by Milagros Miceli, DWI’s Principal Investigator and Project Lead

How We Made Data Workers' Inquiry

This piece traces how Data Workers‘ Inquiry came to life: not as a project with clear answers, but as a messy, stubborn attempt to put research in the hands of the people living the realities researchers usually write about

I spent years researching data work, traveling from one side of the world to the other to collect stories about the invisible labor that fuels so-called AI technologies. In the slums of Buenos Aires, I sat with young people labeling data, dreaming of better futures that the work would never deliver. In Bulgaria, I met Syrian refugees, some barely out of childhood, labeling satellite imagery for drones and swarm technologies. They trained machines that, today, in times of war mongering and unprecedented surveillance, patrol the skies over their own hometowns.

I published papers about this research. I held talks about these encounters. I won awards. But at some point, I realized the words were starting to rot in my mouth. No matter how carefully I wrote, how rigorously I cited, how I insisted on the political importance of data work, something was wrong: I built a career on those data workers’ stories. But what did they get in return? Their working conditions didn’t change because of my research. Their lives didn’t improve because I documented their struggles.

That’s how I started to think about creating the Data Workers‘ Inquiry (DWI) project —not as a study or a research project, but as an act of refusal. I didn’t want to speak for or about data workers anymore. I wanted to build a space where they could tell their own stories, and where research was a form of collective thinking and organizing, not extraction.

By mid-2023 the idea was still messy in my head. The only things I knew for sure were that I didn’t want to repeat the harmful research patterns that I saw in academia, and that I couldn’t work on this project alone. So I approached then PhD student Adio Dinika, who had extensive experience researching platform labour in Sub-Saharan Africa. Adio was instrumental in helping me make sense of my incipient ideas and turning them into a draft project proposal. Then Camilla Salim Wagner and Laurenz Sachenbacher joined, running day-to-day operations and troubleshooting endless hurdles to make the DWI project actually happen. Finally, we were joined by data worker and organizer Krystal Kauffman who helped us stay true to our original purpose while making DWI fit academic expectations.

I was stubborn about one thing: community researchers would be treated and paid fairly—not as participants, but as researchers. Securing funding for this, and more specifically for inviting data workers to name the companies exploiting them and to organize against them if they chose to, was not easy. I pitched the idea to skeptical funders who nodded politely until they realized this wasn’t just another “responsible AI” project. Fortunately, my vision did resonate with some, with DAIR’s founder and director, Dr. Timnit Gebru, being the first to say, “Okay, I’ll help you make DWI happen.” And she did. One year ago today, on May Day, we announced the project with a short video teaser.

 

Finding data workers who would join us was both easier and harder than I expected. I was lucky to have forged durable relationships with data workers while conducting my earlier research. But explaining what I had in mind for this specific project was more challenging, especially because there was no template or previous research I could point to. At first, recruitment went slow, and full of careful conversations and doubts on both sides. But once the first few community researchers joined, something shifted: word traveled through networks built in the shadows of platform labor, and data workers started reaching out to us directly.

Each community researcher designed their own inquiry, which means that they chose the research questions that they wanted to answer, the methods by which they were going to investigate them, and how they were going to present their findings. Some of the community researchers chose to have their full names disclosed in connection with their investigations, while others preferred total anonymity.

The inquiries resulted in a variety of artifacts, including podcasts, reports, video essays, vlogs, documentaries, and animated videos. The diversity of the outputs was beautiful. But it was also challenging to manage the different timelines, expertise, and support systems needed to produce content in each medium. We pulled in illustrators, translators, captioners, animators, and copy editors to help bring each inquiry to life. Before publication, we had local legal experts scrutinize each piece to ensure that none of the community researchers—who were located in different regions with varying legal jurisdictions—would be exposed to legal risks.

While supporting the creation of the pieces, I faced moments of doubt almost every day. How much should we intervene in the community researchers‘ drafts? We wanted to strike a balance between creating artifacts that are legible to a broad audience but retaining the rawness with which the community researchers communicated their experiences. Every suggested edit felt like a small betrayal, and a temptation to make the work “cleaner” and “more acceptable” at the expense of its authenticity. I learned to recognize when what was needed was a helping hand, and when it was better to step back and simply give space.

The stories the community researchers uncovered weren’t just data points—they were wounds, and delving into them wasn’t easy for the researchers themselves. At one point, I realized that the inquiry process was retraumatizing for some, so I brought in a trauma therapist to support them. It wasn’t a perfect solution, and I wish we had engaged the therapist earlier. But this was one of those things you only realize once you’re deep in the process.

And then there were moments when the line between research and organizing simply disappeared. Botlhokwa Ranta, a community researcher and former content moderator, found through her inquiry that a group of former Sama workers, mostly migrant women, were stranded in Nairobi after being fired without pay. To support these workers, we launched a fundraiser with our friends at Superrr, and Ranta helped eleven of the stranded women get home. We also stood by our co-researchers in Germany, when they founded a workers’ council to fight the exploitative working conditions at Telus, a data work provider for Meta and other tech giants. And when a group of data workers in Kenya decided to create a Data Labelers Association and asked for our support, we co-organized their launch event and asked our peer organizations to support them along with us.

 

Still, there were limits. And the existence of these limits was one of the hardest things I had to learn to accept. No matter how much we wanted to, we couldn’t fundamentally change the material conditions under which most data workers live. We couldn’t erase the violence of an industry built on global inequalities and exploited labor. All we could offer was solidarity, and that felt thin.

But I know we did this: we created a repository with 16 pieces by data workers from Kenya, Venezuela, Syria, Lebanon, Brazil, and Germany, each one of them with a unique focus. And we’re publishing eight more inquiries this year. We made every piece open access under a Creative Commons license, so that it could be reused, remixed, and carried forward. The repository has become an important resource for researchers, journalists, and policymakers looking for first-hand accounts of data work, and just as importantly, for other data workers searching for shared struggles and strategies of resistance. We hosted an online event series featuring our community researchers that was viewed by thousands, and we co-organized a data workers’ panel to testify at the European Parliament. Our work has made impact.

The DWI project has exposed the deep-rooted exploitation at the heart of the AI industry: taking advantage of vulnerable groups like migrants, refugees, and gender minorities, and relying on tactics like wage theft, price discrimination, and workplace retaliation to keep the system running. Inquiries like that of Fasica Berhane revealed how social media platforms became instruments for spreading genocidal content during the 2020-2022 Tigray war, and how their failure to hire enough moderators and preserve their mental health made it impossible to effectively stem the tide of hate, with devastating consequences, including the lynching of civilians. The audiovisual pieces by Yasser Alrayes and the pseudonymized worker „Ruba“ showed the link between exploitative working conditions, worker alienation, insufficient training, and the quality of AI training data. The zine, “The Unknown Women of Content Moderation,” exposed the sexual, mental, and physical abuse endured by migrant women employed as data workers in Nairobi.

 

 

These pieces are just a few examples, but they are enough to show that we conducted research that mattered. Yes, it was messy, conflicted, slow, and full of contradictions. But it was research led by the very people whose realities were being studied. Research that prioritized action over publication, and solidarity over prestige.

The Data Workers‘ Inquiry project didn’t fix data work or change the AI industry. But it did carve out a space where data workers could investigate their own exploitation, tell their stories in their own words, organize, and fight—on their own terms. And that was new. We created a blueprint for other researchers who, like me, don’t want to reproduce the violence of parachute research and are tired of academic environments that dress themselves in the language of neutrality and objectivity, while ultimately extracting workers’ pain and repackaging it for their own gain.

About the Author

Milagros Miceli

She is a sociologist and computer scientist investigating how ground-truth data for machine learning is produced. Her research focuses on labor conditions and power dynamics in data work. Since 2018, she has continuously engaged with communities of data workers globally.

She is the research lead at the DAIR Institute, head of the Data, Algorithmic Systems, and Ethics research unit at Weizenbaum-Institut, and a lecturer at TU Berlin. She is also the principal investigator of the Data Workers‘ Inquiry project. 

Skip to content