Who Gets to Say Whether AI Works? A Call for Participatory Evaluations Across the Global South

When an AI system is evaluated for safety or performance, who is doing the evaluating and whose reality are they measuring against?
Right now, the answer is largely: researchers and engineers, working in high-resource language settings, using benchmarks built from data that skews heavily toward the Global North. The result is a significant blind spot. AI systems can appear to perform well under standard evaluation conditions while failing, sometimes seriously, when deployed in different linguistic, cultural, or institutional contexts. These failures don't always show up in benchmarks.
Fixing this requires a fundamentally different approach to how evaluation is conducted and who is involved in it.
A Different Kind of Evaluation
We are excited to announce a new collaborative project: a multi-country pilot on participatory AI evaluations, co-led by the Collective Intelligence Project (CIP) and the Global South Network for Trustworthy AI.
The project builds on CIP's Weval platform, which has already been piloted in India and Sri Lanka, and extends that approach into new country contexts across the Global South. The core premise is straightforward: communities and domain experts — people with deep, situated knowledge of the contexts in which AI is actually deployed — are best placed to identify where AI systems fall short.
The process involves working with local partners to develop culturally grounded evaluation prompts, assessing AI responses through both expert and community perspectives, and synthesising findings into structured, community-driven benchmarks. The goal is to surface misalignment, blind spots, and real-world failures that conventional evaluations miss, and to do so in a way that builds local capacity and feeds directly into global AI governance conversations.
We're Looking for Partners
We are currently seeking 2–3 organisations from across the Global South to co-lead country-level pilots. This is an open call! If you know organisations in your networks who would be a strong fit, we'd encourage you to share this open call widely.
We're not looking for organisations with prior experience in AI evaluations. We're looking for organisations with deep roots: in communities, in the sectors where AI is being deployed, and in the governance conversations that will shape how these systems are built and regulated. Ideal partners demonstrate strong engagement with grassroots stakeholders and domain expertise in areas such as health, agriculture, education, or governance.
Selected partners will:
- help shape the evaluation focus and study design,
- contribute to the development of prompt design grounded in lived and local knowledge,
- facilitate engagement with community participants, and
- support the documentation and synthesis of findings
This is a genuinely collaborative process; we envision partner organisations as co-designers, not data collectors.
Why This Matters
Standard AI benchmarks don't ask whether a system can navigate a land dispute in a context where documentation is informal and local governance structures are complex. They don't ask whether health information is accurate when translated into languages underrepresented in training data, or whether a model's assumptions about household structure, gender roles, or financial behaviour hold in contexts very different from those in which it was built.
Participatory evaluation is how we begin to ask those questions systematically. It’s how we start building the evidence base that is necessary to produce and promote a more contextually grounded, responsible evaluation.
If your organisation is interested in participating, or if you'd like to find out more before applying, fill out this short form.
Partner With Us
If you’re a government entity, tech company or civil society organisation that would like to join, partner with or support the Network, reach out!



