How public involvement can improve the science of AI
Summary
This perspective argues that public involvement — especially engagement with lived-experience experts — is not merely a political or ethical add-on to AI evaluation but a means of strengthening its scientific rigor. Treating AI systems as sociotechnical artifacts whose reliability depends on social and organizational layers, Matias and Price contend that purely technical evaluations routinely miss flaws that participatory methods can surface. They propose a five-stage framework (equipoise, measurement, explanation, inference, interpretation) describing where public participation materially improves the quality of quantitative AI science, and rebut standard objections about generality, subjectivity, reliability, and cost by drawing on established traditions in participatory science.
Key Contributions
- A five-stage framework — equipoise, measurement, explanation, inference, interpretation — for integrating public participation into quantitative AI evaluation.
- A typology of public-involvement models (contributory, cocreated, participatory governance) adapted from civic/citizen science to AI.
- A systematic rebuttal to scientific objections to participatory research in AI.
- A bridge between qualitative sociotechnical scholarship and quantitative evaluation practice.
- Reframing of trustworthy AI as dependent on trustworthy, participatory science rather than on technical assurance alone.
Methods
A conceptual review-and-perspective piece. The authors synthesize literatures on participatory science, AI fairness, algorithm auditing, and sociotechnical systems, and ground the argument in case studies drawn from 15+ years of their own work: human rights data analysis (HRDAG truth-commission work in Guatemala and Colombia), the Invisible Institute’s community coding of Chicago police complaints, the Allegheny Family Screening Tool reanalysis, Kaiser Permanente nursing protests, organ allocation governance, and Reddit field experiments on human–algorithm interaction.
Findings
- Mission-critical AI faces interlocking reliability, contextual-performance, security, and transparency problems that technical evaluation alone cannot resolve.
- Cocreated evaluation has uncovered substantive flaws missed by developers — e.g., HRDAG/ACLU’s reanalysis of the Allegheny Family Screening Tool revealed bipartite-ranking bias against Black families invisible to the original AUC-based evaluation.
- Community labeling of Chicago police complaints exposed allegations of sexual violation hidden by official single-category coding.
- Participatory hypothesis testing in a Reddit field experiment generated a sociotechnical theory of how human fact-checking shapes recommender behavior.
- Equipoise allows adversarial parties (truth commissions, organ allocation stakeholders) to commit to evidence-based processes despite competing interests.
- Community debriefing can preserve study validity by surfacing mid-study platform changes and unobserved confounders.
- Contributory citizen-science models in AI raise consent and labor concerns but can yield robust data when participants find genuine meaning in contributing.
Connections
This paper sits alongside Gillespie2026-aa and Unknown2025-qj in the emerging argument that participation in AI red-teaming and evaluation is an epistemic resource, not just a legitimacy mechanism — lived-experience expertise yields situated knowledge that closed technical teams structurally cannot produce. Where related work often focuses on red-teaming specifically, Matias and Price generalize the case across the full evaluation pipeline, connecting it to longer traditions in participatory and civic science.