Red-Teaming in the Public Interest
Summary
This paper reframes red-teaming of generative AI as a sociotechnical practice rather than a narrowly technical security exercise, asking what it means for red-teaming to serve the public interest. Through interviews and participant observation at public red-teaming events, the authors trace how the practice’s lineage in cybersecurity and military contexts is being adapted — often awkwardly — to the distinct, diffuse harms of generative AI. They argue that the institutional framing of an event, the participants invited, and the organizational context decisively shape what gets recognized as a vulnerability or harm worth surfacing.
Key Contributions
- An empirical, qualitative account of how public-interest red-teaming is actually practiced around generative AI systems.
- A theoretical reframing of red-teaming as a sociotechnical activity entangled with governance, accountability, and public participation — not merely an adversarial technical method.
- Practical insight for regulators, civil society, and technologists working to operationalize AI safety commitments through participatory mechanisms.
Methods
A qualitative, multi-sited study combining:
- 26 semi-structured interviews with red-teaming practitioners and stakeholders.
- Participant observation at three public red-teaming events centered on generative AI.
- Interpretive analysis situating these practices within broader debates over genAI governance and accountability.
Findings
- Red-teaming practices diverge sharply depending on organizational setting, the expertise of participants, and how “public interest” is operationally framed.
- Cybersecurity-derived adversarial methods do not transfer cleanly to generative AI; the nature of harms (bias, representational damage, sociocultural risk) demands methodological adaptation.
- Public-facing events surface harm categories and concerns that internal or industry red-teams typically miss, suggesting genuine epistemic value in opening the practice to broader publics.
- The framing of an event functions as a kind of governance act: it conditions what counts as a finding, who has standing to flag harms, and what follow-through is possible.
Connections
This work sits alongside Gillespie2026-aa and Matias2025-px in a growing strand of scholarship treating red-teaming and adversarial testing as participatory governance infrastructures rather than purely technical evaluations. Together these papers raise overlapping questions about who gets to test AI systems, under what framings, and with what accountability downstream — making them natural interlocutors on the politics of public participation in AI safety.