Red-Teaming Generative AI as Sociotechnical Practice

From Adversarial Probing to Sociotechnical Practice

The three papers collected here share a common move: refusing to treat red-teaming (and AI evaluation more broadly) as a self-contained technical exercise, and instead reframing it as a sociotechnical practice whose rigor, legitimacy, and harm-detection capacity depend on who is involved, under what conditions, and with what values in mind. Gillespie2026-aa makes the case most forcefully for red-teaming specifically, arguing that the practice has been normalized in policy faster than it has been understood empirically. Unknown2025-qj complements this with grounded fieldwork at public red-teaming events, while Matias2025-px generalizes the underlying epistemic claim: that public and lived-experience involvement is not a concession to politics but a precondition for scientifically reliable evaluation.

Labor, Value, and the Ghosts Behind Evaluation

A central thread is the rendering visible of red-teaming’s hidden labor and embedded value judgments. Gillespie2026-aa draws a pointed analogy to commercial content moderation: outsourced vendors, crowdworkers, NDAs, and the psychological toll—secondary trauma, moral injury—of inhabiting adversarial personas to elicit harmful outputs. The authors warn that automation rhetoric obscures rather than eliminates this labor. Critically, when companies define harm taxonomies internally, value-setting collapses into a narrow demographic and institutional perspective. Unknown2025-qj echoes this through its empirical observation that what counts as a “harm” or “vulnerability” is shaped decisively by institutional framing and participant composition—reinforcing the worry that proprietary red-teaming systematically misses harms its participants are not positioned to recognize.

Publics as Epistemic Resource, Not Window Dressing

Where Gillespie2026-aa is largely diagnostic, Matias2025-px offers a constructive epistemology for why broader participation improves evaluation. Lived-experience experts contribute situated knowledge that professional evaluators structurally cannot supply: the HRDAG/ACLU reanalysis of the Allegheny Family Screening Tool and the community recoding of Chicago police complaints both show flaws that internal, metric-driven evaluation missed. The five-stage scheme—equipoise, measurement, explanation, inference, interpretation—gives a methodological vocabulary for what Unknown2025-qj documents ethnographically at public red-teaming events: that opening evaluation to diverse publics surfaces considerations invisible to internal teams. Together, these papers reposition participation from legitimacy-theater to a source of empirical validity.

Tensions: Democratization vs. Extraction

The papers also converge on a cautionary note. Public and volunteer red-teaming (DEFCON-style events, citizen-science contributions) risks extractive reliance on marginalized communities whose unpaid labor substitutes for, rather than supplements, proper institutional investment (Gillespie2026-aa). Matias2025-px acknowledges parallel concerns about consent and worker treatment in contributory citizen science, while Unknown2025-qj notes that the “public interest” framing of red-teaming events is itself shaped—and sometimes constrained—by the organizations convening them. The shared implication is that participation must be designed with attention to power, compensation, and follow-through, not merely access.

Toward an Agenda

Read together, these three papers sketch the contours of an emerging research program: empirical study of red-teaming as labor and practice (Gillespie2026-aa); ethnographic and interview-based accounts of how public-interest red-teaming actually functions (Unknown2025-qj); and a participatory methodology that integrates lived-experience expertise into the formal stages of AI evaluation science (Matias2025-px). The arc moves from critique of the status quo, through documentation of nascent alternatives, to a constructive epistemology of participatory AI evaluation—suggesting that the rigor and accountability of generative AI safety practices will ultimately stand or fall on how seriously their sociotechnical character is taken.