Little mentioning, moderate attention, great relevance: The quality of online platform data in the Digital Services Act
Summary
This paper examines how data quality is conceptualized — or notably absent — in the European Union’s Digital Services Act (DSA), particularly Article 40 and its Delegated Regulation, which govern researcher access to data from Very Large Online Platforms. Through a mixed-methods analysis of regulatory documents and 242 stakeholder feedback submissions to two European Commission Calls for Evidence, the authors show that data quality is barely mentioned in official EU texts and was instead pushed onto the agenda by academics and NGOs. They argue that data quality should be understood not as a neutral methodological construct but as a politically contested concept negotiated between three stakeholder perspectives: a research standard (academics), an economic good (platforms), and a regulatory concern (EU institutions).
Key Contributions
- First empirical analysis of how data quality is discursively constructed and contested within the DSA regulatory process.
- A typology of three stakeholder perspectives on platform data quality: research standard, economic good, and European-regulative perspective.
- Application of Multiple Correspondence Analysis as a tool for mapping discursive positions in platform governance debates.
- A bridge between computational social science literature on data quality and critical platform governance scholarship.
- Concrete policy recommendations, including independent institutions for ongoing data quality assessment, benchmark datasets, and audit mechanisms.
- An annotated dataset and reproducible code released via a public repository.
Methods
The authors quantitatively coded 242 feedback entries (from 214 unique organizations) for references to data quality and its intrinsic indicators (accuracy, completeness, consistency), achieving high inter-coder reliability (Krippendorff’s α = 0.92). They used Multiple Correspondence Analysis (MCA) — drawing on Bourdieusian geometric data analysis — to construct a “data quality discursive space,” followed by Hierarchical Clustering on Principal Components (HCPC) to identify four actor profiles. A thematic analysis (Braun and Clarke) of relevant text passages and a word frequency analysis across the DSA, the draft, and the final Delegated Regulation complement the quantitative findings.
Findings
- 38.79% of submitters referenced intrinsic data quality at least once; mentions were more frequent in the second consultation period (24.3%) than the first (19.6%).
- Academic/research institutions (35.98%) and NGOs (21.02%) were the most active submitters; 20.56% of submissions came from the USA.
- The first two MCA dimensions explained 69.2% of inertia; dimension 2 separated those mentioning data quality (academics, NGOs) from those who did not (companies, business associations, public authorities).
- Among major platforms, only Meta and Snapchat explicitly raised data quality, with Meta strategically inverting the argument: claiming that platform data is too poor to be useful, therefore researcher access is futile.
- Researchers most frequently invoked completeness and accuracy, tying these to specific collection methods (APIs, data donations, scraping) and calls for independent audits.
- Advocacy partially succeeded: data quality language entered Recitals 7 and 13 of the final Delegated Regulation, but researchers did not secure rights to initiate mediation under Article 13.
Connections
This paper sits at the heart of ongoing debates about platform data access in the “post-API age,” directly engaging with work on the DSA’s Article 40 infrastructure such as Ohme2026-nv and broader critical analyses of platform research conditions like Rieder2026-pp and Rieder2025-ju. Its concern with data quality as a methodological-political hybrid resonates with computational social science work on measurement and validity in platform data, including Bak-Coleman2025-pm and Murtfeldt2025-wu, as well as historical perspectives on platform-research relations such as Freelon2024-sc. The argument for independent auditing institutions also connects to platform governance proposals discussed in Helmond2026-ll and Schiffrin_undated-gi.
Podcast
A research-radio episode discusses this paper: Listen