Social Media Research Methods and Digital Traces
The Closing of the Open Web: From APIcalypse to DSA
The papers gathered here are largely organized around a single inflection point: the collapse of open API access to social platforms, the regulatory response that followed, and the methodological and ethical reorganization of the field in its wake. Freelon2024-sc provides the periodization that anchors much of this conversation, tracing the trajectory from “laissez-faire” APIs through authentication, restriction, and the current patchwork of paid, vetted, and unofficial access regimes. Murtfeldt2025-wu quantifies the human cost of this closure as a “eulogy” for Twitter research, showing a 13% decline in publications in 2024 after the $42,000/month Enterprise pricing took effect. Yang2026-tq confirms the pattern at the level of the social sciences as a whole: empirical social media research tripled through the early 2020s before plateauing and declining, with Twitter/X and Facebook overrepresented relative to platforms users actually inhabit. The 2025 inventory in Unknown2025-ed60bc90 documents the resulting toolscape.
Regulation as a Contested Settlement
A second cluster examines whether the EU’s Digital Services Act, particularly Article 40, can substitute the lost commons with a regulated one. Pierri2025-hm and de-Vreese2026-zx both argue, from inside the policy process, that current turbulence is evidence of meaningful constraint on platform power rather than failure, while warning of resource asymmetries and “independence by permission.” Philipp2026-tl surveys the unevenness of DSA-mandated APIs for election research, and Peters2026-mo shows how “data quality” — barely mentioned in initial EU drafts — was pushed into the final Delegated Regulation through academic and NGO advocacy, framing quality itself as a politically contested concept. Empirical audits temper the optimism: Entrena-Serrano2025-gw documents persistent gaps in TikTok’s Research API, and Rieder2025-ju shows YouTube’s search endpoint is “forgetful by design,” losing the majority of findable videos within months — a finding that directly undermines DSA-style systemic risk monitoring. Lukito2026-nb synthesizes the resulting landscape as a “price, proficiency, or permission” problem that systematically excludes non-WEIRD researchers.
Industry Entanglement and the Integrity of Evidence
Running parallel to the access debate is a sharper, metascientific worry about who gets to do the research that does happen. Bak-Coleman2025-pm and Heiss2026-qv draw the analogy to tobacco, pharma, and fossil-fuel science, arguing that platforms’ exclusive control of data structurally compromises independent evidence. Bak-Coleman2026-mk supplies quantitative backbone: roughly half of high-profile social media papers have disclosable industry ties, most undisclosed, with an estimated 80% “industrial saturation” once editors and reviewers are included. Munger2025-cz turns this lens on the Meta2020 collaboration, arguing that even methodologically virtuous one-off field experiments lack “temporal” and “poetic” validity on platforms that mutate faster than peer review. Allen2025-ot reads the same situation more constructively, positioning browser-extension experiments as a productive middle ground between lab control and platform cooperation.
Methodological Workarounds: Donations, Plugins, and In-Situ Capture
If platforms will not provide data, researchers increasingly build their own instruments. Iannelli2018-ebd918b7 is the early node in this lineage, demonstrating Facebook ad campaigns as a cheap and controllable recruitment instrument for niche populations (€0.46 per respondent), while also flagging the limits of platform-elaborated “interest” targeting as proxies for offline attributes. Inacio-da-Silva2026-zf extends this logic into crowdsourced ad auditing during the 2018 Brazilian elections. Ulloa2024-jm shifts the focus to web-tracking, showing that the dominant source of measurement error in scraped news consumption is not temporal delay but the ex-situ collection environment itself — paywalls, logins, cookie consents — making in-situ browser capture qualitatively necessary rather than merely preferable. Schulte2026-df and Hepp2026-oi reframe these scattered workarounds as a “research infrastructure problem,” arguing that multi-platform observability should be treated as reusable scaffolding rather than reinvented per study. Fan2026-af complements this with a temporal turn: hyper-longitudinal user-sequences donated across platforms, analyzed through sequence mining, HMMs, process mining, and embeddings.
What the Traces Distort
A persistent undercurrent across these papers is the worry that even when data is available, it misrepresents the social world. Oswald2025-km gives this its sharpest formulation through the production–consumption gap: visible discourse is the tip of an iceberg of silent users. Giglietto2022-b30e8b4e shows how Meta’s 100-share anonymization threshold in the URL Shares Dataset produces nearly identical cross-country trajectories driven by Feed algorithm changes rather than local events — a cautionary tale for comparative or longitudinal work. Green2025-ap argues that domain-level partisan audience scores systematically mistake within-source heterogeneity for moderation, distorting echo-chamber research at the URL level. Hartmann2025-px generalizes the point: much disagreement in the echo chamber literature reflects inconsistent conceptualization and operationalization rather than substantive disagreement about reality. Luhring2025-od performs the analogous audit for NewsGuard, finding continuous scores stable but binary trustworthy/untrustworthy thresholds prone to dramatic distortion. Gaisbauer2025-by pushes for a multi-level “political cartography” of news sharing that resists outlet-level reductionism. Anwar2024-34dba628 reviews how Facebook Reactions, despite methodological fragmentation, have become a workable proxy for affective response, particularly for the anger-driven engagement that animates far-right communication.
Detecting Coordination, Inferring Identities
A methodologically distinct strand develops computational machinery for the kinds of inference the new data environment enables. Iannucci2025-eg and Mannocci2025-ig both argue that coordinated inauthentic behavior is inherently multimodal and temporal, and that flattening modalities or using fixed time windows misses or fabricates structure; multiplex community detection with temporal decay emerges as the more faithful representation. Smith2025-kc uses a clever exploitation of the Twitter V1 API cursor to causally identify “attention brokerage” — amplification-driven triadic closure — extending classical brokerage theory into platform-affordance terrain. Bruns2025-fz proposes vector embeddings of network actions as a way out of the “furball” of conventional network visualization. Starbird2025-jj offers a more interpretive method, decomposing rumoring on quote-tweet structures into evidence and frame, showing that misleading content often pairs accurate evidence with distorted framing. Lai2024-to extends latent ideology estimation to YouTube video content via cross-subreddit sharing. Most provocatively, Lee2026-je demonstrates that LLMs can infer political alignment from ostensibly nonpolitical online discourse — a methodological capability that is simultaneously a privacy threat, recapitulating the post–Cambridge Analytica concerns that drove platforms to close their APIs in the first place.
Migration, Fragmentation, and the Field’s Near Future
Wang2026-ub closes the circle empirically: even a coordinated, motivated cohort of academic early adopters largely failed to migrate from Twitter to Mastodon, with retention depending on field-specific servers and federated engagement diversity rather than raw network size. Combined with Yang2026-tq’s finding that 82.59% of social media research still uses a single platform, the implication is that the empirical study of social media is being reshaped not only by platform-side restrictions but by a fragmenting user ecosystem the field has not yet learned to study at scale. F2020-6278a4aa foreshadowed much of this — networked publics, micro-targeting, attention competition, and the inadequacy of traditional political-communication methods — and several of the more recent papers can be read as belated methodological responses to challenges that were already legible half a decade ago.