Computational Methods for Social Research (Structure)

The arc of inquiry

Computational social research is in the middle of an identity crisis. A decade of methodological optimism — fueled by open APIs, large-scale digital traces, and increasingly capable language models — is now colliding with a closed platform ecosystem, a regulatory turn in Europe, and a wave of generative AI tools whose epistemic status is unresolved. The papers filed here trace that arc: from data infrastructures and their decay, through methodological adaptations (LLMs, network science, multimodal analysis), to a maturing self-critique of how computational tools shape what we can claim to know about social life.

The data-access problem and the “post-API age”

A first cluster of work grapples with the collapse of open platform data. Freelon2024-sc periodizes social media data access from a “laissez-faire” era to today’s fragmented regime of academic walled gardens and pay-to-play APIs, while Murtfeldt2025-wu quantifies the cost: a decade of ~25% annual growth in Twitter research, then stagnation and decline after the 2023 API shutdown. Bastos2025-ya offers a retrospective eulogy for Twitter as research infrastructure, and Yang2026-tq’s systematic review confirms the pattern is field-wide — social media research tripled, then plateaued, with Twitter/X and Facebook still dominating despite their misalignment with actual public platform usage. Giglietto2026-855a54cb reframes this trajectory as a structural problem: voluntary access regimes are inherently fragile, and the EU’s Digital Services Act offers a partial but uneven corrective. Peters2026-mo shows how even within the DSA process, “data quality” is a politically contested concept that platforms, NGOs, and regulators define differently.

Adjacent papers document the methodological consequences. Rieder2025-ju audits YouTube’s search API and finds it “forgetful by design,” with severe temporal decay that makes historical retrieval unreliable. Ulloa2024-jm shows that even web scraping introduces ~34% content disparity relative to in-situ capture, independent of time delay. Giglietto2022-b30e8b4e uncovers how Meta’s 100-share threshold in the URL Shares Dataset entangles algorithmic curation with the social signals researchers want to measure. Together these papers argue that platform-mediated data is never neutral: thresholds, ranking systems, and access regimes co-produce the empirical record.

Industry influence and the political economy of evidence

A sharper line of critique zooms out from infrastructure to political economy. Bak-Coleman2025-pm argues that tech firms exert tobacco-industry-style influence over the science of their own products, exploiting asymmetries in data, funding, and access. Bak-Coleman2026-mk quantifies the claim: roughly half of high-profile social media papers have undisclosed industry ties, with editorial and reviewer roles also captured, and topical attention skewed toward user-blame framings (misinformation sharing) over platform-level dynamics. Allen2025-ot reads platform-independent experiments (browser extensions, LLM reranking) as a methodological response to this closed environment. The combined picture is that computational social science increasingly operates inside a research infrastructure whose dependencies it must itself audit.

Detecting coordination, ideology, and influence

A second major thread develops computational instruments for studying networked political behavior. Coordinated inauthentic behavior detection has matured from single-modality timing methods toward multiplex and temporal approaches: Iannucci2025-eg integrates time-decayed co-action across multiple layers, Mannocci2025-ig systematically compares flattening strategies and finds multiplex community detection most robust, Yang2025-iv proposes statistical regularities in sharing speed as evasion-resistant signals, and Luceri2025-tr extends these methods to video-first platforms. Minici2024-tf pushes toward graph foundation models that generalize across campaigns, while Bastos2025-ol explores visual signatures of troll farms. Di-Marco2025-aa, notably, runs in the other direction: a formal cascade-influence analysis suggests real coordinated accounts are placed roughly randomly and exert far less influence than optimistic accounts assume.

Network thinking also organizes work on cross-platform diffusion and structural mechanisms. Gerard2025-br proposes narrative-cluster affiliation networks as a platform-agnostic alternative to interaction graphs, finding “bridge users” who carry narratives across Truth Social and X. Smith2025-kc develops a causal account of triad transitivity via “attention brokerage.” Bruns2025-fz proposes practice mapping with action embeddings as an alternative to the “hairball” network visualization. Latent-ideology estimation, meanwhile, extends from users to videos in Lai2024-to, and to LLM-derived scaling in Le-Mens2025-qz. Bouchaud2026-lr takes this further: X’s own recommender system inadvertently learns a linear ideological direction in user embeddings, with implications both for algorithmic auditing and privacy regulation.

The LLM turn: promises and limits

The most rapidly developing thread integrates LLMs into research workflows. Several papers treat LLMs as scalable annotators or classifiers: Bailard2024-pj uses a fine-tuned DeBERTa to classify collective action frames; Meher2025-qb shows QLoRA fine-tuning makes conflict-event classification accessible on consumer hardware; Balluff2026-bv uses automated content analysis to detect media-capture patterns; Iris2026-pg combines ChatGPT-4o extraction with manual validation to document Radical Right overrepresentation in European election coverage; Larsson2026-ro applies GPT-4 to a decade of Norwegian Facebook posts. Giglietto2024-cbeb3f70 benchmarks OpenAI’s embeddings against language-specific BERT variants for clustering Italian political news. Other papers push LLMs toward more interpretive tasks: Elfes2026-jb operationalizes Greimas’ actantial model to measure narrative polarization; Waight2025-al uses LLMs to extract claim-subject structures for cross-language narrative similarity; Arminio2025-tw uses VLLMs to render images into connotation-bearing text for clustering; DiGiuseppe2025-es proposes LLM-paired comparisons for scaling open-ended survey responses; Ober2026-vd integrates LLM labeling with topic modeling for qualitative interview analysis; and Marino2024-2fbc690f articulates a three-phase validation protocol for fully LLM-integrated pipelines.

But the same cluster contains its own immune system. Balluff2026-if offers a sustained critique of “unreflective LLM adoption,” noting reproducibility breakage, corporate dependency, language bias, and environmental cost. Paci2025-ag documents that even frontier models fail at interpreting implicatures and presuppositions in Italian political speech. Brown2025-jk finds that LLM annotator bias is largely dataset-specific rather than model-specific, with item difficulty far outweighing demographic effects. Lee2026-je demonstrates that LLMs can infer political alignment from ostensibly nonpolitical online posts — a methodological capability that doubles as a privacy threat. Fan2025-ut shows that source and language act as observed confounders in pretrained embeddings, treatable by linear concept erasure. And in a striking recent contribution, Alizadeh2026-es finds that frontier coding agents can reproduce most computational social science findings but exhibit sycophantic behavior when prompts are confirmatory — performance that is impressive and structurally worrying in equal measure.

Multimodal, narrative, and temporal expansions

Several papers stretch the empirical scope of computational analysis beyond text. Gardam2025-er argues for visual-first methods on Instagram; Arora2025-tx develops multimodal frame extraction across text and image; Anwar2024-34dba628 reviews the use of Facebook Reactions as paralinguistic affective signals. Sarmiento2025-as proposes unsupervised framing detection for polarized discourse. Fan2026-af makes a more fundamental conceptual move: communication theory treats communication as a process, but most empirical work aggregates temporally; user-sequence approaches and process-mining methods can recover the missing temporal dimension from digital traces.

Substantive findings and the limits of platform-derived inference

These methodological tools yield substantive claims that recur across the corpus. Algorithmic and curatorial effects appear more modest, conditional, and structural than popular narratives suggest: Brown2026-br finds rabbit holes but not echo chambers or radicalization pathways on YouTube; McNally2025-dn shows Facebook’s News Feed differentially treats hard news but is detectable rather than opaque; Green2025-ap argues that domain-level partisanship measures conflate moderation with heterogeneity, missing networked story-level curation. Oswald2025-km’s production–consumption gap and Nenno2025-xa’s cross-national news values work both caution that visible online discourse is unrepresentative of underlying populations. Luhring2025-od audits NewsGuard and shows that binary trustworthiness cutoffs can distort downstream conclusions. Iannelli2018-ebd918b7 is an early example, using Facebook ad targeting to recruit niche populations while finding inconclusive evidence that platform “interest” categories track stigmatized opinions. Gaisbauer2025-by argues more broadly for multi-level (story, outlet, content) cartographies of news circulation.

A final, increasingly visible thread asks what computational social science is for and under what conditions it can succeed. F2020-6278a4aa anticipated much of this conversation, arguing that fuzzy organizational boundaries, micro-targeting, and attention competition demand methodological reinvention. Ng2026-og argues that agentic AI systems require social theory as a structural prior, not as decoration. The methodological critiques in Balluff2026-if, Bak-Coleman2025-pm, Peters2026-mo, and Giglietto2026-855a54cb converge on a shared point: computational social research is no longer a neutral toolkit applied to platform data, but an institution embedded in regulatory, commercial, and infrastructural politics. The papers in this collection suggest the field is reorienting accordingly — toward platform-independent designs, multimodal and temporal analyses, hybrid human–LLM workflows with explicit validation, and an unusually self-aware accounting of where the data, the tools, and the funding come from.

fg-zettelkasten

Explorer

Computational Methods for Social Research (Structure)

The arc of inquiry

The data-access problem and the “post-API age”

Industry influence and the political economy of evidence

Detecting coordination, ideology, and influence

The LLM turn: promises and limits

Multimodal, narrative, and temporal expansions

Substantive findings and the limits of platform-derived inference

Graph View

Table of Contents

Backlinks

fg-zettelkasten

Explorer

Computational Methods for Social Research (Structure)

The arc of inquiry

The data-access problem and the “post-API age”

Industry influence and the political economy of evidence

Detecting coordination, ideology, and influence

The LLM turn: promises and limits

Multimodal, narrative, and temporal expansions

Substantive findings and the limits of platform-derived inference

Toward reflexive computational social science

Graph View

Table of Contents

Backlinks