ECHO: Encoding Communities via High-order Operators

Summary

ECHO is a self-supervised graph learning framework for attributed community detection that targets two complementary failure modes of GNN-based approaches: a Semantic Wall (over-smoothing and heterophilic poisoning from rigid inductive biases) and a Systems Wall (the O(N²) memory cost of dense pairwise similarity clustering). The system routes each graph to an Isolating (MLP) or Densifying (GraphSAGE) encoder via unsupervised structural heuristics, applies attention-guided K-step diffusion to throttle information flow across heterophilic edges, trains with a memory-sharded full-batch InfoNCE objective, and extracts communities through chunked O(N·K) top-k similarity feeding into modularity maximization. On LFR benchmarks up to 1M nodes and real attributed graphs including Pokec (1.6M nodes, 30M edges), ECHO reaches state-of-the-art NMI on several datasets while sustaining >2,800 nodes/sec on a single A100.

Key Contributions

  • Topology-Aware Router that picks between MLP and GraphSAGE encoders using feature sparsity ρ_X, mean degree ⟨k⟩, and semantic assortativity H_R, avoiding a one-size-fits-all inductive bias.
  • Attention-guided multi-scale diffusion with learned edge weights and an L1 sparsity penalty, dynamically pruning heterophilic edges to mitigate over-smoothing.
  • Memory-sharded full-batch InfoNCE: exact full-batch gradients with VRAM bounded by dynamic chunking of the negative-sample tensor.
  • Chunked O(N·K) similarity extraction with degree-adaptive top-k and mutual-max symmetrization, replacing the dense O(N²) similarity matrix.
  • An open-source unified framework (ECHO-GNN) demonstrated end-to-end on graphs >1.6M nodes / >30M edges on commodity hardware.

Methods

Four-phase pipeline: (1) unsupervised routing on ρ_X, ⟨k⟩, H_R to select Isolating MLP vs. Densifying SAGE; (2) adaptive semantic encoding; (3) K-step diffusion with softmax edge attention from an MLP over concatenated node pairs; (4) chunked cosine top-k extraction with degree-adaptive k_i ∈ [k_min, k_max], followed by Louvain/Leiden modularity maximization via igraph. Training uses an InfoNCE loss with attention-weighted positives and 256 negatives per node, sharding triggered when N×P×d exceeds 2×10⁸ elements. Evaluation spans LFR (N=500–5,000 and scaled to 1M, μ=0.5) and real graphs (Chameleon, Actor, Amazon Photo/Computers, Coauthor CS, CoraFull, YouTube, Pokec), against K-Means, LPA, Leiden, LINE, DGI, MVGRL, and SDCN. Implementation is PyTorch 2.6 with TF32/AMP on a single A100 80GB.

Findings

  • At the LFR boundary μ=0.5 with N=5,000, ECHO reaches NMI 0.3663, well above DGI (0.1607), MVGRL (0.1677), LINE (0.3463), LPA (0.1912), and SDCN (0.1737), and is stable or improving with scale while SDCN degrades.
  • State-of-the-art NMI on Chameleon (0.1701), Amazon Photo (0.7290), Amazon Computers (0.5957), CoraFull (0.5114), and a near-tie with DGI on Coauthor CS (0.7042 vs 0.7071).
  • MVGRL OOMs beyond ~5,000 nodes and SDCN is numerically unstable, underscoring the systems advantage of the sharded design.
  • Throughput: ≈3,266 nodes/s on synthetic YouTube (347s) and ≈2,805 nodes/s on Pokec (582s) on a single A100.
  • The router systematically picks the Isolating MLP for dense/heterophilic graphs (Chameleon, Actor, Amazon Photo/Computers) and the Densifying SAGE for sparse, feature-rich, or large graphs (Coauthor CS, CoraFull, YouTube, Pokec); K=0 dominates mid-scale, K=1 for the largest social graphs.
  • t-SNE on Cora and Amazon Computers shows visibly more separable clusters in ECHO embeddings than in raw features.

Connections

No other papers have been registered under this note’s topics yet, so there are no sibling notes to link. Conceptually, ECHO sits between classical modularity-based community detection (Louvain/Leiden, generalized k-path methods) and self-supervised attributed graph learning (DGI, MVGRL, GraphCL, SDCN), and would naturally connect to future notes on over-smoothing in GNNs, contrastive learning at scale, and heterophily-aware message passing.

Podcast

A research-radio episode discusses this paper: Listen