GNNs for Social Networks: Influence, Communities, and Misinformation
Published:
Why Graphs for Social Networks?
Social influence is inherently relational:
- A userโs political views are correlated with their friendsโ views (homophily)
- Misinformation spreads along retweet chains โ the propagation tree matters
- Community structure (echo chambers, polarisation) is a global graph property
- Influence of an account cannot be measured by its own features alone
GNNs capture these relational patterns โ structure that content-only models (text classification, user attribute prediction) miss.
Task 1: Fake News and Misinformation Detection
The propagation graph approach: when a news article is shared, it creates a propagation tree (root โ shares โ reshares). Each node is a user; each edge is a retweet.
Key observations:
- Fake news propagates differently from real news: faster initial spread, shallower tree (bot amplification), then dies out
- Real news: slower spread, deeper tree, more diverse users
GNN-FakeNews (Bian et al., 2020): builds two propagation graphs โ top-down (spread direction) and bottom-up (source tracing). Runs GNN on both, then combines embeddings with a claim encoder for final classification.
Advantage over content-only methods: two articles with identical content but different propagation patterns โ different predictions. Structure provides signal that text alone cannot.
Task 2: Community Detection
Traditional methods: spectral clustering, Louvain algorithm (modularity optimisation). These use only graph structure.
GNN approach: combine node features + graph structure for richer community embeddings.
SEAL (Learning from Subgraphs for Link Prediction): learns community structure implicitly during link prediction training โ communities are nodes that tend to be mutually linked.
Graph Autoencoders (GAE/VGAE, Kipf & Welling, 2016):
Train to reconstruct A from Z. The latent Z captures community structure โ nodes in the same community cluster together in latent space. Communities are found by clustering Z.
Task 3: Influence Estimation and Viral Prediction
Influence maximisation: which K users to seed to maximise information spread? A combinatorial problem (NP-hard).
GNN approach (Chen et al., 2021): train a GNN to predict the expected spread from a seed set. The GNN takes the seed set (as initial node activations) and propagates via the graph, simulating the cascade. Output: expected reach after T steps.
This replaces expensive Monte Carlo simulation (10,000 cascade simulations per seed set) with a single GNN forward pass.
Viral content prediction: given a postโs initial shares (first 1 hour), predict total reach at 24 hours. The postโs propagation subgraph at 1 hour โ GNN โ reach prediction. Structure of early spread is highly predictive of final virality.
Task 4: Friend and Follow Recommendation
Link prediction on social graphs: predict (u, v) edge probability = will user u follow user v?
GraphSAGE for link prediction:
- Sample neighbourhoods for u and v
- Compute embeddings h_u, h_v via GNN
- Score = ฯ(h_u^T h_v) or concat + MLP
On Twitter/Instagram-scale graphs (billions of nodes), neighbourhood sampling (PinSage-style) is necessary.
Challenges Specific to Social Networks
Scale: Facebook has 3B users, 1T+ edges. Full-graph GNNs are infeasible. Must use minibatch training with neighbourhood sampling.
Heterophily: political/social networks are often heterophilic (users follow people with opposite views to monitor them, debate, or due to bot-following patterns).
Temporal dynamics: social graphs evolve rapidly. Static GNNs must be retrained; TGN-style dynamic models are preferable.
Adversarial manipulation: spammers and bots create synthetic edges to boost influence. GNNs trained on observed graphs may encode these manipulated patterns. Adversarially robust GNNs (GNN-Guard, RobustGCN) add graph cleaning or certified training.
Summary
| Task | Graph structure used | Key model |
|---|---|---|
| Fake news detection | Propagation tree structure | GNN-FakeNews |
| Community detection | Adjacency + features | VGAE, node clustering |
| Influence estimation | Full social graph | GNN cascade simulator |
| Friend recommendation | User-user graph | GraphSAGE, LightGCN |
| Bot detection | Follow/retweet graph | GCN + temporal features |
Social networks demonstrate that GNNs are not just machine learning tools โ they are instruments for understanding and intervening in sociotechnical systems. The structural patterns they capture determine how information, influence, and misinformation propagate through society.
References
- Bian, T., Xiao, X., Xu, T., Zhao, P., Huang, W., Rong, Y., & Huang, J. (2020). Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks. AAAI 2020 (GNN-FakeNews: bidirectional propagation tree GCN for rumour and fake news detection on Twitter).
- Kipf, T. N., & Welling, M. (2016). Variational Graph Auto-Encoders. arXiv 2016 (VGAE: variational autoencoder on graphs for unsupervised community detection and link prediction).
- Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive Representation Learning on Large Graphs. NeurIPS 2017 (GraphSAGE: inductive node embedding by neighbourhood sampling, widely used for social network tasks including friend recommendation).
