GNNs for Knowledge Graphs: Reasoning and Completion
Published:
Knowledge Graphs in Production
Freebase: 1.9B triples (deprecated, absorbed by Wikidata) Wikidata: 100M+ triples, multilingual, community-maintained Google Knowledge Graph: powers Google Search “knowledge panels” Yago: derived from Wikipedia, 120M+ facts ConceptNet: commonsense knowledge (objects, situations, relationships)
These graphs power question answering, search, dialogue systems, and recommendation.
Task 1: Knowledge Base Completion (Link Prediction)
The most studied KG task: given entity pair (s, o), which relation r holds? Or: given (s, r, ?), which entity o completes the triple?
GNN approach (R-GCN as encoder):
- R-GCN aggregates multi-hop neighbourhood with relation-specific weights
- Entity embeddings encode structural context
- Shallow decoder (DistMult, RotatE) scores candidate triples
Why GNN beats pure shallow methods:
- Sparse entities (few triples) benefit from neighbourhood aggregation — borrow strength from well-connected entities
- Inductive: new entities not in training set get embeddings from their neighbours
- Multi-hop patterns: “friend of my friend” inference through transitive relation patterns
Task 2: Entity Alignment
Two KGs in different languages or from different sources often refer to the same real-world entities (Barack Obama in English Wikidata and 巴拉克·奥巴马 in Chinese Baidu Baike).
Entity alignment: find the bijection between entities across KGs that refer to the same real-world object.
GNN approach:
- Run GNN on each KG independently → entity embeddings
- Align: find pairs (e_1, e_2) with high embedding similarity
- Seed alignment: a few known pairs used as anchors to align the embedding spaces
KECG / RDGCN: use relational GNNs with attention to produce relation-aware embeddings, then align across KGs using known anchor pairs. GNNs propagate alignment information from anchors to nearby entities.
Task 3: Multi-Hop Reasoning
Complex query answering: “Who is the CEO of the company headquartered in the city where the 2020 Olympics were held?”
This requires a chain of reasoning:
- 2020 Olympics → host city → Tokyo
- Tokyo → headquartered companies → various
- Company → CEO → answer
Neural LP / DRUM: learn rules (soft logical implications) as differentiable programs. The GNN computes path scores for all entity paths of a given type.
MINERVA: framed as a Markov decision process — an agent starts at the query entity and follows relation edges step by step. A GNN encodes local context at each step; policy network selects next edge. This is fully interpretable (the path is the reasoning chain).
Task 4: Question Answering over KGs (KGQA)
Task: natural language question → SPARQL-like query over KG → answer entities.
GNN + BERT approach:
- BERT encodes the question → extract entities and relation mentions
- GNN propagates over the relevant KG subgraph
- Output scores over candidate entities → answer
GRAFT-Net, PullNet: retrieve relevant subgraph from KG (k-hop around mentioned entities), run GNN, combine with document retrieval for hybrid KG+text QA.
Challenges
Scalability: Wikidata has 100M+ entities. Full GNN is impossible. Subgraph extraction (relevant K-hop neighbourhood) + GNN on subgraph is the practical approach.
Relation diversity: Wikidata has 8,000+ relation types. R-GCN with basis decomposition handles this; more recent models use type-specific attention (HGT).
Incomplete KGs: all KGs are incomplete. Models must handle missing context gracefully — one reason GNNs (which leverage available neighbourhood) outperform entity-only embeddings for sparse entities.
Summary
| Task | Graph structure used | Key model |
|---|---|---|
| Link prediction | Multi-relational neighbourhood | R-GCN + RotatE |
| Entity alignment | Cross-KG structure similarity | KECG, RDGCN |
| Multi-hop reasoning | Reasoning paths | MINERVA, DRUM |
| Question answering | KG subgraph + text | GRAFT-Net |
GNNs are the backbone of modern knowledge graph systems — enabling inductive, structure-aware entity representations that power reasoning, completion, and alignment tasks at scale.
References
- Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov, I., & Welling, M. (2018). Modeling Relational Data with Graph Convolutional Networks. ESWC 2018 (R-GCN: relation-specific weight matrices for entity classification and link prediction in knowledge graphs).
- Sun, Z., Deng, Z.-H., Nie, J.-Y., & Tang, J. (2019). RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. ICLR 2019 (RotatE: relations as rotations in complex space, handling symmetry, antisymmetry, inversion, and composition patterns).
- Das, R., Dhuliawala, S., Zaheer, M., Vilnis, L., Durugkar, I., Krishnamurthy, A., Smola, A., & McCallum, A. (2018). Go for a Walk and Arrive at the Answer: Reasoning over Paths in Knowledge Bases using Reinforcement Learning. ICLR 2018 (MINERVA: RL-based multi-hop path traversal for knowledge base question answering).
