GNNs for Knowledge Graphs: Reasoning and Completion

4 minute read

Published:

TL;DR: Knowledge graphs (Freebase, Wikidata, ConceptNet) are massive multi-relational graphs. GNNs power three key tasks: link prediction (fill missing triples), entity alignment (match entities across KGs), and multi-hop reasoning (answer questions requiring several reasoning steps). The key advantage over shallow methods: GNNs are inductive and capture neighbourhood context.

Knowledge Graphs in Production

Freebase: 1.9B triples (deprecated, absorbed by Wikidata) Wikidata: 100M+ triples, multilingual, community-maintained Google Knowledge Graph: powers Google Search “knowledge panels” Yago: derived from Wikipedia, 120M+ facts ConceptNet: commonsense knowledge (objects, situations, relationships)

These graphs power question answering, search, dialogue systems, and recommendation.

The most studied KG task: given entity pair (s, o), which relation r holds? Or: given (s, r, ?), which entity o completes the triple?

GNN approach (R-GCN as encoder):

  1. R-GCN aggregates multi-hop neighbourhood with relation-specific weights
  2. Entity embeddings encode structural context
  3. Shallow decoder (DistMult, RotatE) scores candidate triples

Why GNN beats pure shallow methods:

  • Sparse entities (few triples) benefit from neighbourhood aggregation — borrow strength from well-connected entities
  • Inductive: new entities not in training set get embeddings from their neighbours
  • Multi-hop patterns: “friend of my friend” inference through transitive relation patterns

Task 2: Entity Alignment

Two KGs in different languages or from different sources often refer to the same real-world entities (Barack Obama in English Wikidata and 巴拉克·奥巴马 in Chinese Baidu Baike).

Entity alignment: find the bijection between entities across KGs that refer to the same real-world object.

GNN approach:

  1. Run GNN on each KG independently → entity embeddings
  2. Align: find pairs (e_1, e_2) with high embedding similarity
  3. Seed alignment: a few known pairs used as anchors to align the embedding spaces

KECG / RDGCN: use relational GNNs with attention to produce relation-aware embeddings, then align across KGs using known anchor pairs. GNNs propagate alignment information from anchors to nearby entities.

Why structure helps: "Barack Obama" in English and "巴拉克·奥巴马" in Chinese have very different surface forms. But they share the same neighbourhood structure: both are connected to "USA", "Harvard", "Nobel Peace Prize" (in their respective KGs). GNN embeddings that encode structural position are naturally more alignable than text-based embeddings.

Task 3: Multi-Hop Reasoning

Complex query answering: “Who is the CEO of the company headquartered in the city where the 2020 Olympics were held?”

This requires a chain of reasoning:

  • 2020 Olympics → host city → Tokyo
  • Tokyo → headquartered companies → various
  • Company → CEO → answer

Neural LP / DRUM: learn rules (soft logical implications) as differentiable programs. The GNN computes path scores for all entity paths of a given type.

MINERVA: framed as a Markov decision process — an agent starts at the query entity and follows relation edges step by step. A GNN encodes local context at each step; policy network selects next edge. This is fully interpretable (the path is the reasoning chain).

Task 4: Question Answering over KGs (KGQA)

Task: natural language question → SPARQL-like query over KG → answer entities.

GNN + BERT approach:

  1. BERT encodes the question → extract entities and relation mentions
  2. GNN propagates over the relevant KG subgraph
  3. Output scores over candidate entities → answer

GRAFT-Net, PullNet: retrieve relevant subgraph from KG (k-hop around mentioned entities), run GNN, combine with document retrieval for hybrid KG+text QA.

Challenges

Scalability: Wikidata has 100M+ entities. Full GNN is impossible. Subgraph extraction (relevant K-hop neighbourhood) + GNN on subgraph is the practical approach.

Relation diversity: Wikidata has 8,000+ relation types. R-GCN with basis decomposition handles this; more recent models use type-specific attention (HGT).

Incomplete KGs: all KGs are incomplete. Models must handle missing context gracefully — one reason GNNs (which leverage available neighbourhood) outperform entity-only embeddings for sparse entities.

Summary

TaskGraph structure usedKey model
Link predictionMulti-relational neighbourhoodR-GCN + RotatE
Entity alignmentCross-KG structure similarityKECG, RDGCN
Multi-hop reasoningReasoning pathsMINERVA, DRUM
Question answeringKG subgraph + textGRAFT-Net

GNNs are the backbone of modern knowledge graph systems — enabling inductive, structure-aware entity representations that power reasoning, completion, and alignment tasks at scale.

References