Knowledge Graph Embeddings vs GNNs

5 minute read

Published:

TL;DR: Shallow KG embeddings (TransE, DistMult, ComplEx) learn one vector per entity and one per relation — fast, scalable, but transductive (cannot handle new entities at test time). GNN-based approaches (R-GCN, CompGCN) learn structure-aware embeddings — slower, inductive (can generalise to new entities), and better at multi-hop reasoning. Hybrid approaches combine both.

The Knowledge Graph Completion Task

A knowledge graph (KG) is a collection of triples (subject, relation, object) — e.g., (John_Lennon, member_of, The_Beatles). It is always incomplete: some true triples are missing. KG completion is the task of predicting missing triples.

Evaluation: given (s, r, ?), rank all candidate objects. Metrics: MRR (mean reciprocal rank), Hits@k.

Shallow KG Embeddings

These methods assign a learned embedding to each entity and relation, then score triples with a simple function.

TransE (Bordes et al., 2013)

f(s, r, o) = -||e_s + w_r - e_o||

Interprets relations as translations in embedding space: e_o ≈ e_s + w_r. Excellent for hierarchical, tree-like relations. Cannot model symmetric (friends_with: s=o) or many-to-many relations well.

DistMult (Yang et al., 2015)

f(s, r, o) = e_s · diag(W_r) · e_o = Σ_k e_s[k] · W_r[k] · e_o[k]

Elementwise interaction. Symmetric (f(s,r,o) = f(o,r,s)) — cannot model antisymmetric relations.

ComplEx (Trouillon et al., 2016)

f(s, r, o) = Re( e_s · W_r · ē_o )

Uses complex-valued embeddings. Handles both symmetric and antisymmetric relations. Generally outperforms TransE and DistMult on standard benchmarks.

RotatE (Sun et al., 2019)

f(s, r, o) = -||e_s ∘ e_r - e_o||

Relations as rotations in complex space. Handles symmetry, antisymmetry, inversion, composition — a richer relational geometry.

Key Properties of Shallow Methods

PropertyTransEDistMultComplExRotatE                
Parameters Ed +Rd Ed +Rd2Ed + 2Rd2Ed + 2Rd
TransductiveYesYesYesYes                
InductiveNoNoNoNo                
Symmetric relationsNoYesYesYes                
AntisymmetricYesNoYesYes                
CompositionPartialNoNoYes                

Transductive: requires all entities seen during training. Cannot embed new entities at test time without retraining.

GNN-Based KG Completion

R-GCN and CompGCN use GNNs as encoders — producing entity embeddings that are informed by the graph structure, not just the entity’s identity.

CompGCN (Vashishth et al., 2020)

CompGCN generalises R-GCN by composing entity and relation embeddings during message passing:

h_v = σ( Σ_{(u,r) ∈ N(v)} W_λ(r) · (h_u ∘ z_r) )

Where ∘ is a composition operator (subtraction, multiplication, circular correlation) and z_r is the relation embedding. The composition operator is shared with the decoder.

Why composition matters: TransE uses subtraction (e_o - e_s ≈ w_r). CompGCN builds this into the message passing — when aggregating from neighbour u via relation r, the message is the composition of h_u and the relation embedding z_r. This lets the GNN encode relational context directly into node embeddings.

When to Use Shallow vs GNN Methods

ScenarioRecommendation
Very large KG (millions of entities)Shallow (RotatE, ComplEx) — scalable
New entities at test time (inductive)GNN (R-GCN, CompGCN)
Few triples per entityGNN (leverages neighbourhood structure)
Many triples per entityShallow sufficient
Multi-hop reasoning requiredGNN or neural LP models
Production system, speed mattersShallow (single embedding lookup)

Multi-Hop Reasoning

Shallow methods score triples in isolation — they cannot directly reason about multi-hop paths (e.g., “X is the sibling of Y’s parent” → X is an aunt/uncle of Y). GNNs propagate information over multiple hops, enabling implicit multi-hop reasoning.

Neural LP (Lao & Cohen) and MINERVA (Das et al.) take this further with explicit path-based reasoning, but are slower.

Summary

Shallow KG embeddings are fast, scalable, and well-understood. GNN-based methods are inductive, structure-aware, and better for multi-hop patterns. The trend in the field is hybrid: use a GNN encoder to produce structure-aware entity embeddings, then score with a shallow decoder (DistMult, RotatE). This combines structural awareness with score function expressiveness.

References