Knowledge Graph Embeddings vs GNNs

6 minute read

Published: May 01, 2024

TL;DR: Shallow KG embeddings (TransE, DistMult, ComplEx) learn one vector per entity and one per relation — fast, scalable, but transductive (cannot handle new entities at test time). GNN-based approaches (R-GCN, CompGCN) learn structure-aware embeddings — slower, inductive (can generalise to new entities), and better at multi-hop reasoning. Hybrid approaches combine both.

The Knowledge Graph Completion Task

Key Insight: Shallow KG embeddings are like a phone book — each entity gets exactly one entry, and lookup is instant. GNN-based methods are like a detective's case file — each entity's embedding is assembled from its neighbourhood context. The phone book scales to millions of entries but cannot handle a new person who just arrived; the case file generalises to new people but costs more to build.

A knowledge graph (KG) is a collection of triples (subject, relation, object) — e.g., (John_Lennon, member_of, The_Beatles). It is always incomplete: some true triples are missing. KG completion is the task of predicting missing triples.

Evaluation: given (s, r, ?), rank all candidate objects. Metrics: MRR (mean reciprocal rank), Hits@k.

Shallow KG Embeddings

These methods assign a learned embedding to each entity and relation, then score triples with a simple function.

TransE (Bordes et al., 2013)

f(s, r, o) = -||e_s + w_r - e_o||

Interprets relations as translations in embedding space: e_o ≈ e_s + w_r. Excellent for hierarchical, tree-like relations. Cannot model symmetric (friends_with: s=o) or many-to-many relations well.

DistMult (Yang et al., 2015)

f(s, r, o) = e_s · diag(W_r) · e_o = Σ_k e_s[k] · W_r[k] · e_o[k]

Elementwise interaction. Symmetric (f(s,r,o) = f(o,r,s)) — cannot model antisymmetric relations.

ComplEx (Trouillon et al., 2016)

f(s, r, o) = Re( e_s · W_r · ē_o )

Uses complex-valued embeddings. Handles both symmetric and antisymmetric relations. Generally outperforms TransE and DistMult on standard benchmarks.

RotatE (Sun et al., 2019)

f(s, r, o) = -||e_s ∘ e_r - e_o||

Relations as rotations in complex space. Handles symmetry, antisymmetry, inversion, composition — a richer relational geometry.

Key Properties of Shallow Methods

Property	TransE	DistMult	ComplEx	RotatE
Parameters		E	d +	R	d	E	d +	R	d	2	E	d + 2	R	d	2	E	d + 2	R	d
Transductive	Yes	Yes	Yes	Yes
Inductive	No	No	No	No
Symmetric relations	No	Yes	Yes	Yes
Antisymmetric	Yes	No	Yes	Yes
Composition	Partial	No	No	Yes

Transductive: requires all entities seen during training. Cannot embed new entities at test time without retraining.

Worked Example: TransE vs R-GCN on a Mini-KG

Consider a tiny KG with 3 entities {A, B, C} and 2 triples:

(A, member_of, B)
(A, born_in, C)

TransE learns vectors: e_A, e_B, e_C, w_member_of, w_born_in in ℝ².

Score (A, member_of, B): maximise - e_A + w_member_of - e_B
Score (A, born_in, C): maximise - e_A + w_born_in - e_C
Entity B has no neighbours of its own — its embedding e_B only reflects “it is the target of A’s member_of edge”. No structural context.

R-GCN on the same graph: when computing h_B, it aggregates from A via the inverse member_of^{-1} relation. Entity B’s embedding now encodes “I am a group that A belongs to” — structural context that TransE cannot represent.

If we then add triple (B, located_in, C) at test time, TransE must retrain (new entity interaction). R-GCN can immediately propagate C’s information to B via message passing — inductive generalisation at work.

GNN-Based KG Completion

R-GCN and CompGCN use GNNs as encoders — producing entity embeddings that are informed by the graph structure, not just the entity’s identity.

CompGCN (Vashishth et al., 2020)

CompGCN generalises R-GCN by composing entity and relation embeddings during message passing:

h_v = σ( Σ_{(u,r) ∈ N(v)} W_λ(r) · (h_u ∘ z_r) )

Where ∘ is a composition operator (subtraction, multiplication, circular correlation) and z_r is the relation embedding. The composition operator is shared with the decoder.

Why composition matters: TransE uses subtraction (e_o - e_s ≈ w_r). CompGCN builds this into the message passing — when aggregating from neighbour u via relation r, the message is the composition of h_u and the relation embedding z_r. This lets the GNN encode relational context directly into node embeddings.

When to Use Shallow vs GNN Methods

Scenario	Recommendation
Very large KG (millions of entities)	Shallow (RotatE, ComplEx) — scalable
New entities at test time (inductive)	GNN (R-GCN, CompGCN)
Few triples per entity	GNN (leverages neighbourhood structure)
Many triples per entity	Shallow sufficient
Multi-hop reasoning required	GNN or neural LP models
Production system, speed matters	Shallow (single embedding lookup)

Multi-Hop Reasoning

Shallow methods score triples in isolation — they cannot directly reason about multi-hop paths (e.g., “X is the sibling of Y’s parent” → X is an aunt/uncle of Y). GNNs propagate information over multiple hops, enabling implicit multi-hop reasoning.

Neural LP (Lao & Cohen) and MINERVA (Das et al.) take this further with explicit path-based reasoning, but are slower.

Summary

Shallow KG embeddings are fast, scalable, and well-understood. GNN-based methods are inductive, structure-aware, and better for multi-hop patterns. The trend in the field is hybrid: use a GNN encoder to produce structure-aware entity embeddings, then score with a shallow decoder (DistMult, RotatE). This combines structural awareness with score function expressiveness.

References

Bordes, A., Usunier, N., Garcia-Durán, A., Weston, J., & Yakhnenko, O. (2013). Translating Embeddings for Modeling Multi-relational Data. NeurIPS 2013 (TransE).
Yang, B., Yih, W.-T., He, X., Gao, J., & Deng, L. (2015). Embedding Entities and Relations for Learning and Inference in Knowledge Bases. ICLR 2015 (DistMult).
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016). Complex Embeddings for Simple Link Prediction. ICML 2016 (ComplEx).
Sun, Z., Deng, Z.-H., Nie, J.-Y., & Tang, J. (2019). RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. ICLR 2019.
Vashishth, S., Sanyal, S., Nitin, V., & Talukdar, P. (2020). Composition-based Multi-Relational Graph Convolutional Networks. ICLR 2020 (CompGCN).

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

Knowledge Graph Embeddings vs GNNs

The Knowledge Graph Completion Task

Shallow KG Embeddings

TransE (Bordes et al., 2013)

DistMult (Yang et al., 2015)

ComplEx (Trouillon et al., 2016)

RotatE (Sun et al., 2019)

Key Properties of Shallow Methods

Worked Example: TransE vs R-GCN on a Mini-KG

GNN-Based KG Completion

CompGCN (Vashishth et al., 2020)

When to Use Shallow vs GNN Methods

Multi-Hop Reasoning

Summary

References

Share on

You May Also Enjoy

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization

Alessio Borgi

The Knowledge Graph Completion Task

Shallow KG Embeddings

TransE (Bordes et al., 2013)

DistMult (Yang et al., 2015)

ComplEx (Trouillon et al., 2016)

RotatE (Sun et al., 2019)

Key Properties of Shallow Methods

Worked Example: TransE vs R-GCN on a Mini-KG

GNN-Based KG Completion

CompGCN (Vashishth et al., 2020)

When to Use Shallow vs GNN Methods

Multi-Hop Reasoning

Summary

References

Share on

You May Also Enjoy

📄 Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

📄 Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

📄 Activation Functions in Neural Networks: Why Non-Linearity Matters

📄 FoPE: Fourier Position Embedding for Length Generalization

Output, Gated, and Special Activations: Softmax, GLU, SIREN, and More

Modern Activation Functions: GELU, SiLU, Mish, and Smooth Gating

Activation Functions in Neural Networks: Why Non-Linearity Matters

FoPE: Fourier Position Embedding for Length Generalization