R-GCN: Relational Graph Convolutional Networks

4 minute read

Published: April 29, 2024

TL;DR: R-GCN (Schlichtkrull et al., 2018) adapts GCN for knowledge graphs with typed edges. Each relation type r has its own weight matrix W_r. Node h_v aggregates messages from neighbours for each relation separately, then sums. The main challenge: |R| weight matrices → over-parameterisation. Solved by basis decomposition.

The Problem: Multi-Relational Graphs

A knowledge graph (KG) is a directed multigraph where edges have types (relations). For example, Freebase contains entities (John_Lennon, Beatles, UK) connected by relations (member_of, born_in, from_country).

Standard GCN cannot handle this — it uses a single aggregation weight and cannot distinguish “member_of” from “born_in” edges.

The R-GCN Update Rule

h^{(k+1)}_v = σ( W^{(k)}_0 h^{(k)}_v + Σ_{r ∈ R} Σ_{u ∈ N_r(v)} (1/c_{v,r}) W^{(k)}_r h^{(k)}_u )

Where:

N_r(v) = neighbours of v connected via relation r
c_{v,r} = normalisation constant (typically N_r(v) )
W^{(k)}_0 = self-loop weight (own representation)
W^{(k)}_r = relation-specific weight matrix

Interpretation: for each relation r, node v collects messages from all neighbours connected by r, transforms them by W_r, and sums. All relation-specific sums are then added together with the self-loop.

This is equivalent to running a separate GCN on each relation’s adjacency subgraph and summing the results.

Directionality: Knowledge graphs are directed. R-GCN handles this by treating each directed relation r and its inverse r^{-1} as two separate relation types. The "A-member_of-B" edge creates a message from B to A using W_{member_of}, and an inverse message from A to B using W_{member_of^{-1}}. This lets information flow in both directions.

The Parameter Problem: Basis Decomposition

For a KG with

= 100 relations and embedding dimension d = 200, the weight matrices have 100 × 200 × 200 = 4 million parameters — just for one layer. This is prone to overfitting, especially when many relations have few training examples.

Basis decomposition (the R-GCN solution): express each W_r as a linear combination of B shared basis matrices:

W_r = Σ_{b=1}^{B} a_{rb} · V_b

Where V_b ∈ ℝ^{d×d} are shared bases and a_{rb} are relation-specific scalars. This reduces parameters from

×d² to B×d² +

×B — much smaller when B «

Block-diagonal decomposition (alternative): partition d into blocks, and each relation’s weight is block-diagonal. This reduces computation without sharing bases.

R-GCN for Entity Classification

Task: given a KG with some labelled entities, predict labels for unlabelled entities.

Example: Freebase entity type classification — is this entity a Person, Organisation, or Location?

Setup:

Entity features: one-hot or learned embeddings
2-layer R-GCN
Final h^{(K)}_v fed to softmax classifier
Trained with cross-entropy on labelled entities

On the AIFB and MUTAG entity classification benchmarks, R-GCN outperforms hand-crafted KG embedding methods.

R-GCN for Link Prediction

Task: predict missing triples (subject, relation, object) — i.e., “does this relation exist between these two entities?”

Setup: R-GCN as an encoder, DistMult as a decoder.

Encoder: run R-GCN to get entity embeddings e_s, e_o
Decoder (DistMult): score the triple (s, r, o):

f(s, r, o) = e_s · diag(W_r) · e_o

Training: binary cross-entropy with negative sampling

This encoder-decoder split (GNN encodes structure, shallow decoder scores triples) is a common pattern for KG link prediction.

R-GCN vs DistMult / TransE

Before R-GCN, KG link prediction used shallow embedding methods:

TransE: entity embeddings s + r ≈ o
DistMult: e_s · W_r · e_o scoring
ComplEx: complex-valued extensions

These encode individual entity/relation embeddings but ignore graph structure. R-GCN adds structural context: entity embeddings are informed by their graph neighbourhood, not just their own learned vectors.

Summary

Property	R-GCN
Handles typed edges	Yes — separate W_r per relation
Handles typed nodes	Partial — via features; no type-specific architecture
Scales to many relations	Via basis decomposition
Tasks	Entity classification, link prediction
Key hyperparameter	Number of bases B (usually 2–10)
Limitation	Does not use attention; all neighbours equally weighted

R-GCN is the foundational model for applying GNNs to knowledge graphs. It introduced the relation-specific weight matrix pattern that nearly all subsequent heterogeneous GNN architectures inherit.

References

Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov, I., & Welling, M. (2018). Modeling Relational Data with Graph Convolutional Networks. ESWC 2018.
Kipf, T. N., & Welling, M. (2016). Variational Graph Auto-Encoders. arXiv preprint (graph autoencoder for link prediction).
Yang, B., Yih, W.-T., He, X., Gao, J., & Deng, L. (2015). Embedding Entities and Relations for Learning and Inference in Knowledge Bases. ICLR 2015 (DistMult decoder).

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Alessio Borgi

R-GCN: Relational Graph Convolutional Networks

The Problem: Multi-Relational Graphs

The R-GCN Update Rule

The Parameter Problem: Basis Decomposition

R-GCN for Entity Classification

R-GCN for Link Prediction

R-GCN vs DistMult / TransE

Summary

References

Share on

You May Also Enjoy

Open Problems and Future Directions in Sheaf Neural Networks

Sheaf GNNs for Molecular Property Prediction

Sheaf GNNs for Heterophilic Node Classification: Benchmarks and Results

Sheaves Meet Attention: Transformer-Inspired Sheaf Models