R-GCN: Relational Graph Convolutional Networks

4 minute read

Published:

TL;DR: R-GCN (Schlichtkrull et al., 2018) adapts GCN for knowledge graphs with typed edges. Each relation type r has its own weight matrix W_r. Node h_v aggregates messages from neighbours for each relation separately, then sums. The main challenge: |R| weight matrices → over-parameterisation. Solved by basis decomposition.

The Problem: Multi-Relational Graphs

A knowledge graph (KG) is a directed multigraph where edges have types (relations). For example, Freebase contains entities (John_Lennon, Beatles, UK) connected by relations (member_of, born_in, from_country).

Standard GCN cannot handle this — it uses a single aggregation weight and cannot distinguish “member_of” from “born_in” edges.

The R-GCN Update Rule

h^{(k+1)}_v = σ( W^{(k)}_0 h^{(k)}_v + Σ_{r ∈ R} Σ_{u ∈ N_r(v)} (1/c_{v,r}) W^{(k)}_r h^{(k)}_u )

Where:

  • N_r(v) = neighbours of v connected via relation r
  • c_{v,r} = normalisation constant (typicallyN_r(v))
  • W^{(k)}_0 = self-loop weight (own representation)
  • W^{(k)}_r = relation-specific weight matrix

Interpretation: for each relation r, node v collects messages from all neighbours connected by r, transforms them by W_r, and sums. All relation-specific sums are then added together with the self-loop.

This is equivalent to running a separate GCN on each relation’s adjacency subgraph and summing the results.

Directionality: Knowledge graphs are directed. R-GCN handles this by treating each directed relation r and its inverse r^{-1} as two separate relation types. The "A-member_of-B" edge creates a message from B to A using W_{member_of}, and an inverse message from A to B using W_{member_of^{-1}}. This lets information flow in both directions.

The Parameter Problem: Basis Decomposition

For a KG withR= 100 relations and embedding dimension d = 200, the weight matrices have 100 × 200 × 200 = 4 million parameters — just for one layer. This is prone to overfitting, especially when many relations have few training examples.

Basis decomposition (the R-GCN solution): express each W_r as a linear combination of B shared basis matrices:

W_r = Σ_{b=1}^{B} a_{rb} · V_b
Where V_b ∈ ℝ^{d×d} are shared bases and a_{rb} are relation-specific scalars. This reduces parameters fromR×d² to B×d² +R×B — much smaller when B «R.

Block-diagonal decomposition (alternative): partition d into blocks, and each relation’s weight is block-diagonal. This reduces computation without sharing bases.

R-GCN for Entity Classification

Task: given a KG with some labelled entities, predict labels for unlabelled entities.

Example: Freebase entity type classification — is this entity a Person, Organisation, or Location?

Setup:

  • Entity features: one-hot or learned embeddings
  • 2-layer R-GCN
  • Final h^{(K)}_v fed to softmax classifier
  • Trained with cross-entropy on labelled entities

On the AIFB and MUTAG entity classification benchmarks, R-GCN outperforms hand-crafted KG embedding methods.

Task: predict missing triples (subject, relation, object) — i.e., “does this relation exist between these two entities?”

Setup: R-GCN as an encoder, DistMult as a decoder.

  1. Encoder: run R-GCN to get entity embeddings e_s, e_o
  2. Decoder (DistMult): score the triple (s, r, o):
f(s, r, o) = e_s · diag(W_r) · e_o
  1. Training: binary cross-entropy with negative sampling

This encoder-decoder split (GNN encodes structure, shallow decoder scores triples) is a common pattern for KG link prediction.

R-GCN vs DistMult / TransE

Before R-GCN, KG link prediction used shallow embedding methods:

  • TransE: entity embeddings s + r ≈ o
  • DistMult: e_s · W_r · e_o scoring
  • ComplEx: complex-valued extensions

These encode individual entity/relation embeddings but ignore graph structure. R-GCN adds structural context: entity embeddings are informed by their graph neighbourhood, not just their own learned vectors.

Summary

PropertyR-GCN
Handles typed edgesYes — separate W_r per relation
Handles typed nodesPartial — via features; no type-specific architecture
Scales to many relationsVia basis decomposition
TasksEntity classification, link prediction
Key hyperparameterNumber of bases B (usually 2–10)
LimitationDoes not use attention; all neighbours equally weighted

R-GCN is the foundational model for applying GNNs to knowledge graphs. It introduced the relation-specific weight matrix pattern that nearly all subsequent heterogeneous GNN architectures inherit.

References