HAN: Heterogeneous Graph Attention Networks

4 minute read

Published:

TL;DR: HAN (Wang et al., 2019) uses pre-defined meta-paths to decompose a heterogeneous graph into multiple homogeneous views. GAT-style attention aggregates neighbours within each view (node-level attention), and a second attention layer combines the meta-path views (semantic-level attention). The model learns both which neighbours matter and which relationships matter.

The Two-Level Attention Idea

HAN’s key insight: in a heterogeneous graph, not all neighbours are equally informative, and not all relationship types are equally relevant to the task. Two separate attention mechanisms handle these two sources of variability.

  • Node-level attention: given a meta-path Φ, weight the importance of different neighbours connected via Φ
  • Semantic-level attention: weight the importance of different meta-paths Φ₁, …, Φ_P for the overall prediction

Step 1: Meta-Path-Based Neighbour Projection

For each meta-path Φ, the heterogeneous graph contains a projected meta-path graph: a homogeneous graph where an edge connects node u to node v if there exists a path of type Φ between them.

For example, in an academic network:

  • APA (Author → Paper → Author): co-authorship graph
  • APCPA (Author → Paper → Conference → Paper → Author): same-venue collaboration graph

Each meta-path gives a different view of the Author-Author relationships.

Before attention, project all node features to a common dimension (since different node types may have different feature dimensions):

h'_v = σ( W_{τ(v)} · h_v )

Step 2: Node-Level Attention (Within Meta-Path)

For a given meta-path Φ, apply GAT-style attention over meta-path neighbours:

Compute unnormalised attention between nodes i and j:

e^Φ_{ij} = σ( a^Φ · [h'_i || h'_j] )
Where a^Φ ∈ ℝ^{2d’} is a meta-path-specific attention vector and denotes concatenation.

Normalise with softmax:

α^Φ_{ij} = softmax_j( e^Φ_{ij} ) = exp(e^Φ_{ij}) / Σ_{k ∈ N^Φ_i} exp(e^Φ_{ik})

Aggregate:

z^Φ_i = σ( Σ_{j ∈ N^Φ_i} α^Φ_{ij} · h'_j )

z^Φ_i is node i’s embedding under meta-path Φ. With multi-head attention, concatenate K heads:

z^Φ_i = ||_{k=1}^{K} σ( Σ_{j ∈ N^Φ_i} α^{Φ,k}_{ij} h'_j )

Step 3: Semantic-Level Attention (Across Meta-Paths)

After computing {z^{Φ_1}_i, …, z^{Φ_P}_i}, we need to combine them. Each meta-path may contribute differently to the task.

Learn a meta-path importance score β^{Φ_p} (shared across nodes, since a meta-path’s importance is a global property):

w^{Φ_p} = (1/|V|) Σ_{i ∈ V} q^T · tanh( W · z^{Φ_p}_i + b )

Normalise:

β^{Φ_p} = exp(w^{Φ_p}) / Σ_p exp(w^{Φ_p})

Final embedding:

h_i = Σ_{p=1}^{P} β^{Φ_p} · z^{Φ_p}_i
What semantic attention learns: If the task is author classification (research area prediction), APA (co-authorship) may get high β — co-authors usually work in the same area. APCPA (same venue) may get lower β — researchers can publish in the same venue across areas. HAN learns this automatically from labeled data, without manual meta-path weighting.

Full HAN Pipeline

Input: Heterogeneous graph G with multiple node/edge types
       Pre-defined meta-paths {Φ₁, ..., Φ_P}

For each meta-path Φ_p:
  1. Project node features to common space
  2. Construct meta-path graph (who is connected via Φ_p?)
  3. Apply node-level GAT attention → z^{Φ_p}_i for each node

Semantic-level attention:
  4. Compute meta-path importance β^{Φ_p}
  5. Weighted combination: h_i = Σ_p β^{Φ_p} z^{Φ_p}_i

Classifier:
  6. MLP(h_i) → ŷ_i

Limitations

  1. Manual meta-path definition: domain knowledge required to define meaningful meta-paths. Wrong meta-paths → poor performance.

  2. Combinatorial explosion: for complex HINs with many relation types, the number of meaningful meta-paths grows combinatorially.

  3. Node-type coverage: HAN focuses on target node type; other node types are only intermediaries in meta-paths.

  4. Scalability: constructing meta-path graphs and running multiple GATs is expensive for large graphs.

Later architectures (HGT: Heterogeneous Graph Transformer) address these by learning relation-specific attention without explicit meta-path definition.

Summary

ComponentWhat it learns
Node projection W_{τ(v)}Common embedding space across types
Node-level attention α^Φ_{ij}Which neighbours matter (per meta-path)
Semantic attention β^{Φ_p}Which meta-paths matter (per task)
Multi-head node attentionMultiple perspectives within each meta-path

HAN was the first model to apply hierarchical attention to heterogeneous graphs, showing that the which-relation matters as much as the which-neighbour problem in complex multi-relational data.

References