What Is a Graph? Nodes, Edges, Features, and Labels
Published:
Graphs Are Everywhere
Most structured data is relational โ entities connected by relationships:
- Social networks: users (nodes) connected by friendships (edges)
- Molecules: atoms (nodes) connected by bonds (edges)
- Citation networks: papers (nodes) connected by citations (edges)
- Road networks: intersections (nodes) connected by roads (edges)
- Knowledge graphs: entities (nodes) connected by relations (typed edges)
Standard deep learning assumes inputs are grids (images), sequences (text), or fixed-size vectors. Graphs have variable size, irregular structure, and no canonical ordering โ making them fundamentally different.
Graph Anatomy
A graph G = (V, E) consists of:
V โ a set of nodes (also called vertices). V = N is the number of nodes. - E โ V ร V โ a set of edges. Each edge (u, v) โ E indicates a relationship between nodes u and v.
Node Features
Nodes are rarely bare identifiers. Each node v โ V has a feature vector x_v โ โ^d. Stacked into a matrix:
Examples:
- In a citation network: x_v = bag-of-words representation of the paper
- In a molecule: x_v = atom type, charge, hybridisation state
- In a social network: x_v = age, location, activity features
Edge Features
Edges can also carry features e_{uv} โ โ^k:
- In a molecule: bond type (single/double/aromatic), bond length
- In a knowledge graph: relation type (one-hot)
- In a road network: distance, speed limit, traffic volume
Labels
What you want to predict determines the task level:
| Task level | Label | Example |
|---|---|---|
| Node | y_v per node | Paper topic (node classification) |
| Edge | y_{uv} per edge | Will users u and v become friends? (link prediction) |
| Graph | y_G per graph | Is this molecule toxic? (graph classification) |
The Adjacency Matrix
A graphโs structure is encoded in an adjacency matrix A โ {0,1}^{NรN}:
For an undirected graph, A is symmetric. For a weighted graph, A[u,v] = weight of edge (u,v).
The adjacency matrix is rarely stored explicitly for large graphs (too sparse) โ instead, edge lists or sparse formats are used.
Neighbourhood
The neighbourhood of node v is the set of nodes directly connected to it:
| The degree of node v is | N(v) | โ the number of neighbours. Degree is one of the most fundamental structural properties of a node. |
What GNNs Learn
A GNN takes as input:
- The graph structure (adjacency matrix or edge list)
- Node features X
- (Optionally) Edge features
And produces as output:
- Node embeddings h_v โ โ^dโ for each node (used for node classification)
- Edge embeddings h_{uv} for each edge (used for link prediction)
- Graph embedding h_G โ โ^dโ (used for graph classification)
The core operation: each node aggregates information from its neighbours, combines it with its own features, and updates its representation โ iterating this over multiple rounds.
Summary
| Concept | Notation | Example | ย | ย |
|---|---|---|---|---|
| Node set | V, | V | =N | Papers, atoms, users |
| Edge set | E, | E | =M | Citations, bonds, friendships |
| Node features | X โ โ^{Nรd} | Bag-of-words, atom type | ย | ย |
| Edge features | E โ โ^{Mรk} | Bond type, relation type | ย | ย |
| Adjacency matrix | A โ {0,1}^{NรN} | Who is connected to whom | ย | ย |
| Node label | y_v | Paper topic | ย | ย |
| Graph label | y_G | Molecule toxicity | ย | ย |
Graphs are the natural language of relational data. GNNs are the deep learning architectures that speak it.
References
- Bondy, J. A., & Murty, U. S. R. (2008). Graph Theory. Springer. โ Standard reference for graph theory fundamentals.
- Bronstein, M. M., Bruna, J., Cohen, T., & Veliฤkoviฤ, P. (2021). Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv preprint.
