Andrej Risteski
andrejristeski.bsky.social
Andrej Risteski
@andrejristeski.bsky.social
Machine learning researcher. Professor in ML department at CMU.
Finally, this lens doesn't consider computation per layer. A typical GNN layer (e.g. a GCN layer) for the edge-based architecture would be more expensive compared to the node-based architecture. It would be nice if "rewiring" ideas in the literature can get best-of-both-worlds.
June 24, 2025 at 3:54 PM
This echoes a message from recent position papers (e.g. arxiv.org/pdf/2502.14546) that we need better benchmarks for graph-based tasks to "see" fine-grained differences between improved architectures/methods/etc. (This argument has been made more broadly for benchmarks in SciML)
arxiv.org
June 24, 2025 at 3:54 PM
Empirically, on "standard" benchmarks edge embeddings provide only modest benefit---but it's easy to construct (synthetic, diagnostic, but natural) datasets which have a "hub" structure on which edge embeddings perform dramatically better.
June 24, 2025 at 3:54 PM
The proof techniques are reminiscent of and inspired by techniques in time–space trade-offs in computational complexity, which measure "information flow" between variables in a function. A nice survey if interested here: cs.brown.edu/people/jsava...
cs.brown.edu
June 24, 2025 at 3:54 PM
In some ways, this supports recent remarks in a recent position paper arxiv.org/abs/2505.15547 that phenomena like oversquashing conflate topological and computational bottlenecks --- and in fact, that their interaction matters a lot.
Oversmoothing, Oversquashing, Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning
After a renaissance phase in which researchers revisited the message-passing paradigm through the lens of deep learning, the graph machine learning community shifted its attention towards a deeper and...
arxiv.org
June 24, 2025 at 3:54 PM
The separation manifests on graphs with topological bottlenecks (i.e. hub nodes), and when solving the task requires routing information through a narrow cut. The hub nodes need to either have a lot of memory or the length of the protocol has to grow.
June 24, 2025 at 3:54 PM
We show that there are natural functions computable by edge architectures with constant depth and memory. However, in any node-based architecture, depth x memory has to be Ω(√n), where n = # nodes of the graph. Moreover, without the memory constraint there is *no* separation.
June 24, 2025 at 3:54 PM
We examine a modest architectural change: edges maintain their own state (common GNNs maintain node states). This change was introduced in the literature mostly to naturally handle edge-centric tasks & edge features---but we show it has representational benefits too.
June 24, 2025 at 3:54 PM
We introduce an additional resource budget—memory per processor (viewing GNNs as a message-passing protocol), which ≈ models the dimensionality of the hidden representation (at finite precision), similar to the viewpoint in arxiv.org/abs/1907.03199
What graph neural networks cannot learn: depth vs width
This paper studies the expressive power of graph neural networks falling within the message-passing framework (GNNmp). Two results are presented. First, GNNmp are shown to be Turing universal under su...
arxiv.org
June 24, 2025 at 3:54 PM
The Weifeiler-Lehman hierarchy lens classifies GNNs according to their distinguishing power wrt the graph isomorphism problem (see e.g. arxiv.org/abs/2204.07697). While powerful, this lens ignores other computational resources that can influence expressiveness.
Theory of Graph Neural Networks: Representation and Learning
Graph Neural Networks (GNNs), neural network architectures targeted to learning representations of graphs, have become a popular learning model for prediction tasks on nodes, graphs and configurations...
arxiv.org
June 24, 2025 at 3:54 PM
Congratulations!
December 9, 2024 at 10:43 PM