Expressive power of graph neural networks and the Weisefeiler-Lehman test

TL;DR: Do you have a feeling that deep learning on graphs is a bunch of heuristics that work sometimes and nobody has a clue why? In this post, I discuss the graph isomorphism problem, the Weisfeiler-Lehman heuristic for graph isomorphism testing, and how it can be used to analyse the expressive power of graph neural networks. This is the first in the series of three posts on the expressivity of graph neural networks. In Part 2, I will discuss how to depart from the Weisfeiler-Lehman hierarchy and in Part 3, I will suggest why it may be a good idea to revisit the whole graph isomorphism framework.

ZRvArmJ.png!web

T raditional feed-forward networks (multi-layer perceptrons) are known to be universal approximators: they can approximate any smooth function to any desired accuracy. For graph neural networks, which have emerged relatively recently, the representation properties are less understood. It often happens to observe in experiments that graph neural networks excel on some datasets but at the same time perform disappointingly on others. In order to get to the root of this behaviour, one has to answer the question: how powerful are graph neural networks?

One of the challenges is that graphs encountered in applications are combinations of continuous and discrete structures (node- and edge features and connectivity, respectively), and thus this question can be posed in different ways. One possible formulation is whether graph neural networks can distinguish between different types of graph structures. This is a classical question in graph theory known as the graph isomorphism problem , aiming to determine whether two graphs are topologically equivalent [1]. Two isomorphic graphs have the same connectivity and differ only by a permutation of their nodes.

Somewhat surprisingly, the exact complexity class of the graph isomorphism problem is unknown. It is not known to be solvable in polynomial time nor to be NP-complete , and is sometimes attributed to a special “ GI class ” [2].

Weisfeiler-Lehman test.The seminal 1968 paper of Boris Weisfeiler andAndrey Lehman [3] proposed an efficient heuristic now known as the Weisfeiler-Lehman (WL) test that was initially believed to be a polynomial-time solution for the graph isomorphism problem [4]. A counterexample was found a year later; however, it appears that the WL test works for almost all graphs, in the probabilistic sense [5].

U3YRR3U.png!web

Example of execution of the Weisfeiler-Lehman test on two isomorphic graphs. Curled brackets denote multisets. The algorithm stops after the colouring does not change and produces an output (histogram of colours). Equal outputs for the two graphs suggest that they are possibly isomorphic.

The WL test is based on iterative graph recolouring [6] (“colour” in graph theory refers to a discrete node label), starting with all nodes of identical colour. At each step, the algorithm aggregates the colours of nodes and their neighbourhoods representing them as multisets [7], and hashes the aggregated colour multisets into unique new colours. The algorithm stops upon reaching a stable colouring. If at that point the colourings of the two graphs differ, the graphs are deemed non-isomorphic. However, if the colourings are the same, the graphs are possibly (but not necessarily) isomorphic. In other words, the WL test is a necessary but insufficient condition for graph isomorphism. There exist non-isomorphic graphs for which the WL test produces identical colouring and thus considers them “possibly isomorphic”; the test is said to fail in this case. One such example is shown in the following figure:

zuMr2eb.png!web

Two non-isomorphic graphs on which the WL graph isomorphism test fails, as evident from the identical colouring it produces. In chemistry, these graphs represent the molecular structure of two different compounds, decalin (left) and bicyclopentyl (right). Figure adapted from [14].

Graph isomorphism networks.Keyulu Xu [9] and Christopher Morris [10] (and at least two years earlier, Thomas Kipf in his blog post ) noticed that the WL test bears striking resemblance to graph message passing neural networks [8], a way of doing convolution-like operations on graphs. In a message-passing layer, the features of each node are updated by aggregating the features of the neighbours. The choice of the aggregation and the update operations is crucial: only multiset injective functions make it equivalent to the WL algorithm. Some popular choices for aggregators used in the literature such as maximum or mean are actually strictly less powerful than WL and fail to distinguish between very simple graph structures:

uU7JRvm.png!web

Examples of graph structures that cannot be distinguished by max but can be distinguished by mean aggregator (first and second) and can be distinguished by neither max nor mean (first and third). The reason is that the features aggregated in this way from the neighbours of the black node will be the same. Figure adapted from [9].

Xu proposed a choice of the aggregation and update functions that make message passing neural networks equivalent to the WL algorithm, calling them Graph Isomorphism Networks (GIN). This is as powerful as a standard message-passing neural network can get. But more than a new architecture, the main impact was formulating the question of expressiveness in a simple setting that could be related to a classical problem from graph theory. This idea has already spurred multiple follow-up works.

Weisfeiler-Lehman hierarchy.One direction of extending the results of Xu and Morris was using more powerful graph isomorphism tests. Proposed by László Babai, the k-WL test is a higher-order extension of the Weisfeiler-Lehman algorithm that works on k -tuples instead of individual nodes. With the exception of 1-WL and 1-WL tests that are equivalent, ( k +1)-WL is strictly stronger than k -WL, for any k ≥2, i.e. there exist examples of graphs on which k -WL fails and ( k +1)-WL succeeds, but not vice versa. k -WL is thus a hierarchy or increasingly more powerful graph isomorphism tests, sometimes referred to as the Weisfeiler-Lehman hierarchy [10].

It is possible to design graph neural networks that follow the k -WL test and are thus strictly more powerful than message passing architectures. One such first architecture, k -GNN, was proposed by Morris [11]. A key difference between traditional message passing neural networks and such higher-order GNNs is the fact that they are non-local , as the k -WL algorithm operates on k -tuples of nodes. This has significant implications both on the implementation of the algorithm and its computational and memory complexity: k -GNNs require ( nᵏ ) memory. As a way of mitigating complexity, Morris devised a local version of k -GNNs based on aggregation in local neighbourhoods, which however is less expressive than k -WL.

Somewhat different high-order graph architectures were proposed by Haggai Maron, in whose Ph.D. defence at Weizmann Institute I had the privilege to participate in September 2019. Maron defined a class of Invariant Graph Networks (IGN) based on k -order tensors [12] and showed they are as powerful as k -WL. IGNs are derived from a different variant of k -WL [10] and are more advantageous in terms of their complexity compared to k -GNNs. In particular, the 3-WL-equivalent IGN has “only” quadratic complexity, which is probably the only practically useful graph neural network architecture strictly more powerful than message passing, but still a far cry from the linear complexity of the former[16].

From a theoretical standpoint, the works on provably powerful graph neural networks provided a rigorous mathematical framework that can help interpret and compare different algorithms. There have been multiple follow-up works that extended these results using methods from graph theory and distributed local algorithms [14].

From a practical standpoint, however, there has hardly been a significant impact of these new architectures: for example, the latest benchmarks [15] show that recent provably powerful algorithms actually underperform older techniques. This is not an uncommon situation in machine learning where there are often big gaps between theory and practice. One of the explanations could be deficiencies in the benchmark itself. But perhaps more profound reasons are that better expressivity does not necessarily offer better generalisation (sometimes quite the opposite), and moreover, it is possible that the graph isomorphism model might not capture correctly the actual notion of graph similarity in specific applications — I will discuss this in my next posts. For sure, this line of works is extremely fruitful for building bridges to other disciplines and bringing methods previously not used in the field of deep learning on graphs.

Recommend

（一）Linux进程调度器-基础

滴滴顺风车为什么“难产”？

Bitwarden Roadmap

Apple’s Virtual WWDC Wasn’t Better Than the Real Thing

Python 工匠：在边界处思考

从千千静听到网易云，在线音乐20年沉浮录

你是哪年高考的，那年难吗？

阿里巴巴旗下东南亚电商Lazada宣布李纯出任新CEO

Are Banks Still Using Perl in 2020? The answer is no.

忍不住又和洪涛沟通了一次

About Joyk