This is the blogpost for the paper 'ON GRAPH NEURAL NETWORKS VS GRAPH AUGMENTED MLPS' written by Lei Chen, Zhengdao Chen, Joan Bruna, published as a conference paper at ICLR 2021.

I. Introduction

GNNs (Graph Neural Networks) have been growing in popularity over the last years and researchers are asking themselves if there is an evident advantage of GNNs over other neural network methods such as the GA-MLP (Graph Augmented Multi-Layer Perceptron). This paper poses a comprehensive analogy between GNNs and GA-MLPs, highlighting both limitations and advantages of the previously mentioned methods.

II. GNNs

GNNs (Graph Neural Networks) are used in multiple applications, from image classification to drug discovery, and because they function on computations at node neighborhood levels, they allow for great flexibility and design power at multiple network depths.

The trainable functions mentioned above - AGGREGATE, COMBINE and READOUT -, can be chosen differently such that they result in different GNN architectures.

III. GA-MLPs

GA-MLPs (Graph Augmented Multi-Layer Perceptrons) are considered to be a more simplified version of GNNs, in which the nodes are initially augmented and then applied learnable node level functions on.

Although GNNs face issues such as over-smoothing, exploding gradients and bottleneck effects on training ^{[ii] [iii] [iv] [v] [vi] [vii]}, GA-MLPs cannot be considered a better alternative due to other disadvantages such as not being able to approximate certain type of node attributed walk functions.

Figure 1: Example of a graph pair that cannot be differentiated by GA-MLPs but can be by GNNs^[i].

IV. Related Works

IV. 1. Depth in GNNs

As briefly mentioned in the introduction, GNNs face certain issues when the depth of the network grows too large, which results in an exponential loss of expressive power ^[viii]. However, the authors of this paper focus more on general GNNs, with finite depths, in which case GNNs present an advantage over GA-MLPs. The authors also provide an example of deep GNNs architectures ^{[ix] [x]}with impressive results even with networks running on 30 layers. Thus, there is still an outgoing need to study the depths of GNNs.

IV.2. Existing GA-MLP-type models

Several models of GA-MLPs have been considered to outrun GNNs by enhancing computational performance and one such example is the Simple Graph Convolution (SGC)^[xi]. SGCs work by removing intermediary weights and non-linearities in GCNs. Other proposed GA-MLP-type models are GFNs (Graph Feature Networks)^[xii], gfNNs (Graph Filter Neural Networks)^[xiii]and SIGNs (Scalable Inception Graph Neural Networks)^[xiv].

IV.3. Expressive Power of GNNs

Taking the Weisfeiler-Lehman (WL)^[xv] graph isomorphism test as a baseline, classical GNNs are no better at distinguishing between non-isomorphic pair graphs.

Thus, there have been plenty of attempts to build GNN models whose expressive powers are not limited by WL^{[xvi][xvii][xviii][xix][xx][xxi][xxii][xxiii][xxiv]}.

V. Methodology

The focus of the experiments is analyzing the expressive power of GNNs and GA-MLPs, first as graph isomorphism tests and then as functions on rooted graphs. For both cases, the authors conclude on a few propositions.

V.1. As Graph Isomorphism Tests

The authors raise the question if graph isomorphism tests alone are enough for comparing the expressive powers of GNNs and GA-MLPs. It has been proved that GNNs can distinguish certain graphs that GA-MLPs cannot, while GA-MLPs have been found to possibly suppress WL tests, accomplishment that GNNs do not poses.

V.2. As Functions on Rooted Graphs

The authors consider the models as functions on egonets ^[xxv], which are rooted graphs, and attempt fitting a function to the input-output pairs (𝐺𝑛[𝑖𝑛],𝑌𝑖𝑛). 𝐺𝑛[𝑖𝑛] is the rooted graph and node i the root in order to evaluate GA-MLPs’ ability to approximate functions on the space of rooted graphs, ℰ.

VI. Experiments and Results

VI.1. Equivalence Classes and Egonets

As mentioned in propositions i) and ii) from V.2., the experiment for retrieving the number of equivalence classes caused by GNNs and GA-MLPs is performed on rooted graphs whose node features have been removed. A WL type process is applied on GNNs of 𝐾-depth while for GA-MLPs, the authors compare the augmented features based on proposition i) from V.2., arriving to the conclusion that the number of equivalence classes induced by GA-MLP-𝐴 is lower than that of GNN ones. A more noticeable difference can be seen in Table 1^[xxvii].

Table 1: Difference in the number of equivalence classes induced by selected GNNs and GA-MLPs ^[i].

VI.2. Attributed Walks

For this experiment, data from the Cora dataset and a generated RRG (Regular Random Graph) have been used. The authors assigned the nodes to either node feature blue (even indexed nodes) or red (odd indexed nodes). Some results can be seen in Table 2 ^{[xxviii] [xxix]}.

Table 2: Counting attributed walks on Cora and RRG. The “+” models contain double the operator powers. [i]

VI.3. SBM (Stochastic Block Models)

As previously mentioned in this blogpost, GNNs are more flexible compared to GA-MLPs. The authors compare two variants of GA-MLP model, GA-MLP-A with K =120, GA-MLP-H with K =120 and Ω generated from the Bethe Hessian matrix^[xxx]with a sGNN. For this purpose, they compare community detection on binary (two underlying communities) SBM, as seen in Figure 2 ^[xxxi]. One of the results of this experiment is that for some Ω GA-MLPs, the performance might not be great. On the other hand, GNNs are better in this aspect since their flexibility allows for a higher learning capability.

Figure 2: Community detection for 5 choices of group connectivities on binary SBM. ^[i]

VIII. Conclusions

While both GNNs and GA-MLPs have their nuanced advantages and disadvantages, GA-MLPs do offer a more scalable computational capability while GNNs offer flexibility and higher capability of discovering new features. With suitable operators, GA-MLPs can distinguish almost all non-isomorphic graphs. Unlike GNNs, GA-MLPs are not able to count attributed walks.

IX. Own Review

IX.1. Strengths

I found the paper to be quite comprehensive, while the authors provide clear advantages to when and where one might use GNNs over GA-MLPs, and vice versa. Throughout the paper, the authors present quite a few nuanced differences between GNNs and GA-MLP type models, at the same time leaving the reader with a clear consideration that both methods have their advantages and disadvantages. Considering that not all factors were analyzed in the paper (e.g., optimization was not taken into consideration), the authors managed to keep a fair score between the two methods and focused more on how one can use the methods to benefit the needs of the research scope, e.g., GNNs with their flexibility and GA-MLPs with their computational scalability.

IX.2. Weaknesses

The experiment part could have been more detailed and the GA-MLP types that were mentioned as examples could have been briefly described. The authors could have focused on just one type of GA-MLP and provide a more detailed comparison, and not enumerate as many options. A less general approach to the comparison could have been more engaging.

IX.3. Future Work

A great idea for a future work was already mentioned by the authors in the conclusions part, which would be to integrate optimization into the observations and see how that compares to the other factors mentioned in the paper (e.g., attributed walks, equivalence classes).

References

i Lei Chen, Zhengdao Chen, Joan Bruna. On Graph Neural Networks Versus Graph-Augmented MLPs. Published as a conference paper at ICLR 2021

ii Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.

iii Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 3538–3545. Association for the Advancement of Artificial Intelligence, 2018.

iv Sitao Luan, Mingde Zhao, Xiao-Wen Chang, and Doina Precup. Break the ceiling: Stronger multiscale deep graph convolutional networks. In Advances in neural information processing systems, pp. 10945–10955, 2019.

v Kenta Oono and Taiji Suzuki. Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020.

vi Emanuele Rossi, Fabrizio Frasca, Ben Chamberlain, Davide Eynard, Michael Bronstein, and Federico Monti. Sign: Scalable inception graph neural networks. arXiv preprint arXiv:2004.11198, 2020.

vii Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. arXiv preprint arXiv:2006.05205, 2020.

viii Kenta Oono and Taiji Suzuki. Graph neural networks exponentially lose expressive power for node classification. In International Conference on Learning Representations, 2020.

ix Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. Deepgcns: Can gcns go as deep as cnns? In Proceedings of the IEEE International Conference on Computer Vision, pp. 9267–9276, 2019.

x Guohao Li, Chenxin Xiong, Ali Thabet, and Bernard Ghanem. Deepergcn: All you need to train deeper gcns. arXiv preprint arXiv:2006.07739, 2020a.

xi Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. volume 97 of Proceedings of Machine Learning Research, pp. 6861–6871, Long Beach, California, USA, 09–15 Jun 2019. PMLR.

xii Ting Chen, Song Bian, and Yizhou Sun. Are powerful graph neural nets necessary? A dissection on graph classification. arXiv preprint arXiv:1905.04579, 2019a.

xiii Hoang NT and Takanori Maehara. Revisiting graph neural networks: All we have is low-pass filters. arXiv preprint arXiv:1905.09550, 2019.

xiv Emanuele Rossi, Fabrizio Frasca, Ben Chamberlain, Davide Eynard, Michael Bronstein, and Federico Monti. Sign: Scalable inception graph neural networks. arXiv preprint arXiv:2004.11198, 2020.

xv B Weisfeiler and A Leman. The reduction of a graph to canonical form and the algebra which appears therein. Nauchno-Technicheskaya Informatsia, 2(9):12-16, 1968.

xvi Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. Association for the Advancement of Artificial Intelligence, 2019.

xvii Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. In Advances in Neural Information Processing Systems, pp. 2153–2164, 6 2019a.

xviii Zhengdao Chen, Soledad Villar, Lei Chen, and Joan Bruna. On the equivalence between graph isomorphism testing and function approximation with gnns. In Advances in Neural Information Processing Systems, pp. 15868–15876, 2019c.

xix Christopher Morris and Petra Mutzel. Towards a practical k-dimensional weisfeiler-leman algorithm. arXiv preprint arXiv:1904.01543, 2019.

xx Jiaxuan You, Rex Ying, and Jure Leskovec. Position-aware graph neural networks. In International Conference on Machine Learning, pp. 7134–7143, 2019.

xxi Giorgos Bouritsas, Fabrizio Frasca, Stefanos Zafeiriou, and Michael M Bronstein. Improving graph neural network expressivity via subgraph isomorphism counting. arXiv preprint arXiv:2006.09252, 2020.

xxii Pan Li, Yanbang Wang, Hongwei Wang, and Jure Leskovec. Distance encoding–design provably more powerful gnns for structural representation learning. arXiv preprint arXiv:2009.00142, 2020b.

xxiii Daniel Flam-Shepherd, TonyWu, Pascal Friederich, and Alan Aspuru-Guzik. Neural message passing on high order paths. arXiv preprint arXiv:2002.10413, 2020.

xxiv Ryoma Sato, Makoto Yamada, and Hisashi Kashima. Approximation ratios of graph neural networks for combinatorial problems. In Advances in Neural Information Processing Systems, pp. 4081–4090, 2019.

xxv Victor M Preciado and Ali Jadbabaie. From local measurements to network spectral properties: Beyond degree distributions. In 49th IEEE Conference on Decision and Control (CDC), pp. 2686–2691. IEEE, 2010.

xxvii Sergei Ivanov, Sergei Sviridov, and Evgeny Burnaev. Understanding isomorphism bias in graph data sets. arXiv preprint arXiv:1910.12091, 2019.

xxviii Lei Chen, Zhengdao Chen, Joan Bruna. On Graph Neural Networks Versus Graph-Augmented MLPs. Published as a conference paper at ICLR 2021

xxix Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019.

xxx Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborov´a, and Pan Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935–20940, 2013.

xxxi Lei Chen, Zhengdao Chen, Joan Bruna. On Graph Neural Networks Versus Graph-Augmented MLPs. Published as a conference paper at ICLR 2021

Seitenhierarchie

ON GRAPH NEURAL NETWORKS VERSUS GRAPH AUGMENTED MLPS