15

Geometric Deep Learning: A Quick Tour

 3 years ago
source link: https://towardsdatascience.com/geometric-deep-learning-a-quick-tour-12cef72492ca?gi=408a6e8b50d0
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The following document provides a whirlwind tour of some fundamental concepts in geometric deep learning.

yEZF7rJ.png!web

Photo by Clint Adair onUnsplash

Find the latex-written version of this article here

The following document provides a whirlwind tour of some fundamental concepts in geometric deep learning. The mathematical derivations might not be as rigorously shown and some equations are stated without proofs. This is done intentionally to keep the document short yet comprehensive enough. For deeper analysis, please refer to Bronstein et al. (2017), Gong (2018), Kipf et al. (2017), Hammond et al. (2011), Shuman et al. (2011) and http://geometricdeeplearning.com/ . This is a living document, so please let me know if you find any errors or inconsistency, I will fix them as soon as possible. I am by no means an expert in this field as this note is merely written for my own learning purpose.

We first start by reviewing the concept of a Laplacian matrix in a graph and its’ eigenvalue decomposition. Next, we define convolution operation in the graph and show that it is equivalent to applying a filter in the spectral domain of the graph, where spectrum here refers to the eigenvalues of the laplacian. We show that spectral convolution, by construction, is similar to applying Laplacian operator to function. Lastly, we show how to approximate the Laplacian filter with the sum of Chebyshev polynomials to reduce the algorithm complexity (as performing the complete eigenvalue decomposition is O(n²) where n is the number of vertices in the graph.

Graph Laplacian

Assume we have an undirected, connected and weighted graph G = (V, E, W) where V is a set of |V| = n vertices, E is a set of edges and W is a set of weights W ᵢⱼ for each edge i~j . Define D as the degree matrix where D = diag(Σⱼ W ᵢⱼ). The normalized graph laplacian Δ can be defined as the following.

We can perform eigenvalue decomposition to the laplacian matrix Δ as the following.

Spectral Convolution

Motivated by the convolution operation in Fourier Transform, we define graph convolution as applying a filter to the laplacian spectral components. In a nutshell, the idea is to project the input function f to its’ “Fourier” basis, or here laplacian eigenvectors by means of Φf . Multiply the basis with filtered eigenvalues ĥ(Λ) , resulting in ĥ(Λ) Φf . Lastly, apply an “inverse Fourier transform” by dot product with Φ , resulting in Φ ĥ(Λ) Φf .

mQBJ7bq.png!web

After applying some rearrangements, we can see that the right hand side of the equation can be converted to a function of the normalized laplacian matrix Δ = Φ Λ Φᵀ from equation 5.

Now, similar to a convolutional neural network, we would like to apply more than 1 convolutional filters and also add a nonlinear output transformation. Suppose we use p filters with ξ nonlinear transformation, the spectral convolution network is defined as the following.

qEfUn2B.png!web

Approximation with Chebyshev Polynomials

ChebNets

The main problem with equation 12 is that we need to calculate all eigenvectors which is O(|V|²) in complexity. This is not scalable when we deal with a very large graph with |V|>>. The main idea in ChebNets is to approximate the Φ ĥ (Λ) Φᵀ with a sum of orthonormal Chebyshev polynomials that can be calculated iteratively using the following formula.

We first approximate the eigenvalue filter with a weighted sum of r Chebyshev polynomials that takes in the transformed eigenvalue matrix Λ^. This transformation is needed because Chebyshev polynomials form an orthogonal basis only when the input domain interval is in [-1, 1] . Meanwhile, the eigenvalues range from [0, λ ₘₐₓ ] . Hence to transform the latter to the former, we can apply 2x/λ ₘₐₓ -1 such that for x = 0, (2(0)/λₘₐₓ-1 = -1) and for x = λₘₐₓ, (2λₘₐₓ/λₘₐₓ -1 = 1).

We can now take equation 12 and substitute the filter with its’ Chebyshev polynomials. Note that Λ^ = 2 Λ / λₘₐₓ - I and similarly, Δ^ = 2 Δ/ λₘₐₓ - I . By means of rearrangement, we can transform the right hand side into a sum of Chebyshev polynomials applied on the normalized laplacian Δ^ .

nEv6Bn.png!web

GraphConvNets

GraphConvNets is a special case of ChebNets where we only take 2 Chebyshev polynomial terms r = 2 . Since the normalized laplacian Δ is used, it has been proven that λₘₐₓ = 2 . We can take equation 21 and substitute the given assumptions.

ZbuMjuI.png!web

We now have a graph convolution network that is defined purely using degree matrix D and weight matrix W . To further simplify, we assume α₀ = -α₁ = α.

qEBb2eQ.png!web

Although in practice, since the eigenvalues are in range [0, 2], repeated application of this multiplication could result in numerical instabilities. Hence we apply further normalization by letting Ŵ = W + I and D̂ = diag(Σ{j ≠i} ŵᵢⱼ).

nmuMJfB.png!web

Using these assumptions, GraphConvNets bypasses the need for explicit calculation of Chebyshev polynomials and results in the pure application of weighted sum operation on each vertex’s neighbors.

References

Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., and Vandergheynst, P. (2017). Geometric deep learning: going beyond Euclidean data. IEEE Signal Processing Magazine, 34(4):18{42. arXiv: 1611.08097.

Gong, S. (2018). Geometric Deep Learning. Imperial College London MSc Thesis.

Hammond, D. K., Vandergheynst, P., and Gribonval, R. (2011). Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129{150.

Shuman, D. I., Vandergheynst, P., and Frossard, P. (2011). Chebyshev Polynomial Approximation for Distributed Signal Processing. 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS), pages 1-8. arXiv: 1105.1891.

T. Kipf, M. Welling, Semi-supervised Classification with Graph Convolutional Networks , ICLR 2017


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK