An Intuitive Explanation of EmbedS

Most GDL techniques ignore the semantics of the network. EmbedS aims to represent semantic relations as well.

Introduction

In the previous stories, we discussed several GDL techniques that approach the node representation problem from different perspectives. We started with a transductive approach named DeepWalk and continued with an inductive method GraphSAGE. In the last story, we discussed NeoDTI to learn task-specific node embeddings from heterogeneous graphs. You can find the story below.

In this story, we will conclude the series on GDL methods with EmbedS. EmbedS targets a common deficiency in most GDL models. Though existing approaches can reflect neighborhood-based similarities in the node embeddings effectively, they are insufficient to embed higher-level semantics such as subclass (man is a subclass of the human) relations. This type of semantics is common in knowledge graphs (KG) such as DBPedia, and FreeBase and should be represented alongside the node embeddings.

A KG is different from the networks we have analyzed so far in the sense that, it contains semantically richer information. KGs frequently contain classes, hierarchies, and their interactions, in addition to members of these classes. For instance, a KG that contains people and TV series will include a categorization of people and the TV series, whereas the networks we have seen so far represented only the interaction between the members of these classes. Below we can see such a KG.

A KG to represent interactions between people and TV Series

In the displayed KG, the solid lines denote the starred relation between a person and a TV series, whereas the dashed line represents canStartIn an inter-class relation. We can always add more relation types such as canWatch, canCreate between these two classes or introduce new classes such as Movies. If we have used an HN, we would had only the links between members, no class hierarchy, no inter-class interactions.

Remark that though KGs look like HNs, we cannot represent the class hierarchy in an HN. A KG class is like a hyper-node that contains a group of nodes and interacts with other classes, rather than a simple node that interacts with nodes of different types. In this manner, KGs are incredibly powerful and extendible representation schemes with rich semantics. The rich semantics in KGs cannot be effectively embedded in a space with regular HN embedding techniques such as NeoDTI. EmbedS targets this problem.

EmbedS

EmbedS is presented as an ontology-aware graph embedding method that aims to represent high-level semantics geometrically. EmbedS works on RDFS format, which a markup-scheme to represent semantics in the networks. For the purpose of this study, they have defined five predicates (or relations) as type, subClassOf, subPropertyOf, domain and range. The aim is to learn property representations, as well as class and node representations.

To represent nodes, they randomly create N-d imensional vectors and update them during the optimization, similar to previous works. On the other hand, they define regions to represent classes by an N -dimensional vector and a radius r . Properties are also represented with regions where the center is represented with two N -dimensional vectors and a radius r . Similar to node vectors, class and property centers and their radiuses are updated during the optimization. Thus, property and class regions enlarge and shrink over time.

Now we will explain the intuition behind the loss function that will be used to update the model parameters. EmbedS uses a two-part loss function where they are summed to obtain the total loss. The first part is defined to reflect the neighborhood information between nodes. In other words, the first part is minimized, when the embeddings of the two nodes can be used to restore the interaction (or link, edge) between these nodes, similar to NeoDTI approach.

In the second part of the loss function, the aim is to preserve the relations defined by the five predicates listed above. EmbedS associate a cost to each predicate type based on the predicate itself, classes and the nodes. Since the nodes are points and the classes and properties are regions, we have two different cases for the loss. We can either compute a loss between a point and a region or between two regions.

To compute the loss between a point and a region, we compute the L2 distance between the node and region center and then subtract the radius of the region. To compute a region-to-region cost, we compute the L2 distance between the centers and then subtract the sum of the radiuses. Below we can see a figure that displays the cost between a node and class, where the cost is shown with a red line.

Cost computation from between an entity and a class.

To minimize such a loss function, the parameters must be adjusted to satisfy the predicates, not just the neighborhood relations. During the optimization, radiuses of each class and predicate will be updated to minimize the loss as well as the centers. For instance, we expect as a result that, if class x is subclass of class y , then the region of x is inside the region of y. Therefore, the class hierarchy is reflected in the geometric space. The same holds for other predicates as well.

Therefore, high-level semantics are reflected in the space and EmbedS is ontology aware.

Conclusion

EmbedS is a step to represent richer semantics in the networks, which is missing for most GDL techniques. Representing classes and predicates with regions, instead of points in the space is a novel approach that can be investigated further. However, the defined loss function requires predicate-specific engineering, which can be hard to achieve for large networks. Furthermore, EmbedS is not extensively tested at the moment and its performance across tasks is questionable.

An Intuitive Explanation of EmbedS