60

Graph Algorithms in Neo4j: Triangle Count & Clustering Coefficient

 5 years ago
source link: https://www.tuicool.com/articles/hit/BzIbyei
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Graph algorithms provide the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways through which contagions or network failures spread, and the influences on and resiliency of groups.

This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using agraph database likeNeo4j.

Last week we once again looked at Community Detection algorithms, with a focus on the Louvain Modularity algorithm.

7bqIbiB.jpg!web

This week we wrap up our exploration of Community Detection algorithms, with a look at the Triangle Count and Average Clustering Coefficient algorithm , which measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique, and 0 when there are no connections.

About Triangle Count and Average Clustering Coefficient

Triangle Count is a community detection graph algorithm that is used to determine the number of triangles passing through each node in the graph. A triangle is a set of three nodes, where each node has a relationship to all other nodes.

Triangle counting gained popularity in social network analysis, where it is used to detect communities and measure the cohesiveness of those communities. It is also used to determine the stability of a graph and is often used as part of the computation of network indices, such as the clustering coefficient.

There are two types of clustering coefficients:

Local clustering coefficient

The local clustering coefficient of a node is the likelihood that its neighbors are also connected. The computation of this score involves triangle counting.

Global clustering coefficient

The global clustering coefficient is the normalized sum of those local clustering coefficients. The transitivity coefficient of a graph is sometimes used, which is three times the number of triangles divided by the number of triples in the graph. For more information, see Finding, Counting and Listing all Triangles in Large Graphs, An Experimental Study .”

When Should I Use Triangle Count and Clustering Coefficient?

Triangles Example

Let’s see how the Triangle Count and Clustering Coefficient algorithm works on a small dataset. The followingCypher statement creates a graph with people and KNOWS relationships between them.

MERGE (alice:Person{id:"Alice"})
MERGE (michael:Person{id:"Michael"})
MERGE (karin:Person{id:"Karin"})
MERGE (chris:Person{id:"Chris"})
MERGE (will:Person{id:"Will"})
MERGE (mark:Person{id:"Mark"})
MERGE (michael)-[:KNOWS]->(karin)
MERGE (michael)-[:KNOWS]->(chris)
MERGE (will)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(michael)
MERGE (mark)-[:KNOWS]->(will)
MERGE (alice)-[:KNOWS]->(michael)
MERGE (will)-[:KNOWS]->(chris)
MERGE (chris)-[:KNOWS]->(karin);

zARNryy.png!web

The following query finds all the KNOWS triangles between people in our graph.

CALL algo.triangle.stream("Person","KNOWS")
YIELD nodeA,nodeB,nodeC
MATCH (a:Person) WHERE id(a) = nodeA
MATCH (b:Person) WHERE id(b) = nodeB
MATCH (c:Person) WHERE id(c) = nodeC
RETURN a.id AS nodeA, b.id AS nodeB, c.id AS node

IbQV73E.png!web

We can see that there are KNOWS triangles containing “Will, Michael and Chris”, “Will, Mark and Michael”, and “Michael, Karin and Chris.” This means that everybody in the triangle knows each other. We work out the clustering coefficient of each person by running the following algorithm.

CALL algo.triangleCount.stream('Person', 'KNOWS')
YIELD nodeId, triangles, coefficient
MATCH (p:Person) WHERE id(p) = nodeId
RETURN p.id AS name, triangles, coefficient
ORDER BY coefficient DESC

eMbeQvz.png!web

We learn that Michael is part of the most triangles, but it’s Karin and Mark who are the best at introducing their friends – all of the people who know them, know each other!

Conclusion

As we’ve seen, the Average Clustering Coefficient is often used to estimate whether a network might exhibit “small-world” behaviors that are based on tightly knit clusters.

Next week we’ll look at graph algorithms in practice, where we’ll learn how to apply them in data-intensive applications using data from Yelp’s annual dataset challenge.

Find the patterns in your connected data

A Comprehensive Guide to Graph Algorithms in Neo4j
Read the Ebook

Explore: Clustering Coefficient community detection algorithms cypher global clustering graph algorithms Graph Analytics intelligent solutions local clustering Louvain Modularity Triangle Count

About the Author

Mark Needham & Amy E. Hodler , Neo4j

uyQzymi.jpg!web

Mark Needham is a Support Engineer for Neo4j. He also blogs about software development at markhneedham.com .

Amy is the Analytics and AI Program Manager at Neo4j. She believes a thriving graph ecosystem is essential to catalyze new types of insights. Accordingly, she helps ensure Neo4j partners are successful. In her career, Amy has consistently helped teams break into new markets at startups and large companies including EDS, Microsoft, and Hewlett-Packard (HP). She most recently comes from Cray Inc., where she was the analytics and artificial intelligence market manager.Amy has a love for science and art with an extreme fascination for complexity science and graph theory. When the weather is good, you’re likely to find her cycling the passes in beautiful Eastern Washington.

Neo4j Community Disclaimer

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK