30

Graph Algorithms in Neo4j: Louvain Modularity

 5 years ago
source link: https://www.tuicool.com/articles/hit/NvQVv2Z
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Graph algorithms provide the means to understand, model and predict complicated dynamicssuch as the flow of resources or information, the pathways through which contagions or network failures spread, and the influences on and resiliency of groups.

This blog series is designed to help you better utilize graph analytics and graph algorithms so you can effectively innovate and develop intelligent solutions faster using agraph database likeNeo4j.

Last week we continued our focus on Community Detection algorithms, with a look at the Label Propagation algorithm.

f6Nnami.jpg!web

This week we continue our exploration of Community Detection algorithms, with a look at theLouvain Modularity algorithm, which measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network.

About Louvain Modularity

The Louvain method of community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

The Louvain algorithm is one of the fastest modularity-based algorithms and works well with large graphs. It also reveals a hierarchy of communities at different scales, which is useful for understanding the global functioning of a network.

In order to understand the Louvain modularity algorithm, we must first look at modularity in general.

yyYBJnN.png!web

Modularity is a measure of how well groups have been partitioned into clusters. It compares the relationships in a cluster compared to what would be expected for a random (or other baseline) number of connections.

qiIBRfZ.png!web

The Louvain algorithm was proposed in 2008. The method consists of repeated application of two steps. The first step is a “greedy” assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network based on the communities found in the first step. These two steps are repeated until no further modularity-increasing reassignments of communities are possible.

When Should I Use Louvain?

TIP:Although the Louvain method, and modularity optimization algorithms more generally, have found wide applications across many domains, some problems with these algorithms have been identified:

1. The resolution limit

For larger networks, the Louvain method doesn’t stop with the “intuitive” communities. Instead, there’s a second pass through the community modification and coarse-graining stages, in which several of the intuitive communities are merged together. This is a general problem with modularity optimization algorithms; they have trouble detecting small communities in large networks. It’s a virtue of the Louvain method that something close to the intuitive community structure is available as an intermediate step in the process.

2. The degeneracy problem

There is typically an exponentially large (in network size) number of community assignments with modularities close to the maximum. This can be a severe problem because, in the presence of a large number of high modularity solutions, it’s hard to find the global maximum and difficult to determine if the global maximum is truly more scientifically important than local maxima that achieve similar modularity. Research shows that the different locally optimal community assignments have different structural properties. For more information, see The performance of modularity maximization in practical contexts.”

Louvain Example

Let’s see the Louvain algorithm in action. The following Cypher statement creates a graph of users and friends:

MERGE (nAlice:User {id:"Alice"})
MERGE (nBridget:User {id:"Bridget"})
MERGE (nCharles:User {id:"Charles"})
MERGE (nDoug:User {id:"Doug"})
MERGE (nMark:User {id:"Mark"})
MERGE (nMichael:User {id:"Michael"})
MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug);

buuUrym.png!web

Now we run Louvain to find communities in the social network. Execute the following query:

CALL algo.louvain.stream("User", "FRIEND", {})
YIELD nodeId, community
MATCH (user:User) WHERE id(user) = nodeId
RETURN user.id AS user, community
ORDER BY community;

ZZVJbyZ.png!web

Our algorithm found two communities with three members each.

Mark, Doug and Charles are all friends with each other, as are Bridget, Alice and Michael. Charles is the only one who has friends in both communities, but he has more in community four so he fits better in that one.

Conclusion

As we’ve seen, the Louvain Modularity algorithm is used to evaluate social structures in Twitter, LinkedIn and YouTube. It’s also used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring, which would be indicated by a higher relationship density than average.

Next week we’ll continue our focus on Community Detection algorithms, with a look at the Triangle Count and Average Clustering Coefficient algorithm.

Find the patterns in your connected data

A Comprehensive Guide to Graph Algorithms in Neo4j
Read the Ebook

Explore: community detection algorithms cypher detecting communities fraud analytics graph algorithms Graph Analytics intelligent solutions Louvain Modularity neo4j Social Networks

About the Author

Mark Needham & Amy E. Hodler , Neo4j

uyQzymi.jpg!web

Mark Needham is a Support Engineer for Neo4j. He also blogs about software development at markhneedham.com .

Amy is the Analytics and AI Program Manager at Neo4j. She believes a thriving graph ecosystem is essential to catalyze new types of insights. Accordingly, she helps ensure Neo4j partners are successful. In her career, Amy has consistently helped teams break into new markets at startups and large companies including EDS, Microsoft, and Hewlett-Packard (HP). She most recently comes from Cray Inc., where she was the analytics and artificial intelligence market manager.Amy has a love for science and art with an extreme fascination for complexity science and graph theory. When the weather is good, you’re likely to find her cycling the passes in beautiful Eastern Washington.

Neo4j Community Disclaimer

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK