Can your People Analytics do this?

Supercharging People Analytics with Graph Technology

Aug 15 ·8min read

Uf6BNbU.jpg!web

Background vector created by kjpargeter — www.freepik.com

In a previous article, I made the case for Graphs in People Analytics. You can read the post here: https://towardsdatascience.com/the-dawn-of-a-new-era-for-people-analytics-9a748f7fdc2 . In this article, I’ll delve a bit more deeply into graph technology and demonstrate how a People Graph can enable three key areas of Human Capital Management: understanding organisational networks, tackling underperformance and developing talent .

Graph technology is particularly well suited to People Data applications, where the underlying data is highly connected and where the relationships we want to model and understand are changing and evolving all of the time. We’ll see how graphs enable exploratory analysis where the data is highly connected. We’ll also see how graph algorithms can provide us with valuable insights into our organisational networks; insights that are hard to obtain from conventional relational structures.

The data

I’ll use the HR Metrics and Analytics data made available by the New England College of Business, which is a synthetic dataset comprising of 300 employee records. I’ll only use a subset of the data available for these 300 employees to build the graph including name, manager, department, skills and a reference to how that employee was acquired.

Building the graph

I’ll be using neo4j to host the graph and build the graph using an Object Graph Model with the neomodel library and an API for applications to interact with the graph model objects. Before we start to query the graph, I’ll first explain these concepts and why they are so useful for People Analytics in a bit more detail.

The Object Graph Model

To create the graph, I could have used cypher, however, a better approach is to use an OGM (Object-Graph Model) to create database-aware (Python) objects that our applications can interact with. An OGM provides two advantages. First, it allows us to more easily add to our model definition over time as new data points and relationships are discovered and added to the graph to make the “people picture” even richer. Second, the API (Application Programming Interface) which HR applications will use to interact with our graph becomes much easier to maintain to provide accurate querying that meets the needs of the business. The solution architecture will look like this:

INneUfi.png!web

Solution Architecture Diagram

I’ll keep the graph simple but define enough relationships to generate some useful insights. The OGM will have classes for the following nodes types:

Person — the employee, with properties for name, department, manager, source and skills.Source — where the person was acquired fromSkill — the skill the person hasDepartment — the department the employee belongs to

.. and the following relationship types.

HAS SOURCE — Employee has a SourceHAS DEPARTMENT — Employee has a departmentHAS MANAGER — Employee has a managerHAS PERFORMANCE SCORE — Employee has a performance scoreHAS SKILL — Employee has a skill

The API

The API (Application Programming Interface) is the interface between the applications and the graph. It will have a number of functions, for creating objects for nodes and relationships, and searching for nodes in the graph. As you build out your social enterprise, you’ll want to develop many more functions in the API to fetch all manner of different subgraphs, nodes, relationships and properties for applications using the graph model.

Now that we have the model defined and a live instance of the graph in neo4j we can turn our attention to questions our graph will help us to answer and the sorts of queries we might want to build into our API.

Understanding People Networks

An area of growing interest in HCM (Human Capital Management) and one which is fundamental in Agile delivery is being able to understand organisational networks . Indeed, how individuals and teams are assembled around projects is better enabled because we understand the relationships between people and teams in our organisational network.

We can query our graph to see the nodes with relationships HAS MANAGER. This org-chart visualisation is useful, but with other types of professional relationships in our graph, — coworkers on projects, mentee/mentors and so on — the picture would be much richer.

6be26v7.png!web

matching person nodes against relationship HAS MANAGER in neo4j browser

Who makes us most vulnerable? Where are the bottlenecks?

Betweenness centrality is a useful algorithm to find bottlenecks, control points, and vulnerabilities. The algorithm calculates a centrality score for each node by iterating through each pair of nodes and finding the number of shortest paths through that node. The assumption being that the most central nodes — the bottlenecks or nodes that make the network most vulnerable — are along the shortest path. We return the betweenness centrality for the People nodes in our graph with the following query, where the individuals with the highest centrality score might be considered bottlenecks or points of vulnerability:

CALL algo.betweenness.stream(“Person”, “HAS MANAGER”) YIELD nodeId, centrality RETURN algo.getNodeById(nodeId).name AS name, centrality ORDER BY centrality DESC

YbEvE3v.png!web

Betweenness Centrality of People Nodes

Who has most influence?

PageRank is a popular algorithm for understanding the overall influence of a node in the network. Whilst Betweenness Centrality measures the direct influence of a node, PageRank considers the influence of a node’s neighbours, and their neighbours. We return the PageRank of the People nodes in our graph with the following query where the individuals with the highest score might be considered those with the greatest influence across the organisation:

CALL algo.pageRank.stream(“Person”, ‘HAS MANAGER’, {iterations:20,dampingFactor:0.85}) YIELD nodeId, score RETURN algo.getNodeById(nodeId).name AS name, score ORDER BY score DESC

iA367fB.png!web

With different types of relationships added to the graph — formal, information, professional and social — the people picture becomes much richer. We can find answers to questions like, are some teams more interconnected than others? and, are some teams more intra-connected than others? If we consider too, the other nodes in our graph, such as nodes for performance score, we could determine if individuals’ connectedness has an impact on performance score. Obtaining answers to these questions using relational models is extremely time-consuming but with graphs, obtaining the answers to these sorts of questions is relatively straight-forward.

Tackling Underperformance

Performance score is represented as a node and relationship in our graph and so we can execute a query to match relationships with HAS PERFORMANCE SCORE to display the following subgraph.

mABbuii.png!web

The neo4j browser GUI is useful for exploratory analysis as it allows nodes to be “exploded” to show all relationships. For example, if we wanted to explode on the node labelled “Smith, John” the subgraph is extended as shown:

Vneiia3.png!web

Tackling underperformance is not just about tackling individuals but about tackling a culture, where undesirable individual behaviours or attitudes have created an environment where performance not where we might like it. Graph algorithms for community detection can be applied here to detect groups of employees that display negative traits.

For example, we can use the Louvain algorithm built into neo4j to identify communities within the formal management structure and report the performance score for each individual in the community.

CALL algo.louvain.stream(‘Person’, ‘HAS MANAGER’, {})
YIELD nodeId, communityRETURN algo.asNode(nodeId).name AS name, community,algo.asNode(nodeId).performance AS performance
ORDER BY community;

The results we obtain are as follows:

UFJ7N33.png!web

Louvain algorithm

It is of course helpful to see the distribution of performance scores within the formal management structures (the communities identified by Louvain), but this approach is much more revealing when we start to look at certain types of nodes and relationships defined in our model. For example, Louvain Modularity could allow us to identify communities with poor performance within a peer-group or cohort, and by looking not just at management relationships but relationships outside the management structure too. Community detection algorithms here provide a tool to identify negative cultures early-on.

Managing Talent

With skills and performance data in our graph model we can already do a lot to develop talent. For example, we might want to identify employees with similar skills, to fill a vacancy or to recommend learning that employees with a similar background and skillset have found useful. Neo4j allows us to provide this data in almost real-time using a distance metric calculation called Jaccard Similarity. With the following query we can identify individuals within our organisation with similar skills.

MATCH (p:Person)-[:`HAS SKILL`]->(skill)
WITH {item:id(p), categories: collect(id(skill))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard.stream(data)
YIELD item1, item2, count1, count2, intersection, similarity
RETURN algo.asNode(item1).name AS from, algo.asNode(item2).name AS to, intersection, similarity
ORDER BY similarity DESC

JjmaY37.png!web

With vacancies defined as nodes in our graph and relationships between vacancies and skills, the graph becomes even more powerful. Using Jaccard Similarity, we can now recommend employees to vacancies. If we include courses as nodes in our graph model too, and relationships between courses and skills that learners acquire, we can recommend employees to courses. In fact, we can now understand career pathways through our organisation.

Where to next?

As new data points and relationships are discovered and added to the graph model, the value we get from the graph improves. With a limited model, I’ve shown how graphs and graph algorithms can provide valuable insights to tackle some of the most challenging areas in HCM. For further information or to see a People Graph in action get in touch by Twitter DM @jamesdhope.

References

[1] Cypher Query Language with Neo4j, https://neo4j.com/developer/cypher-query-language/

[3] Louvain, https://neo4j.com/docs/graph-algorithms/current/algorithms/louvain/

[4] Betweenness Centrality, https://neo4j.com/docs/graph-algorithms/current/algorithms/betweenness-centrality/

[5] PageRank, https://neo4j.com/docs/graph-algorithms/current/algorithms/page-rank/

Supercharging People Analytics with Graph Technology

The data

Building the graph

The Object Graph Model

The API

Understanding People Networks

Who makes us most vulnerable? Where are the bottlenecks?

Who has most influence?

Tackling Underperformance

Managing Talent

Where to next?

References

Recommend

时序分析与预测完全指南

Hypnos: An app testing GraphQL-like calls on REST APIs

I wasn’t getting hired as a data scientist, so I sought data on who is

Static Analysis at Scale: An Instagram Story

New – Trigger a Kernel Panic to Diagnose Unresponsive EC2 Instances

Top 5 Animation Libraries in React Native

卡巴斯基杀毒软件会泄漏用户 ID

爱国的硅谷

百度不再是中国市值最高的五家互联网企业之一

科学家找到了抗药结核病的新治疗方法

About Joyk