Find the Knowledge Hubs in Your Company


A very common struggle

Imagine you are a newly hired manager at a big company that is scaling fast and you need answers to your questions. How would you know how to ask?

While teams are popping up like mushrooms everywhere with directions changing every odd week, you can’t really rely on outdated organizational charts anymore. Your best bet will be to discover the internal networks and find the most valuable contributors by yourself.

Source: Pixabay by geralt

That’s where Social Network Analysis (SNA) can help.

Before jumping into Network Science though, I’d suggest running through a couple of basic concepts to get familiar with the topic:

We can build networks with Python using data from communication systems like Slack. The individuals are connected by the messages they send and we can keep track of those by downloading the chat history.

The Goal

…is to find:

  • Who has the most connections in the network = who sent/received the most messages?
  • Who acts as a bridge between the most and least connected parts (outliers) of the network = who has the highest influence on the information flow?

The Data

I took a relatively small, but real data set of a Data Analytics Bootcamp I attended which got 3 months’ worth of convos earlier this year.

Getting this dataset in shape for analysis took some time but all steps are included in the EDA notebook. To visualize the networks, I used the Network X library of Python - code can be found in the file named network_graphs.

To remove noise for my first proof of concept, I only looked at 1-on-1 conversations (threads between 2 people) on public channels. This seemed like a sensible choice since I’m new to Network Science myself.

Let’s find those hubs!


  • 30 hrs of wrangling
  • 2 Social Network Analysis course (Datacamp and Coursera, long-time no see!)
  • and a few mental blocks later — I found myself puzzled and amazed by my first graph:

Aha! What the heck is this, I thought. I can see the messages as connections between the users but got to find a way to turn these user IDs into real names.

After a bit more data wrangling, I conjured a directed graph with a circular layout to have a clear way of showing who sent messages to who.

Looking good. We can clearly see the direction of the messages now and it all makes sense with the names. Could you guess who has the highest connections (both in and out)?

Circling back to our main goal, we need to find the users who are well-connected to more isolated individuals to see…

  • who has knowledge about the tiniest things happening in the network,
  • who is able to spread the information all over the place.

They are called nodes with the highest betweenness. If it’s getting hard to follow, now is a good time to check on the resources mentioned at the beginning of the post.

Highlighted with blue: Sian D, Sam K, and Thamo K seem to be your central point of contacts

Okay, let’s say you are unlucky and those three are sipping a Pina Colada at Zanzibar at the moment so we need the next couple of people in line. This graph below features…

  • the ten most connected people to nudge with our questions highlighted as larger nodes and
  • the three yellow ones with connections to the isolated parts of the network.

Looking at this graph, there are a couple of other things to mention.

You see those yellow circles and their connections? Analysts say they act as bridges in a network, so can easily turn into a bottleneck as they can play a central role in keeping & broadcasting information.

The least connected individuals are the outliers of the network who are sitting at the edge without many connections (intentionally or not). If they gather in groups — forming silos —, they can be harmful to the productivity and efficiency of the company as they will be disconnected from central actions and potentially go off-track.

Talking about productivity, this part was particularly interesting to me as an ex-consultant who used to help companies find and eliminate roadblocks of high-performance. I found a special field of SNA dealing with networks in companies, called Organisational Network Analysis which provides fresh and actionable insights on the inner workings of companies, such as how people work together in teams and who are the key influencers to drive change within a company.

Okay, I feel like this is the time to admit: ONA is the reason I did this whole project. I find it fascinating to discover the actual reasons for the everyday struggles of an organization instead of playing by the guesstimates.

Use this repo as a base and take your own team’s Slack data to see who are the most connected/valuable users in your company.

Please note, these metrics can only get you started, use them wisely and validate with common sense.

Eric Sims who’s a presentation on DataTalks encouraged me to continue with this project, my brother Mark Szulyovszky and James Clare, my python tutor for helping me through the stages where I got stuck.

Post a Comment

Previous Post Next Post