

Data Analysis of Most Popular AI Startups Names on AngelList
source link: https://www.tuicool.com/articles/QfUbYjm
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

AngelList is a U.S. website for startups, angel investors, and job-seekers looking to work at startups. In this tutorial, we want to look at the most popular words used by AI startups in their names.
Step 1: Import Data
We use the data scarped by rodrigosnader . This dataset includes 10,151 records of data of AI startups on AngelList.
Step 2: Data Wrangling
We first need to structure our data to better meet our needs.
Split Names
The startups names are stored in the name
column.
We can see that some of the companies’ names are constructed with words without space (e.g. MonkeyLearn, NextDeavor), so we want to split the names into separate words.
To do this, we can use the wordninja package I found from a stackoverflow thread.
We will store the splited names in their lower-case form, in the names
list.
Lemmatization
To improve our analysis result, we first perform Lemmatization to our names
list.
Lemmatization is the process of converting a word to its base form. You can learn more about Text Mining process from this great piece.
Update the names list with words’ base forms.
Stop Words
Stop words like the
, a
do not provide useful meaning so we are going to remove them from the list.
Remove Single Characters
Similarly, we want to remove single characters that provide no meaning from the list.
Step 3: Generate WordCloud
Top 20 distinct words in the list
Now we are done with the data wrangling process, let’s take a look at the most popular words in the list.
We can see that lab
, technology, data
, network
, and neural
are the top words used by startups in their names.
We can use WordCloud to plot the result for better visualization.
We can now get a WordCloud viz:
To make the WordCloud looks cleaner, we can customize its color theme.
And we are done.
Reference
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK