39

Data Analysis of Most Popular AI Startups Names on AngelList

 5 years ago
source link: https://www.tuicool.com/articles/QfUbYjm
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

RnaQNrN.jpg!web

AngelList is a U.S. website for startups, angel investors, and job-seekers looking to work at startups. In this tutorial, we want to look at the most popular words used by AI startups in their names.

Step 1: Import Data

We use the data scarped by rodrigosnader . This dataset includes 10,151 records of data of AI startups on AngelList.

Step 2: Data Wrangling

We first need to structure our data to better meet our needs.

Split Names

The startups names are stored in the name column.

iqiE7jy.png!web

We can see that some of the companies’ names are constructed with words without space (e.g. MonkeyLearn, NextDeavor), so we want to split the names into separate words.

To do this, we can use the wordninja package I found from a stackoverflow thread.

We will store the splited names in their lower-case form, in the names list.

Lemmatization

To improve our analysis result, we first perform Lemmatization to our names list.

Lemmatization is the process of converting a word to its base form. You can learn more about Text Mining process from this great piece.

Update the names list with words’ base forms.

Stop Words

Stop words like the , a do not provide useful meaning so we are going to remove them from the list.

Remove Single Characters

Similarly, we want to remove single characters that provide no meaning from the list.

Step 3: Generate WordCloud

Top 20 distinct words in the list

Now we are done with the data wrangling process, let’s take a look at the most popular words in the list.

We can see that lab , technology, data , network , and neural are the top words used by startups in their names.

NJjyA3Z.png!web

We can use WordCloud to plot the result for better visualization.

We can now get a WordCloud viz:

22Qzy2M.png!web

To make the WordCloud looks cleaner, we can customize its color theme.

And we are done.

b2maM3V.png!web

Reference

  1. Text Mining in Python: Steps and Examples
  2. angel-scraper
  3. How to split text without spaces into list of words?

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK