4

Data Science for E-Commerce with Python

 3 years ago
source link: https://towardsdatascience.com/data-science-for-e-commerce-with-python-a0a97dd7721d
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

1. Introduction to data and data science

The definition of data is simply a collection of raw facts, such as numbers, words, or observations, whereas Data Science is the scientific discipline that deals with the study of data.

Image for post
Image for post
Photo by Franki Chamaki on Unsplash

Nowadays, most e-commerce platforms collect tons of data from users without hampering customer experience. Collected data is stored in structured or tabulated tables in order to facilitate analysis and interpretation. Not only structured data is stored, but also non-structured data such as images, videos or documents, which also have plenty of value at the moment of studying users preferences and are frequently harder to process and analyze.

Data analysis provides these companies with insights and metrics that are constantly changing and that allow them to build even better products.

The reason for this major outbreak in interest for data science is its widespread adoption due to the growth of data volumes and computing power. The growth of data is directly related to the widespread digitization and internet penetration and mobile devices massive adoption, which are continuously generating data without human intervention. On the other hand, computing power has enabled data scientists to store, process and study large chunks of data in an efficient way.

Nowadays, not only are big tech companies, such as Google, Facebook, Apple, Amazon or Microsoft, the ones that are taking full advantage in their core businesses, but also small and local businesses as well as startups have gradually adopted data science to add value to their businesses.

2. E-commerce Applications of Data Science

E- commerce stands for electronic commerce and represents the online version of physical retail stores. It allows people from all over the world to purchase, browse and sell products through online platforms.

Although it may seem a fairly simple process from the customer standpoint, there are several obstacles to overcome in order to provide a seamless online shopping experience, such as processes-related ones which include product ordering, delivery and fair pricing.

However, with the growing number of people looking to shop online, the e-commerce industry is expanding rapidly. This also means that an increasing proportion of traditional businesses are switching or complementing their business model to electronic commerce.

In the context of e-commerce industry evolution, Data Science helps to bring maximum value out of the vast amount of data available in such platforms, and helps to switch focus towards customer engagement and experience. It focuses on:

  • Product recommendation for users.
  • Analysis of customer trends and behaviors
  • Forecasting sales and stock logistics.
  • Optimizing product pricing and payment methods.

Each of these applications involves storage and interpretation of large amounts of volumes of data, in which data analysis techniques come in handy.

Image for post
Image for post
Photo by Pixabay from Pexels

3. Recommender Systems

One example of the application of Data Analytics techniques in case studies are company’s Recommender Systems, which are a means of predicting the preference that users might have towards an item based on previous purchases or searches on the platform.

Recommender systems are used strategically to increase conversion rates, elevate customer experience and amplify user engagement.

A large-scale recommender system that has proved to work is Amazon’s data-driven and personalized marketing approach to boosting sales in the platform through intelligent recommendations to users. According to McKinsey Insights magazine, 35% of Amazon’s revenue is generated by its recommendation engine. This achievement has been possible because of the recommendation system’s application in email campaigns and on most of its web site’s pages, both on-site and off-site recommendations.

There are two types of recommender systems:

  1. Content-Based Recommendations: Method that makes recommendations based on attributes or features of the product. For instance, if a product shares attributes with another, in case a user purchased the first, the system should recommend the second as there is a higher probability that the user’s preferences will match the second product.
  2. Collaborative Recommendations: This method makes recommendations based on the interactions displayed by multiple users. For instance, if several clients have purchased a certain product with another one, the system should recommend each of the products reciprocally as previous customers purchased both items together on previous occasions.

4. Customer Analytics

Customers are a key factor for any e-commerce company and emphasizing in providing great customer experience and satisfaction to the client should be a primer concern. In order to achieve such a level of service, it’s necessary to get to know the client and its preferences.

E-commerce platforms have the possibility to track a customer’s activity from the moment he or she enters the site till the time they leave, whether this happens after purchasing or selling some product, or after skimming through the products. Based on this necessity to know the client, every action that it’s taken must be recorded and stored as potential useful data to determine the client’s profile.

The process of generating actionable insights about the customers from their collected data is known as Customer Analytics.

Customer Analytics helps to understand the trends and shifts in customer’s behavior in order to modify business strategies, as well as make key business decisions accordingly. It also provides a means to analyze which channels of acquisition and retention of clients are actually working and which are not.

In order to build a Customer Analytics platforms, e-commerce companies must focus on key features about customers, which include:

  • Customer profiling and segmentation: Customers can be grouped based on their preferences, purchases and browsing patterns, in order to build a personal profile and provide recommendations based on it. In addition, this profiling helps to build target audiences, personalized products and even marketing strategies that work for each group.
    It also helps to shift focus to most profitable clients to establish better customer relations. Customers can be classified among geographical characteristics, behavior in the platform, demographic features and psychological characteristics.
  • Sentiment Analysis: This is the process of determining the emotion behind a set of words or sentences, in order to identify a sentiment expressed by customers for their purchased or sold products, through product reviews or in support tickets.
    Sentiment classifiers can be either positive, negative or neutral, and help to respond to complaints and improve customer service, among others.
  • Churn Analysis: This is the process of analyzing the likelihood of when a customer will purchase a product, based on its activity in the platform, directed towards optimizing existing acquisition and retention strategies. An improvement in the churn rate can highly impact the growth and even the sustainability of the business.
  • Lifetime Value Prediction: This is the estimated total revenue that a customer will provide to the business during his or her relationship with the platform. The estimation is made using factors such as early transaction patterns, frequency and volume of transactions, among others.
    Predicting Lifetime Value Prediction, helps in planning what kind of customers to invest business resources in to extract the most value out of them.

5. Data Exploratory Process with Python

The first step before analyzing a dataset is to preview the information it contains. To process this information easily we’re going to use Pandas, the Python library for data manipulation and analysis that offers data structures and operations for manipulating numerical tables and time series.

For those who are not familiar with Python, it is a high-level and general-purpose programming language that emphasizes coding efficiency, readability and re-usage of scripts.

Both the datasets and the script can be found at my GitHub following this link. Below, I’ll include the necessary code to run the analysis on your computers:

# Imports 
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
import numpy as np

After proceeding with the imports of the necessary libraries, proceed with the creation of a pandas dataframe that contains the information of the dataset, and explore it:

# Read dataset and preview
data = pd.read_csv('e_commerce.csv')# Exploring data
data.info()
Image for post
Image for post

From the result of the application of the info() method in the “data” variable, we access to the information inside the dataset, which consists of a series of transactions made in an e-commerce platforms, for which we have identified the user ID, purchase product ID and many more descriptive data that will be useful in the process.

After moving on with the analysis, we proceed with some cleaning of the null features in the dataframe. As we can see with the code below, there are 173.638 null fields for product 2, meaning that the user did not purchase more than one in such cases. Also, there are 383.247 null fields for product 3:

# Count null features in the dataset
data.isnull().sum()
Image for post
Image for post

Now, let’s proceed with the replacement of null features with zero values, as we need to have a clean dataset to perform operations with it:

# Replace the null features with 0:
data.fillna(0, inplace=True) # Re-check N/A was replaced with 0.

In the dataframe, we have all the transactions made by customers, which include every transaction each one made. In order to identify users that spend the most in our platform, lets group by user ID and add up the amount spent:

# Group by User ID:
purchases = data.groupby(['User_ID']).sum().reset_index()
Image for post
Image for post

Also, we can access to which products each User ID purchased, let’s try with user ID 1.000.001:

data[data['User_ID'] == 1000001]
Image for post
Image for post

After identifying most-spending users, let’s extract the range of ages of these users and the average sale for each age group:

purchase_by_age = data.groupby('Age')['Purchase'].mean().reset_index()
Image for post
Image for post

The group of users whose age ranges from 51–55 is the one that spends the most at the platform, so maybe we should target our marketing campaigns to them. Let’s take a look a the graphical distribution of users age:

plt.figure(figsize=(16,4))
plt.plot(purchase_by_age.index, purchase_by_age.values, color='purple', marker='*')
plt.grid()
plt.xlabel('Age Group', fontsize=10)
plt.ylabel('Total Purchases in $', fontsize=10)
plt.title('Average Sales distributed by age group', fontsize=15)
plt.show()
Image for post
Image for post
Image by Author

On the other hand, it would be interested to find out which age group and gender makes more transactions.These two facts can easily be calculated with few lines of code:

# Grouping by gender and age
age_and_gender = data.groupby('Age')['Gender'].count().reset_index()
gender = data.groupby('Gender')['Age'].count().reset_index()# Plot distribution
plt.figure(figsize=(12,9))
plt.pie(age_and_gender['Gender'], labels=age_and_gender['Age'],autopct='%d%%', colors=['cyan', 'steelblue','peru','blue','yellowgreen','salmon','#0040FF'])
plt.axis('equal')
plt.title("Age Distribution", fontsize='20')
plt.show()
Image for post
Image for post
Image by Author
# Plot gender distributionplt.figure(figsize=(12,9))
plt.pie(gender['Age'], labels=gender['Gender'],autopct='%d%%', colors=['salmon','steelblue'])
plt.axis('equal')
plt.title("Gender Distribution", fontsize='20')
plt.show()
Image for post
Image for post
Image by Author

In addition, we can calculate which occupations of those displayed by the customers of the platform are the ones that purchase more products. Take a look a the code below:

# Group by occupation:
occupation = data.groupby('Occupation')['Purchase'].mean().reset_index()# Plot bar chart with line plot:
sns.set(style="white", rc={"lines.linewidth": 3})
fig, ax1 = plt.subplots(figsize=(12,9))
sns.barplot(x=occupation['Occupation'],y=occupation['Purchase'],color='#004488',ax=ax1)
sns.lineplot(x=occupation['Occupation'],y=occupation['Purchase'],color='salmon',marker="o",ax=ax1)
plt.axis([-1,21,8000,10000])
plt.title('Occupation Bar Chart', fontsize='15')
plt.show()
sns.set()
Image for post
Image for post
Image by Author

And lastly, we can determine which are the best-selling products in the platform:

# Group by product ID
product = data.groupby('Product_ID')['Purchase'].count().reset_index()
product.rename(columns={'Purchase':'Count'},inplace=True)
product_sorted = product.sort_values('Count',ascending=False)# Plot line plot
plt.figure(figsize=(14,8))
plt.plot(product_sorted['Product_ID'][:10], product_sorted['Count'][:10], linestyle='-', color='purple', marker='o')
plt.title("Best-selling Products", fontsize='15')
plt.xlabel('Product ID', fontsize='15')
plt.ylabel('Products Sold', fontsize='15')
plt.show()
Image for post
Image for post
Image by Author

Conclusion

My aim with this article was to provide an intuition about how global companies apply data science for acquiring, retaining, and growing their customer base. In addition to this, I wanted to provide a practical explanation of theory involved in e-commerce business which includes recommendations systems and customer analytics.

If you liked the information included in this article don’t hesitate to contact me to share your thoughts. It motivates me to keep on sharing!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK