Case study: Uber vs Lyft

Ride-sharing apps dominate the market for cabs and their ride prices are not constant like public transport. They are greatly affected by the demand and supply of rides at a given time. So what exactly drives this demand?

In this case study, I would be exploring this dataset available on Kaggle. The first part of this case study would be exploring the dataset and the second part would be trying to find a pattern and predict the prices of ride-sharing apps.

The code for this can be found here. I highly suggest going through the code first. Please star the repository if you found it useful.

Let’s start with the exploration of the dataset. There are two main csv files in the dataset, namely weather.csv and cab_rides.csv. According to the dataset description, the Cab ride data covers various types of cabs for Uber & Lyft and their price for the given location. You can also find if there was any surge in the price during that time. Weather data contains weather attributes like temperature, rain, cloud, etc for all the locations taken into consideration.

So the cab_rides.csv dataset contains the columns,

distance: distance of the ride
cab_type: Uber or Lyft
timestamp: timestamp of the ride
destination: destination of the ride
source: source location of the ride
price: the price of the ride
surge_multiplier: the surge multiplier experienced at the moment of booking the ride
product_id
name: Category of the ride

weather.csv contains the following columns,

temp: Temperature at the location
location
clouds: Couldn’t find the proper documentation for this, but I assumed that the smaller the number meant less cloudy sky.
pressure: Atmospheric pressure in bars
rain: Again, no documentation for the exact meaning but can be interpreted
timestamp
humidity: Humidity percentage
wind: wind speed at the location

Want to read this story later? Save it in Journal.

Alright *rubs hands together*, let's get started.

(If anyone knows how to make gists out of cells of a notebook, please guide me;-;)

After reading in the datasets, the first thing you’ll notice that the weather data is only present for the hour whereas the info on rides is available at a quicker frequency so we need to fix this in order to use this data properly. This has been done in the notebook available on GitHub, so make sure to check that out.

Let’s start with a basic plot to see the number of rides each app gets on a daily basis. For this, we will segregate the data day wise.

Right off the bat, we can safely assume that Uber is much more popular than Lyft. What’s really interesting to me here is that regardless of the days, the difference in the number of customers remains the same which could mean a stable client base of each of the app and that a user that once starts using one app does not switch to the other easily.

Next, we’ll look at the peak and dead hours for both the apps.

This graph does not give any major insights, most of it just confirms our basic ideas such as the spike after 8 in the morning and followed by a busy day and a spike after 9 in the night. Again, the similarity in both graphs seems to confirm our first inference.

the difference in the number of customers remains the same which could mean a stable client base of each of the app and that a user that once starts using one app does not switch to the other easily.

Now, let’s do some price analysis.

Lyft is more expensive than Uber. This could explain why Lyft has a significantly smaller user base. Or maybe just that Lyft’s rides are more premium than Uber’s rides. But in any case, having a higher price for less distance traveled is not great especially when Uber controls the market.

Here, green represents prices of Lyft rides and orange represents prices of Uber rides segregated on the location of source and destination. Again, Uber is considerably cheaper in each location.

Now, I wanted to explore the possibility that Lyft might just have more premium services so I plotted the average price.

Considering, their direct competitors I am going to consider that they have similar categories of rides, which proves my theory that, Lyft’s ride might be more premium, false. And well, the final inference is that Lyft is just more expensive( ¯\_(ツ)_/¯ )

After this, I wanted to look at price distribution with distance traveled for each different location as the source and as the destination. My hope for doing this was to find expensive locations i.e starting from a location or going to a particular location would be expensive.

These plots are quite complicated and I think there would be a better way to represent these plots. Please let me know if you anything better in mind.

It’s tough to deduce conclusions from these plots but some things to come to mind. When starting a ride from Fenway(the brown line) both Uber and Lyft are more expensive than other locations. A simple google search tells me that Fenway is a baseball park hosting MLB matches which explains this. Both the apps take advantage of Fenway being an area where people go for leisure and having high economic value and cost more when leaving that area.

As for the destination plots, a similar trend can be noticed for Financial district(the dark blue line). A conclusion can be that Financial district has offices and working people need to go there and both Uber and Lyft take advantage of this.

This completes a mini case study on this dataset.

What’s next?

Well, the obvious next step to create a model that can take all these parameters such as source, distance, weather, and then predict the price of the ride. And that’s what I’m going to do.

Follow me so that you can get to know when that article is up and please do give claps if you found any of these interesting.

This is my first attempt at something like this, I’d love to hear your thoughts on it!

References

More from Journal

There are many Black creators doing incredible work in Tech. This collection of resources shines a light on some of us:

Case study: Uber vs Lyft

Case study: Uber vs Lyft

What’s next?

More from Journal

Recommend

Everything about GANs and some projects

European Voluntary Service: A recipe for a successful Gap Year

The reasonability of misinformation

Polyline Animation

5 Takeaways From Working Remotely at a Startup During a Pandemic

Availability with Redis

UX Case Study: Sub-Buddy-Subscription Management app.

What DE&I means to Pathrise

I started training as a therapist, and this is what I learnt | by Felicity Peel...

Designing a productivity app: a UX case study

About Joyk