6

Case study: Uber vs Lyft

 3 years ago
source link: https://blog.usejournal.com/case-study-uber-vs-lyft-a62769000d7d
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Case study: Uber vs Lyft

Image for post
Image for post
Photo by Thought Catalog on Unsplash

Ride-sharing apps dominate the market for cabs and their ride prices are not constant like public transport. They are greatly affected by the demand and supply of rides at a given time. So what exactly drives this demand?

In this case study, I would be exploring this dataset available on Kaggle. The first part of this case study would be exploring the dataset and the second part would be trying to find a pattern and predict the prices of ride-sharing apps.

The code for this can be found here. I highly suggest going through the code first. Please star the repository if you found it useful.

Let’s start with the exploration of the dataset. There are two main csv files in the dataset, namely weather.csv and cab_rides.csv. According to the dataset description, the Cab ride data covers various types of cabs for Uber & Lyft and their price for the given location. You can also find if there was any surge in the price during that time. Weather data contains weather attributes like temperature, rain, cloud, etc for all the locations taken into consideration.

So the cab_rides.csv dataset contains the columns,

  • distance: distance of the ride
  • cab_type: Uber or Lyft
  • timestamp: timestamp of the ride
  • destination: destination of the ride
  • source: source location of the ride
  • price: the price of the ride
  • surge_multiplier: the surge multiplier experienced at the moment of booking the ride
  • product_id
  • name: Category of the ride

weather.csv contains the following columns,

  • temp: Temperature at the location
  • location
  • clouds: Couldn’t find the proper documentation for this, but I assumed that the smaller the number meant less cloudy sky.
  • pressure: Atmospheric pressure in bars
  • rain: Again, no documentation for the exact meaning but can be interpreted
  • timestamp
  • humidity: Humidity percentage
  • wind: wind speed at the location

Want to read this story later? Save it in Journal.

Alright *rubs hands together*, let's get started.

Image for post
Image for post

(If anyone knows how to make gists out of cells of a notebook, please guide me;-;)

After reading in the datasets, the first thing you’ll notice that the weather data is only present for the hour whereas the info on rides is available at a quicker frequency so we need to fix this in order to use this data properly. This has been done in the notebook available on GitHub, so make sure to check that out.

Let’s start with a basic plot to see the number of rides each app gets on a daily basis. For this, we will segregate the data day wise.

Image for post
Image for post

Right off the bat, we can safely assume that Uber is much more popular than Lyft. What’s really interesting to me here is that regardless of the days, the difference in the number of customers remains the same which could mean a stable client base of each of the app and that a user that once starts using one app does not switch to the other easily.

Next, we’ll look at the peak and dead hours for both the apps.

Image for post
Image for post

This graph does not give any major insights, most of it just confirms our basic ideas such as the spike after 8 in the morning and followed by a busy day and a spike after 9 in the night. Again, the similarity in both graphs seems to confirm our first inference.

the difference in the number of customers remains the same which could mean a stable client base of each of the app and that a user that once starts using one app does not switch to the other easily.

Now, let’s do some price analysis.

Image for post
Image for post

Lyft is more expensive than Uber. This could explain why Lyft has a significantly smaller user base. Or maybe just that Lyft’s rides are more premium than Uber’s rides. But in any case, having a higher price for less distance traveled is not great especially when Uber controls the market.

Image for post
Image for post

Here, green represents prices of Lyft rides and orange represents prices of Uber rides segregated on the location of source and destination. Again, Uber is considerably cheaper in each location.

Now, I wanted to explore the possibility that Lyft might just have more premium services so I plotted the average price.

Image for post
Image for post

Considering, their direct competitors I am going to consider that they have similar categories of rides, which proves my theory that, Lyft’s ride might be more premium, false. And well, the final inference is that Lyft is just more expensive( ¯\_(ツ)_/¯ )

After this, I wanted to look at price distribution with distance traveled for each different location as the source and as the destination. My hope for doing this was to find expensive locations i.e starting from a location or going to a particular location would be expensive.

Image for post
Image for post
Image for post
Image for post

These plots are quite complicated and I think there would be a better way to represent these plots. Please let me know if you anything better in mind.

It’s tough to deduce conclusions from these plots but some things to come to mind. When starting a ride from Fenway(the brown line) both Uber and Lyft are more expensive than other locations. A simple google search tells me that Fenway is a baseball park hosting MLB matches which explains this. Both the apps take advantage of Fenway being an area where people go for leisure and having high economic value and cost more when leaving that area.

As for the destination plots, a similar trend can be noticed for Financial district(the dark blue line). A conclusion can be that Financial district has offices and working people need to go there and both Uber and Lyft take advantage of this.

This completes a mini case study on this dataset.

What’s next?

Well, the obvious next step to create a model that can take all these parameters such as source, distance, weather, and then predict the price of the ride. And that’s what I’m going to do.

Follow me so that you can get to know when that article is up and please do give claps if you found any of these interesting.

This is my first attempt at something like this, I’d love to hear your thoughts on it!

References

More from Journal

There are many Black creators doing incredible work in Tech. This collection of resources shines a light on some of us:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK