

Uber Freight Near-Real-Time Analytics Architecture
source link: https://www.infoq.com/news/2022/11/uber-freight-analysis/?itm_source=infoq&itm_medium=popular_widget&itm_campaign=popular_content_list&itm_content=
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Uber Freight Near-Real-Time Analytics Architecture
Nov 08, 2022 2 min read
Uber Freight is the Uber platform dedicated to connecting shippers with carriers. Providing reliable service to shippers is crucial for Uber Freight. This is why the Carrier Scorecard was developed, with several metrics including on-time pickup/delivery, tracking automation, and late cancellations. It’s important to show this information in near real-time on the Carrier app, and it is important that the architecture that provides this service is fast enough to meet the requirements.
The requirements for this architecture are data freshness, latency, reliability, and accuracy. The performance scores are updated with low latency once a load is complete or bounced. Every carrier can view their performance score in the app with low delay. The data can be processed and served with a high level of reliability, and the entire system can be gracefully recovered in case of failure. Performance metrics must be calculated accurately.
Some potential solutions were considered before designing and implementing the final architecture, particularly for the aggregation metrics stage. On-the-fly aggregation with MySQL and pre-aggregation of data with MySQL. Both of these solutions have some cons, the main problem is the upsert of records in batch to be sure the historical data are up to date correctly. Another considered solution was to use two OLAP database tables, one storing the raw data, and events trigger asynchronous functions to update the metrics in the other table, but this solution is not scalable, especially while writing traffic is high.

The final architecture uses Kafka, Flink, and Pinot. The Kafka events, generated by backend services, are aggregated by Flink. The aggregated data are ingested in Pinot which uses real-time ingestion from Kafka to cover the last three days of data. For historical data, the ingestion is made from HDFS.
Apache Pinot provides index optimization techniques like JSON, sorted columns, and Star-tree in order to accelerate the query's performance. Fast queries allow a better interactive experience for carriers. To achieve the 250 ms query latency on a table, two kinds of indexes are created on Pinot tables: inverted index and sorted index. The inverted index can speed up the queries with WHERE clause by a factor of 10, sorted index by carriers unique id, reduces the table size by half and this reduces the query latency too.
Neutrino is the query gateway to access the Pinot datasets; it is a different deployment of Presto, where the coordinator and worker run on each host and can run queries independently. Neutrino accepts PrestoSQL queries and translates them into Pinot query languages. A Redis cash is added in front of Neutrino in order to store aggregated metrics for a maximum of 12 hours, this allows to achieve more than 90% of the cache hit rate.
Uber observed a statistically significant boost across all key metrics since it started to provide the information on performances to Freight drivers: -0.4% of late cancellations, +0.6% of on-time pick-up, +1% of on-time drop-off and +1% of auto tracking performances. These performance improvements resulted in an estimated cost saving of $1.5 million in 2021.
About the Author
Claudio Masolo
Claudio is software architect with focus on big data and ML architecture. In spare time he likes running, reading and play old videogames.
Show moreRecommend
-
7
Lambda Architecture Serving Layer: Real-Time Visualization For Taxi [Part 2]December 21st 2020 4
-
8
Wednesday, 23 December 2020 10:18 Real-time freight data could speed deliveries By Stephen Withers ...
-
5
Uber’s Real-Time Push Platform ...
-
8
Webinar: Real Time ArchitectureFUTURE POSTS No future posts left, oh my!
-
9
Uber Freight buys shipping software company Transplace for $2.25B ...
-
9
Near real-time features for near real-time personalization Co-authors: Rupesh Gupta, Sasha Ovsankin,
-
11
Waymo Via and Uber Freight partner up for the long haulRebecca BellanWed, June 8, 2022, 12:00 AM·3 min read
-
6
AdvertisementTransportationUber and Waymo Want to Make Self-Driving Freight Trucks a ThingThe former...
-
7
TechUber Freight cuts 150 jobs, about 3% of the unit’s head count
-
7
Uber’s real-time architecture represents the future of data apps: Meet the architects who built it
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK