The magic behind Uber’s data-driven success

Uber, the ride-hailing giant, is a household name worldwide. We all recognize it as the platform that connects riders with drivers for hassle-free transportation. But what most people don’t realize is that behind the scenes, Uber is not just a transportation service; it’s a data and analytics powerhouse. Every day, millions of riders use the Uber app, unwittingly contributing to a complex web of data-driven decisions. This blog takes you on a journey into the world of Uber’s analytics and the critical role that Presto, the open source SQL query engine, plays in driving their success.

Uber’s DNA as an analytics company

At its core, Uber’s business model is deceptively simple: connect a customer at point A to their destination at point B. With a few taps on a mobile device, riders request a ride; then, Uber’s algorithms work to match them with the nearest available driver and calculate the optimal price. But the simplicity ends there. Every transaction, every cent matters. A ten-cent difference in each transaction translates to a staggering $657 million annually. Uber’s prowess as a transportation, logistics and analytics company hinges on their ability to leverage data effectively.

The pursuit of hyperscale analytics

The scale of Uber’s analytical endeavor requires careful selection of data platforms with high regard for limitless analytical processing. Consider the magnitude of Uber’s footprint. 1 The company operates in more than 10,000 cities with more than 18 million trips per day. To maintain analytical superiority, Uber keeps 256 petabytes of data in store and processes 35 petabytes of data every day. They support 12,000 monthly active users of analytics running more than 500,000 queries every single day.

To power this mammoth analytical undertaking, Uber chose the open source Presto distributed query engine. Teams at Facebook developed Presto to handle high numbers of concurrent queries on petabytes of data and designed it to scale up to exabytes of data. Presto was able to achieve this level of scalability by completely separating analytical compute from data storage. This allowed them to focus on SQL-based query optimization to the nth degree.

What is Presto?

Presto is an open source distributed SQL query engine for data analytics and the data lakehouse, designed for running interactive analytic queries against datasets of all sizes, from gigabytes to petabytes. It excels in scalability and supports a wide range of analytical use cases. Presto’s cost-based query optimizer, dynamic filtering and extensibility through user-defined functions make it a versatile tool in Uber’s analytics arsenal. To achieve maximum scalability and support a broad range of analytical use cases, Presto separates analytical processing from data storage. When a query is constructed, it passes through a cost-based optimizer, then data is accessed through connectors, cached for performance and analyzed across a series of servers in a cluster. Because of its distributed nature, Presto scales for petabytes and exabytes of data.

The evolution of Presto at Uber

Beginning of a data analytics journey.

Uber began their analytical journey with a traditional analytical database platform at the core of their analytics. However, as their business grew, so did the amount of data they needed to process and the number of insight-driven decisions they needed to make. The cost and constraints of traditional analytics soon reached their limit, forcing Uber to look elsewhere for a solution.

Uber understood that digital superiority required the capture of all their transactional data, not just a sampling. They stood up a file-based data lake alongside their analytical database. While this side-by-side strategy enabled data capture, they quickly discovered that the data lake worked well for long-running queries, but it was not fast enough to support the near-real time engagement necessary to maintain a competitive advantage.

To address their performance needs, Uber chose Presto because of its ability, as a distributed platform, to scale in linear fashion and because of its commitment to ANSI-SQL, the lingua franca of analytical processing. They set up a couple of clusters and began processing queries at a much faster speed than anything they had experienced with Apache Hive, a distributed data warehouse system, on their data lake.

Continued high growth

As the use of Presto continued to grow, Uber joined the Presto Foundation, the neutral governing body behind the Presto open source project, as a founding member alongside Facebook. Their initial contributions were based on their need for growth and scalability. Uber focused on contributing to several key areas within Presto:

Automation: To support growing usage, the Uber team went to work on automating cluster management to make it simple to keep up and running. Automation enabled Uber to grow to their current state with more than 256 petabytes of data, 3,000 nodes and 12 clusters. They also put process automation in place to quickly set up and take down clusters.

Workload Management: Because different kinds of queries have different requirements, Uber made sure that traffic is well-isolated. This enables them to batch queries based on speed or accuracy. They have even created subcategories for a more granular approach to workload management.

Because much of the work done on their data lake is exploratory in nature, many users want to execute untested queries on petabytes of data. Large, untested workloads run the risk of hogging all the resources. In some cases, the queries run out of memory and do not complete.

To address this challenge, Uber created and maintains sample versions of datasets. If they know a certain user is doing exploratory work, they simply route them to the sampled datasets. This way, the queries run much faster. There may be inaccuracy because of sampling, but it allows users to discover new viewpoints within the data. If the exploratory work needs to move on to testing and production, they can plan appropriately.

Security: Uber adapted Presto to take users’ credentials and pass them down to the storage layer, specifying the precise data to which each user has access permissions. As Uber has done with many of its additions to Presto, they contributed their security upgrades back to the open source Presto project.

The technical value of Presto at Uber

Analyzing complex data types with presto.

As a digital native company, Uber continues to expand its use cases for Presto. For traditional analytics, they are bringing data discipline to their use of Presto. They ingest data in snapshots from operational systems. It lands as raw data in HDFS. Next, they build model data sets out of the snapshots, cleanse and deduplicate the data, and prepare it for analysis as Parquet files.

For more complex data types, Uber uses Presto’s complex SQL features and functions, especially when dealing with nested or repeated data, time-series data or data types like maps, arrays, structs and JSON. Presto also applies dynamic filtering that can significantly improve the performance of queries with selective joins by avoiding reading data that would be filtered by join conditions. For example, a parquet file can store data as BLOBS within a column. Uber users can run a Presto query that extracts a JSON file and filters out the data specified by the query. The caveat is that doing this defeats the purpose of the columnar state of a JSON file. It is a quick way to do the analysis, but it does sacrifice some performance.

Extending the analytical capabilities and use cases of Presto

To extend the analytical capabilities of Presto, Uber uses many out-of-the-box functions provided with the open source software. Presto provides a long list of functions, operators, and expressions as part of its open source offering, including standard functions, maps, arrays, mathematical, and statistical functions. In addition, Presto also makes it easy for Uber to define their own functions. For example, tied closely to their digital business, Uber has created their own geospatial functions.

Uber chose Presto for the flexibility it provides with compute separated from data storage. As a result, they continue to expand their use cases to include ETL, data science , data exploration, online analytical processing (OLAP), data lake analytics and federated queries.

Pushing the real-time boundaries of Presto

Uber also upgraded Presto to support real-time queries and to run a single query across data in motion and data at rest. To support very low latency use cases, Uber runs Presto as a microservice on their infrastructure platform and moves transaction data from Kafka into Apache Pinot, a real-time distributed OLAP data store, used to deliver scalable, real-time analytics.

According to the Apache Pinot website, “Pinot is a distributed and scalable OLAP (Online Analytical Processing) datastore, which is designed to answer OLAP queries with low latency. It can ingest data from offline batch data sources (such as Hadoop and flat files) as well as online data sources (such as Kafka). Pinot is designed to scale horizontally, so that it can handle large amounts of data. It also provides features like indexing and caching.”

This combination supports a high volume of low-latency queries. For example, Uber has created a dashboard called Restaurant Manager in which restaurant owners can look at orders in real time as they are coming into their restaurants. Uber has made the Presto query engine connect to real-time databases.

To summarize, here are some of the key differentiators of Presto that have helped Uber:

Speed and Scalability: Presto’s ability to handle massive amounts of data and process queries at lightning speed has accelerated Uber’s analytics capabilities. This speed is essential in a fast-paced industry where real-time decision-making is paramount.

Self-Service Analytics: Presto has democratized data access at Uber, allowing data scientists, analysts and business users to run their queries without relying heavily on engineering teams. This self-service analytics approach has improved agility and decision-making across the organization.

Data Exploration and Innovation: The flexibility of Presto has encouraged data exploration and experimentation at Uber. Data professionals can easily test hypotheses and gain insights from large and diverse datasets, leading to continuous innovation and service improvement.

Operational Efficiency: Presto has played a crucial role in optimizing Uber’s operations. From route optimization to driver allocation, the ability to analyze data quickly and accurately has led to cost savings and improved user experiences.

Federated Data Access: Presto’s support for federated queries has simplified data access across Uber’s various data sources, making it easier to harness insights from multiple data stores, whether on-premises or in the cloud.

Real-Time Analytics: Uber’s integration of Presto with real-time data stores like Apache Pinot has enabled the company to provide real-time analytics to users, enhancing their ability to monitor and respond to changing conditions rapidly.

Community Contribution: Uber’s active participation in the Presto open source community has not only benefited their own use cases but has also contributed to the broader development of Presto as a powerful analytical tool for organizations worldwide.

The power of Presto in Uber’s data-driven journey

Today, Uber relies on Presto to power some impressive metrics. From their latest Presto presentation in August 2023, here’s what they shared:

Uber’s success as a data-driven company is no accident. It’s the result of a deliberate strategy to leverage cutting-edge technologies like Presto to unlock the insights hidden in vast volumes of data. Presto has become an integral part of Uber’s data ecosystem, enabling the company to process petabytes of data, support diverse analytical use cases, and make informed decisions at an unprecedented scale.

Getting started with Presto

If you’re new to Presto and want to check it out, we recommend this Getting Started page where you can try it out.

Alternatively, if you’re ready to get started with Presto in production you can check out IBM watsonx.data , a Presto-based open data lakehouse. Watsonx.data is a fit-for-purpose data store, built on an open lakehouse architecture, supported by querying, governance and open data formats to access and share data.

1 Uber. EMA Technical Case Study, sponsored by Ahana. Enterprise Management Associates (EMA). 2023.

More from Artificial intelligence

Generative ai vs. predictive ai: what’s the difference.

5 min read - Many generative AI tools seem to possess the power of prediction. Conversational AI chatbots like ChatGPT can suggest the next verse in a song or poem. Software like DALL-E or Midjourney can create original art or realistic images from natural language descriptions. Code completion tools like GitHub Copilot can recommend the next few lines of code. But generative AI is not predictive AI. Predictive AI is its own class of artificial intelligence, and while it might be a lesser-known approach,…

How two software companies are using IBM watsonx for their enterprise generative AI solutions

3 min read - In the year since we unveiled IBM’s enterprise generative AI (gen AI) and data platform, we’ve collaborated with numerous software companies to embed IBM watsonx™ into their apps, offerings and solutions. By equipping partners with the latest gen AI technologies, expertise and support, we’re helping them make an impact across industries, including financial services, IT, sales, business intelligence and sports. To build on this momentum, we continue to introduce new capabilities around watsonx, such as our open source IBM® Granite™…

IBM StreamSets is now available for real-time decision making

4 min read - We are thrilled to announce the general availability of IBM StreamSets for real-time data integration. To maintain an edge over competitors and improve their bottom-line without undermining growth, leaders need to steer organizations effectively, making decisions that are informed by current data, quickly. Indeed, highly data-driven organizations are three times more likely to report significant improvements in decision-making compared to those who rely less on data.  But organizations face significant challenges in accessing reliable, up-to-date data to power decision making. Eighty-two percent of…

IBM Newsletters

Logo.

Digital Innovation and Transformation

Mba student perspectives.

  • Assignments
  • Assignment: Competing with Data

Uber knows you: how data optimizes our rides

case study how uber uses big data

While Uber transports people and meals around the world without owning a car, they still rely on fuel: Data, data and more data – the magic word for Uber.

Everyone knows Uber. But dude, they know you at least equally well!

While Uber transports people around the world without owning a car, there is only one fuel that powers Uber: Data. This is the secret key driving growth of the silicon valley start-up revolutionizing the taxi industry. What makes Uber unique is that the data driven insights don’t just stay within its internal dashboards but are implemented real-time into its services to generate an unprecedented user experience for both customers and drivers. 1

Wait, what’s the use of knowing my way to work?

Come on, you can do better! Uber uses data in many different ways with two applications standing out.

Matching Algorithms

Pathways to a Just Digital Future

Starting as soon as you open the app, until you reach your destination, Uber’s routing engine and matching algorithms are working hard. By entering the planned route and time of day, prediction models directly forecast the driving time and allocates the optimal driver through a process called batch-matching.

Through a machine learning algorithm, the models become more accurate in their predictive power with each ride filed. This matching algorithm allows Uber to minimize the number of variables a customer has to enter. In addition to that, they offer lower wait times and a more reliable experience for riders. Drivers, in turn, get more time to earn. 1

Surge Pricing

The instant implementation of live data allows Uber to effectively operate a dynamic pricing model. Using geo-location coordinates from drivers, street traffic and ride demand data, the so called Geosurge-algorithm compares theoretical ideals with what is actually implemented in the real world to make alterations based on the time of the journey. Using this process, fares are updated in real time based on demand. In addition, this allows prices to be adjusted specifically to different areas in cites, so that some neighborhoods may have surge pricing while others do not. 2

case study how uber uses big data

Furthermore, smart machine learning algorithms will take multiple data inputs and predict where the highest demand is going to be. During peak time, drivers receive live data in form of heat maps to compare the demand in different areas. 3

case study how uber uses big data

This system allows Uber to optimally position drivers ensuring that there is no supply and demand shortage. Doing so, they create the most efficient market and maximize the number of rides it can provide which in turn benefits all parties. 1

But that’s billions of data – how do they manage?

That’s right, Uber gives about 15 million rides per day. 4 To manage this data flood, they introduced its own Machine Learning platform called Michelangelo which is used to create different models for Uber’s various services.

Michelangelo is an internal ML-as-a-service platform that democratizes and optimizes the scaling of AI, ML and Deep Learning. It enables internal teams to seamlessly build, deploy, and operate machine learning solutions at Uber’s scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. For the Geeks, visit this page where Michelangelo is presented in detail. 5

case study how uber uses big data

Boy, this sounds expensive – was it really necessary?

Hell yes! Before Michelangelo was born, Uber’s ML operations faced big challenges such as bad data quality, high data latency, lack of efficiency and scalability, and poor reliability. With its business growing exponentially, the amount of incoming data increased every day.

Being Uber means being efficient! Travis Kalanick – Co-founder of Uber

To realize Michelangelo, new data scientists, analysts and engineers had to be hired and the computing power and its internet bandwidth had to be heavily increased. 6,7 There are no exact spending figures available on this, but Ubers financials’ show that R&D spending increased by over 150 million 8 over the year prior to implementation in 2017. Although the entire amount was certainly not invested in this project, we expect that quite some money was spent for Uber’s new best buddy.

So, all their problems are solved now?

You have no idea! Even though Uber has managed to successfully process and use the vast amounts of data, they still face major challenges. The most important to mention here are the status of its drivers, tax issues, constitutional issues and of course the rising competition of companies such as Lyft, Didi or Grab ( details about challenges ). 9 In my view, however, Uber remains a highly competitive company with virtually no limits. Consider the diverse offerings such as packaging and food delivery, the upcoming driverless technologies and of course even air taxis which is by the way my favorite idea!

But Jesus! Think about how much data you need to manage for that!

1 How Uber uses data science to reinvent transportation? (projectpro.io)

2 How Surge Pricing Works | Drive with Uber | Uber

3 When and where are the most riders? | Driving & Delivering – Uber Help

4 Scaling Machine Learning at Uber with Michelangelo | Uber Blog

5 Data Science at Uber. Uber is one of the most successful… | by Jagandeep Singh | Medium

6 Uber’s Big Data Platform: 100+ Petabytes with Minute Latency | Uber Blog

7 Evolving Michelangelo Model Representation for Flexibility at Scale | Uber Blog

8 Uber R&D spending worldwide 2018 | Statista

9 4 Challenges Uber Will Face in the Next Years (investopedia.com)

10 https___blogs-images.forbes.com_amitchowdhry_files_2016_05_Uber-Surge-Pricing.jpg (960×573) (gettagged.us)

Student comments on Uber knows you: how data optimizes our rides

Yannik — thanks for the post, it was both hilariously written AND interesting. It was thought-provoking to read about how Uber is able to adjust its services in real-time, versus using big data as an input to make its product better in the long-term. Even though Uber and Lyft have achieved mass scale, I do wonder if they will continue to be competitive with rising prices and the increased ubiquity of big data as a business asset.

Great post!

Something I have always thought of is if and how algorithms can be trained to show empathy and act ethically. Your point about Uber being able to selectively surge charge brings back memories of Uber surcharging during mass shootings. I wonder if at some point algorithms will be able to cross reference what is going on in the public domain (news, online, etc) with location info and at some point make these ethical decisions without human intervention.

Great blog and an interesting read Yannik! Uber has definitely done a great job in eliminating the customer pain points around commuting by leveraging customer data But as I see their increasing challenges especially in the developing economies like India: frequent cancellation by drivers, drivers insisting on cash payment due to lack of payment transparency for drivers (which was sorted 2 months back by uber after being in India for almost 10 years), poor customer service, and now rising competition with electric vehicle ride hailing player. Uber had been able to do good in the US and some part of the European market, but it has struggled from the beginning in the developing market due to stiff competition. I’m really curious to know what will be their next growth strategy, what will be their future? And how are they going to use the plethora of customer data to make their next bet?

Yannik, this was an awesome read! I used Uber/Lyft on a daily basis when I worked in consulting and am still a frequent user of it now so I love asking the drivers about how the app works for them. One of the fascinating things I heard was that if a top-rated driver is on their way to pick up a non-top-rated user and a top-rated user subsequently requests a ride, the app will cancel the original ride to the non-top-rated user and redirect the driver to the top-rated user instead. I understood this as the app ensuring that their top-rated users have the best service from their best drivers (not necessarily to incentivize users to be better riders, since most users are unaware of this mechanism) but reading from your post, it strikes me that it may also be a cost saving mechanism to link its best drivers and users to “minimize the number of variables” for both parties and curate highly efficient rides to increase capacity.

Leave a comment Cancel reply

You must be logged in to post a comment.

How Uber uses data science to reinvent transportation?

Understand how the ride sharing service Uber uses big data and data science to reinvent transportation and logistics globally.

How Uber uses data science to reinvent transportation?

With more than 8 million users, 1 billion Uber trips and 160,000+ people driving for Uber across 449 cities in 66 countries – Uber is the fastest growing startup standing at the top of its game. Tackling problems like poor transportation infrastructure in some cities, unsatisfactory customer experience, late cars, poor fulfilment, drivers denying to accept credit cards and more –Uber has “eaten the world” in less than 5 years and is a remarkable name to reckon when it comes to solving problems for people in transportation.

data_science_project

Ola Bike Rides Request Demand Forecast

Downloadable solution code | Explanatory videos | Tech Support

If you have ever booked an Uber, you might know how simple the process is –just press a button, set the pickup location, request a car, go for a ride and pay with a click of a button. The process is simple but there is a lot going on behind the scenes. The secret key driving growth of the $51 billion start-up, is the big data it collects and leverages for insightful and intelligent decision making. While Uber moves people around the world without owning any cars, data moves Uber. With the foundation to build the most intelligent company on the planet by completely solving problems for riders –Big Data and Data Science are at the heart of everything Uber does - surge pricing, better cars, detecting fake rides, fake cards, fake ratings, estimating fares and driver ratings. Read on to understand how Uber makes clever use of big data and data science to reinvent transportation and logistics globally.

Uber Big Data and Data Science

Table of Contents

Big data at uber, data products at uber - surge pricing, matching algorithms at uber, fare estimates, uber data science tools.

ProjectPro Free Projects on Big Data and Data Science

“Uber lives or dies by data. Their overall mission and their sustainability is completely dependent on how good their data is. The more data they can collect, the more information they can derive from patterns and behaviours. Their ability to increase profits is all dependent on that.” - said Spencer, a former Uber driver.

There is no need to look for a local taxi or to tip a bellman for the ride, you are just a click away from a high quality customer experience with Uber’s revolutionizing data driven business model. Data is the biggest asset for Uber and its complete business model is based on the big data principle of crowdsourcing. Anybody with a car willing to help someone get to a desired location can offer help in getting them there.

New Projects

It is tricky to get sufficient details on Uber’s big data infrastructure but we have some interesting information here about Uber’s big data. Uber’s data is collected in a Hadoop data lake and it uses spark and hadoop to process the data. Uber’s data comes from a range of data types and databases like SOA database tables, schema less data stores and the event messaging system, Apache Kafka.

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Uber is greedy about what data it collects and with many cheap relative storage options like Hadoop and Spark -it has got data about every single GPS point for every trip taken on Uber. Uber stores historic information about its system and capabilities to ease doing data science for its data scientists down the road. Keeping the change logs, versioning of database schemas helps data scientist answer every question on-hand. With the data Uber has, data scientists can answer questions like what did the Uber system look like at a particular point of time from a customer perspective, supply behaviour perspective, from inter-server communication perspective or even to the state of a database.

With a huge database of drivers, as soon as a user requests for car, their algorithms match a user with the most suitable driver within a 15 second window to the nearest driver. Uber stores and analyses data on every single trip the users take which is leveraged to predict the demand for cars, set the fares and allocate sufficient resources. Data science team at Uber also performs in-depth analysis of the public transport networks across different cities so that they can focus on cities that have poor transportation and make the best use of the data to enhance customer service experience.

Here's what valued users are saying about ProjectPro

user profile

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

user profile

Abhinav Agarwal

Graduate Student at Northwestern University

Not sure what you are looking for?

In fact, uber drivers continue to generate data for Uber even when they are not carrying any passengers because they transmit data back to the central platform at Uber which is used to draw inferences on traffic patterns. The data is stored into the database for supply and demand algorithm analysis. Driver data is used for autonomous car research, surge pricing, tracking the location of drivers, monitoring driver’s speed, motion and acceleration and identifying if a driver is working for a competing cab sharing company.

Big data analysis spans across diverse functions at Uber – machine learning, data science, marketing , fraud detection and more. Uber data consists of information about trips, billing, health of the infrastructure and other services behind its app. City operations teams use uber big data to calculate driver incentive payments and predict many other real time events. The complete process of data streaming is done through a Hadoop Hive based analytics platform which gives right people and services with required data at right time.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

“Whether it’s calculating Uber’s “surge pricing, “helping drivers to avoid accidents, or finding the optimal positioning of cars to maximize profits, data is central to what Uber does. All these data problems…are really crystalized on this one math with people all over the world trying to get where they want to go. That’s made data extremely exciting here, it’s made engaging with Spark extremely exciting.”- said Uber’s Head of Data Aaron Schildkrout.

Data Science at Uber

Data science is an integral part of Uber’s products and philosophy. Uber does an exceptional job of hiring data-oriented people throughout the company through its exclusive Uber Analytics test v3.1. Any individual applying a job at Uber that requires analysing back-end extract from the application, has to take the Uber Analytics Test.

Recommended Reading:    Top 20 Data Analytics Projects for Students to Practice in 2021

On the product front, Uber’s data team  is behind all the predictive models powering the ride sharing cab service right from predicting that “Your driver will be in here in 3 minutes.” to estimating fares, showing up surge prices and heat maps to the drivers on where to position themselves within the city.The business success of Uber depends on its ability to create a positive user experience through statistical data analysis. What make Uber unique is that the data science driven insights don’t just stay within the dashboards or company reports but they are implemented in real-time into its to create a positive user experience for customers and drivers.

Explore Categories

To create the most efficient market and maximize the number of rides it can provide –Uber uses surge pricing.  You are running late and stressed enough to take the public transport, Uber could come to your rescue, and however you soon notice that they will charge you 1.5 times more than the usual rate.

Sometimes when you try to book an Uber, and what you thought would be a $10 ride is going to be 2 or 3 or even 4 times more – this is due to the surge pricing algorithms that Uber implements behind the scenes. Data Science is at the heart of Uber’s surge pricing algorithm. Given a certain demand, what is the right price for a car based on the economic conditions. The king of ride sharing service maintains the surge pricing algorithm to ensure that their passengers always get a ride when they need one even if it comes at the cost of inflated price. Uber has even applied for a patent on big data informed pricing i.e. surge pricing.

Most of the predictive models at Uber follow the business logic on how pricing decisions are made. For instance, the Geosurge (name for surge pricing or dynamic pricing model at Uber) looks at the data available and then compares theoretical ideals with what is actually implemented in the real world. Uber’s surge pricing model is based on both geo-location and demand (for a ride) to position drivers efficiently. Data science methodologies are extensively used to analyse the short term effects of surge pricing on customer demand and long term effects of surge pricing on retaining customers. Uber depends on regression analysis to find out which neighbourhoods will be the busiest so it can activate surge pricing to get more drivers on the roads.

Uber recently announced that it’s going to limit the use of surge pricing through machine learning. The machine learning algorithms will take multiple data inputs and predict where the highest demand is going to be so that Uber drivers can be redirected there. This will ensure that there is no supply and demand shortage so that it does not have to actually implement surge pricing. Uber has not yet confirmed as to when this new system with smart machine learning algorithms would be rolled out to reduce surge pricing.

Get FREE Access to Machine Learning Example Codes for Data Cleaning, Data Munging, and Data Visualization

Timing is everything at Uber. Given a pickup location, drop off location and time of the day, predictive models developed at Uber predict how long will it take for a driver to cover the distance. Uber has sophisticated routing and matching algorithms that direct cars to people and people to places. Right from the time you open the uber app till you reach your destination, Uber’s routing engine and matching algorithms are hard at work.

Uber follows a supplier pick map matching algorithm where the customer selects the variables associated with a service (in this case Uber app) and makes a match by sending requests to the most optimal list of service providers. Any Uber ride request is first sent to the nearest available Uber driver (the nearest available Uber driver is determined by comparing the customer location with the expected time of arrival of the driver). The Uber driver then accepts or rejects a ride request. This matching algorithm works well for Uber since the transaction is highly commoditized i.e. the number of variables that the customer has to decide before a match is made are minimal.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Uber uses a mixture of internal and external data to estimate fares. Uber calculates fares automatically using street traffic data, GPS data and its own algorithms that make alterations based on the time of the journey. It also analyses external data like public transport routes to plan various services.

Get More Practice, More Data Science and Machine Learning Projects , and More guidance.Fast-Track Your Career Transition with ProjectPro

Python is the go-to data science programming language at Uber and is extensively used by the Uber data team. Commonly used third party modules to do data science at Uber include NumPy, SciPy, Matplotlib and Pandas. Uber data team does use R programming language , Octave or Matlab occasionally for prototypes or one-off data science projects and not for production stack. D3 is the most preferred data visualization tool at Uber and Postgres, the most preferred SQL framework.

What can you expect in future from Uber‘s data driven methodologies?

With initiatives like UberFresh for grocery deliveries, UberRush for package courier service and UberChopper offering helicopter rides to the wealthy-Uber is all set to revolutionize private transportation globally. Uber knows the popular nightclubs in the city, best in class restaurants and has data about traffic patterns across different regions. Uber’s data would be soon be combined with customer specific personal data in exchange of benefits making Uber the big Big Data Company. Soon, citizens would not mind sharing their SSN with Uber if they use your data to book a restaurant for a romantic dinner date on Valentine’s Day that has good live music and arrange a pick up for you and your wife in a luxury car.

Access Data Science and Machine Learning Project Code Examples

So the next time on your “Uber” ride experience, do think of some data science that is going behind the scenes. The quality of service that you are enjoying is the due to the big data being analysed and data science being applied, to create a better riding experience for you.

Access Solved Big Data and Data Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

How Uber Uses Data and Analytics (Case Study)

Everyone knows Uber as a shared service for point-to-point transportation, but not everyone knows Uber as a data and analytics company.

In this EMA technical case study, sponsored by Ahana, you’ll learn about:

  • What is Presto?
  • The evolution of its use at Uber
  • The analytical use cases of Presto, and more. 

To download, fill out the form on the right. 

case study how uber uses big data

Big Data in Practice by Bernard Marr

Get full access to Big Data in Practice and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

42 UBER How Big Data Is At The Centre Of Uber’s Transportation Business

Uber is a smartphone app-based taxi booking service which connects users who need to get somewhere with drivers willing to give them a ride. The service has been hugely popular. Since being launched to serve San Francisco in 2009, the service has been expanded to many major cities on every continent except for Antarctica, and the company are now valued at $41 billion. The business are rooted firmly in Big Data, and leveraging this data in a more effective way than traditional taxi firms has played a huge part in their success.

What Problem Is Big Data Helping To Solve?

Uber’s entire business model is based on the very Big Data principle of crowdsourcing: anyone with a car who is willing to help someone get to where they want to go can offer to help get them there. This gives greater choice for those who live in areas where there is little public transport, and helps to cut the number of cars on our busy streets by pooling journeys.

How Is Big Data Used In Practice?

Uber store and monitor data on every journey their users take, and use it to determine demand, allocate resources and set fares. The company also carry out in-depth analysis of public transport networks in the cities they serve, so they can focus coverage in poorly served areas and provide links to buses and trains.

Uber hold a vast database of drivers in all of the cities they cover, so when a passenger asks for a ride, they can ...

Get Big Data in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

case study how uber uses big data

  • About Masters of Media
  • Current students
  • Alumni: Class of 2018-2019
  • Alumni: Class of 2017-2018
  • Alumni: Class of 2016-2017
  • Alumni: Class of 2015-2016
  • Alumni: Class of 2014-2015
  • Alumni: Class of 2012-2013
  • Blog Writing Guide Lines

The Big Problem with Uber’s Big Data: Ethics and Regulation of Data Ownership

Print Friendly, PDF & Email

“Technology is neither good nor bad; nor is it neutral” (Kranzberg 1986, p. 545)

That is why it is key to understand how we, as users and moderators, give additional meaning to technological features and participate in the complex chain of interactions they bring across society. Living in the mere beginnings of the era of “Big Data”, it is pressing to address the cultural and ethical implications of a phenomenon often idolised and seen as a universal answer by many in the business and scientific spheres (Boyd & Crawford 2012). Who should have access to big datasets? Should the use of Big Data be regulated and how to address the privacy concerns that come with mass information collection? Using the work of Danah Boyd and Kate Crawford “Critical Questions for Big Data”, we will analyse how the gig economy companies, taking Uber as an example, are handling Big Data and why it has caused a series of ethical controversies and recent legislative action.

Questioning Big Data

Boyd and Crawford define Big Data as a “cultural, technological, and scholarly phenomenon that rests on the interplay of technology, analysis, and mythology” (Boyd & Crawford 2012, p.663). This definition breaks away from the understanding that Big Data is just a dataset too large for human comprehension and transforms the term into a more complex phenomenon of social, not scientific, origins. Consequently, this leaves room for theorising and critiquing the role of Big Data in many social shifts. Boyd and Crawford develop six provocative claims about the influence of this phenomenon, three of which have a specific relevance to the case of Uber and will serve as the theoretical backbone of analysis – evaluating how Big Data changed the definition of work but created new ethical issues and failed to deliver on its promise of objectivity. 

Big Data Changes Definitions

Big Data is at the core of Uber’s business model. It collects, analyses, and stores huge amounts of information that is later used to fuel the algorithms of the platform and produce an “optimised” (according to pre-determined criteria in the AI) personalised service. With these abilities, Uber gained an extraordinary market-breaking advantage in the ride hailing industry (Rogers 2015). Most importantly, it redefined how “work” is perceived by introducing the “on-demand digital independent contractor” (Malin & Chandler 2017) model. 

Big Data gave Uber enough power and agency to be able to attract workers with its ease-of-use and escape the classic employee-employer relationship, defining itself as a data-powered platform that serves as a mediator between drivers and consumers (Wilhelm 2018). With this position, Uber solely relies on Big Data and the algorithms that collect and use it to balance the complex relationship between service providers and customers, an approach that seems heavily technologically deterministic. Nevertheless, for good or bad, Uber and the data-powered gig economy have irreversibly changed the way people define work in the service industry – to a point that “app workers” accounts for the majority of the ride-hailing and delivery labour force (Malin & Chandler, 2017).

Just Because it Is Accessible Does not Make it Ethical 

Boyd and Crawford make the important point that Big Data can produce “destabilising amounts of knowledge and information that lack the regulating force” (Boyd & Crawford 2012, p.666). Uber is experiencing this effect more and more recently with a growing amount of legislative action taken against the company’s data collection policies and lack of algorithmic transparency. The ethics of data ownership and availability have become the “next frontier in the fight for gig workers’ rights” (Clarke 2021). 

As Uber drivers are considered independent contractors and not employees, the company has not deemed it necessary to share with its workers the data it collects about their work and how it influences the algorithm’s opinion of individual workers. Drivers also have no way to retrieve their personal data, to erase it, or to migrate it if they decide to start working at a competitive platform (although the GigCV initiative is currently trying to make the latter possible). 

The ethics behind data ownership in the gig economy is a heavily disputed topic, but recent court decisions are turning the debate in favour of workers (Reshaping Work, 2021). In a landmark case of March 2021, Amsterdam’s District Court ruled that Uber must disclose “data used to deduct earnings, assign work, and suspend drivers” and also shed light on how driver surveillance systems are used in the Netherlands (Ongweso Jr, 2021). Similar rulings across Europe suggest that the debate around regulating Big Data is more a “when” and “how” than an “if” question at that point. 

Claims to Objectivity and Accuracy Are Misleading 

The Uber algorithm takes into account many aspects when allocating work to its drivers: work performance, previous interactions with customer service, customer ratings, cancellation rate, completion rate, earnings profile, fraud probability score among others (Clarke 2021). However, nobody truly knows the exact extend of data collection and the way algorithms utilise this information. Uber is notoriously reluctant to share such data with researchers, policymakers, or the public. Nevertheless, there are jurisdictions where Uber has been legally forced to provide certain datasets to data scientists, most notably in Chicago. This lead to the discovery of bias and racial discrimination in the company’s dynamic pricing algorithms in a study on over 68 million Uber rides in Chicago (Wiggers 2020). Critiquing Big Data with a study based on Big Datasets is exactly the kind of self-reflexivity that is often lacking in the scientific community (Boyd and Crawford 2012), but this trend can also be explained by the lack of openly accessible datasets that deem a larger territorial study on the subject impossible.

We Are Our Tools

There is a “deep industrial drive toward gathering and extracting maximal value from data” (Boyd & Crawford 2012) and that is not inherently negative. However, we should remain mindful and question the ethical implications of this new data-driven society. As the example of Uber showcased, Big Data is not a magical universal solution, and its flawed collection and interpretation can cause serious social divides and issues. “We are our tools” (Boyd and Crawford 2021, p.675) and we should be aware and responsible for the consequences they cause.

Comments are closed.

Related Posts

Gig industry and the issue of data, ‘quiet ride’ in uber: technology defeats small talk., the questionable ethics of r/hermancainaward.

Using Big Data to Estimate Consumer Surplus: The Case of Uber

Estimating consumer surplus is challenging because it requires identification of the entire demand curve. We rely on Uber’s “surge” pricing algorithm and the richness of its individual level data to first estimate demand elasticities at several points along the demand curve. We then use these elasticity estimates to estimate consumer surplus. Using almost 50 million individual-level observations and a regression discontinuity design, we estimate that in 2015 the UberX service generated about $2.9 billion in consumer surplus in the four U.S. cities included in our analysis. For each dollar spent by consumers, about $1.60 of consumer surplus is generated. Back-of-the-envelope calculations suggest that the overall consumer surplus generated by the UberX service in the United States in 2015 was $6.8 billion.

We are grateful to Josh Angrist, Keith Chen, Joseph Doyle, Hank Farber, Alan Krueger, Greg Lewis, Jonathan Meer, and Glen Weyl for helpful comments and discussions. We are also grateful to Mattie Toma for excellent research assistance. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

Peter Cohen transitioned from paid independent contractor to full-time employee of Uber during the writing of the paper. As a current employee, he has an equity stake in the company.

Jonathan Hall was an employee and shareholder of Uber Technologies before, during, and after the writing of this paper.

MARC RIS BibTeΧ

Download Citation Data

Working Groups

Mentioned in the news, more from nber.

In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship  — as well as online conference reports , video lectures , and interviews .

2024, 16th Annual Feldstein Lecture, Cecilia E. Rouse," Lessons for Economists from the Pandemic" cover slide

You are using an outdated browser. Please upgrade your browser to improve your experience.

case study how uber uses big data

UPDATED 14:27 EDT / JUNE 17 2023

case study how uber uses big data

Uber’s real-time architecture represents the future of data apps: Meet the architects who built it

case study how uber uses big data

BREAKING ANALYSIS by Dave Vellante with George Gilbert

Uber Technologies Inc. has one of the most amazing business models ever created. The company’s mission is underpinned by technology that helps people go anywhere and get anything — and the results have been stunning.

In just over a decade, Uber has become a firm with more than $30 billion in annual revenue, an annual bookings run rate of $126B and a market capitalization near $90 billion today. Moreover, the company’s productivity metrics are three to five times greater than what you’d find at a typical technology company when, for example, measured by revenue per employee. In our view, Uber’s technology stack represents the future of enterprise data apps where organizations will essentially create real-time digital twins of their businesses and in doing so, deliver enormous customer value.

In this Breaking Analysis, we introduce you to one of the architects behind Uber’s groundbreaking fulfillment platform. We’ll explore their objectives, the challenges they had to overcome, how Uber has done it and why we believe its platform is a harbinger for the future.

The technical team behind Uber’s fulfillment platform

Shown here are some of the technical team members behind Uber’s fulfillment platform. These individuals went on a two-year journey to create one of the most loved applications in the world today.

case study how uber uses big data

It was our pleasure to welcome Uday Kiran Medisetty, distinguished engineer at Uber. He has led, bootstrapped and scaled up major real-time platform initiatives in his time at Uber and agreed to share how the team actually accomplished this impressive feat of software and networking engineering.

Watch this clip of Uday describing his background and role at Uber.

Uber as the future of enterprise data apps

Back in March , we introduced this idea of Uber as an instructive example of the future of enterprise data apps. We put forward the premise that the future of digital business applications will manifest itself as digital twins that represents people, places and things of a business operation. We said that increasingly, business logic will be embedded into data on which applications will be built from a set of coherent data elements.

case study how uber uses big data

The evolution of enterprise applications

When we follow the progression of enterprise applications throughout history, it’s useful to share some examples of inflection points on the journey and where we are today.

The graphic below describes the history in simple terms, starting with enterprise 1.0 which focused on departmental or back office automation. The orange represents the enterprise resource planning movement, where a company like Ford integrated all its financials, supply chain and other internal resources into a coherent set of data and activities that drove productivity. Then came Web 2.0 for the enterprise.

And here we’re talking about using data and machine intelligence in a custom platform to manage an internal value chain using modern techniques. We use Amazon.com Inc.’s retail operation (not Amazon Web Services Inc.) as the example.

case study how uber uses big data

Uber represents a major milestone in application experiences

Highlighted by the red dotted line, we show “Enterprise Ecosystem” apps, which is where we place Uber. Uber is one of the first, if not the first, to build a custom platform to manage an external ecosystem. To the right we show the Consumer Metaverse represented in the gaming industry.

Our fundamental premise is that what Uber has built represents what eventually mainstream companies are going to be able to buy. That is packaged apps that use artificial intelligence to orchestrate an Uber-like ecosystem experience with off-the-shelf software and cloud services. Because very few organizations possess a team like Uber’s, we believe an off-the-shelf capability that can be easily deployed by organizations will be in high demand as real-time data apps become mainstream platforms.

With this as background, we dove into a series of Q&A with Uday that we’ll summarize below with our questions and Uday’s response in the pull quotes.

Q1.   Uday, can you explain in layman’s terms how the architecture of an application orchestrating an entire ecosystem differs significantly from the traditional model of packaged apps managing repeatable processes? 

One of the fascinating things about building any platforms for Uber is how we need to interconnect what’s happening in the real world to build large scale, real-time applications that can orchestrate all of this at scale. There is a real person waiting in the real world to get a response from our application whether they can continue with the next step or not. If you think about our scale, with, for example, the last FIFA World Cup, we had 1.6 million concurrent consumers interacting with our platform at that point in time. This includes riders, eaters, merchants, drivers, couriers and all of these different entities. They are trying to do things in the real world and our applications has to be real-time, everything needs to be consistent, it needs to be performant, and on top of all of this, we need to be cost-effective at scale. Because if we are not performant, if you’re not leveraging the right set of resources, then we can explode our overall cost of managing the infrastructure. So these are all some unique challenges in building an Uber-like application. And we can go into more details on various aspects both at breadth and also in depth.

Our takeaway:

Uber’s application is a complex orchestration of real-time, large-scale activities, demanding immediate, reliable and efficient responses to various user interactions. Performance and cost-effectiveness are pivotal to handle its massive user base without inflating infrastructure costs. We believe this represents a radical shift and a monumental transformation from the application’s perspective, especially considering our common experiences within the world of enterprise tech.

Watch this short clip of Uday explaining how Uber’s app connects real world entities and the differs from conventional enterprise apps.

Q2. Uday, based on a series of  blog posts  that you and the  team authored , we know you ran into limits back in 2014 with the existing architecture. What limitations did the 2014 architecture pose to Uber’s mission at scale, prompting the need for an architectural rewrite as you mentioned in your blogs? Particularly, could you elaborate on the tradeoff you discussed — optimizing for availability over latency and consistency? Why was this a problem and how did Uber address this issue?

If you think back to 2014 and what was the most production-ready databases that were available at that point, we could not have used traditional SQL-like systems because of the scale that we had even at that point in time. The only option we had, which provided us some sort of scalable realtime databases was no-SQL kind of systems. So we were leveraging Cassandra and the entire application that drives the state of the online orders, the state of the driver sessions, all of the jobs, all of the waypoints, all of that was stored on in Cassandra. And over the last eight years we have seen [a big change in] the kind of fulfillment-use cases that we need to build for. So whatever assumptions that we made in our core data models and what kind of entities we can interact has completely changed. So we had to, if not anything else, change our application just for that reason. Then second, because the entire application was designed with availability as the main requirement, and latency was more of a best effort and consistency was more of a best effort mechanism, whenever things went wrong, it made it really hard to debug. For example, we don’t want a scenario where if you request a ride, two drivers show up at your pickup point, because the system could not reconcile whether this trip was already assigned to a particular driver or it wasn’t assigned to anyone. And those were real problems that would happen if we didn’t have a consistent system. So there were several problems at the infrastructure layer at that point. One is consistency that I mentioned already, and because we didn’t have any atomicity, we had to make sure the system automatically reconciles and patches the data when things go out of sync based on what we expect the data to be. There was a lot of scalability issues. Because we were getting to a best-effort consistency, we were using at the application layer some sort of hashing. And what we would do is [try to] get all of the updates for a given user routed to a same instance and have a queue in that instance so that even if a database is not providing consistency, we have a queue of updates. So we made sure there’s only one update at any point in time. That works when you have updates only in two entities, so then at least you can do application level orchestration to ensure they might eventually get in sync, but it doesn’t scale beyond that. And because you are using hashing, we could not scale our cluster beyond the vertical limit. And that also inhibited our scale challenges, especially for large cities that we want to handle, we couldn’t go beyond a certain scale. So these were the key infrastructure problems that we had to fundamentally fix so that we can set ourselves up for the next decade or two.

The architecture of Uber’s app in 2014, while suitable for its time, faced substantial challenges as the company grew and its use cases evolved. The architecture faced issues around consistency, lack of atomicity, hashing collisions that limited scalability and the like. These, coupled with changing business requirements, highlighted the necessity for an architectural rewrite to ensure the platform’s sustainability and success in the decades to come.

Watch this three-minute clip of Uday explaining the challenges Uber was facing back in 2014 that necessitated an architectural change.

Q3. Uday, considering Uber’s expansion beyond just drivers and riders to support new verticals and businesses, could you discuss how improvements in database capabilities have affected your approach to consistency and latency? Furthermore, could you elaborate on how you’ve generalized the platform to accommodate these new business ventures?

So that’s a great question. You know, one of the things we had to make sure was as the kind of entities change within our system, as we have to build new fulfillment flows, we need to build a modular and leverageable system at the application level. At the end of the day, we want the engineers building core applications and core fulfillment flows abstracted away from all of the underlying complexities around infrastructure, scale, provisioning, latency, consistency. They should get all of this for free, and they don’t need to think about it. When they build something, they get the right experience out of the box. So what we had to do was, at our programming layer, we had a modular architecture where every entity, for example, let’s say there is an order… there is an order representation, there’s a merchant, there’s a user or an organization representation, and we can store these objects as individual tables and we can store the relationships between them as another table that stores the relationships between these objects. So whenever new objects get into the system and whenever we need to introduce new relationships, they are stored transactionally within our system. We use the core database, you can think of it as a transactional key value store. At the database layer, we still only store the key columns that we need and rest of the data is stored as a serialized blob so that we don’t have to continuously update the database. Anytime we add new attributes for a merchant or for a user, we want to [minimize] operational overhead. But at the high level, every object is a table and then every relationship is a row in another table, and then whenever new objects or relationships get introduced, they are transactionally committed.

Our takeaway: a semantic layer built in the database

What Uday describes above is an implementation of a semantic layer within the database. Uber has developed a flexible and modular application architecture that supports the introduction and management of new business verticals and entities, freeing engineers from infrastructure and data complexities. This architecture enables the smooth integration of new objects or relationships, maintained transactionally within their system, allowing for scalable and efficient system growth.

Watch this two-minute clip where Uday explains how Uber efficiently deals with the added complexity of new elements and vertical use cases.

The critical aspects of Uber’s new architecture

case study how uber uses big data

Above we show a chart here from Uber Engineering. One objective in our research is to understand how the new architecture differs from the previous system and how Uber manages the mapping between the physical world (people, places and things) and the logical user experience. The way we understand this graphic is the Green is the application layer (intermixed with the data platforms) on lefthand side. The righthand side represents the new architecture and shows that Uber has separated the application services (at the top in green) from the data management below, where Google Cloud Spanner comes in to play.

Q4. Uday, could you help us grasp the key aspects and principles of your new architecture, particularly in contrast with your previous one? Could you explain how this new architecture differs from the previous one and what these changes mean to your business?

We went through some of the details of the previous architecture earlier. Like the core data was stored in Cassandra and because we want to have low-latency reads, we had a Redis cache as a backup whenever Cassandra fails, or whenever we want some low-latency reads and we went through  Ringpop , which is the application-layer shard management, so that the request gets routed to the instance we need. And there was one pattern I didn’t mention, which was on  Saga pattern , which was a paper from a few decades ago. Ultimately there was a point in time where the kind of transactions that we had to build evolved from just two objects. Imagine, for example, a case where we want to have a batch offer, which means a single driver could accept multiple trips at the same time or not. Now, you don’t have a one-t0-one association, you have a single driver, I have maybe two trips, four trips, five trips, and you have some other object that is establishing this association. Now if we need to create a transaction across all of these objects, we tried using Saga as a pattern, extending our application layer, transaction coordination. But again, it became even more complex because if things go wrong, we have to also write compensating actions. So that system is always in a state where they can proceed. We don’t want users to get stuck and not get new trips. So in the new architecture, the key foundations we mentioned, one was around strong consistency and linear scalability. So the new SQL kind of databases provide that. And we went through an exhaustive evaluation in 2018 across multiple choices. And at that point in time we picked [Google] Spanner as the option. We moved all of the transaction coordination and scalability concerns to the database layer, and at the application layer, we focus on building the right programming model for building new fulfillment flows. And the core transactional data is stored in Spanner. We limit the number of RPCs [remote procedure calls] that we go to from our on-prem data centers to Google Cloud because it’s a latency-sensitive operation, right? And we don’t want to have a lot of chatter between these two worlds. And we have an on-prem cache which will still provide you point in time snapshot reads across multiple entities so that they’re consistent with each other. So for most use cases, they can read from the cache, and Spanner is only used if I want strong reads for a particular object. And if I want cache reads across multiple objects, I go to my cache. If I want to search across multiple objects, then we have our own search system which is indexed on specific properties that we need. So that if I want to get all of the nearby orders that are currently not assigned to anyone, we can do that low-latency search at scale. And obviously we also emit Kafka events within the Uber stack, so then we can build all sorts of near-real-time or OLAP [online analytical processing] applications and then it’s also virtual raw tables, then you can build more derived tables using Spark jobs. But all of those things are happening within Uber’s infrastructure and we use Spanner for strong reads and core transactions that we want to commit across all of the entities and establishing those relationships that I mentioned.

Watch this clip of Uday describing the salient elements of Uber’s new architecture and the role of Google Spanner.

Here’s our summary of Uber’s new architecture:

Uber’s new architecture addressed the complexity of its previous system emphasizing three key changes:

  • Transition to a consistent and scalable database:  The old architecture relied heavily on Cassandra for storing core data and Redis cache as a backup for low-latency reads. The application layer was managed by Ringpop for request routing, and there was a usage of the Saga pattern for transaction coordination. The Saga pattern, however, added complexity, particularly when handling transactions across multiple objects, such as batch offers to drivers. The new architecture prioritizes strong consistency and linear scalability. After an evaluation in 2018, Uber chose Google Spanner, which moved transaction coordination and scalability concerns to the database layer.
  • Shift of focus at the application layer:  In the new architecture, the application layer focuses on developing the right programming model for building new fulfillment flows. Transactional data is stored in Spanner, reducing the number of RPCs from Uber’s on-premise data centers to Google Cloud to avoid latency.
  • Integrating cache, search and data streaming systems:  For most use cases, Uber’s system can read from an on-premises cache. Spanner is used for strong reads of individual objects and core transactions. A separate search system has been developed for fast, scalable searches across multiple objects. This search system allows for operations like finding unassigned nearby orders. Kafka is used for emitting events in real-time within the Uber stack, aiding in the development of real-time and batch applications.

Uber has shifted its architecture significantly, focusing on consistency and scalability with the introduction of Google Spanner. The application layer has been streamlined, concentrating on building new fulfillment flows, while the management of transaction coordination and scalability concerns has been shifted to the database layer. An integrated cache, search and data streaming system further optimizes the operation, allowing for more efficient data retrieval and real-time event management.

Q5. Uday, would it be accurate to state that the successful alignment between the elements in the application and the database allowed Uber to leverage the transactional strengths of the database at both layers, simplifying coordination at the application level? Additionally, can you explain Uber’s deep hybrid architecture, with part of the application operating on-premises and part leveraging unique services from Google Cloud.

[Note: Uber also uses Oracle Cloud Infrastructure for infrastructure services but was out of scope for this discussion. Bob Evans covered this development in a  blog post  claiming Uber was shuttering its own data centers, a claim that is questionable given Uday’s commentary around latency.]

Absolutely. And then I think one more interesting fact is how for most engineers, they don’t even need to understand behind the scenes. They don’t need to know it’s being powered by Spanner or any database. The guarantees that we provide to application developers who are building, for example, fulfillment flows is they have a set of entities and they say, “Hey, for this user action, these are the entities that need to be transactionally consistent and these are the updates I want to make to them.” And then behind the scenes are application layer, leverages, Spanners’, transaction, buffering, makes updates to each and every entity, and then once all the updates are made, we commit so then all the updates are reflected in the storage so that the next strong read will be the latest update.

Q6. The database decision obviously was very important. What was it about Spanner that led you to that choice? Spanner is a globally consistent database. What else about it made it easier for all the applications’ data elements to share their status? You said you did a detailed evaluation. How did you land on Spanner? 

There are a lot of dimensions that we evaluate, but one is we wanted to build using a NewSQL database because we want to have the mix of, for example, ACID [Atomicity, Consistency, Isolation and Durability] guarantees that SQL systems provide and horizontal scalability that NoSQL kind of systems provide. Building large-scale applications using NewSQL databases, at least around that time when we started, we didn’t have that many examples to choose from. Even within Uber we were kind of the first application for managing live orders using a NoSQL based system. But we need external consistency, right? Spanner provides the strictest concurrency control guarantee for transactions so that when the transactions are committed in a certain order, any specific read after that, they see the latest data. That is very important because, imagine we assigned a particular job to a specific driver or courier, and then next moment if we see that, oh, this driver is not assigned to anyone, we might make a wrong business decision and then assign you one more trip and that will lead to wrong outcomes. Then horizontal scalability, because Spanner automatically shards and then it’ll rebalance the shards. And so then we have this horizontal scalability, in fact we have our own autoscaler that listens to our load and Spanner signals and constantly adds new nodes and remove nodes because the traffic pattern Uber has changes based on time of the day and then hour of the day and then also day of the week. It’s very curvy. So then we can make sure we have the right number of nodes that are provisioned to handle the scale at that point in time. I mentioned the server-side transaction buffering, that was very important for us so that we can have a modular application so that each application, each entity that I’m representing, they can commit, update to that entity independently, and a layer above is coordinating across all of these entities. And once all of these entities have updated their part, then we can commit the overall transaction. So the transaction buffering on the server side helped us at the application layer to make it modular. Then all the things around stale reads, point-in-time reads, bounded stillness reads, these help us build the right caching layer so that for most reads, our cache rate probably is like on high 60s, 70 [percent]. So for most reads, we can go to our on-prem cache and only when there’s a cache miss or strong reads, we can go to a storage system. So these were the key things we want from NewSQL. Then Spanner was the one because, with the time to market, because it’s already productionized and we can leverage that solution, but all of these interactions are behind an  ORM [object relational mapping] layer  with the guarantees that we need. So this will help us, over time, figure out if we need to evaluate other options or not. But right now, for most developers, they don’t need to understand what is powering behind the scenes.

Watch this clip of Uday describing the decision process around Google Spanner and the impact on developer productivity.

Here’s our summary. There are five key aspects around the choice of Spanner and its impact on developers:

  • Simplification for developers:  Engineers at Uber aren’t burdened with understanding the intricacies of the underlying database system, such as Spanner. Instead, they can focus on developing features and workflows, secure in the knowledge that they can specify entities that must be transactionally consistent and that the system will handle this reliably.
  • Why Spanner:  Uber wanted to take advantage of the consistency guarantees of SQL systems and the horizontal scalability of NoSQL systems. At the time they were looking, there were not many NewSQL options that fulfilled these requirements. Spanner stood out because it offered external consistency, which ensures that when transactions are committed in a certain order, any subsequent read will see the latest data. This is vital to Uber’s operations as it prevents erroneous decisions based on stale data.
  • Scalability:  Spanner also provides horizontal scalability, automatically sharding and rebalancing shards, which is important given Uber’s variable traffic patterns.
  • Server-side transaction buffering:  This feature was essential for Uber, as it allows the system to commit updates to each entity independently. The application layer coordinates across all of these entities, and once each entity has updated its part, the overall transaction can be committed.
  • Caching capabilities:  Spanner’s features around stale reads, point in time reads, and bounded staleness reads allowed Uber to build an effective caching layer. Most reads can be handled by their on-premises cache, reducing the need to access the storage system directly.

Uber’s system benefits significantly from the use of Google Spanner, which provides consistency, scalability and efficient transaction handling, among other features. The introduction of Spanner has streamlined operations at the application layer and facilitated the implementation of a reliable caching layer. Importantly, this setup shields most developers from having to understand the complexities of the underlying database technology, letting them focus on the application level concerns.

Q7. Please explain how Uber managed to establish systemwide coherency across its data elements. In other words, how did Uber design and develop technology to ensure a unified understanding or agreement on the definitions of critical elements like drivers, riders and pricing? We’re specifically interested in the aspects of the system that enable this coherency.

There are many objects we need to think about, considering what a user sees in the app that need to be coherent and which ones can be kind of stale, but you don’t necessarily notice because not everything needs to have the same amount of guarantees, same amount of latency and so on, right? So if you think about some of the attributes that we manage, we talked about the concept of orders, if a consumer places any intent that is an order within our system, a single intent might require us to decompose that intent into multiple sub objects. For example, if you place an Uber Eats order, there is one job for the restaurant to prepare the food and there is one job object for the courier to pick up and then drop off. And for the courier job object, we have many waypoints, which is the pickup waypoint, dropoff waypoint, each waypoint can have its own set of tasks that you need to perform. For example, it could be taking a signature, taking a photo, paying at the store, all sorts of tasks, right? And all of these are composable and leverageable. So I can build new things using the same set of objects. And in any kind of marketplace we have supply and demand and we need to ensure there is a right kind of dispatching and matching paradigms. In some cases, we offer one job to one supply. In some cases it could be image to end, in some cases it is blast to many supplies. In some cases, they might see some other surface where these are all of the nearby jobs that you can potentially handle. So this is another set of objects which is super-real-time, because when a driver sees an offer card in the app, it goes away in 30 seconds and in 30, 40 seconds they need to make a decision, and based on that we have to figure out the next step, because within Uber’s application, we have changed the user’s expectation of how quickly we can perform things. If we are off by a few seconds, people will start canceling. Uber is hyper-local, so we have a lot of attributes around latitude, longitude, route line, driver’s current location, our ETAs. These are probably some of the hardest to get right, because we constantly ingest the current driver location every four seconds, we have lot of latitude/longitude. The throughput of this system itself is like hundreds of thousands of updates per second. But not every update will require us to change the ETA, right? Your ETA is not changing every four seconds. Your routing is not changing every four seconds. So we do some magic behind the scenes to make sure that, OK, have you crossed city boundaries, only then we might require you to update something. Have you crossed some product boundaries, only then we require you to do some things. So we do inferences to limit the number of updates that we are making to the core transactional system, and then we only store the data that we need, and then there’s a complete parallel system that manages the whole pipeline of, for example, how we receive the driver side of equations and generate navigations and stuff for drivers, and then how we convert these updates and then show it on the rider app. That stream is completely decoupled from the core orders and jobs. And if you think about the Uber system, it’s not just about building the business platform layer. We have a lot of our own sync infrastructure at the edge API layer because we need to make sure all of the application data is kept in sync. They’re going through choppy network conditions, they might be unreliable, and we need to make sure that they get the updates as quickly as possible with low latency, irrespective of what kind of network condition they are in. So there’s a lot of engineering challenges at that layer as well. Ultimately, all of this is working together to provide you the visibility that I can exactly see what’s going on, because if you’re waiting for your driver, if they don’t move, you might cancel assuming that they might not show up. And we need to make sure that those updates flow through, not just our system, but also from our system back to the rider app as quickly as possible.

Watch Uday describe how Uber managed to establish system-wide coherency across its data elements.

Uber’s system is a complex web of various components working in tandem to ensure smooth operations and a positive user experience. Data is categorized based on its necessity for real-time updates, and Uber has built technology to intelligently determine when updates are needed. This helps to maintain a harmonious data environment that can handle numerous scenarios, ensuring coherency across all its data elements.

Our analysis suggests that Uber’s current system operates on two distinct layers: the application layer and the database layer. At the application layer, Uber manages various entities or “things,” and it effectively translates these entities down to the database layer. The transactional semantics of the database make it easier to manage and coordinate these entities.

However, this also highlights that Uber treats the “liveliness” of data as a separate attribute, which allows the company to manage data updates and communication in a way that isn’t strictly tied to its representation in the database. This strategy involves managing updates based on specific properties of each data element.

This approach is noteworthy, especially in light of previous  discussions with Walmart , which highlighted the importance of data prioritization and efficient communication from stores and other edge locations.

Our assertion is that Uber’s strategy could provide a model for other businesses that need to manage complex, real-time data across distributed systems.

How orchestrating real-world entities is different

case study how uber uses big data

The chart above attempts to describe these 3.0 apps where starting at the bottom you have the platform resources then the data layer to provide the single version of truth and then the application services that govern and orchestrate the the digital representations of the real world entities – drivers, riders, packages and the like, and that all supports what the customers see: the Uber app.

A big difference from the cloud stack that we all know is you’re not directly aware of consuming compute and storage. Rather, Uber is offering up physical “things” – access to drivers, merchants, services and the like – and translating that into software.

Q8. Uday, could you explain how Uber decided between using commercial off-the-shelf software and developing its own intellectual property to meet its objectives? Could you outline the thought process behind the “build versus buy” decisions you made and the role of open source software? 

In general, we rely on a lot of open-source technologies, commercial off-the-shelf software and, in some cases, in-house-developed solutions. Ultimately it depends on the kind of specific use case, time to market, maybe you want to optimize for cost, optimize for maintainability. All of these factors come into picture. For the app, the core orders and the code fulfillment system, we talked about Spanner and how we leverage that with some specific guarantees. We use Spanner for even our identity use cases where we want to manage, for example, especially in large organizations, we want to make sure your business rules, your AD [Active Directory] groups, your stuff, how we capture that for our consumers, that has to be in sync. But there is a lot of other services across microservices, across Uber, that leverage Cassandra, if their use case is high ride throughput. And we leverage Redis for all kinds of caching needs. We leverage Zookeeper for low-level infrastructure platform storage needs. And we also have a system that is built on top of MySQL with an RAF-based algorithm called DocStore. So for the majority of the use cases, that is our go-to solution where it provides you shard local transactions and it’s a multimodal database. So it’s useful for most kind of use cases and it’s optimized for cost, because we manage the stateful layer, we manage and we deploy it on our nodes. So for most applications, that will give us the balance of cost and efficiency and for applications that need the strongest level of requirements like fulfillment or identity where we use Spanner. For higher ride input, we use Cassandra. And beyond this when we think about our metrics system, M3DB, it’s an open-source software, open-sourced by Uber, contributed to the community few years ago, it’s a time series database. We ingest millions of metric data points per second and we had to build something on our own. And now it’s an active community and there’s a bunch of other companies leveraging M3DB for metric storage. So ultimately, in some cases we might have built something and open sourced it in some cases we leverage off-the-shelf, in some cases we use completely open source and contribute some new features. For example for, for DataLake, Uber pioneered Apache back in 2016 and contributed. So then we have one of the largest transactional data lakes with maybe 200-plus petabytes of data that we manage.

Watch this clip where Uday describes Uber’s make vs. buy decision and the role of open source software.

Uber’s approach demonstrates a balance between using off-the-shelf and open-source solutions, contributing to open-source projects, and developing in-house solutions. This strategy is determined by factors such as specific use cases, time-to-market considerations, cost-efficiency and maintainability. It underscores the importance of a diversified strategy in software and database solutions to meet the unique needs of various operations.

The value of real-time data apps

This next snippet we’re sharing comes from an Enterprise Technology Research roundtable.

case study how uber uses big data

Q9. Uday, how did Uber change the engine in midflight going from the previous architecture without disrupting its business? 

Designing a [pure] greenfield system is one thing, but moving from whatever you have to [a new] system is 10X harder. The hardest engineering challenges that we had to solve for was how we go from A to B without impacting any user. We don’t have the luxury to do a downtime where, “Hey, you know, we’re going to shut off Uber for an hour and then let’s do this migration behind the scenes.” We went through how the previous system was using Cassandra with some in-memory queue, and then the new system is strongly consistent. How do you go from, the core database guarantees are different, the application APIs are different, so what we had to build was a proxy layer that for any user request, we have a backward compatibility, so then we shadow what is going to the old system and new system. But then because the properties of what transaction gets committed in old and new are also different, it’s extremely hard to even shadow and get the right metrics for us to get the confidence. Ultimately, that is the shadowing part. And then what we did was we tagged a particular driver and a particular order that gets created, whether it’s created in the old system or new system, and then we gradually migrate all of the drivers and orders from old to new. So there would be at a point in time you might be seeing that marketplace is kind of split where half of those orders and earners are in the old, half of them are in the new, and then once all of the orders are moved, we switch over the state of remaining earners from old to new. So one, we had to do a lot of unique challenges on shadowing, and two, we had to do a lot of unique tricks to make sure that we give the perception of there is no downtime and then move that state without losing any context, without losing any jobs in flight and so on. And then if there is a driver who’s currently completing a trip in the old stack, we let that complete and the moment they’re done with that trip, we switch them to the new stack so that their state is not transferred midway through a trip. And so once you create new trips and new earners through new and then switch them after they complete the trip, we have a safe point to migrate. This is similar to 10 years ago, I was at VMware and we used to work on how do you do vMotion — virtual machine migration, from one host to another host — so this was kind of like that challenge. What is the point at which you can split, you can move the state without having any application impact? So those are the tricks that we had to do.

Watch this clip of Uday explaining how Uber transitioned from its original system to the new architecture with zero user disruption.

Q10. How is Uber planning to manage a future where the ecosystem could be 10 or 100 times larger with more data than currently? And have you considered scenarios where the centralized database might not be the core of your data management strategy? In other words, when does this architecture run out of gas? 

That’s where the tradeoffs come in. We need to be really careful about not putting so much data in the core system that manages these entities and these relationships and overwhelm it with so much data that we’ll end up hitting scale bottlenecks. For example, the fare item that you see both on the rider app or on the driver app, that item is made up of hundreds of line items with different business rules, specific different geos, different localities, different tax items. We don’t store all of that in the core object. But one attribute for a fare that we can leverage is a fare only changes if the core properties of rider’s requirements change. So every time you change your dropoff, then we regenerate the fare. So I have one fare UID. Every time we regenerate, we create a new version of that fare and store these two UIDs along with the my core order object so that I can store in a completely different system my fare UID, fare version, and all of the data with all of the line items, all of the context that we use to generate that line items. Because what we need to save transactionally is the fare version UID. When we save the order, we don’t need to save all of the fare attributes along with that. So these are some design choices that we do to make sure that, you know, we limit the amount of data that we store for these entities. In some cases we might store the data, in some cases we might version the data and then store along with that. In some cases, if it is OK to tolerate that data and it doesn’t need to be coherent with the core orders and jobs, it can be saved in a completely different online storage. And then we have at the presentation layer where we generate the UI screen. There, we can enrich this data and then generate the screen that we need. So all of this will make sure that we limit the scale of growth of the core transactional system and then we leverage other systems that are more suited for the specific needs of those data attributes. But still all of them tie into the order object and then there’s an association that we maintain. And the second question on how do we make sure we don’t run out of gas? You know, we kind of went through already. Like one, obviously we are doing our own scale testing, our own projected testing to make sure that we are constantly ahead of our growth and make sure the system can scale. And then we are also very diligent about looking at the properties of the data, choosing the right technology so that we limit the amount of data that we store for that system and then use specific kind of systems that are catered to those use cases. Like for example, like all of our matching system, if it wants to query all of the nearby jobs and nearby supplies, we don’t go to the transactional system to query that. We have our own inbuilt search platform where we are doing real-time ingestion of all of this data using CDC and so then we have all kinds of anchors so that we can do real-time, on-the-fly generation of all of the jobs because the more context you have, the better marketplace optimization we can make and that can give you the kind of efficiency at scale, right? Otherwise, we’ll make imperfect decisions, which will hurt the overall marketplace efficiency.

Watch this series of clips where Uday addresses the question of what happens when Uber’s volume increases by 10 times or 100 times and if or when its current architecture runs out of gas.

Q11. Uday, how do you envision the future of commercial tools in the next three to five years that could potentially simplify the process for mainstream companies, without the extensive technical resources like Uber’s, to build similar applications? Specifically, do you foresee a future where these companies can manage hundreds of thousands of digital twins using more off-the-shelf technology? Is this a plausible scenario in the midterm future?

I think the whole landscape around developer tools, applications, it’s a rapid evolving space. What’s possible now was not possible five years ago. And it’s constantly changing. But what we see is we need to provide value at upper layers of the stack, right? And then if there is some solution that can provide something off-the-shelf, we move to that so then we can focus up the layer. Like it’s not just building, taking off-the-shelf or past solutions. Just taking the sheer complexity of representing configuration, representing the geodiversity around the world, and then building something that can work for any use case in any country adhering to those specific local rules, that is what I see as the core strength of Uber. We can manage any kind of payment disbursements or payments in the world. We have the largest support for many payments, like any payment method in the world for earners who are disbursing billions of payouts to whatever bank account and whatever payment method they need money in. We have a risk system that can handle nuanced use cases around risk and fraud. Our system around fulfillment that’s managing this, our system around maps that is managing all of the ground tolls, surcharges, navigation, all of that, so we have probably one of the largest global map stacks where we manage our own navigation and leveraging some data from external providers. So this is the core IP and core business strength of Uber and that is what is allowing us to do many verticals, but again, the systems that I can use to build this that over time, absolutely I see it makes it easier for many companies to leverage this. Maybe 15 years ago we didn’t have Spanner, so it was much harder to build this. Now with Spanner or with similar new off-the-shelf databases, it solves one part of the challenge, but then other challenges will emerge.

Watch this clip where Uday gives his thoughts on the future of data apps and the feasibility that more mainstream companies without Uber’s resources will be able to take advantage of off-the-shelf software to develop real-time versions of their businesses.

Final thoughts

Uber has had an immense impact, not just in software development, but also in transforming the way business is conducted, similar to how Amazon.com pioneered managing its internal processes. By orchestrating people, places and things on an internal platform, Uber has done something similar but on a more significant scale. It has done this for an external ecosystem, making it accessible to consumers in real time.

However, a major question remains: How will the industry develop technology that allows mainstream companies to start building their own platforms to manage their ecosystems, akin to Uber’s model? We believe Uber points the way to the future of real-time, data-driven applications, but the exact path the industry takes is an intriguing question for the future.

Keep in touch

Many thanks to Uday Kiran Medisetty for his collaboration on this research. Thanks to Alex Myerson and Ken Shifman on production, podcasts and media workflows for Breaking Analysis. Special thanks to Kristen Martin and Cheryl Knight, who help us keep our community informed and get the word out, and to Rob Hof, our editor in chief at SiliconANGLE.

Remember we publish each week on  Wikibon  and  SiliconANGLE . These episodes are all available as  podcasts wherever you listen .

Email  [email protected] , DM  @dvellante on Twitter  and comment on  our LinkedIn posts .

Also, check out this  ETR Tutorial we created , which explains the spending methodology in more detail. Note:  ETR  is a separate company from Wikibon and SiliconANGLE .  If you would like to cite or republish any of the company’s data, or inquire about its services, please contact ETR at [email protected].

Here’s the full video analysis:

All statements made regarding companies or securities are strictly beliefs, points of view and opinions held by SiliconANGLE Media, Enterprise Technology Research, other guests on theCUBE and guest writers. Such statements are not recommendations by these individuals to buy, sell or hold any security. The content presented does not constitute investment advice and should not be used as the basis for any investment decision. You and only you are responsible for your investment decisions.

Disclosure: Many of the companies cited in Breaking Analysis are sponsors of theCUBE and/or clients of Wikibon. None of these firms or other companies have any editorial control over or advanced viewing of what’s published in Breaking Analysis.

Image:  ifeelstock/Adobe Stock

A message from john furrier, co-founder of siliconangle:, your vote of support is important to us and it helps us keep the content free., one click below supports our mission to provide free, deep, and relevant content.  , join our community on youtube, join the community that includes more than 15,000 #cubealumni experts, including amazon.com ceo andy jassy, dell technologies founder and ceo michael dell, intel ceo pat gelsinger, and many more luminaries and experts..

Like Free Content? Subscribe to follow.

LATEST STORIES

case study how uber uses big data

A close look at JPMorgan's aggressive cloud migration

case study how uber uses big data

Why Databricks vs. Snowflake is not a zero-sum game

case study how uber uses big data

X’s new AI training opt-out setting draws regulatory scrutiny

case study how uber uses big data

Cyber insurance provider Cowbell reels in $60M to grow its product portfolio

case study how uber uses big data

New AI models flood the market even as AI takes fire from regulators, actors and researchers

case study how uber uses big data

US grand jury indicts North Korean hacker for role in Andariel cyberattacks

CLOUD - BY PAUL GILLIN . 2 DAYS AGO

BIG DATA - BY GUEST AUTHOR . 2 DAYS AGO

AI - BY MARIA DEUTSCHER . 2 DAYS AGO

SECURITY - BY MARIA DEUTSCHER . 2 DAYS AGO

AI - BY ROBERT HOF . 3 DAYS AGO

SECURITY - BY MARIA DEUTSCHER . 3 DAYS AGO

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Share Podcast

Cold Call podcast series

Uber’s Strategy for Global Success

How can Uber adapt its business model to compete in unique global markets?

  • Apple Podcasts

As Uber entered unique regional markets around the world – from New York to Shanghai, it has adapted its business model to comply with regulations and compete locally. As the transportation landscape evolves, how can Uber adapt its business model to stay competitive in the long term?

Harvard Business School assistant professor Alexander MacKay describes Uber’s global market strategy and responses by regulators and local competitors in his case, “ Uber: Competing Globally .”

HBR Presents is a network of podcasts curated by HBR editors, bringing you the best business ideas from the leading minds in management. The views and opinions expressed are solely those of the authors and do not necessarily reflect the official policy or position of Harvard Business Review or its affiliates.

BRIAN KENNY: The theory of disruptive innovation was first coined by Harvard Business School professor Clayton Christensen in his 1997 book, The Innovator’s Dilemma . The theory explains the phenomenon by which an innovation transforms an existing market or sector by introducing simplicity, convenience, and affordability where complication and high cost are the status quo. Think Netflix disrupting the video rental space. Over the years, the term has been applied liberally and not always correctly to other examples, but every so often, an idea comes along that really fits the bill. Enter Uber, the ridesharing behemoth that turned the car service industry on its head. In a few short years after launching in 2010, Uber became the largest car service in the world, as measured in ride count. Last year, Uber drove 6.2 billion riders. Today’s case takes us to London in 2019, where Uber is facing the latest in a long list of challenges from regulators threatening their ability to continue operating in that important market. In this episode of Cold Call , we welcome Alexander MacKay to discuss the case entitled, “Uber: Competing Globally.” I’m your host, Brian Kenny, and you’re listening to Cold Call on the HBR Presents network.

Alexander MacKay is in the strategy unit at Harvard Business School. His research focuses on matters of competition, including pricing, demand, and market structure. Alex, thanks for joining us on Cold Call today.

ALEX MACKAY: Thank you, Brian. Very happy to be here.

BRIAN KENNY: The idea of Uber seems so simple, but it was revolutionary in so many ways. And Uber has been in the headlines many times for both good and bad reasons in its decade of existence. So we’re going to touch on a lot of those things today. So thanks for sharing the case with us.

ALEX MACKAY: Brian, I’m very happy to. It’s a little funny, we’ve actually started to see the first few students who have never hailed a traditional taxi in our classrooms. So I think increasingly, the contrast between the two is going to be pretty difficult for people to fully understand.

BRIAN KENNY: Let me ask you to start by telling us what your cold call would be when you set up the class here.

ALEX MACKAY: The case starts off with the current legal battle going on in London. And so the first question I just ask to start the classroom is: What’s the end game for Uber in London? What do they look like 10 years from now? In the midst of this ongoing legal battle, there has been back and forth, some give and take from both sides, Transportation for London, and also on the Uber side as well. And there’s actually a recent court case that has allowed Uber to have a little more time to operate. They bought about 18 more months of time, but this has been also brought with additional, stricter scrutiny, and 18 months from now, they’re going to be at it again trying to figure out exactly what rules Uber’s allowed to operate under.

BRIAN KENNY: It seems like 18 months in the lifetime of Uber is like a decade. Everything seems to happen so quickly for this company. That’s a long period of time. What made you decide to write this case? How does it relate to the work that you’re doing in your research?

ALEX MACKAY: A big focus of my research is on competition policy, particularly the realms of antitrust and regulation. And here we have a company, Uber, whose relationship with regulation has been really essential to its strategy from day one. And I think appreciating the effects of regulation and how its impact Uber’s performance in different markets, is really critical for understanding strategy and global strategy broadly.

BRIAN KENNY:  Let’s just talk a little bit about Uber. I think people are familiar with it, but they may not be familiar with just how large they are in this space. And the space that they’ve sort of created has also blown up and expanded in many ways. So how big is Uber? Like what’s the landscape of ridesharing look like and where does Uber sit in that landscape?

ALEX MACKAY: Uber globally is the biggest ridesharing company. In 2018, they had over $10 billion in revenue for both ridesharing and their Uber Eats platform. And you mentioned in the introduction, that they had over 6 billion rides in 2019. That’s greater than 15 million rides every day that’s happening on their platform. So really, just an enormous company.

BRIAN KENNY: So they started back in 2010. It’s been kind of an amazing decade of growth for them. How do you explain that kind of rapid expansion?

ALEX MACKAY: They were financed early on with some angel investors. I think Kalanick’s background really helped there to get some early funding. But one of the critical things that allowed them to expand early into many markets that helped their growth was they’re a relatively asset light company. On the ground, they certainly need sales teams, they need translation work to move into different markets, but because the main asset they were providing in these different markets was software, and drivers were bringing their own cars and riders were bringing their own phones, the key pieces of hardware that you need to operate this market, they really didn’t have to invest a ton of capital. In fact, when they launched in Paris, they launched as sort of a prototype, just to show, “Hey, we can do this in Paris without too much difficulty,” as their first international market. So being able to really scale it across different markets really allowed them to grow. I think by 2015, their market cap was $60 billion, five years after founding, which is just an incredible rate of growth.

BRIAN KENNY: So they’re the biggest car service in the world, but they don’t own any cars. Like what business are they really in, I guess is the question?

ALEX MACKAY: They’re certainly in the business of matching riders to drivers. They’ve been able to do this in a way that doesn’t require them to own cars, just through the use of technology. And so what they’re doing, and this is I think pretty well understood, is that they’re using existing capital, people who have cars that may be going unused, personal cars, and Uber is able to use that and deploy that to give riding services to different customers. Whereas in the traditional taxi model, you could have taxis that you didn’t necessarily own, but you leased them or you rented them, but they had the express purpose of being driven for taxi services. And so it wasn’t using idle capital. You kind of had to create additional capital in order to provide the services.

BRIAN KENNY: So you mentioned Travis Kalanick a little bit earlier, but he was one of the co-founders of the company, and the case goes a little bit into his philosophy of what expansion into new markets should look like. Can you talk a little bit about that?

ALEX MACKAY: Certainly. Yeah. And I think it might even be helpful to talk a bit about his background, which I think provides a little more context before Uber. He dropped out of UCLA to work on his first company, Scour, and that was a peer-to-peer file sharing service, a lot like Napster, and actually predated Napster. And where he was operating was sort of an evolving legal gray area. Eventually, Scour got sued for $250 billion by a collection of entertainment companies and had to file for bankruptcy.

BRIAN KENNY: Wow.

ALEX MACKAY: He followed that up with his next venture, Red Swoosh, and that was software aimed at allowing users to share network bandwidth. So again, it was a little bit ahead of its time, making use of recent advances in technology. Early on though, they got in trouble with the IRS. They weren’t withholding taxes, and there were some other issues with his co-founder, and there was sort of a bad breakup between the two. Despite this, he persevered and ended up selling the company for $23 million in 2007. And after that, his next big thing was Uber. So one thing I just want to point out is that at all three of these companies, he was looking to do something that leveraged new technology to change the world. And by nature, sometimes businesses like that operate in a legal gray area and you have very difficult decisions to make. Some other decisions you have to make are clearly unethical and there’s really no reason to make some of those decisions, like with the taxes and with some other things that came out later on at Uber, but certainly one of the things that any founder who’s looking to change the world with a big new technology company has to deal with, is that often, the legal framework and the regulatory framework around what you’re trying to do isn’t well established.

BRIAN KENNY: Obviously drama seems to follow Travis where he goes. And his expansion strategy was pretty aggressive. It was almost like a warlike mentality in terms of going into a new market. And you could sort of sum it up as saying ask forgiveness. Is that fair?

ALEX MACKAY: Yeah. Yeah. Ask for forgiveness, not permission. I think they were really focused on winning. I think that was sort of their ultimate goal. We describe in the case there’s this policy of principle confrontation, to ignore existing regulations until you receive pushback. And then when you do receive pushback, either from local regulators or existing sort of taxicab drivers, mobilize a response to sort of confront that. During their beta launch in 2010, they received a cease-and-desist letter from the city of San Francisco. And they essentially just ignored this letter. They rebranded, they used to be UberCab, and they just took “Cab” out of their name, so now they’re Uber. And you can see their perspective in their press release in response to this. They say, “UberCab is a first to market cutting edge transportation technology, and it must be recognized that the regulations from both city and state regulatory bodies have not been written with these innovations in mind. As such, we are happy to help educate the regulatory bodies on this new generation of technology and work closely with both agencies to ensure compliance.”

BRIAN KENNY: It’s a little arrogant.

ALEX MACKAY: Yeah, so you can see right there, they’re saying, what we’re operating in is sort of this new technology-based realm and the regulators don’t really understand what’s going on. And so instead of complying with the existing regulations, we’re going to try to push regulations to fit what we’re trying to do.

BRIAN KENNY: The case is pretty epic in terms of it sort of cuts a sweeping arc across the world, looking at the challenges that they faced with each market they entered, and none more interesting I think the New York City, which is obviously an enormous market. Can you talk a little bit about some of the challenges they faced going into New York with the cab industry being as prevalent as it was and is?

ALEX MACKAY: Yeah, absolutely. I mean, I think it’s pretty well known for people who are familiar with New York that there were restrictions on the number of medallions which allowed taxis to operate. So there was a limited number of taxis that could drive around New York City. This restriction had really driven up the value of these medallions to the taxi owners. And if you had the experience of taking taxis in New York City prior to the advent of Uber, what you’d find is that there were some areas where the service was very, very good. Downtown, Midtown Manhattan, you could almost always find a taxi, but there are other parts of the city where it was very difficult at times to find a cab. And when you got in a cab, you weren’t sure that you were always going to be given a fair ride. And so Uber coming in and providing this technology that allowed you to pick up a ride from anywhere and sort of track the route as you’re going on really disrupted this market. Consumers love them. They had a thousand apps signups before they even launched. Kalanick mentioned this in terms of their launch strategy, we have to go here because the consumers really want us here. But immediately, they started getting pushback from the taxicab owners who were threatened by this new mode of transportation. They argued that they should be under the same regulations that the taxis were. And there were a lot of local government officials that were sort of mobilized against Uber as well. De Blasio, the Mayor of New York, wrote opinion articles against Uber, claiming that they were contributing to congestion. There was a lot of concern that maybe they had some safety issues, and the taxi drivers and the owners brought a lawsuit against Uber for evading these regulations. And then later on, and this was the case in many local governments, de Blasio introduced a bill to put additional restrictions on Uber that would make them look a lot more like a traditional taxi operating model, with limited number of licenses and strict requirements for reporting.

BRIAN KENNY: And this is the same scenario that’s going to play out almost with every city that they go into because there is such an established infrastructure for the taxi industry in those places. They have lobbyists. They’re tied into the political networks. In some instances, it was revealed that they’ve been connected with organized crime. So not for the faint of heart, right, trying to expand into some of the biggest cities in the United States.

ALEX MACKAY: Absolutely. Absolutely. And what’s sort of fascinating about the United States is it’s actually a place where a company can engage in this battle over regulation on the ground. And de Blasio writes his opinion article and pushes forward this bill. Uber responds by taking out an ad campaign, over $3 million, opposing these regulations and calling out de Blasio. So again, we sort of have this fascinating example of Uber mobilizing their own lobbyists, their lawyers, but also public advertising to sort of convince the residents of New York City that de Blasio and the regulators that are trying to come down on them are in the wrong.

BRIAN KENNY: Yeah. And at the end of the day, it’s consumers that they’re really making this appeal to, because I guess my question is, are these regulations stifling innovation? And if they are, who pays the ultimate price for that, Uber or the consumer?

ALEX MACKAY: Consumers definitely loved Uber. And I don’t think any of the regulators were trying to stifle innovation. I don’t think they would say that. I think their biggest concern, their primary concern was safety, and a secondary and related concern here was losing regulatory oversight over the transportation sector. So this is a public service that had been fairly tightly regulated for a long time, and there was some concern that what happens when this just becomes almost a free market sector. At the same time, these regulators have the lobbyists from the taxicab industry and other interested parties in their ear trying to convince them that Uber really is like a taxi company and should be regulated, and really emphasizing the safety concerns and other concerns to try to get stricter regulations put on Uber. And part of that may be valid. I think you certainly should be concerned about safety and there are real concerns there, but part of it is simply the strategic game that rivals are going to play between each other. And the taxicab industry sees Uber as a threat. It’s in their best interest to lobby the regulators to come down on Uber.

BRIAN KENNY: And what’s amazing to me is that while all this is playing out, they’re not turning their tails and running. They’re continuing to push forward and expand into other parts of the world. So can you talk a little bit about what it was like trying to go into countries in Latin America, countries in Asia, where the regulations and the regulatory infrastructure is quite different than it is in the US?

ALEX MACKAY: In the case, we have anecdotes, vignettes, one for each continent. And their experience in each continent was actually pretty different. Even within a continent, you’re going to have very different regulatory frameworks for each country. So we sort of pick a few and focus on a few, just to highlight how the experience is very different in different countries. And one thing that’s sort of interesting, in Latin America, we focus on Bogota in Colombia, and what’s sort of interesting there is they launched secretly and they were pretty early on considered to be illegal, but they continue to operate despite the official policy of being illegal in Colombia. And they were able to do that in a way that you may not be able to do it so easily in the United States, just because of the different layers of enforcement and policy considerations that are present in Colombia and not necessarily in the United States. Now, when I talk about the current state of Uber in different countries, this is continually evolving. So they temporarily suspended their operations early in 2020 in Columbia. Now they’re back. This is a continual back and forth game that they’re playing with the regulators in different markets.

BRIAN KENNY: And in a place like Colombia, are they not worried about violence and the potential for violence against their drivers?

ALEX MACKAY: Absolutely. So this is true sort of around the world. I think in certain countries, violence becomes a little bit more of a concern. And what they found in Colombia is they did have more incidents where taxi drivers decided to take things into their own hands and threaten Uber drivers and Uber riders, sometimes with weapons. Another decision Uber had to make that was related to that was whether or not to allow riders to pay in cash. Because in the United States, they’d exclusively used credit cards, but in Latin America and some other countries like India, consumers tended to prefer to use cash to pay, and allowing that sort of opened up this additional risk that Uber didn’t really have a great system in place to protect them from. Because when you go to cash, you’re not able to track every rider quite as easily, and there’s just a bigger chance for fraud or for robbery and that sort of thing popping up.

BRIAN KENNY: Going into Asia was also quite a challenge for them. Can you talk a little bit about some of the challenges they faced, particularly in China?

ALEX MACKAY: They had very different experiences in each country in Asia. China was a unique case that is very fascinating, because when Uber launched there, there were already existing technology-based, you might call them, rideshare companies, that were fairly prominent, Didi and Kuaidi, And these companies later merged to be one company, DiDi, which is huge. It’s on par with Uber in terms of its global presence as a ridesharing company. When Uber launched there, they didn’t fully anticipate all the changes they would have to make to going into a very different environment. In China, besides having established competitors, Google Maps didn’t work, and they sort of relied on that mapping software to do their location services. So they had to completely redo their location services. They also, again, relied on credit cards for payments, and in China, consumers increasingly used apps to do their payments. And this became a little bit of a challenge because the main app that Chinese customers used, they used WeChat and Alipay primarily, they were actually owned by parent companies of the rival ridesharing company. So Uber had to essentially negotiate with its rivals in order to have consumers pay for their ridesharing services. And so here are a few sort of localization issues that you could argue Uber didn’t fully anticipate when they launched. The other thing about competing in China that’s sort of interesting is that Chinese policy regarding competition is very different from policy in the United States and much of Europe. For the most part, there’s not the traditional antitrust view of protecting the consumers first and foremost. That certainly comes into play, but the Chinese government has other objectives, including promoting domestic firms. And so if you think about launching into a company where there’s a large established domestic rival that certainly increases the difficulty of success, because when push comes to shove, the government is likely to come down on the side of your rival, which is the domestic company, and not the foreign entrant.

BRIAN KENNY: Yeah, which is understandable, I guess, to some extent. This sounds exhausting, to be sort of fighting skirmishes on all these fronts in all these different places in the world. How does that affect the morale or tear at the fabric maybe of the culture at a company like Uber, where they’re trying to manage this on a global scale and running into challenges every step of the way?

ALEX MACKAY: It certainly has an effect. I think Uber did a very good job at recruiting teams of people who really wanted to win. And so, if that’s the consistent message you’re sending to your teams, then these challenges may be actually considered somewhat exciting. And so I think by bringing in that sort of person, I think they actually fueled this desire to win in these markets and really kept the momentum going. One of the downsides of this of course is that if you exclusively focus on winning and getting around the existing regulations, there does become this challenge of what’s ethical and what’s not ethical? And in certain business areas, there actually often is a little bit of a gray line. I mean, you can see this outside of ridesharing. It’s a much broader thing to think about, but regulation of pharmaceuticals, regulation of use of new technologies such as drones, often the technology outpaces the regulation by a little bit and there’s this lag in trying to figure out what actually is the right thing to do. I think it’s a fair question whether or not you can disentangle this sort of principle of confrontation that’s so pervasive throughout the company culture when it comes to regulation from this principle confrontation of other ethical issues that are not necessarily business driven, and whether or not it’s easy to maintain that separation. And I think that’s a fair question, certainly worthy for debate. But what I think is important is you can set up a company where you are abiding by ethical issues that are very clear, but you’re still going to face challenges on the legal side when you’re developing a new business in an area with new technology.

BRIAN KENNY: That’s a great insight. I mean, I found myself asking myself as I got through the case, I can’t tell if Uber is the victim or the aggressor in all of this. And I guess the answer is they’re a little bit of both.

ALEX MACKAY: Yeah. I think it’s fair to characterize them as an aggressor, and I think you sort of need to be if you want to succeed and if you want to change the world in a new technology area. In some sense, they’re a victim in that we’re all the victim as consumers and as firms of regulations that are sometimes difficult to adapt in real time to changing market conditions. And there’s a good reason why they are sticky over time, but sometimes that can be very costly. Going back to something we talked about earlier, I think there are hardly any consumers that wanted Uber kicked out of New York City. I think everyone realized this was just so much superior to any other option they had, that they were really willing to fight to keep Uber around in the limited ways they could.

BRIAN KENNY: So let’s go back to the central issue in the case then, which is, how important is it to them, in terms of their global strategy, to have a presence in a place like London? They’re still not profitable by the way, we should point that out, that despite the fact that they are the largest in the space, they haven’t turned the corner to profitability yet. I would imagine London’s kind of important.

ALEX MACKAY: Absolutely. London is a key international city, and a presence there is important for Uber’s overall brand. So many people travel through London, and it’s a real benefit for anyone who travels to be able to use the same service at any city you stop in. At the same time, they’re facing these increasing regulatory pressures from London, and so it’s a real question whether or not, 10 years from now, they look substantially different from the established taxi industry that’s there. And you can kind of see this battle playing out across different markets. As another example, in Ghana. When they entered there, they actually entered with a framework for understanding. They helped build the regulations for ridesharing services in Ghana when they entered. But over time, that evolved to additional restrictions as the existing taxi companies pushed back on them. So I think a key lesson here in all of this is that the regulations that you see at any given point in time aren’t absolutely fixed, for anyone starting a technology-based company, there will be regulations that do get created that affect your business. Stepping outside of transportation, we can see that going on now with the big tech firms and sort of the antitrust investigations they’re are under. And the policymakers in the US and Europe are really trying to evolve the set of regulations to reflect the different businesses that Apple, Facebook, Microsoft, Google are involved in.

BRIAN KENNY: One thing we haven’t touched on, and it’s not touched on in the case obviously because it just sort of started fairly recently, is the pandemic and the implications of the pandemic for the rideshare industry as fewer people find themselves in need of going anywhere. Have you given any thought to that and whether that’s going to have any effect on the regulations?

ALEX MACKAY: It certainly could. Uber is in a somewhat fortunate position, at least if you judge by their market capitalization, with respect to the pandemic. Initially their stocks took a pretty big hit, but rebounded pretty quickly, and part of this is because the primary part of their business is the transportation through Uber X, but they do also offer the delivery services through Uber Eats, and that business has really picked up during this pandemic. There’s certainly a mix of views about the future, but I think most people do believe that at some point we’ll get back to business as usual, at least for Uber services, when we come up with a vaccine. I think most people anticipate that they’ll be resuming use of Uber once it becomes safe to do so. And I think, to be frank, a lot of people already have resumed using Uber, especially people who don’t have cars or who see it as a valuable alternative or a safer alternative to public transit.

BRIAN KENNY: Yeah, that’s a really good point. And the Uber Eats thing is interesting as another example of how it’s important for businesses to re-imagine the business that they’re in because that, in many ways, may be helping them through a really tough patch here. This has been a really interesting conversation, Alex, I want to ask you one final question, which is, as the students are packing up to leave class, what’s the one thing you want them to take away from the case?

ALEX MACKAY: So I would hope the students take away the importance of regulation in business strategy. And I think the case of Uber really highlights that. And if you look at the conversation around Uber I’d say for the first 10 years of their existence, it was essentially around the superiority of their technology and not so much how they handled regulation. If you think back to the cease-and-desist letter that San Francisco issued in 2010, if Uber had simply stopped operations then, we wouldn’t have the ridesharing world that we have today. So their strategy of principle confrontation with respect to regulation was really essential for their future growth. Again, this does raise important ethical considerations as you’re operating in a legal gray area, but it’s certainly an essential part of strategy.

BRIAN KENNY: Alex, thanks so much for joining us on Cold Call today. It’s been great talking to you.

ALEX MACKAY: Thank you so much, Brian.

BRIAN KENNY: If you enjoy Cold Call, you might like other podcasts on the HBR Presents Network. Whether you’re looking for advice on navigating your career, you want the latest thinking in business and management, or you just want to hear what’s on the minds of Harvard Business School professors, the HBR Presents Network has a podcast for you. Find them on Apple podcasts or wherever you listen. I’m your host, Brian Kenny, and you’ve been listening to Cold Call , an official podcast of Harvard Business School on the HBR Presents Network.

  • Subscribe On:

Latest in this series

This article is about competitive strategy.

  • Global strategy
  • Government policy and regulation

Partner Center

Browser not supported

This probably isn't the experience you were expecting. Internet Explorer isn't supported on Uber.com. Try switching to a different browser to view our site.

Uber logo

Schedule rides in advance

Efficiently Managing the Supply and Demand on Uber’s Big Data Platform

Featured image for Efficiently Managing the Supply and Demand on Uber’s Big Data Platform

With Uber’s business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To reduce operational expenses, we developed a holistic framework with 3 pillars: platform efficiency, supply, and demand (using supply to describe the hardware resources that are made available to run big data storage and compute workload, and demand to describe those workloads).  In this post, we will share our work on managing supply and demand.  For more details about the context of the larger initiative and improvements in platform efficiency, please refer to our earlier posts: Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data , and Cost-Efficient Open Source Big Data Platform at Uber .

Given that the vast majority of Uber’s infrastructure is on-prem, we will start with some of the open technologies that we applied onsite.

Cheap and Big HDDs

While the focus of the storage market has moved from HDD to SSD considerably over the last 5 years, HDD still has a better capacity/IOPS ratio that suits the necessary workload for big data. One of the reasons is that most big data workloads are sequential scans instead of random seeks. Still, the conventional wisdom is that bigger HDDs with less IOPS/TB can negatively affect the performance of big data applications.

Our HDFS clusters at Uber have many thousands of machines, each with dozens of HDDs. At first we believed that the IOPS/TB would be an unavoidable problem, but our investigation showed that it can be mitigated.

Image

The chart above shows the P99 and Average values of 10-min moving average of the HDD IO util% chart across thousands of HDDs from a subset of our clusters. The P99 values are pretty high, hovering around 60% with peaks above 90%. Surprisingly, we found out that the average IO util% is less than 10%!

The current machines at Uber have a mix of 2TB, 4TB, 8TB HDDs with an average disk size around 4TB. If we ever move to 16TB HDDs, the average IO util% may go up to 4x, or less than 40%. That is still OK, but what about P99? P99 will likely stay above 90% which will likely severely slow down the jobs and affect our user experiences. However, if we can load balance among the HDDs, we can potentially bring down the P99 by a lot.

So how do we balance the load on the HDDs? Here are a few ideas in our plan:

  • Proactive Temperature Balancing : It’s possible for us to predict the temperature of each HDFS block pretty well using hints including when it was created, the location of the associated file in the HDFS directory structure, and the historical access pattern. Proactively balancing out the temperature of each HDFS block is the important first step for us to handle IOPS hotspots. This is especially important when the cluster was recently expanded, in which case new, hot blocks will be unevenly added into the new machines until we proactively perform temperature balancing.
  • Read-time Balancing: Most of our files on HDFS are stored with 3 copies. In the traditional HDFS logic, NameNode will choose a random DataNode to read from. The main reasoning is that statistically, the probability of each copy getting accessed is the same. Given that the blocks are already well balanced, then it seems that the workload on each HDD should be the same. Not many people realize that while the expectation is the same, the actual workload on each HDD may have a huge variance, especially in a short time span of 2-5 seconds, which is the time needed to read a full 256MB to 512MB block, assuming 100-200MB/second sequential read speed. In fact, if we have 1000 HDDs and we try to access 1000 blocks randomly at the same time, each block is randomly chosen from the 3 copies among the 1000 nodes, then by expectation, we will leave around 37% of the HDDs without any workload, while the busiest drive might get 5 or more requests! The idea of the read-time balancing algorithm is simple: each DataNode tracks the number of open read/write block requests for each of the HDDs. The client will query all the DataNodes for how busy the HDD containing the block is, and then pick a random one among the least loaded HDDs. This can dramatically reduce contention for the small number of hotspots, and improve the P99 access latency.
  • Write-time Balancing: Similar to read-time balancing, HDFS also has many choices in deciding on which DataNode and HDD to write the next block. Existing policies like round-robin or space-first don’t take into account how busy each HDD is. We propose that the IO load on each HDD should be an additional, important factor to consider as well, to minimize the variance of IO util% on the HDDs.

With these changes, we are ready to onboard the 16TB HDDs onto our HDFS fleet. The cost savings come from not only the reduced cost per TB of the 16TB HDDs, but also, more importantly, the reduced TCO (Total Cost of Ownership) due to a smaller machine fleet size and reduced power needs per TB.

For readers who know HDFS well, you might be wondering: what about HDFS Erasure Code feature in HDFS 3.0? Erasure Code dramatically reduces the number of copies of each block from 3x to around 1.5x (depending on the configuration).  In fact, all of the balancing ideas above still work, although the Read-time Balancing idea becomes a bit more complex, due to the fact that Erasure Code decoding will be necessary if the blocks chosen to read include at least one parity block.

On-Prem HDFS with 3 copies is in general more expensive than the object storage offerings in the cloud for Big Data workloads. However, the move from 2TB, 4TB and 8TB HDDs to 16TB HDDs dramatically reduces the price gap, even without the Erasure Code.

Free CPU and Memory on HDFS Nodes

Thanks to the abundance of network bandwidth among our machines and racks, we moved to a disaggregated architecture where Apache ® Hadoop ® Distributed File System ( HDFS) and Apache Hadoop YARN run on separate racks of machines. That improved the manageability of our hardware fleet, and allowed us to scale storage and compute capacity independently.

However, this architecture also leaves a good amount of free CPU and Memory on the HDFS nodes. The minimal CPU and Memory on one machine is still much larger than what is needed to run HDFS DataNode. As a result, we decided to run YARN nodemanager on those HDFS nodes to utilize the space.

So are we going back to the colocated HDFS and YARN of the old world? Not necessarily. The earlier colocated model was a requirement due to network bandwidth limitations. The new colocated model is an option.  We only use it when there is an opportunity to optimize for cost. In the new colocated model, we still enjoy the independent scaling of Storage and Compute since we still buy a good number of pure Compute racks for extra YARN workloads.

Image

Free Compute Resources without Guarantee

In the earlier post Challenges and Opportunities to Dramatically Reduce the Cost of Uber’s Big Data , we talked about the Disaster Recovery requirements where we have a second region with redundant compute capacity for Big Data. While this compute capacity is not guaranteed (e.g. in case of disaster), we can still leverage that for many use cases, like maintenance jobs that we will describe later in the demand section. Moreover, this Failover Capacity reserved for Disaster Recovery is not the only source of non-guaranteed compute resources. We were able to identify several more significant sources in Online Services:

  • Failover Capacity : Similar to the Big Data Platform, our online Services and Storage platforms also have a significant amount of failover capacity reserved for disaster recovery. Unlike the huge demand for Big Data capacity however, there is not as much elasticity in the demand of Online Services and Storage, so we can re-allocate some of that failover capacity for Big Data
  • Periodic Traffic Patterns : Online Services’ utilization of resources follows the pattern of our platform traffic very well. For example, weekday usage and weekend usage are very different, rush hours and midnights are different, and the annual peak of traffic is on New Year’s Eve. While Uber’s global presence reduces the variance of traffic compared to a single continent, the pattern is still obvious, which implies that we have a lot of compute resources to use when traffic is off-peak.
  • Dynamically free CPUs : In order to keep a reasonably low P99 latency for RPCs, we do not want to run CPUs at over 65% since otherwise some instances of some services may randomly get overloaded and cause significant performance degradations. However, we can have some low-priority processes running in the background on those machines to harvest the free CPU cycles. When the RPC workloads spikes, the low-priority processes will be slowed down or even suspended.

It does take some effort to utilize that compute capacity for our Big Data Workload. First of all, we need to make our job schedulers work together. That’s why we started the YARN-on-Peloton project, where Peloton is Uber’s open-source cluster scheduler. This project provides the mechanism to run YARN workload on the Online Compute machines.

Once we have the mechanism, the next step is to determine what kind of workload can run in those non-guaranteed compute resources. We will discuss that further in the Demand section below.

Machine Accounting

Before we started working on the Cost Efficiency project, we were under the assumption that our focus should be on technical innovations like the 2 projects that we mentioned above. To our surprise, there is a lot of free efficiency to be gained by simply better accounting for the machines used by our Big Data Platform.

Over the years, we have accumulated a large number of deprecated systems, dev and test environments, and POC clusters that are either not used any more or over-provisioned for the current use case. This is probably the case not only for Uber but also for many fast-growing companies, where looking back to assess the machine inventory is typically a low priority.

Working on those one by one will be tedious and time consuming. Instead we designed and developed several centralized dashboards that allow us to break down the machines in our hardware fleet by many dimensions. The dashboards are powered by some common data pipelines.

Image

In this effort, we leveraged several open-source frameworks and cloud technologies to make it happen quickly. The result was impressive. With a set of clear dashboards, we were able to drive a good number of teams to release several thousand machines in total over 6 months.

In the last section, we covered the cost efficiency from the supply side of our Big Data Platform. However, what turned out to be more important for cost efficiency is the demand side.

That’s because many of our Big Data workloads are experimental. Users are in a constant cycle of trying out new experiments and Machine Learning models, and deprecating the old ones. To enable our users to continue to innovate as fast as possible, we need toolings that enable them to manage their demand with minimal effort.

Ownership Attribution

The first step for any demand management is to establish a strong ownership attribution mechanism for all workloads, storage and compute, on our Big Data Platform.

While this seems obvious and straightforward on the high level, it turns out to be a difficult task for several reasons:

  • People change: Employees can transfer to different teams or leave the company. Organizational structures can change over time, and so it’s hard to keep all ownership information up to date.
  • Lineage: The end-to-end data flow can have 10 or more steps, each of which might be owned by a different person or team. In many occasions, the ownership cannot be simply attributed to either Data Producers, Data Processors, or Data Consumers. Data Lineage gives us the end-to-end dependency graph, but we still need domain knowledge to accurately attribute the real ownership.
  • Granular ownership: The same data set might need multiple ownerships for different parts. For example, a product team may want to own the log data for a specific period of time, and they are willing to have it deleted afterwards. However, another team like compliance may request the data sets to be stored for longer. Another example is a general mobile log table where many types of messages, each owned by a different team, are logged. One more example is a wide table with many columns, where different column sets are owned by different teams.

To solve these problems, we leveraged a combination of organizational charts and a home-built, hierarchical ownership tool to keep track of the owner information. We also built an end-to-end data lineage system to track the lineage information. We are right now investigating ways to allow granular ownership to data sets.

Consumption Per Business Event

Once all Big Data workloads have a clear owner, the next step is to project out the demand growth and measure whether it is above our forecast. There are several reasons for demand growth, which we categorize into 2 buckets:

  • Business Event growth: Business Event here refers to Uber trips and UberEats orders. When those grow, we anticipate the big data workload to grow proportionally.
  • Per Business Event growth: This is the demand growth net of the business event growth. Common reasons for this include new features, more experiments, or simply more detailed logging or more data analysis and machine learning models. We set this to a predetermined value based on historical data.

Every month, we check with each ownership group whether their usage is above the forecast. In cases where it is, we ask the owners to dig in and understand the underlying reasons. Owners can often find innovative ways to reduce their usage by leveraging their domain knowledge of their use cases. In some cases, the ownership group is too big, and they would ask us for more detailed breakdowns. That’s why we built a Data Cube for them to have full visibility into different dimensional breakdowns as they want.

Data Cube for Understanding the Costs

Our Data Cube is defined based on sets of dimensions and of measures. Take the HDFS Usage Data Cube as an example, with the following dimensions:

  • HDFS cluster
  • HDFS directory
  • Hive database
  • Hive partition

And the following measures:

  • Number of files
  • Number of bytes
  • Estimated annual cost

We built a slice-and-dice UI as shown below. Users can change the filters, the dimensions of the breakdowns, as well as the measure. This powerful and flexible tool has given users detailed visibility into where the biggest opportunities are for potential cost savings.

Image

In addition to the Usage Data Cubes like above, we also built drill-down UIs for Queries and Pipelines. While the dimensions and measures are completely different, they serve a similar purpose for users to have a deep understanding of their workload.

Image

Dead Datasets, Orphan Pipelines, and Beyond

In addition to the passive visualization and exploration tool above, we also proactively analyze the metadata to “guess” where the cost efficiency opportunities are.

  • Dead Datasets are those datasets that nobody has read or written to in the last 180 days. These sets are often created with valid use cases, like experiments and ad hoc analysis. Owners of these tables usually want to keep them for some time, but very few people would remember to delete them when no longer needed. After all, people usually remember what they need, and forget about what they don’t need! We built a reminder system that automatically creates tasks for people to confirm if they still need the dead datasets every 90 days. In addition, we also added support for Table-Level TTL and Partition-Level TTL that will automatically delete the table or partitions of the table.
  • Orphan Pipelines are ETL pipelines whose output datasets are not read by anyone (except other Orphan Pipelines) in the last 180 days. Similar to Dead Datasets, many Orphan Pipelines can be stopped with owners’ confirmations.

Next Steps and Open Challenges

The work mentioned above allowed us to save over 25% of our Big Data Spend. However, there are still a lot more opportunities. Looking forward, we are doubling down on our Big Data cost efficiency efforts in the following areas:

Cloud resources

While most of Uber’s infrastructure is on-prem, we also have some cloud presence. Cloud resources have several advantages over on-prem:

  • Elasticity: It takes a lot less time for us to get additional capacity in the cloud compared to on-prem. While this may not be as important when we can predict our needs accurately, Uber’s business is affected by many external factors, like COVID-19 in early 2020, and the recovery for the foreseeable future. Elasticity will allow us to reduce the capacity buffer (and thus the cost) without risking too much into supply shortage situations.
  • Flexibility: Cloud provides multiple purchase options like reserved, on-demand, spot/preemptible instances, together with many different instance types with various sizes. This gives us the flexibility to use the cheapest option that fits the workload needs.

Given Uber’s large on-prem presence, it’s not realistic to move to the cloud overnight. However, the distinct advantages of cloud and on-prem resources give us a great opportunity to utilize the strengths of both! In short, why don’t we use on-prem for the base of the workload, and use cloud for the workload above the base, like load spikes and huge ad-hoc workloads? There are open questions related to network bandwidth, latency and cost, but most of our Big Data workload is batch processing which doesn’t require sub-second latency, and network traffic in the cloud is charged in the outbound direction only.

Job Tiering and Opportunistic Computation

In a multi-tenant Big Data Platform, we have many thousands of jobs and tables. Not all of them have the same priority. However, since they are from different owners, it’s hard if not impossible for us to have them agree on the relative priority of everything.

Still, we believe it’s our responsibility to come up with a guideline for tiering of the jobs. Different tiers of jobs will have different priority in getting resources, and they will be charged differently. Combined with the Pricing Mechanism in the Platform Efficiency section, we can work out the optimization algorithm that allows us to maximize customer satisfaction with limited cost budget, or maintain customer satisfaction and reduce the cost.

From Big Data Workloads to Online Workloads

With everything discussed above and in the previous 2 posts, we are confident that we are on the right track to halve our Big Data spend in the next 2-3 years. When we reflect on our thoughts, a lot of them apply to the online workload (Online Services, Online Storages) as well. We are right now actively collaborating with our peers on Online Compute Platform and Online Storage Platform to explore those opportunities. With that, we might be able to dramatically reduce the overall infrastructure spend of Uber!  Please stay tuned for future updates.

This initiative would not be possible without the contributions from over 50 colleagues in and out of Uber’s Data Platform team.  We would like to thank their hard work and collaboration over the last 2 years.

Apache ® , Apache Hadoop ® , Apache Hive, Apache ORC, Apache Oozie, Apache TEZ, Apache Hadoop YARN, Hadoop, Kafka, Hive, YARN, TEZ, Oozie are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Zheng Shao

Zheng Shao is a Distinguished Engineer at Uber. His focus is on big data cost efficiency as well as data infra multi-region and on-prem/cloud architecture. He is also an Apache Hadoop PMC member and an Emeritus Apache Hive PMC member.

Mohammad Islam

Mohammad Islam

Mohammad Islam is a Distinguished Engineer at Uber. He currently works within the Engineering Security organization to enhance the company's security, privacy, and compliance measures. Before his current role, he co-founded Uber’s big data platform. Mohammad is the author of an O'Reilly book on Apache Oozie and serves as a Project Management Committee (PMC) member for Apache Oozie and Tez.

Posted by Zheng Shao, Mohammad Islam

Related articles

Image

Enabling Security for Hadoop Data Lake on Google Cloud Storage

July 30 / Global

Image

Navigating the LLM Landscape: Uber’s Innovation with GenAI Gateway

July 11 / Global

Image

Introduction to Kafka Tiered Storage at Uber

July 1 / Global

Image

How Uber Accomplishes Job Counting  At Scale

May 22 / Global

Image

DataK9: Auto-categorizing an exabyte of data at field level through AI/ML

May 9 / Global

Most popular

Post thumbnail

Debugging with Production Neighbors – Powered by SLATE

Understanding acceptance and cancellation rates.

Post thumbnail

How Uber ensures Apache Cassandra®’s tolerance for single-zone failure

Introducing trips in a row.

Resources for driving and delivering with Uber

Experiences and information for people on the move

Ordering meals for delivery is just the beginning with Uber Eats

Putting stores within reach of a world of customers

Transforming the way companies move and feed their people

Taking shipping logistics in a new direction

Moving care forward together with medical providers

Expanding the reach of public transportation

Explore how Uber employees from around the globe are helping us drive the world forward at work and beyond

Engineering

The technology behind Uber Engineering

Community support

Doing the right thing for cities and communities globally

Uber news and updates in your country

Product, how-to, and policy content—and more

Sign up to drive

Sign up to ride.

DataFlair

  • Big Data Tutorials

5 Big Data Case Studies – How big companies use Big Data

Undoubtedly Big Data has become a big game-changer in most of the modern industries over the last few years. As Big Data continues to pass through our day to day lives, the number of different companies that are adopting Big Data continues to increase.

Let us see how Big Data helped them to perform exponentially in the market with these 6 big data case studies.

Top 5 Big Data Case Studies

Following are the interesting big data case studies –

1. Big Data Case Study – Walmart

Walmart is the largest retailer in the world and the world’s largest company by revenue, with more than 2 million employees and 20000 stores in 28 countries. It started making use of big data analytics much before the word Big Data came into the picture.

Walmart uses Data Mining to discover patterns that can be used to provide product recommendations to the user, based on which products were brought together.

WalMart by applying effective Data Mining has increased its conversion rate of customers. It has been speeding along big data analysis to provide best-in-class e-commerce technologies with a motive to deliver superior customer experience.

The main objective of holding big data at Walmart is to optimize the shopping experience of customers when they are in a Walmart store.

Big data solutions at Walmart are developed with the intent of redesigning global websites and building innovative applications to customize the shopping experience for customers whilst increasing logistics efficiency.

Hadoop and NoSQL technologies are used to provide internal customers with access to real-time data collected from different sources and centralized for effective use.

2. Big Data Case Study – Uber

Uber is the first choice for people around the world when they think of moving people and making deliveries. It uses the personal data of the user to closely monitor which features of the service are mostly used, to analyze usage patterns and to determine where the services should be more focused.

Uber focuses on the supply and demand of the services due to which the prices of the services provided changes. Therefore one of Uber’s biggest uses of data is surge pricing. For instance, if you are running late for an appointment and you book a cab in a crowded place then you must be ready to pay twice the amount.

For example, On New Year’s Eve, the price for driving for one mile can go from 200 to 1000. In the short term, surge pricing affects the rate of demand, while long term use could be the key to retaining or losing customers. Machine learning algorithms are considered to determine where the demand is strong.

3. Big Data Case Study – Netflix

It is the most loved American entertainment company specializing in online on-demand streaming video for its customers.

Netflix has been determined to be able to predict what exactly its customers will enjoy watching with Big Data. As such, Big Data analytics is the fuel that fires the ‘recommendation engine’ designed to serve this purpose. More recently, Netflix started positioning itself as a content creator, not just a distribution method.

Unsurprisingly, this strategy has been firmly driven by data. Netflix’s recommendation engines and new content decisions are fed by data points such as what titles customers watch, how often playback stopped, ratings are given, etc. The company’s data structure includes Hadoop, Hive and Pig with much other traditional business intelligence.

Netflix shows us that knowing exactly what customers want is easy to understand if the companies just don’t go with the assumptions and make decisions based on Big Data.

4. Big Data Case Study – eBay

A big technical challenge for eBay as a data-intensive business to exploit a system that can rapidly analyze and act on data as it arrives (streaming data). There are many rapidly evolving methods to support streaming data analysis.

eBay is working with several tools including Apache Spark , Storm, Kafka. It allows the company’s data analysts to search for information tags that have been associated with the data (metadata) and make it consumable to as many people as possible with the right level of security and permissions (data governance).

The company has been at the forefront of using big data solutions and actively contributes its knowledge back to the open-source community.

5. Big Data Case Study – Procter & Gamble

Procter & Gamble whose products we all use 2-3 times a day is a 179-year-old company. The genius company has recognized the potential of Big Data and put it to use in business units around the globe. P&G has put a strong emphasis on using big data to make better, smarter, real-time business decisions.

The Global Business Services organization has developed tools, systems, and processes to provide managers with direct access to the latest data and advanced analytics. Therefore P&G being the oldest company, still holding a great share in the market despite having many emerging companies.

Big Data predicting the uncertainties

A groundbreaking study in Bangladesh has found that using data from mobile phone networks to track movements of people across the country help predict where outbreaks of diseases such as malaria are likely to occur, enabling health authorities to take preventive measures.

Every year, malaria kills more than 400,000 people globally and most of them are children.

The different type of data, including information provided by the Bangladesh ministry of health, are used to create risk maps indicating the likely locations of malaria outbreaks so the local health authorities can then be warned to take preventative action, including spraying insecticides and stockpiling bed nets and medicines to protect the population from the disease.

With the various technologies it holds, Big Data helps almost every company or sector that aspires to grow. Analyzing large datasets that are associated with the events of the company can give them insights to increase their customer satisfaction.

If you know more such interesting Big Data case studies, share with us through comments.

Keep improving! Big Data has your back 🙂

Your 15 seconds will encourage us to work even harder Please share your happy experience on Google

courses

Tags: Big companies using big data Big Data case study Big Data Walmart case study ebay big data case study Netflix big data case study Procter & Gamble Big Data Case Study Uber big data case study

4 Responses

  • Pingbacks 0

case study how uber uses big data

small undata

case study how uber uses big data

very small data

case study how uber uses big data

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Big Data – Introduction
  • Big Data – History
  • Big Data – Reasons to Learn
  • Big Data – Trends
  • Big Data – Reasons Behind its Hype
  • Big Data – Benefits
  • Big Data – Top Tools
  • Big Data – Application in Banking Sector
  • Big Data – Real Time Applications
  • Big Data – Why it is Popular
  • Big Data – Career Path
  • Big Data – Use Cases
  • Big Data – Apps in Healthcare
  • Big Data – Wildlife Conservation
  • Big Data – Agriculture
  • Big Data – Retail Industry
  • Big Data – Bank Industry
  • Big Data – Media & Entertainment
  • Big Data – Automobile Industry
  • Big Data – Travel and Tourism
  • Big Data – Education Sector
  • Big Data – Telecom Industry
  • Big Data – Top Case Studies
  • Big Data – Cloud Computing
  • Big Data – Lambda Architecture
  • Big Data – Analytics Tools
  • Big Data – Vulnerability
  • Big Data in Income Tax Department
  • Big Data – Careers & Jobs Roles
  • Big Data – Developer Skills
  • Why Choose a Career in Big Data
  • Big Data – Jobs for Freshers
  • Why Switch Career in Big Data
  • Big Data – BI Tools for Visualization
  • Big Data – Salesforce Acquires Tableau
  • Big Data at Flipkart
  • Big Data in Union Budget
  • Big Data vs Data Science
  • Career switch – Mainframe to Big Data
  • Big Data – Quotes
  • Hadoop – Introduction
  • Hadoop – Why
  • Hadoop – Features
  • Hadoop – History
  • Hadoop – Ecosystem
  • Hadoop – Architecture
  • Hadoop – Pros and Cons
  • Hadoop – Analytics Tools
  • Hadoop – Internal Working
  • Hadoop – Commands
  • Hadoop – getmerge Command
  • Hadoop – copyFromLocal Command
  • Hadoop – Cluster
  • Hadoop – High Availability
  • Hadoop – Schedulers
  • Hadoop – Distributed Cache
  • Hadoop – Automatic Failover
  • Hadoop – Hadoop Streaming
  • Hadoop – Hadoop Security
  • Hadoop – Limitations & Solutions
  • Hadoop – Install Hadoop 2 on Ubuntu
  • Hadoop – Install multinode Cluster
  • What’s New in Hadoop 3
  • Hadoop – Install Hadoop 3
  • Hadoop – HBase Compaction & Data Locality
  • Hadoop 2.x vs Hadoop 3.x
  • Hadoop – Best Books
  • Hadoop – Future
  • Hadoop – Career
  • Hadoop – Job Opportunities
  • Hadoop – Job Roles
  • Hadoop – Developer Salary
  • Hadoop – Certifications
  • Hadoop for Data Science
  • Hadoop vs Cassandra
  • Hadoop vs MongoDB
  • Hadoop vs Spark vs Flink
  • Hadoop – Ecosystem Infographic
  • Hadoop Interview Que. – 1
  • Hadoop Interview Que. – 2
  • Hadoop Interview Que. – 3
  • Hadoop Quiz – Part 1
  • Hadoop Quiz – Part 2
  • Hadoop Quiz – Part 3
  • Hadoop Quiz – Part 4
  • Hadoop Quiz – Part 5
  • Hadoop Quiz – Part 6

job-ready courses

Logo of Data Booster

Case Study How Uber employees use 20x more data in decision-making

Uber’s analytics team was flooded with requests from Operations Managers on how they could explore important data sources.

case study how uber uses big data

Although reports and dashboards were available, Operations Managers at Uber knew that the best and fastest decisions could only be made by exploring the data. Uber tried educating the professionals through meetings, coaching and how-to guides, but this was not enough.

Uber needed a solution to make its global workforce data-driven at scale. Through hands-on upskilling on our platform, thousands of Uber employees now use data in their daily work. Operations, Marketing and Product teams use it for planning and decision-making.

Download the case study and learn more about Uber’s journey to data-driven decision making.

Download the Uber Case Study

Uber managed to upskill over 24.000 employees in data-driven decision making. Learn more about how we made this happen.

case study how uber uses big data

  • Get started 

How Data-Driven Companies Succeed: Google, Amazon, and Uber Case Studies

How Data-Driven Companies Succeed: Google, Amazon, and Uber Case Studies

It's no secret that data-driven companies are succeeding. Google, Amazon, and Uber are prime examples of businesses that have used data analytics to improve their performance and better serve their customers. In this blog post, we will take a look at how these three companies use business data and analytics to drive success. We'll explore the different ways they use data to make decisions about everything from product strategy to customer engagement. So, if you're looking for some insights into how to make your business more data-driven, then you've come to the right place!

“Data is the new oil.” — Clive Humby.

Google is always looking for ways to make its employees more successful. They have an initiative called Project Oxygen, in which they analyzed data from 10k performance reviews and compared it with employee retention rates- this allowed them to identify behaviors common among high achieving managers who are also good at keeping people around! As part of these efforts towards bettering the company overall through knowledge accumulation (a key strategy), Google created training programs designed specifically toward developing those competencies identified through analysis—boosting median favorability scores by 11%.

They also discovered key opportunities, such as extending maternity leave, which cut their new mother attrition rates in half.

The power of big data analytics has given Amazon the insight they need to create a more effective sales strategy. By analyzing customer buying habits and supplier interactions, these Data-driven logistics allow them to understand how customers behave when shopping for groceries as well as what products are popular among consumers generally speaking - all giving insights into further changes that can be made in order improve your business's performance or provide better service next time around!

case study how uber uses big data

UBER can analyze historical data and key metrics, such as the number of ride requests and trips getting fulfilled in different parts of the city. With this information, they are then able to have an insight into areas that will likely see high demand for rides over future periods so drivers can be sent there accordingly rather than just circling aimlessly waiting on people who may not even want one at that moment!

The Takeaway

These are just a few examples of how data analytics can be used to improve business performance. As you can see, each company used data in different ways to achieve success. There is no one-size-fits-all approach to becoming a data-driven company. The important thing is to figure out:

  • What type of data will be most helpful for your business?
  • How best to collect and clean that data?
  • How to arrange the data to easily understand it?
  • What insights can be gleaned from this data and how to apply the insights to your business?

case study how uber uses big data

Do you have any examples of data-driven companies that you admire? Let us know in the comments below! And be sure to stay tuned for our next blog post, where we'll be taking a closer look at how you can integrate data into your daily business decisions.

- "How Google Uses Big Data Analytics To Create A Better Workplace." Forbes. Forbes Magazine, 01 Mar. 2016. Web. 17 Apr. 2017.

- "The Many Ways Amazon Uses Big Data Analytics [CASE STUDY]." Forbes. Forbes Magazine, 20 Dec. 2016. Web. 17 Apr. 2017.

- "How to Use Big Data Analytics in Your Small Business." businessintelligencemarket Share How to Use Big Data Analytics in Your Small Business Comments. N.p., 15 Mar. 2016. Web. 17 Apr. 2017.

Christian Pillat

Christian Pillat

Franchisee, multi-time founder with deep experience & companies in Marketing, E-Commerce, and Franchise verticals.

Latest articles

Why aggregating data is difficult (and what to do about it).

The Future of Franchising: How AI Will Help Businesses Make Better Decisions

The Future of Franchising: How AI Will Help Businesses Make Better Decisions

How to Become a Data-Driven Business

How to Become a Data-Driven Business

News release details

Uber announces results for second quarter 2024.

Gross Bookings grew 19% year-over-year and 21% year-over-year on a constant currency basis Income from operations of $796 million; Adjusted EBITDA of $1.6 billion, up 71% year-over-year Operating cash flow of $1.8 billion; Free cash flow of $1.7 billion

SAN FRANCISCO--(BUSINESS WIRE)-- Uber Technologies, Inc. (NYSE: UBER) today announced financial results for the quarter ended June 30, 2024.

“Uber’s growth engine continues to hum, delivering our sixth consecutive quarter of trip growth above 20 percent, alongside record profitability,” said Dara Khosrowshahi, CEO. “The Uber consumer has never been stronger--more people are using the platform, and more frequently, than ever before--while drivers and couriers earned a new all-time high of $17.9 billion over the quarter.”

“Strong topline trends and operating leverage across the P&L demonstrate the durability of our growth and significant cash flow generation underlying our platform,” said Prashanth Mahendra-Rajah, CFO. “We started share repurchases against our inaugural authorization during the quarter as we continue to drive long-term shareholder return.”

Financial Highlights for Second Quarter 2024

  • Gross Bookings grew 19% year-over-year (“YoY”) to $40.0 billion, or 21% on a constant currency basis, with Mobility Gross Bookings of $20.6 billion (+23% YoY or +27% YoY constant currency) and Delivery Gross Bookings of $18.1 billion (+16% YoY or +17% YoY constant currency). Trips during the quarter grew 21% YoY to 2.8 billion, or approximately 30 million trips per day on average.
  • Revenue grew 16% YoY to $10.7 billion, or 17% on a constant currency basis. Combined Mobility and Delivery revenue grew 19% YoY to $9.4 billion, or 20% on a constant currency basis. Business model changes negatively impacted total revenue YoY growth by 7 percentage points.
  • Income from operations was $796 million, up $470 million YoY and $624 million quarter-over-quarter (“QoQ”).
  • Net income attributable to Uber Technologies, Inc. was $1.0 billion, which includes a $333 million benefit (pre-tax) due to net unrealized gains related to the revaluation of Uber’s equity investments.
  • Adjusted EBITDA of $1.6 billion, up 71% YoY. Adjusted EBITDA margin as a percentage of Gross Bookings was 3.9%, up from 2.7% in Q2 2023.
  • Net cash provided by operating activities was $1.8 billion and free cash flow, defined as net cash flows from operating activities less capital expenditures, was $1.7 billion.
  • Share repurchases were $325 million of our common stock under the February 2024 authorization.
  • Unrestricted cash, cash equivalents, and short-term investments were $6.3 billion at the end of the second quarter.

Outlook for Q3 2024

For Q3 2024, we anticipate:

  • Our outlook assumes a roughly 4 percentage point currency headwind to total reported YoY growth, including a roughly 7 percentage point currency headwind to Mobility’s reported YoY growth.
  • For perspective, recent strengthening of the US dollar versus other currencies represents an over $400 million expected headwind to Q3 Gross Bookings and is included in our outlook.
  • Adjusted EBITDA of $1.58 billion to $1.68 billion, which represents 45% to 54% YoY growth.

Financial and Operational Highlights for Second Quarter 2024

 

 

 

 

 

 

 

 

 

 

)

 

 

 

 

 

 

 

 

 

Monthly Active Platform Consumers (“MAPCs”)

 

 

137

 

 

156

 

14

%

 

 

Trips

 

 

2,282

 

 

2,765

 

21

%

 

 

Gross Bookings

 

$

33,601

 

$

39,952

 

19

%

 

21

%

Revenue

 

$

9,230

 

$

10,700

 

16

%

 

17

%

Income from operations

 

$

326

 

$

796

 

144

%

 

 

Net income attributable to Uber Technologies, Inc.

 

$

394

 

$

1,015

 

158

%

 

 

Adjusted EBITDA

 

$

916

 

$

1,570

 

71

%

 

 

Net cash provided by operating activities

 

$

1,190

 

$

1,820

 

53

%

 

 

Free cash flow

 

$

1,140

 

$

1,721

 

51

%

 

 

See “Definitions of Non-GAAP Measures” and “Reconciliations of Non-GAAP Measures” sections herein for an explanation and reconciliations of non-GAAP measures used throughout this release.

Q2 2023 net income includes a $386 million net benefit (pre-tax) from revaluations of Uber’s equity investments. Q2 2024 net income includes a $333 million net benefit (pre-tax) from revaluations of Uber’s equity investments.

Results by Offering and Segment

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Gross Bookings:

 

 

 

 

 

 

 

 

Mobility

 

$

16,728

 

$

20,554

 

23

%

 

27

%

Delivery

 

 

15,595

 

 

18,126

 

16

%

 

17

%

Freight

 

 

1,278

 

 

1,272

 

%

 

(1

)%

 

$

33,601

 

$

39,952

 

19

%

 

21

%

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Revenue:

 

 

 

 

 

 

 

 

Mobility

 

$

4,894

 

$

6,134

 

25

%

 

27

%

Delivery

 

 

3,057

 

 

3,293

 

8

%

 

9

%

Freight

 

 

1,279

 

 

1,273

 

%

 

(1

)%

 

$

9,230

 

$

10,700

 

16

%

 

17

%

Mobility Revenue in Q2 2024 was negatively impacted by business model changes in some countries that classified certain sales and marketing costs as contra revenue by $386 million. These changes negatively impacted Mobility revenue YoY growth by 8 percentage points.

Delivery Revenue in Q2 2023 and Q2 2024 were negatively impacted by business model changes in some countries that classified certain sales and marketing costs as contra revenue by $114 million and $413 million, respectively. These changes negatively impacted Delivery revenue YoY growth by 9 percentage points for Q2 2024.

Total revenue in Q2 2023 and Q2 2024 were negatively impacted by business model changes in some countries that classified certain sales and marketing costs as contra revenue by $114 million and $799 million, respectively. These changes negatively impacted total revenue YoY growth by 7 percentage points for Q2 2024.

 

 

 

 

 

 

 

 

 

 

 

Mobility

 

29.3

%

 

29.8

%

Delivery

 

19.6

%

 

18.2

%

Mobility Revenue Margin in Q2 2024 was negatively impacted by business model changes in some countries that classified certain sales and marketing costs as contra revenue by 190 bps.

Delivery Revenue Margin in Q2 2023 and Q2 2024 was negatively impacted by business model changes that classified certain sales and marketing costs as contra revenue by 70 bps and 230 bps, respectively.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Segment Adjusted EBITDA:

 

 

 

 

 

 

Mobility

 

$

1,170

 

 

$

1,567

 

 

34

%

Delivery

 

 

329

 

 

 

588

 

 

79

%

Freight

 

 

(14

)

 

 

(12

)

 

14

%

Corporate G&A and Platform R&D

 

 

(569

)

 

 

(573

)

 

(1

)%

 

$

916

 

 

$

1,570

 

 

71

%

Includes costs that are not directly attributable to our reportable segments. Corporate G&A also includes certain shared costs such as finance, accounting, tax, human resources, information technology and legal costs. Platform R&D also includes mapping and payment technologies and support and development of the internal technology infrastructure. Our allocation methodology is periodically evaluated and may change.

“Adjusted EBITDA” is a non-GAAP measure as defined by the SEC. See “Definitions of Non-GAAP Measures” and “Reconciliations of Non-GAAP Measures” sections herein for an explanation and reconciliations of non-GAAP measures used throughout this release.

Financial Highlights for the Second Quarter 2024 (continued)

  • Revenue of $6.1 billion: Mobility Revenue grew 25% YoY and 9% QoQ. The YoY increase was primarily attributable to an increase in Mobility Gross Bookings due to an increase in Trip volumes. Mobility Revenue Margin of 29.8% increased 50 bps YoY and decreased 40 bps QoQ. Business model changes negatively impacted Mobility Revenue Margin by 190 bps in Q2 2024.
  • Adjusted EBITDA of $1.6 billion: Mobility Adjusted EBITDA increased 34% YoY, and Mobility Adjusted EBITDA margin was 7.6% of Gross Bookings compared to 7.0% in Q2 2023 and 7.9% in Q1 2024. Mobility Adjusted EBITDA margin improvement YoY was primarily driven by better cost leverage from higher volume.
  • Revenue of $3.3 billion: Delivery Revenue grew 8% YoY and 2% QoQ. The YoY increase was primarily attributable to an increase in Delivery Gross Bookings due to an increase in Trip volumes. Delivery Revenue Margin of 18.2% decreased 140 bps YoY and was flat QoQ. Business model changes negatively impacted Delivery Revenue Margin by 230 bps in Q2 2024.
  • Adjusted EBITDA of $588 million: Delivery Adjusted EBITDA increased 79% YoY, and Delivery Adjusted EBITDA margin was 3.2% of Gross Bookings, compared to 2.1% in Q2 2023 and 3.0% in Q1 2024. Delivery Adjusted EBITDA margin improvement YoY was primarily driven by better cost leverage from higher volumes and increased Advertising revenue.
  • Revenue of $1.3 billion: Freight Revenue was flat YoY and declined 1% QoQ. Revenue was flat YoY driven by an increase in loads, offset by a decrease in revenue per load as a result of the challenging freight market cycle.
  • Adjusted EBITDA loss of $12 million: Freight Adjusted EBITDA increased $2 million YoY. Freight Adjusted EBITDA margin as a percentage of Gross Bookings increased 20 bps YoY to (0.9%).
  • Corporate G&A and Platform R&D: Corporate G&A and Platform R&D expenses of $573 million, compared to $569 million in Q2 2023, and $604 million in Q1 2024. Corporate G&A and Platform R&D as a percentage of Gross Bookings decreased 30 bps YoY and 20 bps QoQ primarily due to certain one-time benefits and improved fixed cost leverage.

GAAP and Non-GAAP Costs and Operating Expenses

  • Cost of revenue excluding D&A: GAAP cost of revenue was $6.5 billion. Non-GAAP cost of revenue was $6.4 billion, representing 16.0% of Gross Bookings, compared to 16.4% in both Q2 2023 and Q1 2024. On a YoY basis, non-GAAP cost of revenue as a percentage of Gross Bookings decreased due to improved cost leverage with Gross Bookings growth outpacing cost of revenue growth.
  • Operations and support: GAAP operations and support was $682 million. Non-GAAP operations and support was $621 million, representing 1.6% of Gross Bookings, compared to 1.8% and 1.6% in Q2 2023 and Q1 2024, respectively. On a YoY basis, non-GAAP operations and support as a percentage of Gross Bookings decreased due to improved fixed cost leverage.
  • Sales and marketing: GAAP sales and marketing was $1.1 billion. Non-GAAP sales and marketing was $1.1 billion, representing 2.7% of Gross Bookings, compared to 3.5% and 2.4% in Q2 2023 and Q1 2024, respectively. On a YoY basis, non-GAAP sales and marketing as a percentage of Gross Bookings decreased due to business model changes in some countries that classified certain sales and marketing costs as contra revenue.
  • Research and development: GAAP research and development was $760 million. Non-GAAP research and development was $483 million, representing 1.2% of Gross Bookings, compared to 1.5% and 1.3% in Q2 2023 and Q1 2024, respectively. On a YoY basis, non-GAAP research and development as a percentage of Gross Bookings decreased due to a decrease in employee headcount costs.
  • General and administrative: GAAP general and administrative was $686 million. Non-GAAP general and administrative was $523 million, representing 1.3% of Gross Bookings, compared to 1.5% in both Q2 2023 and Q1 2024. On a YoY basis, non-GAAP general and administrative as a percentage of Gross Bookings decreased due to improved fixed cost leverage.

Operating Highlights for the Second Quarter 2024

  • Monthly Active Platform Consumers (“MAPCs”) reached 156 million: MAPCs grew 14% YoY to 156 million, driven by continued improvement in consumer activity for both our Mobility and Delivery offerings.
  • Trips of 2.8 billion: Trips on our platform grew 21% YoY, driven by both Mobility and Delivery growth. Monthly trips per MAPC reached an all-time high and grew 6% YoY to 5.9.
  • Supporting earners: Drivers and couriers earned an aggregate $17.9 billion (including tips) during the quarter, with earnings up 19% YoY, or 23% on a constant currency basis.
  • Membership: Launched Uber One, our single cross-platform membership program, in Guatemala, El Salvador and Panama. Uber One is now available across 28 countries. In addition, launched Uber One for Students for $4.99 a month or $48 annually, offering membership benefits and exclusive promos to college students across the US with more countries to come this year.
  • Advertising: Our revenue run-rate from Advertising exceeded $1 billion. Expanded Journey Ads to programmatic buyers, enabling advertisers across our top markets to more easily buy Journey Ads through their preferred demand-side platforms, increasing our addressable market.
  • Maps innovation: Launched and scaled new Uber Maps features to improve and enhance the earner and consumer experience, including launching the ability for earners to provide real-time inputs on road closures, turn restrictions and more; launching street-level imagery for riders to help them navigate to their designated pickup location; and scaling wayfinding at airports to enable step-by-step directions to pickup at our top global airports.
  • Earner and rider safety: Launched safety preferences – a new way for riders to set and forget Uber’s suite of safety features – globally. Expanded the Record My Ride feature and Voice Audio Seatbelt reminders, reminding riders to wear their seatbelt while simultaneously sending them an in-app push notification, broadly across the US and LatAm. In addition, introduced a new in-app verified rider badge in over a dozen US cities.
  • Uber for Business (“U4B”) delegate profiles: Launched U4B delegate profiles, a new tool that gives executive assistants or other non-traveler bookers the capability to request and schedule rides on an executive’s behalf. The product streamlines trip management and offers a ride experience that meets executive standards with top-quality cars and premium service.
  • Low cost Mobility offerings: Launched Scheduled UberX Share, allowing consumers to pre-book their ride to save about 25% vs. UberX. Launched Uber Shuttle in the US, letting consumers reserve up to five seats to or from an airport, concert or sporting event, up to seven days in advance. In addition, announced our license to operate buses via Uber Shuttle in Delhi, India, making Uber the first ride-hailing aggregator to be awarded such a license. Finally, expanded the Uber Shuttle service to more U4B clients globally, including Dell in Brazil and McCormick in Mexico.
  • EMEA expansion: Launched UberX in Hungary and Luxembourg through partnerships with FoTaxi and Webtaxi, respectively. In addition, launched UberX Share and Trains in Spain; UberX Share in the Netherlands; and Transit in Austria.
  • Electrification updates: In July, announced a multi-year strategic partnership with BYD to bring 100,000 new electric vehicles onto the Uber platform across key global markets, with plans to collaborate on future BYD autonomous-capable vehicles to be deployed on the Uber platform. In addition, launched the PowerUp Package in the UK, giving eligible drivers up to £5,000 toward their next vehicle, with discounts of up to £17,000 on select Kia models and £750 in bp Pulse charging credits.
  • Instacart partnership: Partnered with Instacart to power US restaurant delivery within the Instacart app, enabling Instacart customers nationwide to order from hundreds of thousands of restaurants using an Uber Eats interface.
  • Expanded Costco partnership: Expanded our partnership with Costco to additional states and provinces across the US and Canada. Through the partnership, Costco members see additional savings and receive a 20% discount on an annual Uber One membership. Costco is currently available on Uber Eats in the US, Canada, Mexico and Japan.
  • Grocery & Retail merchant selection: Launched partnerships with popular grocery stores and retailers including The Vitamin Shoppe, GNC and Save A Lot in the US; and 7-Eleven in Mexico. Expanded our partnership with Rite Aid to include alcohol delivery for nearly 1,000 locations across eight US states. In addition, strengthened our partnership with Lawson convenience stores in Japan by agreeing to expand quick commerce offerings and integrating their stock management system into the Uber Eats app.
  • Commitment to affordability: Increased merchant-funded offers available on our platform — such as discounts; buy one, get one (“BOGO”) deals; and more — by over 70% YoY on a constant currency basis through improved offer quality and the launch of Happy Hour offers. In addition, launched multi-location ads and budget tooling to improve the enterprise merchant experience of setting up offers across multiple store locations.
  • Merchant growth and discovery features: Revamped the Uber Eats Manager software to provide personalized growth recommendations to merchants, such as running a promotion on a certain dish or adding photos to menu listings. In addition, announced plans to launch a short-form video feed to boost discovery and help restaurants showcase their dishes.
  • Procurement platform expansion: Building upon the success and rapid adoption of Uber Freight Exchange: Contract, expanded the platform to include Uber Freight Exchange: Spot, enabling carriers to use intelligent search, bidding, upfront pricing, and automated tendering for spot freight fulfillment.
  • Aurora partnership expansion: Launched Premier Autonomy with Aurora, an industry first program to democratize access to driverless trucks for carriers of all sizes. In addition, announced that Uber Freight will be one of Aurora's first customers on its Dallas-to-Houston freight route, with driverless hauls for shippers expected at the end of 2024.

Webcast and conference call information

A live audio webcast of our second quarter ended June 30, 2024 earnings release call will be available at https://investor.uber.com/ , along with the earnings press release and slide presentation. The call begins on August 6, 2024 at 5:00 AM (PT) / 8:00 AM (ET). This press release, including the reconciliations of certain non-GAAP measures to their nearest comparable GAAP measures, is also available on that site.

We also provide announcements regarding our financial performance and other matters, including SEC filings, investor events, press and earnings releases, on our investor relations website ( https://investor.uber.com/ ), and our blogs ( https://uber.com/blog ) and Twitter accounts (@uber and @dkhos), as a means of disclosing material information and complying with our disclosure obligations under Regulation FD.

Uber’s mission is to create opportunity through movement. We started in 2010 to solve a simple problem: how do you get access to a ride at the touch of a button? More than 52 billion trips later, we're building products to get people closer to where they want to be. By changing how people, food, and things move through cities, Uber is a platform that opens up the world to new possibilities.

Forward-Looking Statements

This press release contains forward-looking statements regarding our future business expectations which involve risks and uncertainties. Actual results may differ materially from the results predicted, and reported results should not be considered as an indication of future performance. Forward-looking statements include all statements that are not historical facts and can be identified by terms such as “anticipate,” “believe,” “contemplate,” “continue,” “could,” “estimate,” “expect,” “hope,” “intend,” “may,” “might,” “objective,” “ongoing,” “plan,” “potential,” “predict,” “project,” “should,” “target,” “will,” or “would” or similar expressions and the negatives of those terms. Forward-looking statements involve known and unknown risks, uncertainties and other factors that may cause our actual results, performance or achievements to be materially different from any future results, performance or achievements expressed or implied by the forward-looking statements. These risks, uncertainties and other factors relate to, among others: competition, managing our growth and corporate culture, financial performance, investments in new products or offerings, our ability to attract drivers, consumers and other partners to our platform, our brand and reputation and other legal and regulatory developments, particularly with respect to our relationships with drivers and couriers and the impact of the global economy, including rising inflation and interest rates. For additional information on other potential risks and uncertainties that could cause actual results to differ from the results predicted, please see our annual report on Form 10-K for the year ended December 31, 2023 and subsequent quarterly reports and other filings filed with the Securities and Exchange Commission from time to time. All information provided in this release and in the attachments is as of the date of this press release and any forward-looking statements contained herein are based on assumptions that we believe to be reasonable as of this date. Undue reliance should not be placed on the forward-looking statements in this press release, which are based on information available to us on the date hereof. We undertake no duty to update this information unless required by law.

Non-GAAP Financial Measures

To supplement our financial information, which is prepared and presented in accordance with generally accepted accounting principles in the United States of America (“GAAP”), we use the following non-GAAP financial measures: Adjusted EBITDA; Free cash flow; Non-GAAP Costs and Operating Expenses as well as, revenue growth rates in constant currency. The presentation of this financial information is not intended to be considered in isolation or as a substitute for, or superior to, the financial information prepared and presented in accordance with GAAP. We use these non-GAAP financial measures for financial and operational decision-making and as a means to evaluate period-to-period comparisons. We believe that these non-GAAP financial measures provide meaningful supplemental information regarding our performance by excluding certain items that may not be indicative of our recurring core business operating results.

We believe that both management and investors benefit from referring to these non-GAAP financial measures in assessing our performance and when planning, forecasting, and analyzing future periods. These non-GAAP financial measures also facilitate management’s internal comparisons to our historical performance. We believe these non-GAAP financial measures are useful to investors both because (1) they allow for greater transparency with respect to key metrics used by management in its financial and operational decision-making and (2) they are used by our institutional investors and the analyst community to help them analyze the health of our business.

There are a number of limitations related to the use of non-GAAP financial measures. In light of these limitations, we provide specific information regarding the GAAP amounts excluded from these non-GAAP financial measures and evaluating these non-GAAP financial measures together with their relevant financial measures in accordance with GAAP.

For more information on these non-GAAP financial measures, please see the sections titled “Key Terms for Our Key Metrics and Non-GAAP Financial Measures,” “Definitions of Non-GAAP Measures” and “Reconciliations of Non-GAAP Measures” included at the end of this release. In regards to forward looking non-GAAP guidance, we are not able to reconcile the forward-looking non-GAAP Adjusted EBITDA measure to the closest corresponding GAAP measure without unreasonable efforts because we are unable to predict the ultimate outcome of certain significant items. These items include, but are not limited to, significant legal settlements, unrealized gains and losses on equity investments, tax and regulatory reserve changes, restructuring costs and acquisition and financing related impacts.

 

 

 

 

 

 

 

 

 

 

 

 

Cash and cash equivalents

 

$

4,680

 

 

$

4,497

 

Short-term investments

 

 

727

 

 

 

1,795

 

Restricted cash and cash equivalents

 

 

805

 

 

 

776

 

Accounts receivable, net

 

 

3,404

 

 

 

3,783

 

Prepaid expenses and other current assets

 

 

1,681

 

 

 

1,632

 

Total current assets

 

 

11,297

 

 

 

12,483

 

Restricted cash and cash equivalents

 

 

1,519

 

 

 

2,608

 

Restricted investments

 

 

4,779

 

 

 

5,061

 

Investments

 

 

6,101

 

 

 

6,203

 

Equity method investments

 

 

353

 

 

 

342

 

Property and equipment, net

 

 

2,073

 

 

 

2,034

 

Operating lease right-of-use assets

 

 

1,241

 

 

 

1,181

 

Intangible assets, net

 

 

1,425

 

 

 

1,265

 

Goodwill

 

 

8,151

 

 

 

8,083

 

Other assets

 

 

1,760

 

 

 

2,254

 

Total assets

 

$

38,699

 

 

$

41,514

 

 

 

 

 

Accounts payable

 

$

790

 

 

$

752

 

Short-term insurance reserves

 

 

2,016

 

 

 

2,387

 

Operating lease liabilities, current

 

 

190

 

 

 

198

 

Accrued and other current liabilities

 

 

6,458

 

 

 

6,981

 

Total current liabilities

 

 

9,454

 

 

 

10,318

 

Long-term insurance reserves

 

 

4,722

 

 

 

5,733

 

Long-term debt, net of current portion

 

 

9,459

 

 

 

9,454

 

Operating lease liabilities, non-current

 

 

1,550

 

 

 

1,492

 

Other long-term liabilities

 

 

832

 

 

 

734

 

Total liabilities

 

 

26,017

 

 

 

27,731

 

 

 

 

 

 

Redeemable non-controlling interests

 

 

654

 

 

 

631

 

Equity

 

 

 

 

Common stock

 

 

 

 

 

 

Additional paid-in capital

 

 

42,264

 

 

 

43,062

 

Accumulated other comprehensive loss

 

 

(421

)

 

 

(479

)

Accumulated deficit

 

 

(30,594

)

 

 

(30,233

)

Total Uber Technologies, Inc. stockholders' equity

 

 

11,249

 

 

 

12,350

 

Non-redeemable non-controlling interests

 

 

779

 

 

 

802

 

Total equity

 

 

12,028

 

 

 

13,152

 

Total liabilities, redeemable non-controlling interests and equity

 

$

38,699

 

 

$

41,514

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

$

9,230

 

 

$

10,700

 

 

$

18,053

 

 

$

20,831

 

 

 

 

 

 

 

 

 

Cost of revenue, exclusive of depreciation and amortization shown separately below

 

 

5,515

 

 

 

6,488

 

 

 

10,774

 

 

 

12,656

 

Operations and support

 

 

664

 

 

 

682

 

 

 

1,304

 

 

 

1,367

 

Sales and marketing

 

 

1,218

 

 

 

1,115

 

 

 

2,480

 

 

 

2,032

 

Research and development

 

 

808

 

 

 

760

 

 

 

1,583

 

 

 

1,550

 

General and administrative

 

 

491

 

 

 

686

 

 

 

1,433

 

 

 

1,895

 

Depreciation and amortization

 

 

208

 

 

 

173

 

 

 

415

 

 

 

363

 

 

 

8,904

 

 

 

9,904

 

 

 

17,989

 

 

 

19,863

 

 

 

326

 

 

 

796

 

 

 

64

 

 

 

968

 

Interest expense

 

 

(144

)

 

 

(139

)

 

 

(312

)

 

 

(263

)

Other income (expense), net

 

 

273

 

 

 

420

 

 

 

565

 

 

 

(258

)

 

 

455

 

 

 

1,077

 

 

 

317

 

 

 

447

 

Provision for income taxes

 

 

65

 

 

 

57

 

 

 

120

 

 

 

86

 

Income (loss) from equity method investments

 

 

4

 

 

 

(12

)

 

 

40

 

 

 

(16

)

 

 

394

 

 

 

1,008

 

 

 

237

 

 

 

345

 

Less: net loss attributable to non-controlling interests, net of tax

 

 

 

 

 

(7

)

 

 

 

 

 

(16

)

 

$

394

 

 

$

1,015

 

 

$

237

 

 

$

361

 

 

 

 

 

 

 

 

 

Basic

 

$

0.19

 

 

$

0.49

 

 

$

0.12

 

 

$

0.17

 

Diluted

 

$

0.18

 

 

$

0.47

 

 

$

0.10

 

 

$

0.15

 

 

 

 

 

 

 

 

 

Basic

 

 

2,026,813

 

 

 

2,092,180

 

 

 

2,018,233

 

 

 

2,085,324

 

Diluted

 

 

2,079,265

 

 

 

2,150,019

 

 

 

2,066,260

 

 

 

2,151,647

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Net income including non-controlling interests

 

$

394

 

 

$

1,008

 

 

$

237

 

 

$

345

 

Adjustments to reconcile net income to net cash provided by operating activities:

 

 

 

 

 

 

 

 

Depreciation and amortization

 

 

208

 

 

 

181

 

 

 

415

 

 

 

375

 

Bad debt expense

 

 

24

 

 

 

9

 

 

 

44

 

 

 

35

 

Stock-based compensation

 

 

504

 

 

 

455

 

 

 

974

 

 

 

939

 

Deferred income taxes

 

 

6

 

 

 

(7

)

 

 

16

 

 

 

(23

)

Loss (income) from equity method investments, net

 

 

(4

)

 

 

12

 

 

 

(40

)

 

 

16

 

Unrealized (gain) loss on debt and equity securities, net

 

 

(386

)

 

 

(333

)

 

 

(706

)

 

 

388

 

Loss from sale of investment

 

 

74

 

 

 

 

 

 

74

 

 

 

 

Impairments of goodwill, long-lived assets and other assets

 

 

11

 

 

 

 

 

 

78

 

 

 

 

Unrealized foreign currency transactions

 

 

2

 

 

 

59

 

 

 

85

 

 

 

209

 

Other

 

 

6

 

 

 

(79

)

 

 

10

 

 

 

(138

)

Change in assets and liabilities, net of impact of business acquisitions and disposals:

 

 

 

 

 

 

 

 

Accounts receivable

 

 

(13

)

 

 

(162

)

 

 

155

 

 

 

(584

)

Prepaid expenses and other assets

 

 

(114

)

 

 

(108

)

 

 

(233

)

 

 

(430

)

Operating lease right-of-use assets

 

 

42

 

 

 

47

 

 

 

94

 

 

 

93

 

Accounts payable

 

 

(19

)

 

 

(70

)

 

 

(26

)

 

 

(24

)

Accrued insurance reserves

 

 

588

 

 

 

692

 

 

 

938

 

 

 

1,385

 

Accrued expenses and other liabilities

 

 

(87

)

 

 

141

 

 

 

(229

)

 

 

731

 

Operating lease liabilities

 

 

(46

)

 

 

(25

)

 

 

(90

)

 

 

(81

)

Net cash provided by operating activities

 

 

1,190

 

 

 

1,820

 

 

 

1,796

 

 

 

3,236

 

 

 

 

 

 

 

 

 

Purchases of property and equipment

 

 

(50

)

 

 

(99

)

 

 

(107

)

 

 

(156

)

Purchases of non-marketable equity securities

 

 

 

 

 

(58

)

 

 

 

 

 

(232

)

Purchases of marketable securities

 

 

(1,361

)

 

 

(3,288

)

 

 

(2,207

)

 

 

(5,317

)

Proceeds from maturities and sales of marketable securities

 

 

1,127

 

 

 

1,821

 

 

 

1,627

 

 

 

3,851

 

Proceeds from sale of equity method investment

 

 

703

 

 

 

8

 

 

 

703

 

 

 

17

 

Other investing activities

 

 

(11

)

 

 

(60

)

 

 

(7

)

 

 

(81

)

Net cash provided by (used in) investing activities

 

 

408

 

 

 

(1,676

)

 

 

9

 

 

 

(1,918

)

 

 

 

 

 

 

 

 

Issuance of term loans and notes, net of issuance costs

 

 

 

 

 

 

 

 

1,121

 

 

 

 

Principal repayment on term loan and notes

 

 

(7

)

 

 

(7

)

 

 

(1,144

)

 

 

(13

)

Principal payments on finance leases

 

 

(42

)

 

 

(35

)

 

 

(82

)

 

 

(77

)

Proceeds from the issuance of common stock under the Employee Stock Purchase Plan

 

 

85

 

 

 

103

 

 

 

85

 

 

 

103

 

Repurchases of common stock

 

 

 

 

 

(325

)

 

 

 

 

 

(325

)

Other financing activities

 

 

6

 

 

 

73

 

 

 

(45

)

 

 

21

 

Net cash provided by (used in) financing activities

 

 

42

 

 

 

(191

)

 

 

(65

)

 

 

(291

)

Effect of exchange rate changes on cash and cash equivalents, and restricted cash and cash equivalents

 

 

27

 

 

 

(56

)

 

 

43

 

 

 

(150

)

Net increase (decrease) in cash and cash equivalents, and restricted cash and cash equivalents

 

 

1,667

 

 

 

(103

)

 

 

1,783

 

 

 

877

 

 

 

 

 

 

 

 

 

Beginning of period

 

 

6,793

 

 

 

7,984

 

 

 

6,677

 

 

 

7,004

 

End of period

 

$

8,460

 

 

$

7,881

 

 

$

8,460

 

 

$

7,881

 

 

 

 

 

 

 

 

 

 

Other Income (Expense), Net

The following table presents other income (expense), net (in millions):

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Interest income

 

$

107

 

 

$

176

 

 

$

194

 

 

$

335

 

Foreign currency exchange gains (losses), net

 

 

1

 

 

 

(83

)

 

 

(93

)

 

 

(247

)

Unrealized gain (loss) on debt and equity securities, net

 

 

386

 

 

 

333

 

 

 

706

 

 

 

(388

)

Loss from sale of investment

 

 

(74

)

 

 

 

 

 

(74

)

 

 

 

Other, net

 

 

(147

)

 

 

(6

)

 

 

(168

)

 

 

42

 

Other income (expense), net

 

$

273

 

 

$

420

 

 

$

565

 

 

$

(258

)

During the three and six months ended June 30, 2023, unrealized gain on debt and equity securities, net represents changes in the fair value of our equity securities, primarily including: a $466 million and $521 million unrealized gain on our Aurora investment, respectively; a $151 million and $177 million unrealized gain on our Joby investment, respectively; a $225 million and $113 million unrealized gain on our Grab investment, respectively; partially offset by a $461 million and $104 million unrealized loss on our Didi investment, respectively.

 

During the three months ended June 30, 2024, unrealized gain on debt and equity securities, net represents changes in the fair value of our equity securities, primarily including: a $220 million unrealized gain on our Grab investment, and a $178 million unrealized gain on our Didi investment.

 

During the six months ended June 30, 2024, unrealized loss on debt and equity securities, net represents changes in the fair value of our equity securities, primarily including: a $522 million unrealized loss on our Aurora investment; partially offset by a $109 million gain on our Didi investment and a $96 million gain on our Grab investment.

 

During the three and six months ended June 30, 2023, loss from sale of investment represents an immaterial loss recognized on the sale of our remaining 29% equity interest in MLU B.V. to Yandex, for $703 million in cash. After this transaction, we no longer have an equity interest in MLU B.V.

Stock-Based Compensation Expense

The following table summarizes total stock-based compensation expense by function (in millions):

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Operations and support

 

$

45

 

 

$

54

 

 

$

83

 

 

$

121

 

Sales and marketing

 

 

26

 

 

 

24

 

 

 

50

 

 

 

45

 

Research and development

 

 

317

 

 

 

277

 

 

 

607

 

 

 

576

 

General and administrative

 

 

116

 

 

 

100

 

 

 

234

 

 

 

197

 

Total

 

$

504

 

 

$

455

 

 

$

974

 

 

$

939

 

Key Terms for Our Key Metrics and Non-GAAP Financial Measures

Adjusted EBITDA. Adjusted EBITDA is a Non-GAAP measure. We define Adjusted EBITDA as net income (loss), excluding (i) income (loss) from discontinued operations, net of income taxes, (ii) net income (loss) attributable to non-controlling interests, net of tax, (iii) provision for (benefit from) income taxes, (iv) income (loss) from equity method investments, (v) interest expense, (vi) other income (expense), net, (vii) depreciation and amortization, (viii) stock-based compensation expense, (ix) certain legal, tax, and regulatory reserve changes and settlements, (x) goodwill and asset impairments/loss on sale of assets, (xi) acquisition, financing and divestitures related expenses, (xii) restructuring and related charges and (xiii) other items not indicative of our ongoing operating performance.

Adjusted EBITDA margin . We define Adjusted EBITDA margin as Adjusted EBITDA as a percentage of Gross Bookings. We define incremental margin as the change in Adjusted EBITDA between periods divided by the change in Gross Bookings between periods.

Aggregate Driver and Courier Earnings . Aggregate Driver and Courier Earnings refers to fares (net of Uber service fee, taxes and tolls), tips, Driver incentives and Driver benefits.

Driver(s). The term Driver collectively refers to independent providers of ride or delivery services who use our platform to provide Mobility or Delivery services, or both.

Driver or restaurant earnings. Driver or restaurant earnings refer to the net portion of the fare or the net portion of the order value that a Driver or a restaurant retains, respectively. These are generally included in aggregate Drivers and Couriers earnings.

Driver incentives. Driver incentives refer to payments that we make to Drivers, which are separate from and in addition to the Driver’s portion of the fare paid by the consumer after we retain our service fee to Drivers. For example, Driver incentives could include payments we make to Drivers should they choose to take advantage of an incentive offer and complete a consecutive number of trips or a cumulative number of trips on the platform over a defined period of time. Driver incentives are recorded as a reduction of revenue or cost of revenue, exclusive of depreciation and amortization. These incentives are generally included in aggregate Drivers and Couriers earnings.

Free cash flow. Free cash flow is a Non-GAAP measure. We define free cash flow as net cash flows from operating activities less capital expenditures.

Gross Bookings. We define Gross Bookings as the total dollar value, including any applicable taxes, tolls, and fees, of: Mobility rides, Delivery orders (in each case without any adjustment for consumer discounts and refunds, Driver and Merchant earnings, and Driver incentives) and Freight Revenue. Gross Bookings do not include tips earned by Drivers. Gross Bookings are an indication of the scale of our current platform, which ultimately impacts revenue.

Monthly Active Platform Consumers (“MAPCs”). We define MAPCs as the number of unique consumers who completed a Mobility ride or received a Delivery order on our platform at least once in a given month, averaged over each month in the quarter. While a unique consumer can use multiple product offerings on our platform in a given month, that unique consumer is counted as only one MAPC.

Revenue Margin. We define Revenue Margin as revenue as a percentage of Gross Bookings.

Segment Adjusted EBITDA. We define each segment’s Adjusted EBITDA as segment revenue less the following direct costs and expenses of that segment: (i) cost of revenue, exclusive of depreciation and amortization; (ii) operations and support; (iii) sales and marketing; (iv) research and development; and (v) general and administrative. Segment Adjusted EBITDA also reflects any applicable exclusions from Adjusted EBITDA.

Segment Adjusted EBITDA margin . We define each segment’s Adjusted EBITDA margin as the segment Adjusted EBITDA as a percentage of segment Gross Bookings.

Trips. We define Trips as the number of completed consumer Mobility rides and Delivery orders in a given period. For example, an UberX Share ride with three paying consumers represents three unique Trips, whereas an UberX ride with three passengers represents one Trip. We believe that Trips are a useful metric to measure the scale and usage of our platform.

Definitions of Non-GAAP Measures

We collect and analyze operating and financial data to evaluate the health of our business and assess our performance. In addition to revenue, net income (loss), income (loss) from operations, and other results under GAAP, we use: Adjusted EBITDA; Free cash flow; Non-GAAP Costs and Operating Expenses; as well as, revenue growth rates in constant currency, which are described below, to evaluate our business. We have included these non-GAAP financial measures because they are key measures used by our management to evaluate our operating performance. Accordingly, we believe that these non-GAAP financial measures provide useful information to investors and others in understanding and evaluating our operating results in the same manner as our management team and board of directors. Our calculation of these non-GAAP financial measures may differ from similarly-titled non-GAAP measures, if any, reported by our peer companies. These non-GAAP financial measures should not be considered in isolation from, or as substitutes for, financial information prepared in accordance with GAAP.

Adjusted EBITDA

We define Adjusted EBITDA as net income (loss), excluding (i) income (loss) from discontinued operations, net of income taxes, (ii) net income (loss) attributable to non-controlling interests, net of tax, (iii) provision for (benefit from) income taxes, (iv) income (loss) from equity method investments, (v) interest expense, (vi) other income (expense), net, (vii) depreciation and amortization, (viii) stock-based compensation expense, (ix) certain legal, tax, and regulatory reserve changes and settlements, (x) goodwill and asset impairments/loss on sale of assets, (xi) acquisition, financing and divestitures related expenses, (xii) restructuring and related charges and (xiii) other items not indicative of our ongoing operating performance.

We have included Adjusted EBITDA because it is a key measure used by our management team to evaluate our operating performance, generate future operating plans, and make strategic decisions, including those relating to operating expenses. Accordingly, we believe that Adjusted EBITDA provides useful information to investors and others in understanding and evaluating our operating results in the same manner as our management team and board of directors. In addition, it provides a useful measure for period-to-period comparisons of our business, as it removes the effect of certain non-cash expenses and certain variable charges.

Legal, tax, and regulatory reserve changes and settlements

Legal, tax, and regulatory reserve changes and settlements are primarily related to certain significant legal proceedings or governmental investigations related to worker classification definitions, or tax agencies challenging our non-income tax positions. These matters have limited precedent, cover extended historical periods and are unpredictable in both magnitude and timing, therefore are distinct from normal, recurring legal, tax and regulatory matters and related expenses incurred in our ongoing operating performance.

Limitations of Non-GAAP Financial Measures and Adjusted EBITDA Reconciliation

Adjusted EBITDA has limitations as a financial measure, should be considered as supplemental in nature, and is not meant as a substitute for the related financial information prepared in accordance with GAAP. These limitations include the following:

  • Adjusted EBITDA excludes certain recurring, non-cash charges, such as depreciation of property and equipment and amortization of intangible assets, and although these are non-cash charges, the assets being depreciated and amortized may have to be replaced in the future, and Adjusted EBITDA does not reflect all cash capital expenditure requirements for such replacements or for new capital expenditure requirements;
  • Adjusted EBITDA excludes stock-based compensation expense, which has been, and will continue to be for the foreseeable future, a significant recurring expense in our business and an important part of our compensation strategy;
  • Adjusted EBITDA excludes certain restructuring and related charges, part of which may be settled in cash;
  • Adjusted EBITDA excludes other items not indicative of our ongoing operating performance;
  • Adjusted EBITDA does not reflect period to period changes in taxes, income tax expense or the cash necessary to pay income taxes;
  • Adjusted EBITDA does not reflect the components of other income (expense), net, which primarily includes: interest income; foreign currency exchange gains (losses), net; and unrealized gain (loss) on debt and equity securities, net; and
  • Adjusted EBITDA excludes certain legal, tax, and regulatory reserve changes and settlements that may reduce cash available to us.

Constant Currency

We compare the percent change in our current period results from the corresponding prior period using constant currency disclosure. We present constant currency growth rate information to provide a framework for assessing how our underlying revenue performed excluding the effect of foreign currency rate fluctuations. We calculate constant currency by translating our current period financial results using the corresponding prior period’s monthly exchange rates for our transacted currencies other than the U.S. dollar.

Free Cash Flow

We define free cash flow as net cash flows from operating activities less capital expenditures.

Non-GAAP Costs and Operating Expenses

Costs and operating expenses are defined as: cost of revenue, exclusive of depreciation and amortization; operations and support; sales and marketing; research and development; and general and administrative expenses. We define Non-GAAP costs and operating expenses as costs and operating expenses excluding: (i) stock-based compensation expense, (ii) certain legal, tax, and regulatory reserve changes and settlements, (iii) goodwill and asset impairments/loss on sale of assets, (iv) acquisition, financing and divestiture related expenses, (v) restructuring and related charges and (vi) other items not indicative of our ongoing operating performance.

Reconciliations of Non-GAAP Measures

The following table presents reconciliations of Adjusted EBITDA to the most directly comparable GAAP financial measure for each of the periods indicated:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Add (deduct):

 

 

 

 

 

 

 

 

Net loss attributable to non-controlling interests, net of tax

 

 

 

 

 

(7

)

 

 

 

 

 

(16

)

(Income) loss from equity method investments

 

 

(4

)

 

 

12

 

 

 

(40

)

 

 

16

 

Provision for income taxes

 

 

65

 

 

 

57

 

 

 

120

 

 

 

86

 

Other (income) expense, net

 

 

(273

)

 

 

(420

)

 

 

(565

)

 

 

258

 

Interest expense

 

 

144

 

 

 

139

 

 

 

312

 

 

 

263

 

 

 

 

 

 

 

 

 

 

 

 

 

Add (deduct):

 

 

 

 

 

 

 

 

Depreciation and amortization

 

 

208

 

 

 

173

 

 

 

415

 

 

 

363

 

Stock-based compensation expense

 

 

504

 

 

 

455

 

 

 

974

 

 

 

939

 

Legal, tax, and regulatory reserve changes and settlements

 

 

(155

)

 

 

134

 

 

 

95

 

 

 

661

 

Goodwill and asset impairments/loss on sale of assets

 

 

16

 

 

 

 

 

 

83

 

 

 

(3

)

Acquisition, financing and divestitures related expenses

 

 

10

 

 

 

3

 

 

 

18

 

 

 

8

 

Gain on lease arrangement, net

 

 

(2

)

 

 

 

 

 

(3

)

 

 

 

Restructuring and related charges, net

 

 

9

 

 

 

9

 

 

 

31

 

 

 

16

 

 

 

 

 

 

 

 

 

The following table presents reconciliations of free cash flow to the most directly comparable GAAP financial measure for each of the periods indicated:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Purchases of property and equipment

 

 

(50

)

 

 

(99

)

 

 

(107

)

 

 

(156

)

 

 

 

 

 

 

 

 

The following tables present reconciliations of Non-GAAP costs and operating expenses to the most directly comparable GAAP financial measure for each of the periods indicated:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Legal, tax, and regulatory reserve changes and settlements

 

 

 

 

 

 

 

 

(76

)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Restructuring and related charges

 

 

(1

)

 

 

(2

)

 

 

(7

)

Acquisition, financing and divestitures related expenses

 

 

(3

)

 

 

 

 

 

 

Gain on lease arrangements, net

 

 

1

 

 

 

 

 

 

 

Stock-based compensation expense

 

 

(45

)

 

 

(67

)

 

 

(54

)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Restructuring and related charges

 

 

 

 

 

(1

)

 

 

 

Stock-based compensation expense

 

 

(26

)

 

 

(21

)

 

 

(24

)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Restructuring and related charges

 

 

(3

)

 

 

(3

)

 

 

 

Stock-based compensation expense

 

 

(317

)

 

 

(299

)

 

 

(277

)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Legal, tax, and regulatory reserve changes and settlements

 

 

155

 

 

 

(527

)

 

 

(58

)

Goodwill and asset impairments/loss on sale of assets

 

 

(16

)

 

 

3

 

 

 

 

Restructuring and related charges

 

 

(5

)

 

 

(1

)

 

 

(2

)

Acquisition, financing and divestitures related expenses

 

 

(7

)

 

 

(5

)

 

 

(3

)

Gain on lease arrangements, net

 

 

1

 

 

 

 

 

 

 

Stock-based compensation expense

 

 

(116

)

 

 

(97

)

 

 

(100

)

 

 

 

 

 

 

case study how uber uses big data

Investors and analysts: [email protected] Media: [email protected]

case study how uber uses big data

Sign up for email alerts

To opt in to receive investor email alerts, please enter your email address in the field below and select at least one alert option. After submitting your request, you’ll receive an activation email at the requested email address. You must click the activation link in order to complete your subscription. You can sign up for additional alert options at any time.

You can unsubscribe to any of the investor alerts you are subscribed to by visiting the Unsubscribe section below. If you experience any issues with this process, please contact us for further assistance.

By providing your email address below, you are giving consent to Uber Technologies Inc. to send you the requested investor email alert updates.

*
*

Email Alert Sign Up Confirmation

Uber Investor Logo

  • Global citizenship
  • US political engagement
  • U.S. Political Activities 2021
  • U.S. Political Activities 2022
  • U.S. Political Activities H1 2023
  • Uber vs. driving jobs
  • Uber for Business
  • Uber Freight
  • Uber Health
  • Advanced Technologies Group
  • Accessibility
  • Facebook (Opens in new window)
  • Twitter (Opens in new window)
  • Youtube (Opens in new window)
  • Linkedin (Opens in new window)
  • Instagram (Opens in new window)

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

sustainability-logo

Article Menu

case study how uber uses big data

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Research on the spatial structure of xinjiang port cities based on multi-source geographic big data—a case of central kashi city, share and cite.

Wang, G.; Hu, J.; Wang, M.; Zhang, S. Research on the Spatial Structure of Xinjiang Port Cities Based on Multi-Source Geographic Big Data—A Case of Central Kashi City. Sustainability 2024 , 16 , 6852. https://doi.org/10.3390/su16166852

Wang G, Hu J, Wang M, Zhang S. Research on the Spatial Structure of Xinjiang Port Cities Based on Multi-Source Geographic Big Data—A Case of Central Kashi City. Sustainability . 2024; 16(16):6852. https://doi.org/10.3390/su16166852

Wang, Guiqin, Jiangling Hu, Mengjie Wang, and Saisai Zhang. 2024. "Research on the Spatial Structure of Xinjiang Port Cities Based on Multi-Source Geographic Big Data—A Case of Central Kashi City" Sustainability 16, no. 16: 6852. https://doi.org/10.3390/su16166852

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. How Uber Uses Data and Analytics (Case Study)

    case study how uber uses big data

  2. Uber's Big Data Platform: 100+ Petabytes with Minute Latency

    case study how uber uses big data

  3. 5 Big Data Case Studies

    case study how uber uses big data

  4. Uber business model revealed plus a detailed analysis of how Uber makes

    case study how uber uses big data

  5. Data Science Case Studies

    case study how uber uses big data

  6. Uber Case Study by Ahmed Hossam on Prezi

    case study how uber uses big data

COMMENTS

  1. PDF Uber & Big Data a case study

    Data users City operations teams (thousands of users) On-the-ground crews that manage and scale Uber's transportation network in each market. Access data on a regular basis to respond to driver-and-rider-specific issues Data scientists and analysts (hundreds of users)

  2. Unleashing the power of Presto: The Uber case study

    Uber's success as a data-driven company is no accident. It's the result of a deliberate strategy to leverage cutting-edge technologies like Presto to unlock the insights hidden in vast volumes of data. Presto has become an integral part of Uber's data ecosystem, enabling the company to process petabytes of data, support diverse analytical ...

  3. Uber's Big Data Platform: 100+ Petabytes with Minute Latency

    Responsible for cleaning, storing, and serving over 100 petabytes of analytical data, Uber's Hadoop platform ensures data reliability, scalability, and ease-of-use with minimal latency.

  4. Uber knows you: how data optimizes our rides

    Drivers, in turn, get more time to earn. 1. Surge Pricing. The instant implementation of live data allows Uber to effectively operate a dynamic pricing model. Using geo-location coordinates from drivers, street traffic and ride demand data, the so called Geosurge-algorithm compares theoretical ideals with what is actually implemented in the ...

  5. Solving Big Data Challenges with Data Science at Uber

    How engineers and data scientists at Uber came together to come up with a means of partially replicating Vertica clusters to better scale our data volume.

  6. How Uber uses data science to reinvent transportation?

    Understand how the ride sharing service Uber uses big data and data science to reinvent transportation and logistics globally.

  7. How Uber Uses Data and Analytics (Case Study)

    How Uber Uses Data and Analytics (Case Study) Everyone knows Uber as a shared service for point-to-point transportation, but not everyone knows Uber as a data and analytics company. In this EMA technical case study, sponsored by Ahana, you'll learn about: What is Presto? The evolution of its use at Uber The analytical use cases of Presto, and ...

  8. 42: UBER: How Big Data Is At The Centre Of Uber's Transportation

    How Is Big Data Used In Practice? Uber store and monitor data on every journey their users take, and use it to determine demand, allocate resources and set fares. The company also carry out in-depth analysis of public transport networks in the cities they serve, so they can focus coverage in poorly served areas and provide links to buses and ...

  9. Challenges and Opportunities to Dramatically Reduce the Cost of ...

    Big data is at the core of Uber's business. We continue to innovate and provide better experiences for our earners, riders, and eaters by leveraging big data, machine learning, and artificial intelligence technology. As a result, over the last four years, the scale of our big data platform multiplied from single-digit petabytes to many hundreds of petabytes.Uber's big data stack is built ...

  10. Using 'Big Data' to understand the impacts of Uber on taxis in New York

    Highlights • We use creative destruction and diffusion of innovation to explore how Uber affected the taxis. • To capture Uber's popularity we use innovative new data from the New York Times API. • Our results illustrate the disruptive nature of Uber.

  11. How Uber Uses Data to Improve Their Service and Create the New Wave of

    Their answer is data visualization. According to Uber's own data intelligence blog, data visualization specialists range from computer graphics professionals to information design. They handle everything from mapping and framework developments to data that the public (such as drivers) sees.

  12. The Big Problem with Uber's Big Data: Ethics and Regulation of Data

    Big Data gave Uber enough power and agency to be able to attract workers with its ease-of-use and escape the classic employee-employer relationship, defining itself as a data-powered platform that serves as a mediator between drivers and consumers (Wilhelm 2018). With this position, Uber solely relies on Big Data and the algorithms that collect and use it to balance the complex relationship ...

  13. Using Big Data to Estimate Consumer Surplus: The Case of Uber

    Using almost 50 million individual-level observations and a regression discontinuity design, we estimate that in 2015 the UberX service generated about $2.9 billion in consumer surplus in the four U.S. cities included in our analysis. For each dollar spent by consumers, about $1.60 of consumer surplus is generated.

  14. Uber's real-time architecture represents the future of data apps: Meet

    Uber as the future of enterprise data apps Back in March, we introduced this idea of Uber as an instructive example of the future of enterprise data apps.

  15. Uber's Strategy for Global Success

    Harvard Business School assistant professor Alexander MacKay describes Uber's global market strategy and responses by regulators and local competitors in his case, " Uber: Competing Globally ...

  16. BIG DATA ANALYTICS IN UBER

    Uber's value chain analysis is a strategic analytical technique that aids in pinpointing the comp any's sources. of value and competitive advantage. It invol ves Uber's core functions, including ...

  17. Efficiently Managing the Supply and Demand on Uber's Big Data Platform

    With Uber's business growth and the fast adoption of big data and AI, Big Data scaled to become our most costly infrastructure platform. To reduce operational expenses, we developed a holistic framework with 3 pillars: platform efficiency, supply, and demand (using supply to describe the hardware resources that are made available to run big data storage and compute workload, and demand to ...

  18. 5 Big Data Case Studies

    Check out these amazing 5 big data case studies. How Big Data is helping all big cmpanies to make profit - Walmart, Netflix, Uber, eBay, P&G

  19. Uber Case Study

    Uber needed a solution to make its global workforce data-driven at scale. Through hands-on upskilling on our platform, thousands of Uber employees now use data in their daily work. Operations, Marketing and Product teams use it for planning and decision-making.

  20. How Data-Driven Companies Succeed: Google, Amazon, and Uber Case Studies

    It's no secret that data-driven companies are succeeding. Google, Amazon, and Uber are prime examples of businesses that have used data analytics to improve their performance and better serve their customers. In this blog post, we will take a look at how these three companies use business data and analytics to drive success.

  21. Shingles vaccines may reduce dementia risk, two large new studies

    Two new studies suggest that getting a vaccine to protect against a painful case of shingles may be beneficial for memory, too.

  22. Uber Announces Results for Second Quarter 2024

    SAN FRANCISCO--(BUSINESS WIRE)-- Uber Technologies, Inc. (NYSE: UBER) today announced financial results for the quarter ended June 30, 2024. "Uber's growth engine continues to hum, delivering our sixth consecutive quarter of trip growth above 20 percent, alongside record profitability," said Dara Khosrowshahi, CEO.

  23. Sustainability

    Exploring urban spatial structure plays an important role in promoting urban development, but there is a lack of research on the urban spatial structure of Xinjiang ports. This paper takes the central urban area of Kashi City as the study area and integrates points of interest (POI) data with nighttime light (NTL) data using the Open Street Map (OSM) road network to perform kernel density ...