DOORDASH is a food delivery logistics company that connects customers, restaurants, and drivers with the intent of delivering meals. I was fortunate enough to learn the basic architecture of their big data platform and how machine learning is used to coordinate various events in their business. The following is a very brief summary:
Let us start by reviewing a basic transaction. Customer logs in their site (or app) and is presented with a list of restaurants in the vicinity. The customer places an order for the delivery. DOORDASH’s logistics platform has minutes to perform the following tasks:
- Send the order to the merchant
- Match a specific driver to a specific order(s)
- Track the delivery process and collect data to be used to improve models
On the first blush this does not seem to be super complicated, but a closer examination reveals a very different picture. Imagine a zip code with hundreds of customers, dozens of restaurants and drivers each party having different likes and dislikes and limitations. The process of matching drivers to deliveries has to take the following factors into consideration and it must happen in minutes:
- Food preparation time (highly dependent on the restaurant and food type)
- Kitchen to counter travel time
- Expected driver’s travel and parking time going to the restaurant
- Delivery time impacted by the traffic weather
- Location and the capacity of the drivers
- Number of deliveries made by the same drivers
Keep in mind that the above has to happen quickly and has to match the expectation of very demanding and hungry customers. The satisfaction of the drivers and vendors are critical as well. An elaborate worldwide computing and storage platforms has been provisioned to make this possible and the results are nothing short of miraculous. The following are some key attributes of this infrastructure:
- Incoming customer orders are logged in a row-based operational database (utilizing AWS-Aurora). The speed and reliability of this database is absolutely critical and significant steps are taken to assure its throughput and resilience. In addition to the operational database, a second (column-based) database (utilizing AWS-Redshift) is used to store all the collected data needed for business intelligence and machine learning jobs. The information flow from operational to analytics database is done on daily basis using batch jobs (using Apache Airflow). The raw data is also transformed to a new format better suited for machine learning and analytics jobs. The transfer and transformation of data from operational to analytics databases has to have minimal impact on the performance and availability of operational database.
- The main purpose of the overall system is matchmaking and it entails pairing the right driver to an order. The cost function of this process is a combination of customer, driver, and vendor satisfaction. The heavy lifting for assigning orders to drivers is done in real-time using a massive combinatorial optimizer (basically a traveling salesman problem).
- The machine learning models employed by DOORDASH are utilized to make predictions for the following two tasks:
a. Personalizing what is presented to each individual customer based on their preferences. The goal is to optimize the probability of order placement
b. Prediction of parameters fed to the optimization engine (i.e. travel and parking time, food preparation duration among many others)
Machine learning models are continuously trained based on historical as well as new transactional information. The deployment of new models is only done once favorable outcomes are produced in the following three areas:
- Favorable results in back testing using historical data
- Improved accuracy generated by the new models operating in shadow mode
- Satisfactory performance during A/B testing
What is striking here is not so much the actual machine learning models, and it is the tight integration and flawless harmony of dozens of subsegments. I believe the produced results are truly remarkable.
Matching drivers to packages are nothing new and has been done by likes of Fedex and UPS for decades. DOORDASH’s challenge is to do the matching in minutes and keeping three parties satisfied.