Curated papers, articles, and blogs on machine learning in production. Designing your ML system? Learn how other organizations did it.
Star 
Table of Contents
- Data Quality
- Data Engineering
- Data Discovery
- Feature Stores
- Classification
- Regression
- Forecasting
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Audio
- Validation and A/B Testing
- Model Management
- Efficiency
- Ethics
- Infra
- MLOps Platforms
- Practices
- Team Structure
- Fails
Data Quality
- Reliable and Scalable Data Ingestion at Airbnb
Airbnb 2016 - Monitoring Data Quality at Scale with Statistical Modeling
Uber 2017 - Data Management Challenges in Production Machine Learning (Paper)
Google 2017 - Automating Large-Scale Data Quality Verification (Paper)
Amazon 2018 - Meet Hodor — Gojek’s Upstream Data Quality Tool
Gojek 2019 - Data Validation for Machine Learning (Paper)
Google 2019 - An Approach to Data Quality for Netflix Personalization Systems
Netflix 2020 - Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper)
Facebook 2020
Data Engineering
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb 2018 - Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Airbnb 2020 - Unbundling Data Science Workflows with Metaflow and AWS Step Functions
Netflix 2020 - How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand
DoorDash 2020 - Revolutionizing Money Movements at Scale with Strong Data Consistency
Uber 2020 - Zipline - A Declarative Feature Engineering Framework
Airbnb 2020 - Automating Data Protection at Scale, Part 1 (Part 2)
Airbnb 2021 - Real-time Data Infrastructure at Uber
Uber 2021 - Introducing Fabricator: A Declarative Feature Engineering Framework
DoorDash 2022 - Functions & DAGs: introducing Hamilton, a microframework for dataframe generation
Stitch Fix 2021 - Optimizing Pinterest’s Data Ingestion Stack: Findings and Learnings
Pinterest 2022 - Lessons Learned From Running Apache Airflow at Scale
Shopify 2022 - Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Meta 2022 - Data Mesh — A Data Movement and Processing Platform @ Netflix
Netflix 2022 - Building Scalable Real Time Event Processing with Kafka and Flink
DoorDash 2022
Data Discovery
- Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code)
Apache - Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (Code)
WeWork - Discovery and Consumption of Analytics Data at Twitter
Twitter 2016 - Democratizing Data at Airbnb
Airbnb 2017 - Databook: Turning Big Data into Knowledge with Metadata at Uber
Uber 2018 - Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code)
Netflix 2018 - Amundsen — Lyft’s Data Discovery & Metadata Engine
Lyft 2019 - Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code)
Lyft 2019 - DataHub: A Generalized Metadata Search & Discovery Tool (Code)
LinkedIn 2019 - Amundsen: One Year Later
Lyft 2020 - Using Amundsen to Support User Privacy via Metadata Collection at Square
Square 2020 - Turning Metadata Into Insights with Databook
Uber 2020 - DataHub: Popular Metadata Architectures Explained
LinkedIn 2020 - How We Improved Data Discovery for Data Scientists at Spotify
Spotify 2020 - How We’re Solving Data Discovery Challenges at Shopify
Shopify 2020 - Nemo: Data discovery at Facebook
Facebook 2020 - Exploring Data @ Netflix (Code)
Netflix 2021
Feature Stores
- Distributed Time Travel for Feature Generation
Netflix 2016 - Building the Activity Graph, Part 2 (Feature Storage Section)
LinkedIn 2017 - Fact Store at Scale for Netflix Recommendations
Netflix 2018 - Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb 2018 - Feature Store: The missing data layer for Machine Learning pipelines?
Hopsworks 2018 - Introducing Feast: An Open Source Feature Store for Machine Learning (Code)
Gojek 2019 - Michelangelo Palette: A Feature Engineering Platform at Uber
Uber 2019 - The Architecture That Powers Twitter's Feature Store
Twitter 2019 - Accelerating Machine Learning with the Feature Store Service
Condé Nast 2019 - Feast: Bridging ML Models and Data
Gojek 2020 - Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression
DoorDash 2020 - Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed
LinkedIn 2020 - Building a Feature Store
Monzo Bank 2020 - Butterfree: A Spark-based Framework for Feature Store Building (Code)
QuintoAndar 2020 - Building Riviera: A Declarative Real-Time Feature Engineering Framework
DoorDash 2021 - Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory
Uber 2021 - ML Feature Serving Infrastructure at Lyft
Lyft 2021 - Near real-time features for near real-time personalization
LinkedIn 2022 - Building the Model Behind DoorDash’s Expansive Merchant Selection
DoorDash 2022 - Open sourcing Feathr – LinkedIn’s feature store for productive machine learning
LinkedIn 2022 - Evolution of ML Fact Store
Netflix 2022 - Developing scalable feature engineering DAGs
Metaflow + Hamilton via Outerbounds 2022 - Feature Store Design at Constructor
Constructor.io 2023
Classification
- Prediction of Advertiser Churn for Google AdWords (Paper)
Google 2010 - High-Precision Phrase-Based Document Classification on a Modern Scale (Paper)
LinkedIn 2011 - Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper)
Walmart 2014 - Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper)
NAVER 2016 - Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google 2017 - Discovering and Classifying In-app Message Intent at Airbnb
Airbnb 2019 - Teaching Machines to Triage Firefox Bugs
Mozilla 2019 - Categorizing Products at Scale
Shopify 2020 - How We Built the Good First Issues Feature
GitHub 2020 - Testing Firefox More Efficiently with Machine Learning
Mozilla 2020 - Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper)
Microsoft 2020 - Scalable Data Classification for Security and Privacy (Paper)
Facebook 2020 - Uncovering Online Delivery Menu Best Practices with Machine Learning
DoorDash 2020 - Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging
DoorDash 2020 - Deep Learning: Product Categorization and Shelving
Walmart 2021 - Large-scale Item Categorization for e-Commerce (Paper)
DianPing, eBay 2012 - Semantic Label Representation with an Application on Multimodal Product Categorization
Walmart 2022 - Building Airbnb Categories with ML and Human-in-the-Loop
Airbnb 2022
Regression
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb 2017 - Using Machine Learning to Predict the Value of Ad Requests
Twitter 2020 - Open-Sourcing Riskquant, a Library for Quantifying Risk (Code)
Netflix 2020 - Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment
DoorDash 2020
Forecasting
- Engineering Extreme Event Forecasting at Uber with RNN
Uber 2017 - Forecasting at Uber: An Introduction
Uber 2018 - Transforming Financial Forecasting with Data Science and Machine Learning at Uber
Uber 2018 - Under the Hood of Gojek’s Automated Forecasting Tool
Gojek 2019 - BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video)
Google 2020 - Retraining Machine Learning Models in the Wake of COVID-19
DoorDash 2020 - Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code)
Atlassian 2020 - Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code)
Uber 2021 - Managing Supply and Demand Balance Through Machine Learning
DoorDash 2021 - Greykite: A flexible, intuitive, and fast forecasting library
LinkedIn 2021 - The history of Amazon’s forecasting algorithm
Amazon 2021 - DeepETA: How Uber Predicts Arrival Times Using Deep Learning
Uber 2022 - Forecasting Grubhub Order Volume At Scale
Grubhub 2022 - Causal Forecasting at Lyft (Part 1)
Lyft 2022
Recommendation
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper)
Amazon 2003 - Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2)
Netflix 2012 - How Music Recommendation Works — And Doesn’t Work
Spotify 2012 - Learning to Rank Recommendations with the k -Order Statistic Loss (Paper)
Google 2013 - Recommending Music on Spotify with Deep Learning
Spotify 2014 - Learning a Personalized Homepage
Netflix 2015 - The Netflix Recommender System: Algorithms, Business Value, and Innovation (Paper)
Netflix 2015 - Session-based Recommendations with Recurrent Neural Networks (Paper)
Telefonica 2016 - Deep Neural Networks for YouTube Recommendations
YouTube 2016 - E-commerce in Your Inbox: Product Recommendations at Scale (Paper)
Yahoo 2016 - To Be Continued: Helping you find shows to continue watching on Netflix
Netflix 2016 - Personalized Recommendations in LinkedIn Learning
LinkedIn 2016 - Personalized Channel Recommendations in Slack
Slack 2016 - Recommending Complementary Products in E-Commerce Push Notifications (Paper)
Alibaba 2017 - Artwork Personalization at Netflix
Netflix 2017 - A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper)
Twitter 2017 - Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper)
Pinterest 2017 - Powering Search & Recommendations at DoorDash
DoorDash 2017 - How 20th Century Fox uses ML to predict a movie audience (Paper)
20th Century Fox 2018 - Calibrated Recommendations (Paper)
Netflix 2018 - Food Discovery with Uber Eats: Recommending for the Marketplace
Uber 2018 - Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper)
Spotify 2018 - Talent Search and Recommendation Systems at LinkedIn: Practical Challenges and Lessons Learned (Paper)
LinkedIn 2018 - Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper)
Alibaba 2019 - SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper)
Alibaba 2019 - Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper)
Alibaba 2019 - Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor 2019 - Powered by AI: Instagram’s Explore recommender system
Facebook 2019 - Marginal Posterior Sampling for Slate Bandits (Paper)
Netflix 2019 - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber 2019 - Music recommendation at Spotify
Spotify 2019 - Using Machine Learning to Predict what File you Need Next (Part 1)
Dropbox 2019 - Using Machine Learning to Predict what File you Need Next (Part 2)
Dropbox 2019 - Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)
LinkedIn 2019 - Temporal-Contextual Recommendation in Real-Time (Paper)
Amazon 2020 - P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper)
Amazon 2020 - Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper)
Alibaba 2020 - TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper)
Alibaba 2020 - PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper)
Alibaba 2020 - Controllable Multi-Interest Framework for Recommendation (Paper)
Alibaba 2020 - MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper)
Alibaba 2020 - ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper)
Alibaba 2020 - For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify 2020 - Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify 2020 - Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper)
Spotify 2020 - The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify 2020 - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)
LinkedIn 2020 - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)
LinkedIn 2020 - Building a Heterogeneous Social Network Recommendation System
LinkedIn 2020 - How TikTok recommends videos #ForYou
ByteDance 2020 - Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper)
Google 2020 - Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper)
Google 2020 - Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper)
Google 2020 - Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper)
Tencent 2020 - A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper)
Home Depot 2020 - Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper)
Ikea 2020 - How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads
Pinterest 2020 - Multi-task Learning for Related Products Recommendations at Pinterest
Pinterest 2020 - Improving the Quality of Recommended Pins with Lightweight Ranking
Pinterest 2020 - Multi-task Learning and Calibration for Utility-based Home Feed Ranking
Pinterest 2020 - Personalized Cuisine Filter Based on Customer Preference and Local Popularity
DoorDash 2020 - How We Built a Matchmaking Algorithm to Cross-Sell Products
Gojek 2020 - Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper)
Twitter 2021 - Self-supervised Learning for Large-scale Item Recommendations (Paper)
Google 2021 - Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper)
ByteDance 2021 - Using AI to Help Health Experts Address the COVID-19 Pandemic
Facebook 2021 - Advertiser Recommendation Systems at Pinterest
Pinterest 2021 - On YouTube's Recommendation System
YouTube 2021 - "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops
Coveo 2021 - Mozrt, a Deep Learning Recommendation System Empowering Walmart Store Associates
Walmart 2021 - Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper)
Meta 2021 - The Amazon Music conversational recommender is hitting the right notes
Amazon 2022 - Personalized complementary product recommendation (Paper)
Amazon 2022 - Building a Deep Learning Based Retrieval System for Personalized Recommendations
eBay 2022 - How We Built: An Early-Stage Machine Learning Model for Recommendations
Peloton 2022 - Lessons Learned from Building out Context-Aware Recommender Systems
Peloton 2022 - Beyond Matrix Factorization: Using hybrid features for user-business recommendations
Yelp 2022 - Improving job matching with machine-learned activity features
LinkedIn 2022 - Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Meta 2022 - Blueprints for recommender system architectures: 10th anniversary edition
Xavier Amatriain 2022 - How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume
Pinterest 2022 - RecSysOps: Best Practices for Operating a Large-Scale Recommender System
Netflix 2022 - Recommend API: Unified end-to-end machine learning infrastructure to generate recommendations
Slack 2022 - Evolving DoorDash’s Substitution Recommendations Algorithm
DoorDash 2022 - Homepage Recommendation with Exploitation and Exploration
DoorDash 2022 - GPU-accelerated ML Inference at Pinterest
Pinterest 2022 - Addressing Confounding Feature Issue for Causal Recommendation (Paper)
Tencent 2022
Search & Ranking
- Amazon Search: The Joy of Ranking Products (Paper, Video, Code)
Amazon 2016 - How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada 2016 - Ranking Relevance in Yahoo Search (Paper)
Yahoo 2016 - Learning to Rank Personalized Search Results in Professional Networks (Paper)
LinkedIn 2016 - Using Deep Learning at Scale in Twitter’s Timelines
Twitter 2017 - An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper)
Etsy 2017 - Powering Search & Recommendations at DoorDash
DoorDash 2017 - Applying Deep Learning To Airbnb Search (Paper)
Airbnb 2018 - In-session Personalization for Talent Search (Paper)
LinkedIn 2018 - Talent Search and Recommendation Systems at LinkedIn (Paper)
LinkedIn 2018 - Food Discovery with Uber Eats: Building a Query Understanding Engine
Uber 2018 - Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper)
Alibaba 2018 - Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba 2018 - Semantic Product Search (Paper)
Amazon 2019 - Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb 2019 - Entity Personalized Talent Search Models with Tree Interaction Features (Paper)
LinkedIn 2019 - The AI Behind LinkedIn Recruiter Search and recommendation systems
LinkedIn 2019 - Learning Hiring Preferences: The AI Behind LinkedIn Jobs
LinkedIn 2019 - The Secret Sauce Behind Search Personalisation
Gojek 2019 - Neural Code Search: ML-based Code Search Using Natural Language Queries
Facebook 2019 - Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper)
Alibaba 2019 - Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search
Alibaba 2019 - Understanding Searches Better Than Ever Before (Paper)
Google 2019 - How We Used Semantic Search to Make Our Search 10x Smarter
Tokopedia 2019 - Query2vec: Search query expansion with query embeddings
GrubHub 2019 - MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search
Baidu 2019 - Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper)
Amazon 2020 - Managing Diversity in Airbnb Search (Paper)
Airbnb 2020 - Improving Deep Learning for Airbnb Search (Paper)
Airbnb 2020 - Quality Matches Via Personalized AI for Hirer and Seeker Preferences
LinkedIn 2020 - Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn 2020 - Ads Allocation in Feed via Constrained Optimization (Paper, Video)
LinkedIn 2020 - Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn 2020 - AI at Scale in Bing
Microsoft 2020 - Query Understanding Engine in Traveloka Universal Search
Traveloka 2020 - Bayesian Product Ranking at Wayfair
Wayfair 2020 - COLD: Towards the Next Generation of Pre-Ranking System (Paper)
Alibaba 2020 - Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video)
Pinterest 2020 - Driving Shopping Upsells from Pinterest Search
Pinterest 2020 - GDMix: A Deep Ranking Personalization Framework (Code)
LinkedIn 2020 - Bringing Personalized Search to Etsy
Etsy 2020 - Building a Better Search Engine for Semantic Scholar
Allen Institute for AI 2020 - Query Understanding for Natural Language Enterprise Search (Paper)
Salesforce 2020 - Things Not Strings: Understanding Search Intent with Better Recall
DoorDash 2020 - Query Understanding for Surfacing Under-served Music Content (Paper)
Spotify 2020 - Embedding-based Retrieval in Facebook Search (Paper)
Facebook 2020 - Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper)
JD 2020 - QUEEN: Neural query rewriting in e-commerce (Paper)
Amazon 2021 - Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper)
Amazon 2021 - Seasonal relevance in e-commerce search (Paper)
Amazon 2021 - Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba 2021 - How We Built A Context-Specific Bidding System for Etsy Ads
Etsy 2021 - Pre-trained Language Model based Ranking in Baidu Search (Paper)
Baidu 2021 - Stitching together spaces for query-based recommendations
Stitch Fix 2021 - Deep Natural Language Processing for LinkedIn Search Systems (Paper)
LinkedIn 2021 - Siamese BERT-based Model for Web Search Relevance Ranking (Paper, Code)
Seznam 2021 - SearchSage: Learning Search Query Representations at Pinterest
Pinterest 2021 - Query2Prod2Vec: Grounded Word Embeddings for eCommerce
Coveo 2021 - 3 Changes to Expand DoorDash’s Product Search Beyond Delivery
DoorDash 2022 - Learning To Rank Diversely
Airbnb 2022 - How to Optimise Rankings with Cascade Bandits
Expedia 2022 - A Guide to Google Search Ranking Systems
Google 2022 - Deep Learning for Search Ranking at Etsy
Etsy 2022 - Search at Calm
Calm 2022
Embeddings
- Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper)
Sears 2017 - Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper)
Alibaba 2018 - Embeddings@Twitter
Twitter 2018 - Listing Embeddings in Search Ranking (Paper)
Airbnb 2018 - Understanding Latent Style
Stitch Fix 2018 - Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper)
LinkedIn 2018 - Personalized Store Feed with Vector Embeddings
DoorDash 2018 - Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper)
Moshbit 2019 - Machine Learning for a Better Developer Experience
Netflix 2020 - Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code)
Google 2020 - BERT Goes Shopping: Comparing Distributional Models for Product Representations
Coveo 2021 - The Embeddings That Came in From the Cold: Improving Vectors for New and Rare Products with Content-Based Inference
Coveo 2022 - Embedding-based Retrieval at Scribd
Scribd 2021 - Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings (Paper)
Apple 2022
Natural Language Processing
- Abusive Language Detection in Online User Content (Paper)
Yahoo 2016 - Smart Reply: Automated Response Suggestion for Email (Paper)
Google 2016 - Building Smart Replies for Member Messages
LinkedIn 2017 - How Natural Language Processing Helps LinkedIn Members Get Support Easily
LinkedIn 2019 - Gmail Smart Compose: Real-Time Assisted Writing (Paper)
Google 2019 - Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper)
Amazon 2019 - Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want
Stitch Fix 2019 - DeText: A deep NLP Framework for Intelligent Text Understanding (Code)
LinkedIn 2020 - SmartReply for YouTube Creators
Google 2020 - Using Neural Networks to Find Answers in Tables (Paper)
Google 2020 - A Scalable Approach to Reducing Gender Bias in Google Translate
Google 2020 - Assistive AI Makes Replying Easier
Microsoft 2020 - AI Advances to Better Detect Hate Speech
Facebook 2020 - A State-of-the-Art Open Source Chatbot (Paper)
Facebook 2020 - A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs
Facebook 2020 - Deep Learning to Translate Between Programming Languages (Paper, Code)
Facebook 2020 - Deploying Lifelong Open-Domain Dialogue Learning (Paper)
Facebook 2020 - Introducing Dynabench: Rethinking the way we benchmark AI
Facebook 2020 - How Gojek Uses NLP to Name Pickup Locations at Scale
Gojek 2020 - The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper)
Baidu 2020 - PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code)
Google 2020 - Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo)
Salesforce 2020 - GeDi: A Powerful New Method for Controlling Language Models (Paper, Code)
Salesforce 2020 - Applying Topic Modeling to Improve Call Center Operations
RICOH 2020 - WIDeText: A Multimodal Deep Learning Framework
Airbnb 2020 - Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code)
Facebook 2021 - How we reduced our text similarity runtime by 99.96%
Microsoft 2021 - Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models)
Facebook 2021 - Grammar Correction as You Type, on Pixel 6
Google 2021 - Auto-generated Summaries in Google Docs
Google 2022 - ML-Enhanced Code Completion Improves Developer Productivity
Google 2022 - Words All the Way Down — Conversational Sentiment Analysis
PayPal 2022
Sequence Modelling
- Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper)
Sutter Health 2015 - Deep Learning for Understanding Consumer Histories (Paper)
Zalando 2016 - Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper)
Sutter Health 2016 - Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper)
Telefonica 2017 - Deep Learning for Electronic Health Records (Paper)
Google 2018 - Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba 2019 - Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper)
Alibaba 2020 - How Duolingo uses AI in every part of its app
Duolingo 2020 - Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video)
Facebook 2020 - Using deep learning to detect abusive sequences of member activity (Video)
LinkedIn 2021
Computer Vision
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox 2017 - Categorizing Listing Photos at Airbnb
Airbnb 2018 - Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb
Airbnb 2019 - How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors
Deepomatic - Making machines recognize and transcribe conversations in meetings using audio and video
Microsoft 2019 - Powered by AI: Advancing product understanding and building new shopping experiences
Facebook 2020 - A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper)
Google 2020 - Machine Learning-based Damage Assessment for Disaster Relief (Paper)
Google 2020 - RepNet: Counting Repetitions in Videos (Paper)
Google 2020 - Converting Text to Images for Product Discovery (Paper)
Amazon 2020 - How Disney Uses PyTorch for Animated Character Recognition
Disney 2020 - Image Captioning as an Assistive Technology (Video)
IBM 2020 - AI for AG: Production machine learning for agriculture
Blue River 2020 - AI for Full-Self Driving at Tesla
Tesla 2020 - On-device Supermarket Product Recognition
Google 2020 - Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper)
Google 2020 - Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video)
Pinterest 2020 - Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper)
Google 2020 - Vision-based Price Suggestion for Online Second-hand Items (Paper)
Alibaba 2020 - New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model)
Facebook 2021 - An Efficient Training Approach for Very Large Scale Face Recognition (Paper)
Alibaba 2021 - Identifying Document Types at Scribd
Scribd 2021 - Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper)
Walmart 2021 - Recognizing People in Photos Through Private On-Device Machine Learning
Apple 2021 - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
Google 2022 - Contrastive language and vision learning of general fashion concepts (Paper)
Coveo 2022
Reinforcement Learning
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper)
Alibaba 2018 - Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper)
Alibaba 2018 - Reinforcement Learning for On-Demand Logistics
DoorDash 2018 - Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba 2018 - Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper)
Alibaba 2019 - Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga 2020 - Deep Reinforcement Learning in Production Part1 Part 2
Zynga 2020 - Building AI Trading Systems
Denny Britz 2020 - Shifting Consumption towards Diverse content via Reinforcement Learning (Paper)
Spotify 2022 - Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms
Meta 2022 - How to Optimise Rankings with Cascade Bandits
Expedia 2022 - Selecting the Best Image for Each Merchant Using Exploration and Machine Learning
DoorDash 2023
Anomaly Detection
- Detecting Performance Anomalies in External Firmware Deployments
Netflix 2019 - Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code)
LinkedIn 2019 - Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video)
Swedbank, Hopsworks 2019 - Preventing Abuse Using Unsupervised Learning
LinkedIn 2020 - The Technology Behind Fighting Harassment on LinkedIn
LinkedIn 2020 - Uncovering Insurance Fraud Conspiracy with Network Learning (Paper)
Ant Financial 2020 - How Does Spam Protection Work on Stack Exchange?
Stack Exchange 2020 - Auto Content Moderation in C2C e-Commerce
Mercari 2020 - Blocking Slack Invite Spam With Machine Learning
Slack 2020 - Cloudflare Bot Management: Machine Learning and More
Cloudflare 2020 - Anomalies in Oil Temperature Variations in a Tunnel Boring Machine
SENER 2020 - Using Anomaly Detection to Monitor Low-Risk Bank Customers
Rabobank 2020 - Fighting fraud with Triplet Loss
OLX Group 2020 - Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative)
Facebook 2020 - How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4
Facebook 2020 - Using deep learning to detect abusive sequences of member activity (Video)
LinkedIn 2021 - Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop
Uber 2022 - Graph for Fraud Detection
Grab 2022 - Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms
Meta 2022 - Evolving our machine learning to stop mobile bots
Cloudflare 2022 - Improving the accuracy of our machine learning WAF using data augmentation and sampling
Cloudflare 2022 - Machine Learning for Fraud Detection in Streaming Services
Netflix 2022 - Pricing at Lyft
Lyft 2022
Graph
- Building The LinkedIn Knowledge Graph
LinkedIn 2016 - Scaling Knowledge Access and Retrieval at Airbnb
Airbnb 2018 - Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)
Pinterest 2018 - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber 2019 - AliGraph: A Comprehensive Graph Neural Network Platform (Paper)
Alibaba 2019 - Contextualizing Airbnb by Building Knowledge Graph
Airbnb 2019 - Retail Graph — Walmart’s Product Knowledge Graph
Walmart 2020 - Traffic Prediction with Advanced Graph Neural Networks
DeepMind 2020 - SimClusters: Community-Based Representations for Recommendations (Paper, Video)
Twitter 2020 - Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper)
Alibaba 2021 - Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba 2021 - JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper)
JPMorgan Chase 2021 - How AWS uses graph neural networks to meet customer needs
Amazon 2022 - Graph for Fraud Detection
Grab 2022
Optimization
- Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3)
Lyft 2016 - The Data and Science behind GrabShare Carpooling (Part 1) (PAPER NEEDED)
Grab 2017 - How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Uber 2018 - Next-Generation Optimization for Dasher Dispatch at DoorDash
DoorDash 2020 - Optimization of Passengers Waiting Time in Elevators Using Machine Learning
Thyssen Krupp AG 2020 - Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper)
Amazon 2020 - Optimizing DoorDash’s Marketing Spend with Machine Learning
DoorDash 2020 - Using learning-to-rank to precisely locate where to deliver packages (Paper)
Amazon 2021
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper)
Rakuten 2013 - Using Machine Learning to Index Text from Billions of Images
Dropbox 2018 - Extracting Structured Data from Templatic Documents (Paper)
Google 2020 - AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video)
Amazon 2020 - One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper)
Alibaba 2020 - Information Extraction from Receipts with Graph Convolutional Networks
Nanonets 2021
Weak Supervision
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper)
Google 2019 - Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper)
Intel 2019 - Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple 2019 - Bootstrapping Conversational Agents with Weak Supervision (Paper)
IBM 2019
Generation
- Better Language Models and Their Implications (Paper)
OpenAI 2019 - Image GPT (Paper, Code)
OpenAI 2019 - Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post)
OpenAI 2020 - Deep Learned Super Resolution for Feature Film Production (Paper)
Pixar 2020 - Unit Test Case Generation with Transformers
Microsoft 2021
Audio
- Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)
Google 2020 - The Machine Learning Behind Hum to Search
Google 2020
Validation and A/B Testing
- Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper)
Google 2010 - The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper)
Google 2015 - Twitter Experimentation: Technical Overview
Twitter 2015 - It’s All A/Bout Testing: The Netflix Experimentation Platform
Netflix 2016 - Building Pinterest’s A/B Testing Platform
Pinterest 2016 - Experimenting to Solve Cramming
Twitter 2017 - Building an Intelligent Experimentation Platform with Uber Engineering
Uber 2017 - Scaling Airbnb’s Experimentation Platform
Airbnb 2017 - Meet Wasabi, an Open Source A/B Testing Platform (Code)
Intuit 2017 - Analyzing Experiment Outcomes: Beyond Average Treatment Effects
Uber 2018 - Under the Hood of Uber’s Experimentation Platform
Uber 2018 - Constrained Bayesian Optimization with Noisy Experiments (Paper)
Facebook 2018 - Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab
Grab 2018 - Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code)
Better 2019 - Detecting Interference: An A/B Test of A/B Tests
LinkedIn 2019 - Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper)
Uber 2020 - Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka 2020 - Large Scale Experimentation at Stitch Fix (Paper)
Stitch Fix 2020 - Multi-Armed Bandits and the Stitch Fix Experimentation Platform
Stitch Fix 2020 - Experimentation with Resource Constraints
Stitch Fix 2020 - Computational Causal Inference at Netflix (Paper)
Netflix 2020 - Key Challenges with Quasi Experiments at Netflix
Netflix 2020 - Making the LinkedIn experimentation engine 20x faster
LinkedIn 2020 - Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn
LinkedIn 2020 - How to Use Quasi-experiments and Counterfactuals to Build Great Products
Shopify 2020 - Improving Experimental Power through Control Using Predictions as Covariate
DoorDash 2020 - Supporting Rapid Product Iteration with an Experimentation Analysis Platform
DoorDash 2020 - Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity
DoorDash 2020 - Leveraging Causal Modeling to Get More Value from Flat Experiment Results
DoorDash 2020 - Iterating Real-time Assignment Algorithms Through Experimentation
DoorDash 2020 - Spotify’s New Experimentation Platform (Part 1) (Part 2)
Spotify 2020 - Interpreting A/B Test Results: False Positives and Statistical Significance
Netflix 2021 - Interpreting A/B Test Results: False Negatives and Power
Netflix 2021 - Running Experiments with Google Adwords for Campaign Optimization
DoorDash 2021 - The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%
DoorDash 2021 - Experimentation Platform at Zalando: Part 1 - Evolution
Zalando 2021 - Designing Experimentation Guardrails
Airbnb 2021 - How Airbnb Measures Future Value to Standardize Tradeoffs
Airbnb 2021 - Network Experimentation at Scale(Paper]
Facebook 2021 - Universal Holdout Groups at Disney Streaming
Disney 2021 - Experimentation is a major focus of Data Science across Netflix
Netflix 2022 - Search Journey Towards Better Experimentation Practices
Spotify 2022 - Artificial Counterfactual Estimation: Machine Learning-Based Causal Inference at Airbnb
Airbnb 2022 - Beyond A/B Test : Speeding up Airbnb Search Ranking Experimentation through Interleaving
Airbnb 2022 - Challenges in Experimentation
Lyft 2022 - Overtracking and Trigger Analysis: Reducing sample sizes while INCREASING sensitivity
Booking 2022 - Meet Dash-AB — The Statistics Engine of Experimentation at DoorDash
DoorDash 2022 - Comparing quantiles at scale in online A/B-testing
Spotify 2022 - Accelerating our A/B experiments with machine learning
Dropbox 2023 - Supercharging A/B Testing at Uber
Uber
Model Management
- Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast 2018 - Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple 2019 - Runway - Model Lifecycle Management at Netflix
Netflix 2020 - Managing ML Models @ Scale - Intuit’s ML Platform
Intuit 2020 - ML Model Monitoring - 9 Tips From the Trenches
Nubank 2021
Efficiency
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper)
Facebook 2020 - How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs
Roblox 2020 - Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper)
Uber 2021 - GPU-accelerated ML Inference at Pinterest
Pinterest 2022
Ethics
- Building Inclusive Products Through A/B Testing (Paper)
LinkedIn 2020 - LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper)
LinkedIn 2020 - Introducing Twitter’s first algorithmic bias bounty challenge
Twitter 2021 - Examining algorithmic amplification of political content on Twitter
Twitter 2021 - A closer look at how LinkedIn integrates fairness into its AI products
LinkedIn 2022
Infra
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook 2020 - Elastic Distributed Training with XGBoost on Ray
Uber 2021
- Meet Michelangelo: Uber’s Machine Learning Platform
Uber 2017 - Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast 2018 - Big Data Machine Learning Platform at Pinterest
Pinterest 2019 - Core Modeling at Instagram
Instagram 2019 - Open-Sourcing Metaflow - a Human-Centric Framework for Data Science
Netflix 2019 - Managing ML Models @ Scale - Intuit’s ML Platform
Intuit 2020 - Real-time Machine Learning Inference Platform at Zomato
Zomato 2020 - Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform
Lyft 2020 - Building Flexible Ensemble ML Models with a Computational Graph
DoorDash 2021 - LyftLearn: ML Model Training Infrastructure built on Kubernetes
Lyft 2021 - "You Don't Need a Bigger Boat": A Full Data Pipeline Built with Open-Source Tools (Paper)
Coveo 2021 - MLOps at GreenSteam: Shipping Machine Learning
GreenSteam 2021 - Evolving Reddit’s ML Model Deployment and Serving Architecture
Reddit 2021 - Redesigning Etsy’s Machine Learning Platform
Etsy 2021 - Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper)
Meta 2021 - Building a Platform for Serving Recommendations at Etsy
Etsy 2022 - Intelligent Automation Platform: Empowering Conversational AI and Beyond at Airbnb
Airbnb 2022 - DARWIN: Data Science and Artificial Intelligence Workbench at LinkedIn
LinkedIn 2022 - The Magic of Merlin: Shopify's New Machine Learning Platform
Shopify 2022 - Zalando's Machine Learning Platform
Zalando 2022 - Inside Meta's AI optimization platform for engineers across the company (Paper)
Meta 2022 - Monzo’s machine learning stack
Monzo 2022 - Evolution of ML Fact Store
Netflix 2022 - Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline
Binance 2022 - Serving Machine Learning Models Efficiently at Scale at Zillow
Zillow 2022 - Didact AI: The anatomy of an ML-powered stock picking engine
Didact AI 2022 - Deployment for Free - A Machine Learning Platform for Stitch Fix's Data Scientists
Stitch Fix 2022 - Machine Learning Operations (MLOps): Overview, Definition, and Architecture (Paper)
IBM 2022
Practices
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
Yoshua Bengio 2012 - Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper)
Google 2014 - Rules of Machine Learning: Best Practices for ML Engineering
Google 2018 - On Challenges in Machine Learning Model Management
Amazon 2018 - Machine Learning in Production: The Booking.com Approach
Booking 2019 - 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
Booking 2019 - Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank
Rabobank 2019 - Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper)
Cambridge 2020 - Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook 2020 - The problem with AI developer tools for enterprises
Databricks 2020 - Continuous Integration and Deployment for Machine Learning Online Serving and Models
Uber 2021 - Tuning Model Performance
Uber 2021 - Maintaining Machine Learning Model Accuracy Through Monitoring
DoorDash 2021 - Building Scalable and Performant Marketing ML Systems at Wayfair
Wayfair 2021 - Our approach to building transparent and explainable AI systems
LinkedIn 2021 - 5 Steps for Building Machine Learning Models for Business
Shopify 2021 - Data Is An Art, Not Just A Science—And Storytelling Is The Key
Shopify 2022 - Best Practices for Real-time Machine Learning: Alerting
Nubank 2022 - Automatic Retraining for Machine Learning Models: Tips and Lessons Learned
Nubank 2022 - RecSysOps: Best Practices for Operating a Large-Scale Recommender System
Netflix 2022 - ML Education at Uber: Frameworks Inspired by Engineering Principles
Uber 2022
Team structure
- What is the most effective way to structure a data science team?
Udemy 2017 - Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
Stitch Fix 2016 - Building The Analytics Team At Wish
Wish 2018 - Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist
Stitch Fix 2019 - Cultivating Algorithms: How We Grow Data Science at Stitch Fix
Stitch Fix - Analytics at Netflix: Who We Are and What We Do
Netflix 2020 - Building a Data Team at a Mid-stage Startup: A Short Story
Erikbern 2021 - A Behind-the-Scenes Look at How Postman’s Data Team Works
Postman 2021 - Data Scientist x Machine Learning Engineer Roles: How are they different? How are they alike?
Nubank 2022
Fails
- When It Comes to Gorillas, Google Photos Remains Blind
Google 2018 - 160k+ High School Students Will Graduate Only If a Model Allows Them to
International Baccalaureate 2020 - An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor
Harrisburg University 2020 - It's Hard to Generate Neural Text From GPT-3 About Muslims
OpenAI 2020 - A British AI Tool to Predict Violent Crime Is Too Flawed to Use
United Kingdom 2020 - More in awful-ai
- AI Incident Database
Partnership on AI 2022