100% found this document useful (2 votes)
3K views1,420 pages

Azure Application Architecture Guide

This document provides an overview of Azure architecture best practices and design patterns across many domains including application architecture, data architecture, container architecture, security architecture and more. It includes guidance on technology choices, performance optimization, reliability, security and operational excellence.

Uploaded by

ajipaul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
3K views1,420 pages

Azure Application Architecture Guide

This document provides an overview of Azure architecture best practices and design patterns across many domains including application architecture, data architecture, container architecture, security architecture and more. It includes guidance on technology choices, performance optimization, reliability, security and operational excellence.

Uploaded by

ajipaul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1420

Contents

Azure Architecture Center


Architecture icons
Browse all Architectures
Browse Hybrid and Multicloud Architectures
What's new
Application Architecture Guide
Introduction
Architecture styles
Overview
Big compute
Big data
Event-driven architecture
Microservices
N-tier application
Web-queue-worker
Design principles for Azure applications
Overview
Design for self-healing
Make all things redundant
Minimize coordination
Design to scale out
Partition around limits
Design for operations
Use managed services
Use the best data store for the job
Design for evolution
Build for the needs of business
Technology choices
Choose a compute service
Choose a data store
Understand data store models
Select a data store
Criteria for choosing a data store
Choose a load balancing service
Choose a messaging service
Asynchronous messaging options in Azure
Choose an Apache Kafka host
Best practices for cloud applications
API design
API implementation
Autoscaling
Background jobs
Caching
Content Delivery Network
Data partitioning
Data partitioning strategies (by service)
Message encoding considerations
Monitoring and diagnostics
Retry guidance for specific services
Transient fault handling
Performance tuning
Introduction
Scenario 1 - Distributed transactions
Scenario 2 - Multiple backend services
Scenario 3 - Event streaming
Performance antipatterns
Overview
Busy Database
Busy Front End
Chatty I/O
Extraneous Fetching
Improper Instantiation
Monolithic Persistence
No Caching
Synchronous I/O
Responsible Innovation
Overview
Judgment Call
Harms Modeling
Understand Harm
Assess Types of Harm
Community Jury
Azure for AWS Professionals
Overview
Component information
Accounts
Compute
Databases
Messaging
Networking
Regions and Zones
Resources
Security and Identity
Storage
Service comparison
Azure for GCP Professionals
Overview
Services comparison
Microsoft Azure Well-Architected Framework
Overview
Cost Optimization
About
Principles
Design
Checklist
Cost model
Capture requirements
Azure regions
Azure resources
Governance
Initial estimate
Managed services
Performance and price options
Provision
Checklist
AI + Machine Learning
Big data
Compute
Data stores
Messaging
Networking
Cost for networking services
Web apps
Monitor
Checklist
Budgets and alerts
Reports
Reviews
Optimize
Checklist
Autoscale
Reserved instances
VM instances
Caching
Tradeoffs
Operational Excellence
Overview
Principles
Automation
Automation Overview
Repeatable infrastructure
Configure infrastructure
Automate operational tasks
Release engineering
Application development
Continuous integration
Release testing
Performance
Release deployment
Rollback
Observability (operational awareness)
Monitoring
Checklist
Performance Efficiency
Overview
Design
Application design
Application efficiency
Scalability
Capacity planning
Testing
Performance testing
Testing tools
Monitoring
Checklist
Reliability
Overview
Application design
Design overview
Error handling
Failure mode analysis
Backup and recovery
Business metrics
Chaos engineering
Data management
Monitoring and disaster recovery
Recover from a region-wide service disruption
Resiliency testing
Checklist
Security
About
Principles
Design
Governance
Overview
Segmentation strategy
Management groups
Administration
Identity and access management
Checklist
Roles and responsibilities
Control plane
Authentication
Authorization
Best practices
Networking
Network security review
Containment strategies for organizations
Best practices
Storage
Applications and services
Application security considerations
Application classification
Threat analysis
Securing PaaS deployments
Compliance requirements
Configuration and dependencies
Monitor
Health modeling
Tools
Security operations
Review and audit
Identity and network risks
Optimize
Automate
Replace insecure protocols
Elevate security capabilities
Design Patterns
Overview
Categories
Availability
Data management
Design and implementation
Management and monitoring
Messaging
Performance and scalability
Resiliency
Security
Ambassador
Anti-corruption Layer
Asynchronous Request-Reply
Backends for Frontends
Bulkhead
Cache-Aside
Choreography
Circuit Breaker
Claim Check
Command and Query Responsibility Segregation (CQRS)
Compensating Transaction
Competing Consumers
Compute Resource Consolidation
Deployment Stamps
Event Sourcing
External Configuration Store
Federated Identity
Gatekeeper
Gateway Aggregation
Gateway Offloading
Gateway Routing
Geodes
Health Endpoint Monitoring
Index Table
Leader Election
Materialized View
Pipes and Filters
Priority Queue
Publisher/Subscriber
Queue-Based Load Leveling
Retry
Saga
Scheduler Agent Supervisor
Sequential Convoy
Sharding
Sidecar
Static Content Hosting
Strangler Fig
Throttling
Valet Key
Azure categories
AI + Machine Learning
Overview
Technology guide
Cognitive services
Machine learning
Machine learning at scale
Natural language processing
R developer's guide to Azure
Architectures
AI at the edge
AI enrichment in Cognitive Search
AI for Earth
Auditing and risk management
Autonomous systems
Disconnected AI at the edge
Baseball decision analysis with ML.NET and Blazor
Batch scoring for deep Learning
Batch scoring with Python
Batch scoring with Spark on Databricks
Batch scoring with R
Business Process Management
Real-time recommendation API
Chatbot for hotel reservations
E-commerce chatbot
Enterprise chatbot disaster recovery
Enterprise-grade conversational bot
Enterprise productivity chatbot
FAQ chatbot
Content Research
Contract Management
Customer churn prediction
Customer feedback
Defect prevention
Distributed deep learning training
Digital Asset Management
Energy supply optimization
Energy demand forecasting
Image classification
Image classification with CNN's
Information discovery with NLP
Interactive voice response bot
Digital text processing
Machine teaching
MLOps for Python models
MLOps technical paper
Upscale ML lifecycle with MLOps
MLOps maturity model
Azure ML service selection guide
Model training with AKS
Movie recommendations
Marketing optimization
Personalized offers
Personalized marketing solutions
Population health management
Hospital patient predictions
Vehicle telematics
Predictive maintenance
Predictive marketing
Quality assurance
Real-time scoring Python models
Real-time scoring R models
Remote patient monitoring
Retail assistant with visual capabilities
Retail product recommendations
Scalable personalization
Speech services
Speech to text conversion
Training Python models
Vision classifier model
Visual assistant
Deploy AI and machine learning at the edge by using Azure Stack Edge
Analytics
Architectures
Advanced analytics
Anomaly detector process
App integration using Event Grid
Automated enterprise BI
Big data analytics with Azure Data Explorer
Content Delivery Network analytics
Data warehousing and analytics
Demand forecasting
Demand forecasting for marketing
Demand forecasting for price optimization
Demand forecasting for shipping
Discovery Hub for analytics
Hybrid big data with HDInsight
Monitoring solution with Azure Data Explorer
ETL using HDInsight
Interactive analytics with Azure Data Explorer
IoT telemetry analytics with Azure Data Explorer
Interactive price analytics
Mass ingestion of news feeds on Azure
Oil and Gas tank level forecasting
Partitioning in Event Hubs and Kafka
Predicting length of stay in hospitals
Predictive aircraft engine monitoring
Real Time analytics on big data
Stream processing with Azure Databricks
Stream processing with Azure Stream Analytics
Tiering applications & data for analytics
Blockchain
Architectures
Blockchain workflow application
Decentralized trust between banks
Supply chain track and trace
Compute
HPC Overview
Architectures
3D video rendering
Computer-aided engineering
Digital image modeling
HPC risk analysis
HPC and big compute
Cloud-based HPC cluster
Hybrid HPC with HPC Pack
Linux virtual desktops with Citrix
Move Azure resources across regions
Run a Linux VM on Azure
Run a Windows VM on Azure
Solaris emulator on Azure VMs
Migrate IBM applications with TmaxSoft OpenFrame
Reservoir simulations
CFD simulations
Containers
AKS Solution Journey
AKS Baseline Cluster
AKS Cluster Best Practices
AKS Workload Best Practices
AKS day-2 operations guide
Triage practices
Introduction
1- Cluster health
2- Node and pod health
3- Workload deployments
4- Admission controllers
5- Container registry connectivity
Common Issues
AKS Example Solutions
Microservices architecture on AKS
Microservices with AKS and Azure DevOps
Secure DevOps for AKS
Building a telehealth system
CI/CD pipeline for container-based workloads
Databases
Guides
Overview
Relational data
Extract, transform, and load (ETL)
Online analytical processing (OLAP)
Online transaction processing (OLTP)
Data Warehousing
Non-relational data
Non-relational data stores
Free-form text search
Time series data
Working with CSV and JSON files
Big Data
Big Data architectures
Batch processing
Real time processing
Technology choices
Analytical data stores
Analytics and reporting
Batch processing
Data lakes
Data storage
Data store comparison
Pipeline orchestration
Real-time message ingestion
Search data stores
Stream processing
Application tenancy in SaaS Databases
Tenancy models
Monitor Azure Databricks jobs
Overview
Send Databricks application logs to Azure Monitor
Use dashboards to visualize Databricks metrics
Troubleshoot performance bottlenecks
Transfer data to and from Azure
Extend on-premises data solutions to Azure
Securing data solutions
Architectures
Apache Cassandra
Azure data platform
Big data analytics with Azure Data Explorer
Campaign optimization with HDInsight Spark
Campaign optimization with SQL Server
DataOps for modern data warehouse
Data streaming
Data cache
Digital campaign management
Digital marketing using Azure MySQL
Finance management using Azure MySQL
Finance management using Azure PostgreSQL
Gaming using Azure MySQL
Gaming using Cosmos DB
Globally distributed apps using Cosmos DB
Hybrid ETL with Azure Data Factory
Intelligent apps using Azure MySQL
Intelligent apps using Azure PostgreSQL
Interactive querying with HDInsight
Loan charge-off prediction with HDInsight Spark
Loan charge-off prediction with SQL Server
Loan credit risk modeling
Loan credit risk with SQL Server
Messaging
Modern data warehouse
N-tier app with Cassandra
Ops automation using Event Grid
Oracle migration to Azure
Personalization using Cosmos DB
Retail and e-commerce using Azure MySQL
Retail and e-commerce using Azure PostgreSQL
Retail and e-commerce using Cosmos DB
Running Oracle Databases on Azure
Serverless apps using Cosmos DB
Streaming using HDInsight
Windows N-tier applications
Developer Options
Microservices
Overview
Guides
Domain modeling for microservices
Domain analysis
Tactical DDD
Identify microservice boundaries
Design a microservices architecture
Introduction
Choose a compute option
Interservice communication
API design
API gateways
Data considerations
Design patterns for microservices
Operate microservices in production
Monitor microservices in Azure Kubernetes Service (AKS)
CI/CD for microservices
CI/CD for microservices on Kubernetes
Migrate to a microservices architecture
Migrate a monolith application to microservices
Modernize enterprise applications with Service Fabric
Migrate from Cloud Services to Service Fabric
Serverless applications
Serverless Functions overview
Serverless Functions examples
Plan for serverless architecture
Serverless Functions decision and planning
Serverless application assessment
Technical workshops and training
Proof of concept or pilot
Develop and deploy serverless apps
Serverless Functions app development
Serverless Functions code walkthrough
CI/CD for a serverless frontend
Serverless Functions app operations
Serverless Functions app security
Architectures
Big data analytics with Azure Data Explorer
CI/CD pipeline using Azure DevOps
Event-based cloud automation
Microservices on Azure Service Fabric
Multicloud with the Serverless Framework
Serverless applications using Event Grid
Serverless event processing
Unified logging for microservices apps
DevOps
Checklist
Guides
Extending Resource Manager templates
Overview
Update a resource
Conditionally deploy a resource
Use an object as a parameter
Property transformer and collector
Architectures
CI/CD pipeline for chatbots with ARM templates
CI/CD for Azure VMs
CI/CD for Azure Web Apps
CI/CD for Containers
CI/CD using Jenkins and AKS
DevSecOps in Azure
DevSecOps in GitHub
DevTest and DevOps for IaaS
DevTest and DevOps for PaaS
DevTest and DevOps for microservices
DevTest Image Factory
CI/CD using Jenkins and Terraform
Hybrid DevOps
Java CI/CD using Jenkins and Azure Web Apps
Jenkins on Azure
SharePoint for Dev-Test
Real time location sharing
Run containers in a hybrid environment
High Availability
Overview
Architectures
IaaS - Web application with relational database
Hybrid
Guides
FSLogix for the enterprise
Administer SQL Server anywhere with Azure Arc
Azure Stack stretched clusters for DR
Azure Stack for remote offices and branches
Architectures
Manage configurations for Azure Arc enabled servers
Azure Automation in a hybrid environment
Azure enterprise cloud file share
Using Azure file shares in a hybrid environment
Azure Functions in a hybrid environment
Back up files and applications on Azure Stack Hub
Azure Automation Update Management
Deploy AI and machine learning at the edge by using Azure Stack Edge
Design a hybrid Domain Name System solution with Azure
Hybrid file services
Hybrid availability and performance monitoring
Hybrid Security Monitoring using Azure Security Center and Azure Sentinel
Manage hybrid Azure workloads using Windows Admin Center
Azure Arc hybrid management and deployment for Kubernetes clusters
Connect standalone servers by using Azure Network Adapter
Disaster Recovery for Azure Stack Hub virtual machines
On-premises data gateway for Azure Logic Apps
Run containers in a hybrid environment
Connect an on-premises network to Azure
Connect an on-premises network to Azure using ExpressRoute
Cross cloud scaling
Cross-platform chat
Extend an on-premises network using ExpressRoute
Extend an on-premises network using VPN
Hybrid connections
Troubleshoot a hybrid VPN connection
Connect using Windows Virtual Desktop
Windows Virtual Desktop for enterprises
Multiple Active Directory forests
Multiple forests with Azure AD DS
Identity
Guides
Identity in multitenant applications
Introduction
The Tailspin scenario
Authentication
Claims-based identity
Tenant sign-up
Application roles
Authorization
Secure a web API
Cache access tokens
Client assertion
Federate with a customer's AD FS
Architectures
AD DS resource forests in Azure
Deploy AD DS in an Azure virtual network
Extend on-premises AD FS to Azure
Hybrid Identity
Integrate on-premises AD domains with Azure AD
Integrate on-premises AD with Azure
Azure AD identity management for AWS
Integration
Architectures
Basic enterprise integration on Azure
Enterprise business intelligence
Enterprise integration using queues and events
Publishing internal APIs to external users
Web and Mobile front-ends
Custom Business Processes
Line of Business Extension
On-premises data gateway for Azure Logic Apps
Internet of Things
Guides
Azure IoT Edge Vision
Overview
Camera selection
Hardware acceleration
Machine learning
Image storage
Alert persistence
User interface
Architectures
IoT reference architecture
Condition Monitoring
IoT and data analytics
IoT using Cosmos DB
Predictive Maintenance for Industrial IoT
IoT Connected Platform
Contactless IoT interfaces
COVID-19 IoT Safe Solutions
Lighting and disinfection system
Light and power for emerging markets
Predictive maintenance with IoT
Process real-time vehicle data using IoT
Safe Buildings with IoT and Azure
Secure access to IoT apps with Azure AD
Voice assistants and IoT devices
IoT using Azure Data Explorer
Buy online, pickup in store
Project 15 Open Platform
Industrial IoT Analytics
Architecture
Recommended services
Data visualization
Considerations
IoT concepts
Introduction to IoT solutions
Devices, platform, and applications
Attestation, authentication, and provisioning
Field and cloud edge gateways
Application-to-device commands
Builders, developers, and operators
IoT patterns
Measure and control loop
Monitor and manage loop
Analyze and optimize loop
Event routing
Solution scaling with application stamps
Management and Governance
Architectures
Archive on-premises data to cloud
Back up cloud applications
Back up on-premises applications
Centralize app configuration and security
Computer forensics
High availability for BCDR
Data Sovereignty & Data Gravity
Enterprise-scale disaster recovery
Updating Windows VMs in Azure
Highly available SharePoint Server 2016
SMB disaster recovery with Azure Site Recovery
SMB disaster recovery with Double-Take DR
Manage configurations for Azure Arc enabled servers
Azure Automation in a hybrid environment
Back up files and applications on Azure Stack Hub
Azure Automation Update Management
Hybrid availability and performance monitoring
Manage hybrid Azure workloads using Windows Admin Center
Azure Arc hybrid management and deployment for Kubernetes clusters
Disaster Recovery for Azure Stack Hub virtual machines
Media
Architectures
Instant broadcasting with serverless
Live streaming digital media
Video-on-demand digital media
Gridwich media processing system
Gridwich architecture
Gridwich concepts
Clean monolith design
Saga orchestration
Project names and structure
Gridwich CI/CD
Content protection and DRM
Gridwich Media Services
Gridwich Storage Service
Gridwich logging
Gridwich message formats
Pipeline variables to Terraform flow
Gridwich procedures
Set up Azure DevOps
Run Azure admin scripts
Set up local dev environment
Create new cloud environment
Maintain and rotate keys
Test Media Services V3 encoding
Migration
Architectures
Adding modern front-ends to legacy apps
Banking system
Banking cloud transformation
Patterns and implementations
JMeter implementation reference
Lift and shift LOB apps
Lift and shift with AKS
Oracle database migration
Migration decision process
Cross-cloud connectivity
Lift and shift to Azure VMs
Refactor
Rearchitect
Serverless computing LOB apps
Unlock Legacy Data with Azure Stack
Decompose apps with Service Fabric
SQL 2008 R2 failover cluster in Azure
Migrate mainframe data to Azure
Migrate Unisys mainframes to Azure
Refactor IBM z/OS mainframe CF on Azure
Mixed Reality
Architectures
Mixed reality design reviews
Facilities management with mixed reality
Training powered by mixed reality
Mobile
Architectures
Custom mobile workforce app
Scalable apps with Azure MySQL
Scalable apps using Azure PostgreSQL
Social app for with authentication
Task-based consumer mobile app
Networking
Guides
Add IP spaces to peered virtual networks
Azure Firewall Architecture Guide
Architectures
Virtual network peering and VPN gateways
Deploy highly available NVAs
High availability for IaaS apps
Hub-spoke network topology in Azure
Implement a secure hybrid network
Segmenting Virtual Networks
Azure Automation Update Management
Design a hybrid Domain Name System solution with Azure
Hybrid availability and performance monitoring
Connect standalone servers by using Azure Network Adapter
SAP
Overview
Architectures
SAP HANA on Azure (Large Instances)
SAP HANA Scale-up on Linux
SAP NetWeaver on Windows on Azure
SAP S/4HANA in Linux on Azure
SAP BW/4HANA in Linux on Azure
SAP NetWeaver on SQL Server
SAP deployment using an Oracle DB
Dev/test for SAP
Security
Architectures
Azure AD in Security Operations
Cyber threat intelligence
Highly-secure IaaS apps
Homomorphic encryption with SEAL
Real-time fraud detection
Secure OBO refresh tokens
Securely managed web apps
Web app private database connectivity
Virtual network integrated microservices
Virtual Network security options
Hybrid Security Monitoring using Azure Security Center and Azure Sentinel
MCAS and Azure Sentinel security for AWS
Healthcare platform confidential computing
Storage
Architectures
Media rendering
Medical data storage
HIPAA/HITRUST Health Data and AI
Using Azure file shares in a hybrid environment
Hybrid file services
Web
Architectures
Basic web application
Deployment in App Service Environments
Standard deployment
High availability deployment
E-commerce front end
E-commerce website running in ASE
Highly available SharePoint farm
Highly available multi-region web application
Hybrid SharePoint farm with Microsoft 365
Intelligent product search engine for e-commerce
Magento e-commerce in AKS
Migrate a web app using Azure APIM
Multi-region N-tier application
Multitenant SaaS
Multi-tier web application built for HA/DR
SAP S/4 HANA for Large Instances
Scalable e-commerce web app
Scalable Episerver marketing website
Scalable Sitecore marketing website
Scalable Umbraco CMS web app
Scalable and secure WordPress on Azure
Scalable order processing
Scalable web app
More Scalable web apps
Serverless web app
Simple branded website
Simple digital marketing website
Web app monitoring on Azure
Web and mobile applications with MySQL, Cosmos DB, and Redis
Dynamics Business Central as a Service on Azure
Azure Functions in a hybrid environment
Cloud Adoption Framework
Azure architecture icons
12/18/2020 • 2 minutes to read • Edit Online

Helping our customers design and architect new solutions is core to the Azure Architecture Center’s mission.
Architecture diagrams like those included in our guidance can help communicate design decisions and the
relationships between components of a given workload. On this page you will find an official collection of Azure
architecture icons including Azure product icons to help you build a custom architecture diagram for your next
solution.
Do’s
Use the icon to illustrate how products can work together
In diagrams, we recommend to include the product name somewhere close to the icon
Use the icons as they would appear within Azure
Don’ts
Don’t crop, flip or rotate icons
Don’t distort or change icon shape in any way
Don’t use Microsoft product icons to represent your product or service

Example architecture diagram

Browse all Azure architectures to view other examples.

Icon updates
As of November 2020, the folder structure of our collection of Azure architecture icons has changed. The FAQs and
Terms of Use PDF files appear in the first level when you download the SVG icons below. The files in the icons folder
are the same except there is no longer a CXP folder. If you encounter any issues, let us know.

Terms
Microsoft permits the use of these icons in architectural diagrams, training materials, or documentation. You may
copy, distribute, and display the icons only for the permitted use unless granted explicit permission by Microsoft.
Microsoft reserves all other rights.
I agree to the above terms
D O W NLO AD SVG
ICO NS
What's new in the Azure Architecture Center
12/18/2020 • 4 minutes to read • Edit Online

New and updated articles in the Azure Architecture Center

December 2020
New Articles
Performance testing
Testing tools
Refactor IBM z/OS mainframe Coupling Facility (CF) to Azure
Design scalable Azure applications
Plan for capacity
Design Azure applications for efficiency
Design for scaling
Updated Articles
Baseline architecture for an Azure Kubernetes Service (AKS) cluster (#32842d421)

November 2020
New Articles
Use Azure Stack HCI stretched clusters for disaster recovery
Use Azure Stack HCI switchless interconnect and lightweight quorum for Remote Office/Branch Office
Release Engineering Application Development
Release Engineering Continuous integration
Release Engineering Rollback
Project 15 from Microsoft Open Platform for Conservation and Ecological Sustainability Solutions
Unisys mainframe migration
Security logs and audits
Check for identity, network, data risks
Security operations in Azure
Security health modeling in Azure
Azure enterprise cloud file share
Modernize mainframe & midrange data
IoT event routing
Create a Gridwich environment
Gridwich cloud media system
Gridwich CI/CD pipeline
Gridwich clean monolith architecture
Gridwich content protection and DRM
Logging in Gridwich
Gridwich request-response messages
Gridwich project naming and namespaces
Gridwich saga orchestration
Gridwich Storage Service
Gridwich keys and secrets management
Gridwich Media Services setup and scaling
Gridwich pipeline-generated admin scripts
Gridwich Azure DevOps setup
Gridwich local development environment setup
Test Media Services V3 encoding
Gridwich variable flow
Updated Articles
Choosing a data storage technology (#4128cc2d9)
Process real-time vehicle data using IoT (#beeba69f6)
Security monitoring tools in Azure (#4f3a35043)
Building a CI/CD pipeline for microservices on Kubernetes (#c0135f775)

October 2020
New Articles
SQL Server 2008 R2 failover cluster in Azure
Kafka on Azure
Secure application's configuration and dependencies
Application classification for security
Application threat analysis
AKS triage - cluster health
AKS triage - container registry connectivity
AKS triage - admission controllers
AKS triage - workload deployments
AKS triage - node health
Azure Kubernetes Service (AKS) operations triage
Alerts in IoT Edge Vision
Camera selection for IoT Edge Vision
Hardware for IoT Edge Vision
Image storage in IoT Edge Vision
Azure IoT Edge Vision
Machine learning in IoT Edge Vision
User interface in IoT Edge Vision
Configure infrastructure
Repeatable Infrastructure
Automation overview of goals, best practices, and types in Azure
Automated Tasks
Partitioning in Event Hubs and Kafka
Retail - Buy online, pickup in store (BOPIS)
Magento e-commerce platform in Azure Kubernetes Service (AKS)
Web app private connectivity to Azure SQL database
Updated Articles
Security with identity and access management (IAM) in Azure (#2a1154709)
Regulatory compliance (#2a1154709)
Computer forensics Chain of Custody in Azure (#909b776f4)
Overview of the performance efficiency pillar (#ed89cf6ab)
Azure Kubernetes Service (AKS) solution journey (#b63ab6a9f)
GCP to Azure Services Comparison (#a45091c00)
Resiliency patterns (#b5201626c)
Choosing an Azure compute service (#a64329288)
Security patterns (#13add8a06)

September 2020
New Articles
Administrative account security
Enforce governance to reduce risks
Security management groups
Regulatory compliance
Team roles and responsibilities
Segmentation strategies
Tenancy model for SaaS applications
Migrate IBM mainframe applications to Azure with TmaxSoft OpenFrame
Security monitoring tools in Azure
Applications and services
Security storage in Azure | Microsoft Docs
Azure Active Directory IDaaS in Security Operations
Capture cost requirements for an Azure
Tradeoffs for costs
Microsoft Azure Well-Architected Framework
Overview of the security pillar
Storage, data, and encryption in Azure | Microsoft Docs
Move Azure resources across regions
FSLogix for the enterprise
Stromasys Charon-SSP Solaris emulator on Azure VMs
Azure Kubernetes Service (AKS) solution journey
IoT analyze and optimize loops
IoT measure and control loops
IoT monitor and manage loops
Hybrid and Multicloud Architectures
Azure Arc hybrid management and deployment for Kubernetes clusters
Manage configurations for Azure Arc enabled servers
Azure Automation in a hybrid environment
Using Azure file shares in a hybrid environment
Azure Functions in a hybrid environment
Connect standalone servers by using Azure Network Adapter
Back up files and applications on Azure Stack Hub
Disaster Recovery for Azure Stack Hub virtual machines
Azure Automation Update Management
Deploy AI and machine learning at the edge by using Azure Stack Edge
On-premises data gateway for Azure Logic Apps
Run containers in a hybrid environment
Design a hybrid Domain Name System solution with Azure
Hybrid file services
Hybrid availability and performance monitoring
Hybrid Security Monitoring using Azure Security Center and Azure Sentinel
Manage hybrid Azure workloads using Windows Admin Center
DevSecOps in GitHub
Network security review
Network security strategies
Security with identity and access management (IAM) in Azure
Azure Messaging cost estimates
Web application cost estimates
Updated Articles
DevTest and DevOps for IaaS solutions (#a2a167058)
DevTest and DevOps for microservice solutions (#a2a167058)
DevTest and DevOps for PaaS solutions (#a2a167058)
Baseline architecture for an Azure Kubernetes Service (AKS) cluster (#9b20a025d)
DevSecOps in Azure (#511e6ee92)

August 2020
New Articles
Multiple forests with AD DS, Azure AD, and Azure AD DS
Multiple forests with AD DS and Azure AD
Virtual network integrated serverless microservices
Big data analytics with Azure Data Explorer
Content Delivery Network analytics
Azure Data Explorer interactive analytics
IoT analytics with Azure Data Explorer
Azure Data Explorer monitoring
Custom Business Processes
Web and Mobile Front Ends
Line of Business Extension
Attestation, authentication, and provisioning
Field and cloud edge gateways
Condition Monitoring for Industrial IoT
Predictive Maintenance for Industrial IoT
Banking system cloud transformation on Azure
JMeter implementation reference for load testing pipeline solution
Patterns and implementations
IoT connected light, power, and internet for emerging markets
Compute
Scale IoT solutions with application stamps
Builders, developers, and operators
IoT application-to-device commands
IoT solution architecture
IoT solutions conceptual overview
Use cached data
Criteria for choosing a data store
Data store decision tree
Updated Articles
Network security and containment in Azure | Microsoft Docs (#75626e3a7)
Data store cost estimates (#2b0e692f9)
Retry guidance for Azure services (#6c8a169c9)
Chaos engineering (#f742721a9)
Understand data store models (#4fbdd828a)
Cost governance for an Azure workload (#a3452805a)
Event-based cloud automation (#d2cca2011)
Serverless event processing (#d2cca2011)
Azure Application Architecture Guide
12/18/2020 • 3 minutes to read • Edit Online

This guide presents a structured approach for designing applications on Azure that are scalable, secure, resilient,
and highly available. It is based on proven practices that we have learned from customer engagements.

Introduction
The cloud is changing how applications are designed and secured. Instead of monoliths, applications are
decomposed into smaller, decentralized services. These services communicate through APIs or by using
asynchronous messaging or eventing. Applications scale horizontally, adding new instances as demand requires.
These trends bring new challenges. Application state is distributed. Operations are done in parallel and
asynchronously. Applications must be resilient when failures occur. Malicious actors continuously target
applications. Deployments must be automated and predictable. Monitoring and telemetry are critical for gaining
insight into the system. This guide is designed to help you navigate these changes.

T RA DIT IO N A L O N - P REM ISES M O DERN C LO UD

Monolithic Decomposed
Designed for predictable scalability Designed for elastic scale
Relational database Polyglot persistence (mix of storage technologies)
Synchronized processing Asynchronous processing
Design to avoid failures (MTBF) Design for failure (MTTR)
Occasional large updates Frequent small updates
Manual management Automated self-management
Snowflake servers Immutable infrastructure

How this guide is structured


The Azure Application Architecture Guide is organized as a series of steps, from the architecture and design to
implementation. For each step, there is supporting guidance that will help you with the design of your application
architecture.
Architecture style
N-tier Micro services W eb -qu eu e-
w o rker

Even t driven B ig Co mp u te B ig Data

Technology choices
Co mp u te Data sto res Messagin g

Application architecture
Referen ce Design Design B est
arch itectu res p rin cip les p attern s p ractices

Microsoft Azure Well-Architected Framework


Co st O p eratio n al Perfo rman ce
o p timizatio n excellen ce efficien cy Reliab ility Secu rity
Architecture styles
The first decision point is the most fundamental. What kind of architecture are you building? It might be a
microservices architecture, a more traditional N-tier application, or a big data solution. We have identified several
distinct architecture styles. There are benefits and challenges to each.
Learn more: Architecture styles

Technology choices
Knowing the type of architecture you are building, now you can start to choose the main technology pieces for the
architecture. The following technology choices are critical:
Compute refers to the hosting model for the computing resources that your applications run on. For more
information, see Choose a compute service.
Data stores include databases but also storage for message queues, caches, logs, and anything else that an
application might persist to storage. For more information, see Choose a data store.
Messaging technologies enable asynchronous messages between components of the system. For more
information, see Choose a messaging service.
You will probably have to make additional technology choices along the way, but these three elements (compute,
data, and messaging) are central to most cloud applications and will determine many aspects of your design.

Design the architecture


Once you have chosen the architecture style and the major technology components, you are ready to tackle the
specific design of your application. Every application is different, but the following resources can help you along the
way:
Reference architectures
Depending on your scenario, one of our reference architectures may be a good starting point. Each reference
architecture includes recommended practices, along with considerations for scalability, availability, security,
resilience, and other aspects of the design. Most also include a deployable solution or reference implementation.
Design principles
We have identified 10 high-level design principles that will make your application more scalable, resilient, and
manageable. These design principles apply to any architecture style. Throughout the design process, keep these 10
high-level design principles in mind. For more information, see Design principles.
Design patterns
Software design patterns are repeatable patterns that are proven to solve specific problems. Our catalog of Cloud
design patterns addresses specific challenges in distributed systems. They address aspects such as availability,
resiliency, performance, and security. You can find our catalog of design patterns here.
Best practices
Our best practices articles cover various design considerations including API design, autoscaling, data partitioning,
caching, and so forth. Review these and apply the best practices that are appropriate for your application.
Security best practices
Our security best practices describe how to ensure that the confidentiality, integrity, and availability of your
application aren't compromised by malicious actors.

Quality pillars
A successful cloud application will focus on five pillars of software quality: Cost optimization, Operational
excellence, Performance efficiency, Reliability, and Security.
Leverage the Microsoft Azure Well-Architected Framework to assess your architecture across these five pillars.

Next steps
Architecture styles
Architecture styles
12/18/2020 • 5 minutes to read • Edit Online

An architecture style is a family of architectures that share certain characteristics. For example, N-tier is a common
architecture style. More recently, microservice architectures have started to gain favor. Architecture styles don't
require the use of particular technologies, but some technologies are well-suited for certain architectures. For
example, containers are a natural fit for microservices.
We have identified a set of architecture styles that are commonly found in cloud applications. The article for each
style includes:
A description and logical diagram of the style.
Recommendations for when to choose this style.
Benefits, challenges, and best practices.
A recommended deployment using relevant Azure services.

A quick tour of the styles


This section gives a quick tour of the architecture styles that we've identified, along with some high-level
considerations for their use. Read more details in the linked topics.
N -tier
N-tier is a traditional architecture for enterprise applications. Dependencies are managed by dividing the
application into layers that perform logical functions, such as presentation, business logic, and data access. A layer
can only call into layers that sit below it. However, this horizontal layering can be a liability. It can be hard to
introduce changes in one part of the application without touching the rest of the application. That makes frequent
updates a challenge, limiting how quickly new features can be added.
N-tier is a natural fit for migrating existing applications that already use a layered architecture. For that reason, N-
tier is most often seen in infrastructure as a service (IaaS) solutions, or application that use a mix of IaaS and
managed services.
Web-Queue -Worker
For a purely PaaS solution, consider a Web-Queue-Worker architecture. In this style, the application has a web
front end that handles HTTP requests and a back-end worker that performs CPU-intensive tasks or long-running
operations. The front end communicates to the worker through an asynchronous message queue.
Web-queue-worker is suitable for relatively simple domains with some resource-intensive tasks. Like N-tier, the
architecture is easy to understand. The use of managed services simplifies deployment and operations. But with
complex domains, it can be hard to manage dependencies. The front end and the worker can easily become large,
monolithic components that are hard to maintain and update. As with N-tier, this can reduce the frequency of
updates and limit innovation.
Microservices
If your application has a more complex domain, consider moving to a Microser vices architecture. A
microservices application is composed of many small, independent services. Each service implements a single
business capability. Services are loosely coupled, communicating through API contracts.
Each service can be built by a small, focused development team. Individual services can be deployed without a lot
of coordination between teams, which encourages frequent updates. A microservice architecture is more complex
to build and manage than either N-tier or web-queue-worker. It requires a mature development and DevOps
culture. But done right, this style can lead to higher release velocity, faster innovation, and a more resilient
architecture.
Event-driven architecture
Event-Driven Architectures use a publish-subscribe (pub-sub) model, where producers publish events, and
consumers subscribe to them. The producers are independent from the consumers, and consumers are
independent from each other.
Consider an event-driven architecture for applications that ingest and process a large volume of data with very low
latency, such as IoT solutions. The style is also useful when different subsystems must perform different types of
processing on the same event data.
Big Data, Big Compute
Big Data and Big Compute are specialized architecture styles for workloads that fit certain specific profiles. Big
data divides a very large dataset into chunks, performing parallel processing across the entire set, for analysis and
reporting. Big compute, also called high-performance computing (HPC), makes parallel computations across a
large number (thousands) of cores. Domains include simulations, modeling, and 3-D rendering.

Architecture styles as constraints


An architecture style places constraints on the design, including the set of elements that can appear and the
allowed relationships between those elements. Constraints guide the "shape" of an architecture by restricting the
universe of choices. When an architecture conforms to the constraints of a particular style, certain desirable
properties emerge.
For example, the constraints in microservices include:
A service represents a single responsibility.
Every service is independent of the others.
Data is private to the service that owns it. Services do not share data.
By adhering to these constraints, what emerges is a system where services can be deployed independently, faults
are isolated, frequent updates are possible, and it's easy to introduce new technologies into the application.
Before choosing an architecture style, make sure that you understand the underlying principles and constraints of
that style. Otherwise, you can end up with a design that conforms to the style at a superficial level, but does not
achieve the full potential of that style. It's also important to be pragmatic. Sometimes it's better to relax a
constraint, rather than insist on architectural purity.
The following table summarizes how each style manages dependencies, and the types of domain that are best
suited for each.

A RC H IT EC T URE ST Y L E DEP EN DEN C Y M A N A GEM EN T DO M A IN T Y P E

N-tier Horizontal tiers divided by subnet Traditional business domain. Frequency


of updates is low.

Web-Queue-Worker Front and backend jobs, decoupled by Relatively simple domain with some
async messaging. resource intensive tasks.

Microservices Vertically (functionally) decomposed Complicated domain. Frequent updates.


services that call each other through
APIs.

Event-driven architecture. Producer/consumer. Independent view IoT and real-time systems


per sub-system.
A RC H IT EC T URE ST Y L E DEP EN DEN C Y M A N A GEM EN T DO M A IN T Y P E

Big data Divide a huge dataset into small Batch and real-time data analysis.
chunks. Parallel processing on local Predictive analysis using ML.
datasets.

Big compute Data allocation to thousands of cores. Compute intensive domains such as
simulation.

Consider challenges and benefits


Constraints also create challenges, so it's important to understand the trade-offs when adopting any of these
styles. Do the benefits of the architecture style outweigh the challenges, for this subdomain and bounded context.
Here are some of the types of challenges to consider when selecting an architecture style:
Complexity . Is the complexity of the architecture justified for your domain? Conversely, is the style too
simplistic for your domain? In that case, you risk ending up with a "big ball of mud", because the
architecture does not help you to manage dependencies cleanly.
Asynchronous messaging and eventual consistency . Asynchronous messaging can be used to
decouple services, and increase reliability (because messages can be retried) and scalability. However, this
also creates challenges in handling eventual consistency, as well as the possibility of duplicate messages.
Inter-ser vice communication . As you decompose an application into separate services, there is a risk
that communication between services will cause unacceptable latency or create network congestion (for
example, in a microservices architecture).
Manageability . How hard is it to manage the application, monitor, deploy updates, and so on?
Big compute architecture style
12/18/2020 • 3 minutes to read • Edit Online

The term big compute describes large-scale workloads that require a large number of cores, often numbering in
the hundreds or thousands. Scenarios include image rendering, fluid dynamics, financial risk modeling, oil
exploration, drug design, and engineering stress analysis, among others.

Here are some typical characteristics of big compute applications:


The work can be split into discrete tasks, which can be run across many cores simultaneously.
Each task is finite. It takes some input, does some processing, and produces output. The entire application runs
for a finite amount of time (minutes to days). A common pattern is to provision a large number of cores in a
burst, and then spin down to zero once the application completes.
The application does not need to stay up 24/7. However, the system must handle node failures or application
crashes.
For some applications, tasks are independent and can run in parallel. In other cases, tasks are tightly coupled,
meaning they must interact or exchange intermediate results. In that case, consider using high-speed
networking technologies such as InfiniBand and remote direct memory access (RDMA).
Depending on your workload, you might use compute-intensive VM sizes (H16r, H16mr, and A9).

When to use this architecture


Computationally intensive operations such as simulation and number crunching.
Simulations that are computationally intensive and must be split across CPUs in multiple computers (10-
1000s).
Simulations that require too much memory for one computer, and must be split across multiple computers.
Long-running computations that would take too long to complete on a single computer.
Smaller computations that must be run 100s or 1000s of times, such as Monte Carlo simulations.

Benefits
High performance with "embarrassingly parallel" processing.
Can harness hundreds or thousands of computer cores to solve large problems faster.
Access to specialized high-performance hardware, with dedicated high-speed InfiniBand networks.
You can provision VMs as needed to do work, and then tear them down.

Challenges
Managing the VM infrastructure.
Managing the volume of number crunching
Provisioning thousands of cores in a timely manner.
For tightly coupled tasks, adding more cores can have diminishing returns. You may need to experiment to find
the optimum number of cores.

Big compute using Azure Batch


Azure Batch is a managed service for running large-scale high-performance computing (HPC) applications.
Using Azure Batch, you configure a VM pool, and upload the applications and data files. Then the Batch service
provisions the VMs, assign tasks to the VMs, runs the tasks, and monitors the progress. Batch can automatically
scale out the VMs in response to the workload. Batch also provides job scheduling.

Big compute running on Virtual Machines


You can use Microsoft HPC Pack to administer a cluster of VMs, and schedule and monitor HPC jobs. With this
approach, you must provision and manage the VMs and network infrastructure. Consider this approach if you
have existing HPC workloads and want to move some or all it to Azure. You can move the entire HPC cluster to
Azure, or you can keep your HPC cluster on-premises but use Azure for burst capacity. For more information, see
Batch and HPC solutions for large-scale computing workloads.
HPC Pack deployed to Azure
In this scenario, the HPC cluster is created entirely within Azure.
The head node provides management and job scheduling services to the cluster. For tightly coupled tasks, use an
RDMA network that provides very high bandwidth, low latency communication between VMs. For more
information, see Deploy an HPC Pack 2016 cluster in Azure.
Burst an HPC cluster to Azure
In this scenario, an organization is running HPC Pack on-premises, and uses Azure VMs for burst capacity. The
cluster head node is on-premises. ExpressRoute or VPN Gateway connects the on-premises network to the Azure
VNet.
Big data architecture style
12/18/2020 • 10 minutes to read • Edit Online

A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or
complex for traditional database systems.

Data Storage Batch


Processing
Data Analytical Analytics
Sources Data Store and
Reporting
Real-Time Message Stream
Ingestion Processing

Orchestration

Big data solutions typically involve one or more of the following types of workload:
Batch processing of big data sources at rest.
Real-time processing of big data in motion.
Interactive exploration of big data.
Predictive analytics and machine learning.
Most big data architectures include some or all of the following components:
Data sources : All big data solutions start with one or more data sources. Examples include:
Application data stores, such as relational databases.
Static files produced by applications, such as web server log files.
Real-time data sources, such as IoT devices.
Data storage : Data for batch processing operations is typically stored in a distributed file store that can
hold high volumes of large files in various formats. This kind of store is often called a data lake. Options for
implementing this storage include Azure Data Lake Store or blob containers in Azure Storage.
Batch processing : Because the data sets are so large, often a big data solution must process data files
using long-running batch jobs to filter, aggregate, and otherwise prepare the data for analysis. Usually these
jobs involve reading source files, processing them, and writing the output to new files. Options include
running U-SQL jobs in Azure Data Lake Analytics, using Hive, Pig, or custom Map/Reduce jobs in an
HDInsight Hadoop cluster, or using Java, Scala, or Python programs in an HDInsight Spark cluster.
Real-time message ingestion : If the solution includes real-time sources, the architecture must include a
way to capture and store real-time messages for stream processing. This might be a simple data store,
where incoming messages are dropped into a folder for processing. However, many solutions need a
message ingestion store to act as a buffer for messages, and to support scale-out processing, reliable
delivery, and other message queuing semantics. Options include Azure Event Hubs, Azure IoT Hubs, and
Kafka.
Stream processing : After capturing real-time messages, the solution must process them by filtering,
aggregating, and otherwise preparing the data for analysis. The processed stream data is then written to an
output sink. Azure Stream Analytics provides a managed stream processing service based on perpetually
running SQL queries that operate on unbounded streams. You can also use open source Apache streaming
technologies like Storm and Spark Streaming in an HDInsight cluster.
Analytical data store : Many big data solutions prepare data for analysis and then serve the processed
data in a structured format that can be queried using analytical tools. The analytical data store used to serve
these queries can be a Kimball-style relational data warehouse, as seen in most traditional business
intelligence (BI) solutions. Alternatively, the data could be presented through a low-latency NoSQL
technology such as HBase, or an interactive Hive database that provides a metadata abstraction over data
files in the distributed data store. Azure Synapse Analytics provides a managed service for large-scale,
cloud-based data warehousing. HDInsight supports Interactive Hive, HBase, and Spark SQL, which can also
be used to serve data for analysis.
Analysis and repor ting : The goal of most big data solutions is to provide insights into the data through
analysis and reporting. To empower users to analyze the data, the architecture may include a data modeling
layer, such as a multidimensional OLAP cube or tabular data model in Azure Analysis Services. It might also
support self-service BI, using the modeling and visualization technologies in Microsoft Power BI or Microsoft
Excel. Analysis and reporting can also take the form of interactive data exploration by data scientists or data
analysts. For these scenarios, many Azure services support analytical notebooks, such as Jupyter, enabling
these users to leverage their existing skills with Python or R. For large-scale data exploration, you can use
Microsoft R Server, either standalone or with Spark.
Orchestration : Most big data solutions consist of repeated data processing operations, encapsulated in
workflows, that transform source data, move data between multiple sources and sinks, load the processed
data into an analytical data store, or push the results straight to a report or dashboard. To automate these
workflows, you can use an orchestration technology such Azure Data Factory or Apache Oozie and Sqoop.
Azure includes many services that can be used in a big data architecture. They fall roughly into two categories:
Managed services, including Azure Data Lake Store, Azure Data Lake Analytics, Azure Synapse Analytics, Azure
Stream Analytics, Azure Event Hub, Azure IoT Hub, and Azure Data Factory.
Open source technologies based on the Apache Hadoop platform, including HDFS, HBase, Hive, Pig, Spark,
Storm, Oozie, Sqoop, and Kafka. These technologies are available on Azure in the Azure HDInsight service.
These options are not mutually exclusive, and many solutions combine open source technologies with Azure
services.

When to use this architecture


Consider this architecture style when you need to:
Store and process data in volumes too large for a traditional database.
Transform unstructured data for analysis and reporting.
Capture, process, and analyze unbounded streams of data in real time, or with low latency.
Use Azure Machine Learning or Microsoft Cognitive Services.

Benefits
Technology choices . You can mix and match Azure managed services and Apache technologies in HDInsight
clusters, to capitalize on existing skills or technology investments.
Performance through parallelism . Big data solutions take advantage of parallelism, enabling high-
performance solutions that scale to large volumes of data.
Elastic scale . All of the components in the big data architecture support scale-out provisioning, so that you can
adjust your solution to small or large workloads, and pay only for the resources that you use.
Interoperability with existing solutions . The components of the big data architecture are also used for IoT
processing and enterprise BI solutions, enabling you to create an integrated solution across data workloads.

Challenges
Complexity . Big data solutions can be extremely complex, with numerous components to handle data
ingestion from multiple data sources. It can be challenging to build, test, and troubleshoot big data processes.
Moreover, there may be a large number of configuration settings across multiple systems that must be used in
order to optimize performance.
Skillset . Many big data technologies are highly specialized, and use frameworks and languages that are not
typical of more general application architectures. On the other hand, big data technologies are evolving new
APIs that build on more established languages. For example, the U-SQL language in Azure Data Lake Analytics is
based on a combination of Transact-SQL and C#. Similarly, SQL-based APIs are available for Hive, HBase, and
Spark.
Technology maturity . Many of the technologies used in big data are evolving. While core Hadoop
technologies such as Hive and Pig have stabilized, emerging technologies such as Spark introduce extensive
changes and enhancements with each new release. Managed services such as Azure Data Lake Analytics and
Azure Data Factory are relatively young, compared with other Azure services, and will likely evolve over time.
Security . Big data solutions usually rely on storing all static data in a centralized data lake. Securing access to
this data can be challenging, especially when the data must be ingested and consumed by multiple applications
and platforms.

Best practices
Leverage parallelism . Most big data processing technologies distribute the workload across multiple
processing units. This requires that static data files are created and stored in a splittable format. Distributed
file systems such as HDFS can optimize read and write performance, and the actual processing is performed
by multiple cluster nodes in parallel, which reduces overall job times.
Par tition data . Batch processing usually happens on a recurring schedule — for example, weekly or
monthly. Partition data files, and data structures such as tables, based on temporal periods that match the
processing schedule. That simplifies data ingestion and job scheduling, and makes it easier to troubleshoot
failures. Also, partitioning tables that are used in Hive, U-SQL, or SQL queries can significantly improve
query performance.
Apply schema-on-read semantics . Using a data lake lets you to combine storage for files in multiple
formats, whether structured, semi-structured, or unstructured. Use schema-on-read semantics, which
project a schema onto the data when the data is processing, not when the data is stored. This builds
flexibility into the solution, and prevents bottlenecks during data ingestion caused by data validation and
type checking.
Process data in-place . Traditional BI solutions often use an extract, transform, and load (ETL) process to
move data into a data warehouse. With larger volumes data, and a greater variety of formats, big data
solutions generally use variations of ETL, such as transform, extract, and load (TEL). With this approach, the
data is processed within the distributed data store, transforming it to the required structure, before moving
the transformed data into an analytical data store.
Balance utilization and time costs . For batch processing jobs, it's important to consider two factors: The
per-unit cost of the compute nodes, and the per-minute cost of using those nodes to complete the job. For
example, a batch job may take eight hours with four cluster nodes. However, it might turn out that the job
uses all four nodes only during the first two hours, and after that, only two nodes are required. In that case,
running the entire job on two nodes would increase the total job time, but would not double it, so the total
cost would be less. In some business scenarios, a longer processing time may be preferable to the higher
cost of using underutilized cluster resources.
Separate cluster resources . When deploying HDInsight clusters, you will normally achieve better
performance by provisioning separate cluster resources for each type of workload. For example, although
Spark clusters include Hive, if you need to perform extensive processing with both Hive and Spark, you
should consider deploying separate dedicated Spark and Hadoop clusters. Similarly, if you are using HBase
and Storm for low latency stream processing and Hive for batch processing, consider separate clusters for
Storm, HBase, and Hadoop.
Orchestrate data ingestion . In some cases, existing business applications may write data files for batch
processing directly into Azure storage blob containers, where they can be consumed by HDInsight or Azure
Data Lake Analytics. However, you will often need to orchestrate the ingestion of data from on-premises or
external data sources into the data lake. Use an orchestration workflow or pipeline, such as those supported
by Azure Data Factory or Oozie, to achieve this in a predictable and centrally manageable fashion.
Scrub sensitive data early . The data ingestion workflow should scrub sensitive data early in the process,
to avoid storing it in the data lake.

IoT architecture
Internet of Things (IoT) is a specialized subset of big data solutions. The following diagram shows a possible logical
architecture for IoT. The diagram emphasizes the event-streaming components of the architecture.

The cloud gateway ingests device events at the cloud boundary, using a reliable, low latency messaging system.
Devices might send events directly to the cloud gateway, or through a field gateway . A field gateway is a
specialized device or software, usually colocated with the devices, that receives events and forwards them to the
cloud gateway. The field gateway might also preprocess the raw device events, performing functions such as
filtering, aggregation, or protocol transformation.
After ingestion, events go through one or more stream processors that can route the data (for example, to
storage) or perform analytics and other processing.
The following are some common types of processing. (This list is certainly not exhaustive.)
Writing event data to cold storage, for archiving or batch analytics.
Hot path analytics, analyzing the event stream in (near) real time, to detect anomalies, recognize patterns
over rolling time windows, or trigger alerts when a specific condition occurs in the stream.
Handling special types of non-telemetry messages from devices, such as notifications and alarms.
Machine learning.
The boxes that are shaded gray show components of an IoT system that are not directly related to event streaming,
but are included here for completeness.
The device registr y is a database of the provisioned devices, including the device IDs and usually device
metadata, such as location.
The provisioning API is a common external interface for provisioning and registering new devices.
Some IoT solutions allow command and control messages to be sent to devices.

This section has presented a very high-level view of IoT, and there are many subtleties and challenges to
consider. For a more detailed reference architecture and discussion, see the Microsoft Azure IoT Reference
Architecture (PDF download).

Next steps
Learn more about big data architectures.
Event-driven architecture style
12/18/2020 • 3 minutes to read • Edit Online

An event-driven architecture consists of event producers that generate a stream of events, and event
consumers that listen for the events.

Event Consumers

Event Producers Event Ingestion Event Consumers

Event Consumers

Events are delivered in near real time, so consumers can respond immediately to events as they occur. Producers
are decoupled from consumers — a producer doesn't know which consumers are listening. Consumers are also
decoupled from each other, and every consumer sees all of the events. This differs from a Competing Consumers
pattern, where consumers pull messages from a queue and a message is processed just once (assuming no
errors). In some systems, such as IoT, events must be ingested at very high volumes.
An event driven architecture can use a pub/sub model or an event stream model.
Pub/sub : The messaging infrastructure keeps track of subscriptions. When an event is published, it sends
the event to each subscriber. After an event is received, it cannot be replayed, and new subscribers do not
see the event.
Event streaming : Events are written to a log. Events are strictly ordered (within a partition) and durable.
Clients don't subscribe to the stream, instead a client can read from any part of the stream. The client is
responsible for advancing its position in the stream. That means a client can join at any time, and can
replay events.
On the consumer side, there are some common variations:
Simple event processing . An event immediately triggers an action in the consumer. For example, you
could use Azure Functions with a Service Bus trigger, so that a function executes whenever a message is
published to a Service Bus topic.
Complex event processing . A consumer processes a series of events, looking for patterns in the event
data, using a technology such as Azure Stream Analytics or Apache Storm. For example, you could
aggregate readings from an embedded device over a time window, and generate a notification if the
moving average crosses a certain threshold.
Event stream processing . Use a data streaming platform, such as Azure IoT Hub or Apache Kafka, as a
pipeline to ingest events and feed them to stream processors. The stream processors act to process or
transform the stream. There may be multiple stream processors for different subsystems of the application.
This approach is a good fit for IoT workloads.
The source of the events may be external to the system, such as physical devices in an IoT solution. In that case,
the system must be able to ingest the data at the volume and throughput that is required by the data source.
In the logical diagram above, each type of consumer is shown as a single box. In practice, it's common to have
multiple instances of a consumer, to avoid having the consumer become a single point of failure in system.
Multiple instances might also be necessary to handle the volume and frequency of events. Also, a single consumer
might process events on multiple threads. This can create challenges if events must be processed in order or
require exactly-once semantics. See Minimize Coordination.

When to use this architecture


Multiple subsystems must process the same events.
Real-time processing with minimum time lag.
Complex event processing, such as pattern matching or aggregation over time windows.
High volume and high velocity of data, such as IoT.

Benefits
Producers and consumers are decoupled.
No point-to-point integrations. It's easy to add new consumers to the system.
Consumers can respond to events immediately as they arrive.
Highly scalable and distributed.
Subsystems have independent views of the event stream.

Challenges
Guaranteed delivery. In some systems, especially in IoT scenarios, it's crucial to guarantee that events are
delivered.
Processing events in order or exactly once. Each consumer type typically runs in multiple instances, for
resiliency and scalability. This can create a challenge if the events must be processed in order (within a
consumer type), or if the processing logic is not idempotent.
Additional considerations
The amount of data to include in an event can be a significant consideration that affects both performance and
cost. Putting all the relevant information needed for processing in the event itself can simplify the processing
code and save additional lookups. Putting the minimal amount of information in an event, like just a couple of
identifiers, will reduce transport time and cost, but requires the processing code to look up any additional
information it needs.
Microservices architecture style
12/18/2020 • 5 minutes to read • Edit Online

A microservices architecture consists of a collection of small, autonomous services. Each service is self-contained
and should implement a single business capability.

What are microservices?


Microservices are small, independent, and loosely coupled. A single small team of developers can write and
maintain a service.
Each service is a separate codebase, which can be managed by a small development team.
Services can be deployed independently. A team can update an existing service without rebuilding and
redeploying the entire application.
Services are responsible for persisting their own data or external state. This differs from the traditional
model, where a separate data layer handles data persistence.
Services communicate with each other by using well-defined APIs. Internal implementation details of each
service are hidden from other services.
Services don't need to share the same technology stack, libraries, or frameworks.
Besides for the services themselves, some other components appear in a typical microservices architecture:
Management/orchestration . This component is responsible for placing services on nodes, identifying failures,
rebalancing services across nodes, and so forth. Typically this component is an off-the-shelf technology such as
Kubernetes, rather than something custom built.
API Gateway . The API gateway is the entry point for clients. Instead of calling services directly, clients call the API
gateway, which forwards the call to the appropriate services on the back end.
Advantages of using an API gateway include:
It decouples clients from services. Services can be versioned or refactored without needing to update all of
the clients.
Services can use messaging protocols that are not web friendly, such as AMQP.
The API Gateway can perform other cross-cutting functions such as authentication, logging, SSL
termination, and load balancing.

Benefits
Agility. Because microservices are deployed independently, it's easier to manage bug fixes and feature
releases. You can update a service without redeploying the entire application, and roll back an update if
something goes wrong. In many traditional applications, if a bug is found in one part of the application, it
can block the entire release process. New features may be held up waiting for a bug fix to be integrated,
tested, and published.
Small, focused teams . A microservice should be small enough that a single feature team can build, test,
and deploy it. Small team sizes promote greater agility. Large teams tend be less productive, because
communication is slower, management overhead goes up, and agility diminishes.
Small code base . In a monolithic application, there is a tendency over time for code dependencies to
become tangled Adding a new feature requires touching code in a lot of places. By not sharing code or data
stores, a microservices architecture minimizes dependencies, and that makes it easier to add new features.
Mix of technologies . Teams can pick the technology that best fits their service, using a mix of technology
stacks as appropriate.
Fault isolation . If an individual microservice becomes unavailable, it won't disrupt the entire application,
as long as any upstream microservices are designed to handle faults correctly (for example, by
implementing circuit breaking).
Scalability . Services can be scaled independently, letting you scale out subsystems that require more
resources, without scaling out the entire application. Using an orchestrator such as Kubernetes or Service
Fabric, you can pack a higher density of services onto a single host, which allows for more efficient
utilization of resources.
Data isolation . It is much easier to perform schema updates, because only a single microservice is
affected. In a monolithic application, schema updates can become very challenging, because different parts
of the application may all touch the same data, making any alterations to the schema risky.

Challenges
The benefits of microservices don't come for free. Here are some of the challenges to consider before embarking
on a microservices architecture.
Complexity . A microservices application has more moving parts than the equivalent monolithic
application. Each service is simpler, but the entire system as a whole is more complex.
Development and testing . Writing a small service that relies on other dependent services requires a
different approach than a writing a traditional monolithic or layered application. Existing tools are not
always designed to work with service dependencies. Refactoring across service boundaries can be difficult.
It is also challenging to test service dependencies, especially when the application is evolving quickly.
Lack of governance . The decentralized approach to building microservices has advantages, but it can
also lead to problems. You may end up with so many different languages and frameworks that the
application becomes hard to maintain. It may be useful to put some project-wide standards in place,
without overly restricting teams' flexibility. This especially applies to cross-cutting functionality such as
logging.
Network congestion and latency . The use of many small, granular services can result in more
interservice communication. Also, if the chain of service dependencies gets too long (service A calls B,
which calls C...), the additional latency can become a problem. You will need to design APIs carefully. Avoid
overly chatty APIs, think about serialization formats, and look for places to use asynchronous
communication patterns.
Data integrity . With each microservice responsible for its own data persistence. As a result, data
consistency can be a challenge. Embrace eventual consistency where possible.
Management . To be successful with microservices requires a mature DevOps culture. Correlated logging
across services can be challenging. Typically, logging must correlate multiple service calls for a single user
operation.
Versioning . Updates to a service must not break services that depend on it. Multiple services could be
updated at any given time, so without careful design, you might have problems with backward or forward
compatibility.
Skillset . Microservices are highly distributed systems. Carefully evaluate whether the team has the skills
and experience to be successful.

Best practices
Model services around the business domain.
Decentralize everything. Individual teams are responsible for designing and building services. Avoid
sharing code or data schemas.
Data storage should be private to the service that owns the data. Use the best storage for each service and
data type.
Services communicate through well-designed APIs. Avoid leaking implementation details. APIs should
model the domain, not the internal implementation of the service.
Avoid coupling between services. Causes of coupling include shared database schemas and rigid
communication protocols.
Offload cross-cutting concerns, such as authentication and SSL termination, to the gateway.
Keep domain knowledge out of the gateway. The gateway should handle and route client requests without
any knowledge of the business rules or domain logic. Otherwise, the gateway becomes a dependency and
can cause coupling between services.
Services should have loose coupling and high functional cohesion. Functions that are likely to change
together should be packaged and deployed together. If they reside in separate services, those services end
up being tightly coupled, because a change in one service will require updating the other service. Overly
chatty communication between two services may be a symptom of tight coupling and low cohesion.
Isolate failures. Use resiliency strategies to prevent failures within a service from cascading. See Resiliency
patterns and Designing reliable applications.

Next steps
For detailed guidance about building a microservices architecture on Azure, see Designing, building, and
operating microservices on Azure.
N-tier architecture style
12/18/2020 • 5 minutes to read • Edit Online

An N-tier architecture divides an application into logical layers and physical tiers .

Remote
Service
Middle
Tier 1

Client WAF Web Cache


Tier
Data
Tier
Messaging Middle
Tier 2

Layers are a way to separate responsibilities and manage dependencies. Each layer has a specific responsibility. A
higher layer can use services in a lower layer, but not the other way around.
Tiers are physically separated, running on separate machines. A tier can call to another tier directly, or use
asynchronous messaging (message queue). Although each layer might be hosted in its own tier, that's not
required. Several layers might be hosted on the same tier. Physically separating the tiers improves scalability and
resiliency, but also adds latency from the additional network communication.
A traditional three-tier application has a presentation tier, a middle tier, and a database tier. The middle tier is
optional. More complex applications can have more than three tiers. The diagram above shows an application with
two middle tiers, encapsulating different areas of functionality.
An N-tier application can have a closed layer architecture or an open layer architecture :
In a closed layer architecture, a layer can only call the next layer immediately down.
In an open layer architecture, a layer can call any of the layers below it.
A closed layer architecture limits the dependencies between layers. However, it might create unnecessary network
traffic, if one layer simply passes requests along to the next layer.

When to use this architecture


N-tier architectures are typically implemented as infrastructure-as-service (IaaS) applications, with each tier
running on a separate set of VMs. However, an N-tier application doesn't need to be pure IaaS. Often, it's
advantageous to use managed services for some parts of the architecture, particularly caching, messaging, and
data storage.
Consider an N-tier architecture for:
Simple web applications.
Migrating an on-premises application to Azure with minimal refactoring.
Unified development of on-premises and cloud applications.
N-tier architectures are very common in traditional on-premises applications, so it's a natural fit for migrating
existing workloads to Azure.

Benefits
Portability between cloud and on-premises, and between cloud platforms.
Less learning curve for most developers.
Natural evolution from the traditional application model.
Open to heterogeneous environment (Windows/Linux)

Challenges
It's easy to end up with a middle tier that just does CRUD operations on the database, adding extra latency
without doing any useful work.
Monolithic design prevents independent deployment of features.
Managing an IaaS application is more work than an application that uses only managed services.
It can be difficult to manage network security in a large system.

Best practices
Use autoscaling to handle changes in load. See Autoscaling best practices.
Use asynchronous messaging to decouple tiers.
Cache semistatic data. See Caching best practices.
Configure the database tier for high availability, using a solution such as SQL Server Always On availability
groups.
Place a web application firewall (WAF) between the front end and the Internet.
Place each tier in its own subnet, and use subnets as a security boundary.
Restrict access to the data tier, by allowing requests only from the middle tier(s).

N-tier architecture on virtual machines


This section describes a recommended N-tier architecture running on VMs.

Each tier consists of two or more VMs, placed in an availability set or virtual machine scale set. Multiple VMs
provide resiliency in case one VM fails. Load balancers are used to distribute requests across the VMs in a tier. A
tier can be scaled horizontally by adding more VMs to the pool.
Each tier is also placed inside its own subnet, meaning their internal IP addresses fall within the same address
range. That makes it easy to apply network security group rules and route tables to individual tiers.
The web and business tiers are stateless. Any VM can handle any request for that tier. The data tier should consist
of a replicated database. For Windows, we recommend SQL Server, using Always On availability groups for high
availability. For Linux, choose a database that supports replication, such as Apache Cassandra.
Network security groups restrict access to each tier. For example, the database tier only allows access from the
business tier.

NOTE
The layer labeled "Business Tier" in our reference diagram is a moniker to the business logic tier. Likewise, we also call the
presentation tier the "Web Tier." In our example, this is a web application, though multi-tier architectures can be used for
other topologies as well (like desktop apps). Name your tiers what works best for your team to communicate the intent of
that logical and/or physical tier in your application - you could even express that naming in resources you choose to
represent that tier (e.g. vmss-appName-business-layer).

For more information about running N-tier applications on Azure:


Run Windows VMs for an N-tier application
Windows N-tier application on Azure with SQL Server
Microsoft Learn module: Tour the N-tier architecture style
Azure Bastion
Additional considerations
N-tier architectures are not restricted to three tiers. For more complex applications, it is common to have
more tiers. In that case, consider using layer-7 routing to route requests to a particular tier.
Tiers are the boundary of scalability, reliability, and security. Consider having separate tiers for services with
different requirements in those areas.
Use virtual machine scale sets for autoscaling.
Look for places in the architecture where you can use a managed service without significant refactoring. In
particular, look at caching, messaging, storage, and databases.
For higher security, place a network DMZ in front of the application. The DMZ includes network virtual
appliances (NVAs) that implement security functionality such as firewalls and packet inspection. For more
information, see Network DMZ reference architecture.
For high availability, place two or more NVAs in an availability set, with an external load balancer to
distribute Internet requests across the instances. For more information, see Deploy highly available network
virtual appliances.
Do not allow direct RDP or SSH access to VMs that are running application code. Instead, operators should
log into a jumpbox, also called a bastion host. This is a VM on the network that administrators use to
connect to the other VMs. The jumpbox has a network security group that allows RDP or SSH only from
approved public IP addresses.
You can extend the Azure virtual network to your on-premises network using a site-to-site virtual private
network (VPN) or Azure ExpressRoute. For more information, see Hybrid network reference architecture.
If your organization uses Active Directory to manage identity, you may want to extend your Active Directory
environment to the Azure VNet. For more information, see Identity management reference architecture.
If you need higher availability than the Azure SLA for VMs provides, replicate the application across two
regions and use Azure Traffic Manager for failover. For more information, see Run Windows VMs in multiple
regions or Run Linux VMs in multiple regions.
Web-Queue-Worker architecture style
12/18/2020 • 3 minutes to read • Edit Online

The core components of this architecture are a web front end that serves client requests, and a worker that
performs resource-intensive tasks, long-running workflows, or batch jobs. The web front end communicates with
the worker through a message queue .
Remote
Service
Identity
Provider

Web Front Cache


Client End
Worker Database

Queue

CDN Static
Content

Other components that are commonly incorporated into this architecture include:
One or more databases.
A cache to store values from the database for quick reads.
CDN to serve static content
Remote services, such as email or SMS service. Often these are provided by third parties.
Identity provider for authentication.
The web and worker are both stateless. Session state can be stored in a distributed cache. Any long-running work
is done asynchronously by the worker. The worker can be triggered by messages on the queue, or run on a
schedule for batch processing. The worker is an optional component. If there are no long-running operations, the
worker can be omitted.
The front end might consist of a web API. On the client side, the web API can be consumed by a single-page
application that makes AJAX calls, or by a native client application.

When to use this architecture


The Web-Queue-Worker architecture is typically implemented using managed compute services, either Azure App
Service or Azure Cloud Services.
Consider this architecture style for:
Applications with a relatively simple domain.
Applications with some long-running workflows or batch operations.
When you want to use managed services, rather than infrastructure as a service (IaaS).

Benefits
Relatively simple architecture that is easy to understand.
Easy to deploy and manage.
Clear separation of concerns.
The front end is decoupled from the worker using asynchronous messaging.
The front end and the worker can be scaled independently.

Challenges
Without careful design, the front end and the worker can become large, monolithic components that are
difficult to maintain and update.
There may be hidden dependencies, if the front end and worker share data schemas or code modules.

Best practices
Expose a well-designed API to the client. See API design best practices.
Autoscale to handle changes in load. See Autoscaling best practices.
Cache semi-static data. See Caching best practices.
Use a CDN to host static content. See CDN best practices.
Use polyglot persistence when appropriate. See Use the best data store for the job.
Partition data to improve scalability, reduce contention, and optimize performance. See Data partitioning best
practices.

Web-Queue-Worker on Azure App Service


This section describes a recommended Web-Queue-Worker architecture that uses Azure App Service.

The front end is implemented as an Azure App Service web app, and the worker is implemented as an Azure
Functions app. The web app and the function app are both associated with an App Service plan that
provides the VM instances.
You can use either Azure Service Bus or Azure Storage queues for the message queue. (The diagram shows
an Azure Storage queue.)
Azure Cache for Redis stores session state and other data that needs low latency access.
Azure CDN is used to cache static content such as images, CSS, or HTML.
For storage, choose the storage technologies that best fit the needs of the application. You might use
multiple storage technologies (polyglot persistence). To illustrate this idea, the diagram shows Azure SQL
Database and Azure Cosmos DB.
For more details, see App Service web application reference architecture.
Additional considerations
Not every transaction has to go through the queue and worker to storage. The web front end can perform
simple read/write operations directly. Workers are designed for resource-intensive tasks or long-running
workflows. In some cases, you might not need a worker at all.
Use the built-in autoscale feature of App Service to scale out the number of VM instances. If the load on the
application follows predictable patterns, use schedule-based autoscale. If the load is unpredictable, use
metrics-based autoscaling rules.
Consider putting the web app and the function app into separate App Service plans. That way, they can be
scaled independently.
Use separate App Service plans for production and testing. Otherwise, if you use the same plan for
production and testing, it means your tests are running on your production VMs.
Use deployment slots to manage deployments. This lets you to deploy an updated version to a staging slot,
then swap over to the new version. It also lets you swap back to the previous version, if there was a problem
with the update.
Ten design principles for Azure applications
12/18/2020 • 2 minutes to read • Edit Online

Follow these design principles to make your application more scalable, resilient, and manageable.
Design for self healing . In a distributed system, failures happen. Design your application to be self healing when
failures occur.
Make all things redundant . Build redundancy into your application, to avoid having single points of failure.
Minimize coordination . Minimize coordination between application services to achieve scalability.
Design to scale out . Design your application so that it can scale horizontally, adding or removing new instances
as demand requires.
Par tition around limits . Use partitioning to work around database, network, and compute limits.
Design for operations . Design your application so that the operations team has the tools they need.
Use managed ser vices . When possible, use platform as a service (PaaS) rather than infrastructure as a service
(IaaS).
Use the best data store for the job . Pick the storage technology that is the best fit for your data and how it will
be used.
Design for evolution . All successful applications change over time. An evolutionary design is key for continuous
innovation.
Build for the needs of business . Every design decision must be justified by a business requirement.
Design for self healing
12/18/2020 • 4 minutes to read • Edit Online

Design your application to be self healing when failures occur


In a distributed system, failures can happen. Hardware can fail. The network can have transient failures. Rarely, an
entire service or region may experience a disruption, but even those must be planned for.
Therefore, design an application to be self healing when failures occur. This requires a three-pronged approach:
Detect failures.
Respond to failures gracefully.
Log and monitor failures, to give operational insight.
How you respond to a particular type of failure may depend on your application's availability requirements. For
example, if you require very high availability, you might automatically fail over to a secondary region during a
regional outage. However, that will incur a higher cost than a single-region deployment.
Also, don't just consider big events like regional outages, which are generally rare. You should focus as much, if not
more, on handling local, short-lived failures, such as network connectivity failures or failed database connections.

Recommendations
Retr y failed operations . Transient failures may occur due to momentary loss of network connectivity, a dropped
database connection, or a timeout when a service is busy. Build retry logic into your application to handle transient
failures. For many Azure services, the client SDK implements automatic retries. For more information, see Transient
fault handling and the Retry pattern.
Protect failing remote ser vices (Circuit Breaker) . It's good to retry after a transient failure, but if the failure
persists, you can end up with too many callers hammering a failing service. This can lead to cascading failures, as
requests back up. Use the Circuit Breaker pattern to fail fast (without making the remote call) when an operation is
likely to fail.
Isolate critical resources (Bulkhead) . Failures in one subsystem can sometimes cascade. This can happen if a
failure causes some resources, such as threads or sockets, not to get freed in a timely manner, leading to resource
exhaustion. To avoid this, partition a system into isolated groups, so that a failure in one partition does not bring
down the entire system.
Perform load leveling . Applications may experience sudden spikes in traffic that can overwhelm services on the
backend. To avoid this, use the Queue-Based Load Leveling pattern to queue work items to run asynchronously. The
queue acts as a buffer that smooths out peaks in the load.
Fail over . If an instance can't be reached, fail over to another instance. For things that are stateless, like a web
server, put several instances behind a load balancer or traffic manager. For things that store state, like a database,
use replicas and fail over. Depending on the data store and how it replicates, this may require the application to
deal with eventual consistency.
Compensate failed transactions . In general, avoid distributed transactions, as they require coordination across
services and resources. Instead, compose an operation from smaller individual transactions. If the operation fails
midway through, use Compensating Transactions to undo any step that already completed.
Checkpoint long-running transactions . Checkpoints can provide resiliency if a long-running operation fails.
When the operation restarts (for example, it is picked up by another VM), it can be resumed from the last
checkpoint.
Degrade gracefully . Sometimes you can't work around a problem, but you can provide reduced functionality that
is still useful. Consider an application that shows a catalog of books. If the application can't retrieve the thumbnail
image for the cover, it might show a placeholder image. Entire subsystems might be noncritical for the application.
For example, in an e-commerce site, showing product recommendations is probably less critical than processing
orders.
Throttle clients . Sometimes a small number of users create excessive load, which can reduce your application's
availability for other users. In this situation, throttle the client for a certain period of time. See the Throttling pattern.
Block bad actors . Just because you throttle a client, it doesn't mean client was acting maliciously. It just means the
client exceeded their service quota. But if a client consistently exceeds their quota or otherwise behaves badly, you
might block them. Define an out-of-band process for user to request getting unblocked.
Use leader election . When you need to coordinate a task, use Leader Election to select a coordinator. That way,
the coordinator is not a single point of failure. If the coordinator fails, a new one is selected. Rather than implement
a leader election algorithm from scratch, consider an off-the-shelf solution such as Zookeeper.
Test with fault injection . All too often, the success path is well tested but not the failure path. A system could run
in production for a long time before a failure path is exercised. Use fault injection to test the resiliency of the
system to failures, either by triggering actual failures or by simulating them.
Embrace chaos engineering . Chaos engineering extends the notion of fault injection, by randomly injecting
failures or abnormal conditions into production instances.
For a structured approach to making your applications self healing, see Design reliable applications for Azure.
Make all things redundant
12/18/2020 • 2 minutes to read • Edit Online

Build redundancy into your application, to avoid having single points of


failure
A resilient application routes around failure. Identify the critical paths in your application. Is there redundancy at
each point in the path? When a subsystem fails, will the application fail over to something else?

Recommendations
Consider business requirements . The amount of redundancy built into a system can affect both cost and
complexity. Your architecture should be informed by your business requirements, such as recovery time objective
(RTO). For example, a multi-region deployment is more expensive than a single-region deployment, and is more
complicated to manage. You will need operational procedures to handle failover and failback. The additional cost
and complexity might be justified for some business scenarios and not others.
Place VMs behind a load balancer . Don't use a single VM for mission-critical workloads. Instead, place multiple
VMs behind a load balancer. If any VM becomes unavailable, the load balancer distributes traffic to the remaining
healthy VMs. To learn how to deploy this configuration, see Multiple VMs for scalability and availability.

Load
Balancer

Replicate databases . Azure SQL Database and Cosmos DB automatically replicate the data within a region, and
you can enable geo-replication across regions. If you are using an IaaS database solution, choose one that supports
replication and failover, such as SQL Server Always On availability groups.
Enable geo-replication . Geo-replication for Azure SQL Database and Cosmos DB creates secondary readable
replicas of your data in one or more secondary regions. In the event of an outage, the database can fail over to the
secondary region for writes.
Par tition for availability . Database partitioning is often used to improve scalability, but it can also improve
availability. If one shard goes down, the other shards can still be reached. A failure in one shard will only disrupt a
subset of the total transactions.
Deploy to more than one region . For the highest availability, deploy the application to more than one region.
That way, in the rare case when a problem affects an entire region, the application can fail over to another region.
The following diagram shows a multi-region application that uses Azure Traffic Manager to handle failover.
Region 1

Region 2
Azure Traffic
Manager

Synchronize front and backend failover . Use Azure Traffic Manager to fail over the front end. If the front end
becomes unreachable in one region, Traffic Manager will route new requests to the secondary region. Depending
on your database solution, you may need to coordinate failing over the database.
Use automatic failover but manual failback . Use Traffic Manager for automatic failover, but not for automatic
failback. Automatic failback carries a risk that you might switch to the primary region before the region is
completely healthy. Instead, verify that all application subsystems are healthy before manually failing back. Also,
depending on the database, you might need to check data consistency before failing back.
Include redundancy for Traffic Manager . Traffic Manager is a possible failure point. Review the Traffic Manager
SLA, and determine whether using Traffic Manager alone meets your business requirements for high availability. If
not, consider adding another traffic management solution as a failback. If the Azure Traffic Manager service fails,
change your CNAME records in DNS to point to the other traffic management service.
Minimize coordination
12/18/2020 • 4 minutes to read • Edit Online

Minimize coordination between application services to achieve


scalability
Most cloud applications consist of multiple application services — web front ends, databases, business processes,
reporting and analysis, and so on. To achieve scalability and reliability, each of those services should run on
multiple instances.
What happens when two instances try to perform concurrent operations that affect some shared state? In some
cases, there must be coordination across nodes, for example to preserve ACID guarantees. In this diagram, Node2
is waiting for Node1 to release a database lock:

Update
Orders
Node 1

Update OrderItems

(blocked)
Node 2

Coordination limits the benefits of horizontal scale and creates bottlenecks. In this example, as you scale out the
application and add more instances, you'll see increased lock contention. In the worst case, the front-end instances
will spend most of their time waiting on locks.
"Exactly once" semantics are another frequent source of coordination. For example, an order must be processed
exactly once. Two workers are listening for new orders. Worker1 picks up an order for processing. The application
must ensure that Worker2 doesn't duplicate the work, but also if Worker1 crashes, the order isn't dropped.

Orders Worker 1

order #123 ? Process orders


Create order

Worker 2

You can use a pattern such as Scheduler Agent Supervisor to coordinate between the workers, but in this case a
better approach might be to partition the work. Each worker is assigned a certain range of orders (say, by billing
region). If a worker crashes, a new instance picks up where the previous instance left off, but multiple instances
aren't contending.

Recommendations
Embrace eventual consistency. When data is distributed, it takes coordination to enforce strong consistency
guarantees. For example, suppose an operation updates two databases. Instead of putting it into a single
transaction scope, it's better if the system can accommodate eventual consistency, perhaps by using the
Compensating Transaction pattern to logically roll back after a failure.
Use domain events to synchronize state. A domain event is an event that records when something happens
that has significance within the domain. Interested services can listen for the event, rather than using a global
transaction to coordinate across multiple services. If this approach is used, the system must tolerate eventual
consistency (see previous item).
Consider patterns such as CQRS and event sourcing. These two patterns can help to reduce contention
between read workloads and write workloads.
The CQRS pattern separates read operations from write operations. In some implementations, the read data
is physically separated from the write data.
In the Event Sourcing pattern, state changes are recorded as a series of events to an append-only data store.
Appending an event to the stream is an atomic operation, requiring minimal locking.
These two patterns complement each other. If the write-only store in CQRS uses event sourcing, the read-only
store can listen for the same events to create a readable snapshot of the current state, optimized for queries. Before
adopting CQRS or event sourcing, however, be aware of the challenges of this approach.
Par tition data. Avoid putting all of your data into one data schema that is shared across many application
services. A microservices architecture enforces this principle by making each service responsible for its own data
store. Within a single database, partitioning the data into shards can improve concurrency, because a service
writing to one shard does not affect a service writing to a different shard.
Design idempotent operations. When possible, design operations to be idempotent. That way, they can be
handled using at-least-once semantics. For example, you can put work items on a queue. If a worker crashes in the
middle of an operation, another worker simply picks up the work item.
Use asynchronous parallel processing. If an operation requires multiple steps that are performed
asynchronously (such as remote service calls), you might be able to call them in parallel, and then aggregate the
results. This approach assumes that each step does not depend on the results of the previous step.
Use optimistic concurrency when possible. Pessimistic concurrency control uses database locks to prevent
conflicts. This can cause poor performance and reduce availability. With optimistic concurrency control, each
transaction modifies a copy or snapshot of the data. When the transaction is committed, the database engine
validates the transaction and rejects any transactions that would affect database consistency.
Azure SQL Database and SQL Server support optimistic concurrency through snapshot isolation. Some Azure
storage services support optimistic concurrency through the use of Etags, including Azure Cosmos DB and Azure
Storage.
Consider MapReduce or other parallel, distributed algorithms. Depending on the data and type of work to
be performed, you may be able to split the work into independent tasks that can be performed by multiple nodes
working in parallel. See Big compute architecture style.
Use leader election for coordination. In cases where you need to coordinate operations, make sure the
coordinator does not become a single point of failure in the application. Using the Leader Election pattern, one
instance is the leader at any time, and acts as the coordinator. If the leader fails, a new instance is elected to be the
leader.
Design to scale out
12/18/2020 • 2 minutes to read • Edit Online

Design your application so that it can scale horizontally


A primary advantage of the cloud is elastic scaling — the ability to use as much capacity as you need, scaling out as
load increases, and scaling in when the extra capacity is not needed. Design your application so that it can scale
horizontally, adding or removing new instances as demand requires.

Recommendations
Avoid instance stickiness . Stickiness, or session affinity, is when requests from the same client are always
routed to the same server. Stickiness limits the application's ability to scale out. For example, traffic from a high-
volume user will not be distributed across instances. Causes of stickiness include storing session state in memory,
and using machine-specific keys for encryption. Make sure that any instance can handle any request.
Identify bottlenecks . Scaling out isn't a magic fix for every performance issue. For example, if your backend
database is the bottleneck, it won't help to add more web servers. Identify and resolve the bottlenecks in the
system first, before throwing more instances at the problem. Stateful parts of the system are the most likely cause
of bottlenecks.
Decompose workloads by scalability requirements. Applications often consist of multiple workloads, with
different requirements for scaling. For example, an application might have a public-facing site and a separate
administration site. The public site may experience sudden surges in traffic, while the administration site has a
smaller, more predictable load.
Offload resource-intensive tasks. Tasks that require a lot of CPU or I/O resources should be moved to
background jobs when possible, to minimize the load on the front end that is handling user requests.
Use built-in autoscaling features . Many Azure compute services have built-in support for autoscaling. If the
application has a predictable, regular workload, scale out on a schedule. For example, scale out during business
hours. Otherwise, if the workload is not predictable, use performance metrics such as CPU or request queue length
to trigger autoscaling. For autoscaling best practices, see Autoscaling.
Consider aggressive autoscaling for critical workloads . For critical workloads, you want to keep ahead of
demand. It's better to add new instances quickly under heavy load to handle the additional traffic, and then
gradually scale back.
Design for scale in . Remember that with elastic scale, the application will have periods of scale in, when
instances get removed. The application must gracefully handle instances being removed. Here are some ways to
handle scalein:
Listen for shutdown events (when available) and shut down cleanly.
Clients/consumers of a service should support transient fault handling and retry.
For long-running tasks, consider breaking up the work, using checkpoints or the Pipes and Filters pattern.
Put work items on a queue so that another instance can pick up the work, if an instance is removed in the
middle of processing.
Partition around limits
12/18/2020 • 2 minutes to read • Edit Online

Use partitioning to work around database, network, and compute limits


In the cloud, all services have limits in their ability to scale up. Azure service limits are documented in Azure
subscription and service limits, quotas, and constraints. Limits include number of cores, database size, query
throughput, and network throughput. If your system grows sufficiently large, you may hit one or more of these
limits. Use partitioning to work around these limits.
There are many ways to partition a system, such as:
Partition a database to avoid limits on database size, data I/O, or number of concurrent sessions.
Partition a queue or message bus to avoid limits on the number of requests or the number of concurrent
connections.
Partition an App Service web app to avoid limits on the number of instances per App Service plan.
A database can be partitioned horizontally, vertically, or functionally.
In horizontal partitioning, also called sharding, each partition holds data for a subset of the total data set.
The partitions share the same data schema. For example, customers whose names start with A–M go into
one partition, N–Z into another partition.
In vertical partitioning, each partition holds a subset of the fields for the items in the data store. For
example, put frequently accessed fields in one partition, and less frequently accessed fields in another.
In functional partitioning, data is partitioned according to how it is used by each bounded context in the
system. For example, store invoice data in one partition and product inventory data in another. The schemas
are independent.
For more detailed guidance, see Data partitioning.

Recommendations
Par tition different par ts of the application . Databases are one obvious candidate for partitioning, but also
consider storage, cache, queues, and compute instances.
Design the par tition key to avoid hotspots . If you partition a database, but one shard still gets the majority of
the requests, then you haven't solved your problem. Ideally, load gets distributed evenly across all the partitions.
For example, hash by customer ID and not the first letter of the customer name, because some letters are more
frequent. The same principle applies when partitioning a message queue. Pick a partition key that leads to an even
distribution of messages across the set of queues. For more information, see Sharding.
Par tition around Azure subscription and ser vice limits . Individual components and services have limits, but
there are also limits for subscriptions and resource groups. For very large applications, you might need to partition
around those limits.
Par tition at different levels . Consider a database server deployed on a VM. The VM has a VHD that is backed by
Azure Storage. The storage account belongs to an Azure subscription. Notice that each step in the hierarchy has
limits. The database server may have a connection pool limit. VMs have CPU and network limits. Storage has IOPS
limits. The subscription has limits on the number of VM cores. Generally, it's easier to partition lower in the
hierarchy. Only large applications should need to partition at the subscription level.
Design for operations
12/18/2020 • 2 minutes to read • Edit Online

Design an application so that the operations team has the tools they
need
The cloud has dramatically changed the role of the operations team. They are no longer responsible for managing
the hardware and infrastructure that hosts the application. That said, operations is still a critical part of running a
successful cloud application. Some of the important functions of the operations team include:
Deployment
Monitoring
Escalation
Incident response
Security auditing
Robust logging and tracing are particularly important in cloud applications. Involve the operations team in design
and planning, to ensure the application gives them the data and insight they need to be successful.

Recommendations
Make all things obser vable . Once a solution is deployed and running, logs and traces are your primary insight
into the system. Tracing records a path through the system, and is useful to pinpoint bottlenecks, performance
issues, and failure points. Logging captures individual events such as application state changes, errors, and
exceptions. Log in production, or else you lose insight at the very times when you need it the most.
Instrument for monitoring . Monitoring gives insight into how well (or poorly) an application is performing, in
terms of availability, performance, and system health. For example, monitoring tells you whether you are meeting
your SLA. Monitoring happens during the normal operation of the system. It should be as close to real-time as
possible, so that the operations staff can react to issues quickly. Ideally, monitoring can help avert problems before
they lead to a critical failure. For more information, see Monitoring and diagnostics.
Instrument for root cause analysis . Root cause analysis is the process of finding the underlying cause of
failures. It occurs after a failure has already happened.
Use distributed tracing . Use a distributed tracing system that is designed for concurrency, asynchrony, and cloud
scale. Traces should include a correlation ID that flows across service boundaries. A single operation may involve
calls to multiple application services. If an operation fails, the correlation ID helps to pinpoint the cause of the
failure.
Standardize logs and metrics . The operations team will need to aggregate logs from across the various services
in your solution. If every service uses its own logging format, it becomes difficult or impossible to get useful
information from them. Define a common schema that includes fields such as correlation ID, event name, IP
address of the sender, and so forth. Individual services can derive custom schemas that inherit the base schema,
and contain additional fields.
Automate management tasks , including provisioning, deployment, and monitoring. Automating a task makes it
repeatable and less prone to human errors.
Treat configuration as code . Check configuration files into a version control system, so that you can track and
version your changes, and roll back if needed.
Use platform as a service (PaaS) options
12/18/2020 • 2 minutes to read • Edit Online

When possible, use platform as a service (PaaS) rather than


infrastructure as a service (IaaS)
IaaS is like having a box of parts. You can build anything, but you have to assemble it yourself. PaaS options are
easier to configure and administer. You don't need to provision VMs, set up VNets, manage patches and updates,
and all of the other overhead associated with running software on a VM.
For example, suppose your application needs a message queue. You could set up your own messaging service on a
VM, using something like RabbitMQ. But Azure Service Bus already provides reliable messaging as service, and it's
simpler to set up. Just create a Service Bus namespace (which can be done as part of a deployment script) and then
call Service Bus using the client SDK.
Of course, your application may have specific requirements that make an IaaS approach more suitable. However,
even if your application is based on IaaS, look for places where it may be natural to incorporate PaaS options.
These include cache, queues, and data storage.

IN ST EA D O F RUN N IN G. . . C O N SIDER USIN G. . .

Active Directory Azure Active Directory

Elasticsearch Azure Search

Hadoop HDInsight

IIS App Service

MongoDB Cosmos DB

Redis Azure Cache for Redis

SQL Server Azure SQL Database

File share Azure NetApp Files

Please note that this is not meant to be an exhaustive list, but a subset of equivalent options.
Use the best data store for the job
12/18/2020 • 2 minutes to read • Edit Online

Pick the storage technology that is the best fit for your data and how it
will be used
Gone are the days when you would just stick all of your data into a big relational SQL database. Relational
databases are very good at what they do — providing ACID guarantees for transactions over relational data. But
they come with some costs:
Queries may require expensive joins.
Data must be normalized and conform to a predefined schema (schema on write).
Lock contention may impact performance.
In any large solution, it's likely that a single data store technology won't fill all your needs. Alternatives to relational
databases include key/value stores, document databases, search engine databases, time series databases, column
family databases, and graph databases. Each has pros and cons, and different types of data fit more naturally into
one or another.
For example, you might store a product catalog in a document database, such as Cosmos DB, which allows for a
flexible schema. In that case, each product description is a self-contained document. For queries over the entire
catalog, you might index the catalog and store the index in Azure Search. Product inventory might go into a SQL
database, because that data requires ACID guarantees.
Remember that data includes more than just the persisted application data. It also includes application logs, events,
messages, and caches.

Recommendations
Don't use a relational database for ever ything . Consider other data stores when appropriate. See Choose the
right data store.
Embrace polyglot persistence . In any large solution, it's likely that a single data store technology won't fill all
your needs.
Consider the type of data . For example, put transactional data into SQL, put JSON documents into a document
database, put telemetry data into a time series data base, put application logs in Elasticsearch, and put blobs in
Azure Blob Storage.
Prefer availability over (strong) consistency . The CAP theorem implies that a distributed system must make
trade-offs between availability and consistency. (Network partitions, the other leg of the CAP theorem, can never
be completely avoided.) Often, you can achieve higher availability by adopting an eventual consistency model.
Consider the skillset of the development team . There are advantages to using polyglot persistence, but it's
possible to go overboard. Adopting a new data storage technology requires a new set of skills. The development
team must understand how to get the most out of the technology. They must understand appropriate usage
patterns, how to optimize queries, tune for performance, and so on. Factor this in when considering storage
technologies.
Use compensating transactions . A side effect of polyglot persistence is that single transaction might write data
to multiple stores. If something fails, use compensating transactions to undo any steps that already completed.
Look at bounded contexts . Bounded context is a term from domain driven design. A bounded context is an
explicit boundary around a domain model, and defines which parts of the domain the model applies to. Ideally, a
bounded context maps to a subdomain of the business domain. The bounded contexts in your system are a natural
place to consider polyglot persistence. For example, "products" may appear in both the Product Catalog
subdomain and the Product Inventory subdomain, but it's very likely that these two subdomains have different
requirements for storing, updating, and querying products.
Design for evolution
12/18/2020 • 3 minutes to read • Edit Online

An evolutionary design is key for continuous innovation


All successful applications change over time, whether to fix bugs, add new features, bring in new technologies, or
make existing systems more scalable and resilient. If all the parts of an application are tightly coupled, it becomes
very hard to introduce changes into the system. A change in one part of the application may break another part, or
cause changes to ripple through the entire codebase.
This problem is not limited to monolithic applications. An application can be decomposed into services, but still
exhibit the sort of tight coupling that leaves the system rigid and brittle. But when services are designed to evolve,
teams can innovate and continuously deliver new features.
Microservices are becoming a popular way to achieve an evolutionary design, because they address many of the
considerations listed here.

Recommendations
Enforce high cohesion and loose coupling . A service is cohesive if it provides functionality that logically
belongs together. Services are loosely coupled if you can change one service without changing the other. High
cohesion generally means that changes in one function will require changes in other related functions. If you find
that updating a service requires coordinated updates to other services, it may be a sign that your services are not
cohesive. One of the goals of domain-driven design (DDD) is to identify those boundaries.
Encapsulate domain knowledge . When a client consumes a service, the responsibility for enforcing the
business rules of the domain should not fall on the client. Instead, the service should encapsulate all of the domain
knowledge that falls under its responsibility. Otherwise, every client has to enforce the business rules, and you end
up with domain knowledge spread across different parts of the application.
Use asynchronous messaging . Asynchronous messaging is a way to decouple the message producer from the
consumer. The producer does not depend on the consumer responding to the message or taking any particular
action. With a pub/sub architecture, the producer may not even know who is consuming the message. New
services can easily consume the messages without any modifications to the producer.
Don't build domain knowledge into a gateway . Gateways can be useful in a microservices architecture, for
things like request routing, protocol translation, load balancing, or authentication. However, the gateway should be
restricted to this sort of infrastructure functionality. It should not implement any domain knowledge, to avoid
becoming a heavy dependency.
Expose open interfaces . Avoid creating custom translation layers that sit between services. Instead, a service
should expose an API with a well-defined API contract. The API should be versioned, so that you can evolve the API
while maintaining backward compatibility. That way, you can update a service without coordinating updates to all
of the upstream services that depend on it. Public facing services should expose a RESTful API over HTTP. Backend
services might use an RPC-style messaging protocol for performance reasons.
Design and test against ser vice contracts . When services expose well-defined APIs, you can develop and test
against those APIs. That way, you can develop and test an individual service without spinning up all of its
dependent services. (Of course, you would still perform integration and load testing against the real services.)
Abstract infrastructure away from domain logic . Don't let domain logic get mixed up with infrastructure-
related functionality, such as messaging or persistence. Otherwise, changes in the domain logic will require
updates to the infrastructure layers and vice versa.
Offload cross-cutting concerns to a separate ser vice . For example, if several services need to authenticate
requests, you could move this functionality into its own service. Then you could evolve the authentication service
— for example, by adding a new authentication flow — without touching any of the services that use it.
Deploy ser vices independently . When the DevOps team can deploy a single service independently of other
services in the application, updates can happen more quickly and safely. Bug fixes and new features can be rolled
out at a more regular cadence. Design both the application and the release process to support independent
updates.
Build for the needs of the business
12/18/2020 • 2 minutes to read • Edit Online

Every design decision must be justified by a business requirement


This design principle may seem obvious, but it's crucial to keep in mind when designing a solution. Do you
anticipate millions of users, or a few thousand? Is a one-hour application outage acceptable? Do you expect large
bursts in traffic or a predictable workload? Ultimately, every design decision must be justified by a business
requirement.

Recommendations
Define business objectives , including the recovery time objective (RTO), recovery point objective (RPO), and
maximum tolerable outage (MTO). These numbers should inform decisions about the architecture. For example, to
achieve a low RTO, you might implement automated failover to a secondary region. But if your solution can tolerate
a higher RTO, that degree of redundancy might be unnecessary.
Document ser vice level agreements (SL A) and ser vice level objectives (SLO) , including availability and
performance metrics. You might build a solution that delivers 99.95% availability. Is that enough? The answer is a
business decision.
Model the application around the business domain . Start by analyzing the business requirements. Use these
requirements to model the application. Consider using a domain-driven design (DDD) approach to create domain
models that reflect the business processes and use cases.
Capture both functional and nonfunctional requirements . Functional requirements let you judge whether
the application does the right thing. Nonfunctional requirements let you judge whether the application does those
things well. In particular, make sure that you understand your requirements for scalability, availability, and latency.
These requirements will influence design decisions and choice of technology.
Decompose by workload . The term "workload" in this context means a discrete capability or computing task,
which can be logically separated from other tasks. Different workloads may have different requirements for
availability, scalability, data consistency, and disaster recovery.
Plan for growth . A solution might meet your current needs, in terms of number of users, volume of transactions,
data storage, and so forth. However, a robust application can handle growth without major architectural changes.
See Design to scale out and Partition around limits. Also consider that your business model and business
requirements will likely change over time. If an application's service model and data models are too rigid, it
becomes hard to evolve the application for new use cases and scenarios. See Design for evolution.
Manage costs . In a traditional on-premises application, you pay upfront for hardware as a capital expenditure. In a
cloud application, you pay for the resources that you consume. Make sure that you understand the pricing model
for the services that you consume. The total cost will include network bandwidth usage, storage, IP addresses,
service consumption, and other factors. For more information, see Azure pricing. Also consider your operations
costs. In the cloud, you don't have to manage the hardware or other infrastructure, but you still need to manage
your applications, including DevOps, incident response, disaster recovery, and so forth.
Choose an Azure compute service for your
application
12/18/2020 • 6 minutes to read • Edit Online

Azure offers a number of ways to host your application code. The term compute refers to the hosting model for
the computing resources that your application runs on. The following flowchart will help you to choose a compute
service for your application.
If your application consists of multiple workloads, evaluate each workload separately. A complete solution may
incorporate two or more compute services.

Choose a candidate service


Use the following flowchart to select a candidate compute service.

Definitions:
"Lift and shift" is a strategy for migrating a workload to the cloud without redesigning the application or
making code changes. Also called rehosting. For more information, see Azure migration center.
Cloud optimized is a strategy for migrating to the cloud by refactoring an application to take advantage of
cloud-native features and capabilities.
The output from this flowchart is a star ting point for consideration. Next, perform a more detailed evaluation of
the service to see if it meets your needs.
This article includes several tables which may help you to make these tradeoff decisions. Based on this analysis,
you may find that the initial candidate isn't suitable for your particular application or workload. In that case, expand
your analysis to include other compute services.

Understand the basic features


If you're not familiar with the Azure service selected in the previous step, read the overview documentation to
understand the basics of the service.
App Service. A managed service for hosting web apps, mobile app back ends, RESTful APIs, or automated
business processes.
Azure Kubernetes Service (AKS). A managed Kubernetes service for running containerized applications.
Batch. A managed service for running large-scale parallel and high-performance computing (HPC) applications
Container Instances. The fastest and simplest way to run a container in Azure, without having to provision any
virtual machines and without having to adopt a higher-level service.
Functions. A managed FaaS service.
Service Fabric. A distributed systems platform that can run in many environments, including Azure or on
premises.
Virtual machines. Deploy and manage VMs inside an Azure virtual network.

Understand the hosting models


Cloud services, including Azure services, generally fall into three categories: IaaS, PaaS, or FaaS. (There is also SaaS,
software-as-a-service, which is out of scope for this article.) It's useful to understand the differences.
Infrastructure-as-a-Ser vice (IaaS) lets you provision individual VMs along with the associated networking and
storage components. Then you deploy whatever software and applications you want onto those VMs. This model is
the closest to a traditional on-premises environment, except that Microsoft manages the infrastructure. You still
manage the individual VMs.
Platform-as-a-Ser vice (PaaS) provides a managed hosting environment, where you can deploy your application
without needing to manage VMs or networking resources. Azure App Service is a PaaS service.
Functions-as-a-Ser vice (FaaS) goes even further in removing the need to worry about the hosting environment.
In a FaaS model, you simply deploy your code and the service automatically runs it. Azure Functions is a FaaS
service.

NOTE
Azure Functions is an Azure serverless compute offering. You may read Choose the right integration and automation
services in Azure to know how this service compares with other Azure serverless offerings, such as Logic Apps which
provides serverless workflows.

There is a spectrum from IaaS to pure PaaS. For example, Azure VMs can autoscale by using virtual machine scale
sets. This automatic scaling capability isn't strictly PaaS, but it's the type of management feature found in PaaS
services.
In general, there is a tradeoff between control and ease of management. IaaS gives the most control, flexibility, and
portability, but you have to provision, configure and manage the VMs and network components you create. FaaS
services automatically manage nearly all aspects of running an application. PaaS services fall somewhere in
between.

A Z URE
VIRT UA L APP SERVIC E A Z URE K UB ERN ET E C O N TA IN ER A Z URE
C RIT ERIA M A C H IN ES SERVIC E FA B RIC F UN C T IO N S S SERVIC E IN STA N C ES B ATC H

Application Agnostic Applications Services, Functions Containers Containers Scheduled


compositio , containers guest jobs
n executables,
containers

Density Agnostic Multiple Multiple Serverless 1 Multiple No Multiple


apps per services per containers dedicated apps per
instance via VM per node instances VM
app service
plans

Minimum 12 1 53 Serverless 1 33 No 14
number of dedicated
nodes nodes

State Stateless or Stateless Stateless or Stateless Stateless or Stateless Stateless


manageme Stateful stateful Stateful
nt

Web Agnostic Built in Agnostic Not Agnostic Agnostic No


hosting applicable
A Z URE
VIRT UA L APP SERVIC E A Z URE K UB ERN ET E C O N TA IN ER A Z URE
C RIT ERIA M A C H IN ES SERVIC E FA B RIC F UN C T IO N S S SERVIC E IN STA N C ES B ATC H

Can be Supported Supported5 Supported Supported Supported Supported Supported


deployed to 5
dedicated
VNet?

Hybrid Supported Supported Supported Supported Supported Not Supported


connectivity 6 7 supported

Notes
1. If using Consumption plan. If using App Service plan, functions run on the VMs allocated for your App Service
plan. See Choose the correct service plan for Azure Functions.
2. Higher SLA with two or more instances.
3. Recommended for production environments.
4. Can scale down to zero after job completes.
5. Requires App Service Environment (ASE).
6. Use Azure App Service Hybrid Connections.
7. Requires App Service plan or Azure Functions Premium plan.

DevOps
A Z URE
VIRT UA L APP SERVIC E A Z URE K UB ERN ET E C O N TA IN ER A Z URE
C RIT ERIA M A C H IN ES SERVIC E FA B RIC F UN C T IO N S S SERVIC E IN STA N C ES B ATC H

Local Agnostic IIS Express, Local node Visual Minikube, Local Not
debugging others 1 cluster Studio or others container supported
Azure runtime
Functions
CLI

Programmi Agnostic Web and Guest Functions Agnostic Agnostic Command


ng model API executable, with line
applications Service triggers application
, WebJobs model,
for Actor
background model,
tasks Containers

Application No built-in Deploymen Rolling Deploymen Rolling Not


update support t slots upgrade t slots update applicable
(per service)

Notes
1. Options include IIS Express for ASP.NET or node.js (iisnode); PHP web server; Azure Toolkit for IntelliJ, Azure
Toolkit for Eclipse. App Service also supports remote debugging of deployed web app.
2. See Resource Manager providers, regions, API versions and schemas.

Scalability
A Z URE
VIRT UA L APP SERVIC E A Z URE K UB ERN ET E C O N TA IN ER A Z URE
C RIT ERIA M A C H IN ES SERVIC E FA B RIC F UN C T IO N S S SERVIC E IN STA N C ES B ATC H

Autoscaling Virtual Built-in Virtual Built-in Pod auto- Not N/A


machine service machine service scaling1 , supported
scale sets scale sets cluster
auto-
scaling2

Load Azure Load Integrated Azure Load Integrated Azure Load No built-in Azure Load
balancer Balancer Balancer Balancer or support Balancer
Application
Gateway

Scale limit3 Platform 20 100 nodes 200 100 nodes 20 20 core


image: instances, per scale instances per cluster container limit
1000 nodes 100 with set per (default groups per (default
per scale App Service Function limit) subscription limit).
set, Custom Environmen app (default
image: 600 t limit).
nodes per
scale set

Notes
1. See Autoscale pods.
2. See Automatically scale a cluster to meet application demands on Azure Kubernetes Service (AKS).
3. See Azure subscription and service limits, quotas, and constraints.

Availability
A Z URE
VIRT UA L APP SERVIC E A Z URE K UB ERN ET E C O N TA IN ER A Z URE
C RIT ERIA M A C H IN ES SERVIC E FA B RIC F UN C T IO N S S SERVIC E IN STA N C ES B ATC H

SLA SLA for SLA for App SLA for SLA for SLA for AKS SLA for SLA for
Virtual Service Service Functions Container Azure Batch
Machines Fabric Instances

Multi Traffic Traffic Traffic Azure Front Traffic Not Not


region manager manager manager, Door manager supported Supported
failover Multi-
Region
Cluster

For guided learning on Service Guarantees, review Core Cloud Services - Azure architecture and service
guarantees.

Security
Review and understand the available security controls and visibility for each service
App Service
Azure Kubernetes Service
Batch
Container Instances
Functions
Service Fabric
Virtual machine - Windows
Virtual machine - LINUX

Other criteria
A Z URE
VIRT UA L APP SERVIC E A Z URE K UB ERN ET E C O N TA IN ER A Z URE
C RIT ERIA M A C H IN ES SERVIC E FA B RIC F UN C T IO N S S SERVIC E IN STA N C ES B ATC H

SSL Configured Supported Supported Supported Ingress Use sidecar Supported


in VM controller container

Cost Windows, App Service Service Azure AKS pricing Container Azure Batch
Linux pricing Fabric Functions Instances pricing
pricing pricing pricing

Suitable N-Tier, Big Web- Microservic Microservic Microservic Microservic Big


architecture compute Queue- es, Event- es, Event- es, Event- es, task compute
styles (HPC) Worker, N- driven driven driven automation, (HPC)
Tier architecture architecture architecture batch jobs

The output from this flowchart is a star ting point for consideration. Next, perform a more detailed evaluation of
the service to see if it meets your needs.

Consider limits and cost


Perform a more detailed evaluation looking at the following aspects of the service:
Service limits
Cost
SLA
Regional availability
Compute comparison tables

Next steps
Core Cloud Services - Azure compute options. This Microsoft Learn module explores how compute services can
solve common business needs.
Understand data store models
12/18/2020 • 12 minutes to read • Edit Online

Modern business systems manage increasingly large volumes of heterogeneous data. This heterogeneity means
that a single data store is usually not the best approach. Instead, it's often better to store different types of data in
different data stores, each focused toward a specific workload or usage pattern. The term polyglot persistence is
used to describe solutions that use a mix of data store technologies. Therefore, it's important to understand the
main storage models and their tradeoffs.
Selecting the right data store for your requirements is a key design decision. There are literally hundreds of
implementations to choose from among SQL and NoSQL databases. Data stores are often categorized by how they
structure data and the types of operations they support. This article describes several of the most common storage
models. Note that a particular data store technology may support multiple storage models. For example, a
relational database management systems (RDBMS) may also support key/value or graph storage. In fact, there is a
general trend for so-called multi-model support, where a single database system supports several models. But it's
still useful to understand the different models at a high level.
Not all data stores in a given category provide the same feature-set. Most data stores provide server-side
functionality to query and process data. Sometimes this functionality is built into the data storage engine. In other
cases, the data storage and processing capabilities are separated, and there may be several options for processing
and analysis. Data stores also support different programmatic and management interfaces.
Generally, you should start by considering which storage model is best suited for your requirements. Then consider
a particular data store within that category, based on factors such as feature set, cost, and ease of management.

Relational database management systems


Relational databases organize data as a series of two-dimensional tables with rows and columns. Most vendors
provide a dialect of the Structured Query Language (SQL) for retrieving and managing data. An RDBMS typically
implements a transactionally consistent mechanism that conforms to the ACID (Atomic, Consistent, Isolated,
Durable) model for updating information.
An RDBMS typically supports a schema-on-write model, where the data structure is defined ahead of time, and all
read or write operations must use the schema.
This model is very useful when strong consistency guarantees are important — where all changes are atomic, and
transactions always leave the data in a consistent state. However, an RDBMS generally can't scale out horizontally
without sharding the data in some way. Also, the data in an RDBMS must normalized, which isn't appropriate for
every data set.
Azure services
Azure SQL Database | (Security Baseline)
Azure Database for MySQL | (Security Baseline)
Azure Database for PostgreSQL | (Security Baseline)
Azure Database for MariaDB | (Security Baseline)
Workload
Records are frequently created and updated.
Multiple operations have to be completed in a single transaction.
Relationships are enforced using database constraints.
Indexes are used to optimize query performance.
Data type
Data is highly normalized.
Database schemas are required and enforced.
Many-to-many relationships between data entities in the database.
Constraints are defined in the schema and imposed on any data in the database.
Data requires high integrity. Indexes and relationships need to be maintained accurately.
Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all
users and processes.
Size of individual data entries is small to medium-sized.
Examples
Inventory management
Order management
Reporting database
Accounting

Key/value stores
A key/value store associates each data value with a unique key. Most key/value stores only support simple query,
insert, and delete operations. To modify a value (either partially or completely), an application must overwrite the
existing data for the entire value. In most implementations, reading or writing a single value is an atomic operation.
An application can store arbitrary data as a set of values. Any schema information must be provided by the
application. The key/value store simply retrieves or stores the value by key.

Key/value stores are highly optimized for applications performing simple lookups, but are less suitable if you need
to query data across different key/value stores. Key/value stores are also not optimized for querying by value.
A single key/value store can be extremely scalable, as the data store can easily distribute data across multiple nodes
on separate machines.
Azure services
Azure Cosmos DB Table API, etcd API (preview), and SQL API | (Cosmos DB Security Baseline)
Azure Cache for Redis | (Security Baseline)
Azure Table Storage | (Security Baseline)
Workload
Data is accessed using a single key, like a dictionary.
No joins, lock, or unions are required.
No aggregation mechanisms are used.
Secondary indexes are generally not used.
Data type
Each key is associated with a single value.
There is no schema enforcement.
No relationships between entities.
Examples
Data caching
Session management
User preference and profile management
Product recommendation and ad serving

Document databases
A document database stores a collection of documents, where each document consists of named fields and data.
The data can be simple values or complex elements such as lists and child collections. Documents are retrieved by
unique keys.
Typically, a document contains the data for single entity, such as a customer or an order. A document may contain
information that would be spread across several relational tables in an RDBMS. Documents don't need to have the
same structure. Applications can store different data in documents as business requirements change.

Azure service
Azure Cosmos DB SQL API | (Cosmos DB Security Baseline)
Workload
Insert and update operations are common.
No object-relational impedance mismatch. Documents can better match the object structures used in application
code.
Individual documents are retrieved and written as a single block.
Data requires index on multiple fields.
Data type
Data can be managed in de-normalized way.
Size of individual document data is relatively small.
Each document type can use its own schema.
Documents can include optional fields.
Document data is semi-structured, meaning that data types of each field are not strictly defined.
Examples
Product catalog
Content management
Inventory management

Graph databases
A graph database stores two types of information, nodes and edges. Edges specify relationships between nodes.
Nodes and edges can have properties that provide information about that node or edge, similar to columns in a
table. Edges can also have a direction indicating the nature of the relationship.
Graph databases can efficiently perform queries across the network of nodes and edges and analyze the
relationships between entities. The following diagram shows an organization's personnel database structured as a
graph. The entities are employees and departments, and the edges indicate reporting relationships and the
departments in which employees work.

This structure makes it straightforward to perform queries such as "Find all employees who report directly or
indirectly to Sarah" or "Who works in the same department as John?" For large graphs with lots of entities and
relationships, you can perform very complex analyses very quickly. Many graph databases provide a query
language that you can use to traverse a network of relationships efficiently.
Azure services
Azure Cosmos DB Gremlin API | (Security Baseline)
SQL Server | (Security Baseline)
Workload
Complex relationships between data items involving many hops between related data items.
The relationship between data items are dynamic and change over time.
Relationships between objects are first-class citizens, without requiring foreign-keys and joins to traverse.
Data type
Nodes and relationships.
Nodes are similar to table rows or JSON documents.
Relationships are just as important as nodes, and are exposed directly in the query language.
Composite objects, such as a person with multiple phone numbers, tend to be broken into separate, smaller
nodes, combined with traversable relationships
Examples
Organization charts
Social graphs
Fraud detection
Recommendation engines

Data analytics
Data analytics stores provide massively parallel solutions for ingesting, storing, and analyzing data. The data is
distributed across multiple servers to maximize scalability. Large data file formats such as delimiter files (CSV),
parquet, and ORC are widely used in data analytics. Historical data is typically stored in data stores such as blob
storage or Azure Data Lake Storage Gen2, which are then accessed by Azure Synapse, Databricks, or HDInsight as
external tables. A typical scenario using data stored as parquet files for performance, is described in the article Use
external tables with Synapse SQL.
Azure services
Azure Synapse Analytics | (Security Baseline)
Azure Data Lake | (Security Baseline)
Azure Data Explorer | (Security Baseline)
Azure Analysis Services
HDInsight | (Security Baseline)
Azure Databricks | (Security Baseline)
Workload
Data analytics
Enterprise BI
Data type
Historical data from multiple sources.
Usually denormalized in a "star" or "snowflake" schema, consisting of fact and dimension tables.
Usually loaded with new data on a scheduled basis.
Dimension tables often include multiple historic versions of an entity, referred to as a slowly changing
dimension.
Examples
Enterprise data warehouse

Column-family databases
A column-family database organizes data into rows and columns. In its simplest form, a column-family database
can appear very similar to a relational database, at least conceptually. The real power of a column-family database
lies in its denormalized approach to structuring sparse data.
You can think of a column-family database as holding tabular data with rows and columns, but the columns are
divided into groups known as column families. Each column family holds a set of columns that are logically related
together and are typically retrieved or manipulated as a unit. Other data that is accessed separately can be stored in
separate column families. Within a column family, new columns can be added dynamically, and rows can be sparse
(that is, a row doesn't need to have a value for every column).
The following diagram shows an example with two column families, Identity and Contact Info . The data for a
single entity has the same row key in each column-family. This structure, where the rows for any given object in a
column family can vary dynamically, is an important benefit of the column-family approach, making this form of
data store highly suited for storing structured, volatile data.

Unlike a key/value store or a document database, most column-family databases store data in key order, rather than
by computing a hash. Many implementations allow you to create indexes over specific columns in a column-family.
Indexes let you retrieve data by columns value, rather than row key.
Read and write operations for a row are usually atomic with a single column-family, although some
implementations provide atomicity across the entire row, spanning multiple column-families.
Azure services
Azure Cosmos DB Cassandra API | (Security Baseline)
HBase in HDInsight | (Security Baseline)
Workload
Most column-family databases perform write operations extremely quickly.
Update and delete operations are rare.
Designed to provide high throughput and low-latency access.
Supports easy query access to a particular set of fields within a much larger record.
Massively scalable.
Data type
Data is stored in tables consisting of a key column and one or more column families.
Specific columns can vary by individual rows.
Individual cells are accessed via get and put commands
Multiple rows are returned using a scan command.
Examples
Recommendations
Personalization
Sensor data
Telemetry
Messaging
Social media analytics
Web analytics
Activity monitoring
Weather and other time-series data

Search Engine Databases


A search engine database allows applications to search for information held in external data stores. A search engine
database can index massive volumes of data and provide near real-time access to these indexes.
Indexes can be multi-dimensional and may support free-text searches across large volumes of text data. Indexing
can be performed using a pull model, triggered by the search engine database, or using a push model, initiated by
external application code.
Searching can be exact or fuzzy. A fuzzy search finds documents that match a set of terms and calculates how
closely they match. Some search engines also support linguistic analysis that can return matches based on
synonyms, genre expansions (for example, matching dogs to pets ), and stemming (matching words with the
same root).
Azure service
Azure Search | (Security Baseline)
Workload
Data indexes from multiple sources and services.
Queries are ad-hoc and can be complex.
Full text search is required.
Ad hoc self-service query is required.
Data type
Semi-structured or unstructured text
Text with reference to structured data
Examples
Product catalogs
Site search
Logging

Time series databases


Time series data is a set of values organized by time. Time series databases typically collect large amounts of data in
real time from a large number of sources. Updates are rare, and deletes are often done as bulk operations. Although
the records written to a time-series database are generally small, there are often a large number of records, and
total data size can grow rapidly.
Azure service
Azure Time Series Insights
Workload
Records are generally appended sequentially in time order.
An overwhelming proportion of operations (95-99%) are writes.
Updates are rare.
Deletes occur in bulk, and are made to contiguous blocks or records.
Data is read sequentially in either ascending or descending time order, often in parallel.
Data type
A timestamp is used as the primary key and sorting mechanism.
Tags may define additional information about the type, origin, and other information about the entry.
Examples
Monitoring and event telemetry.
Sensor or other IoT data.

Object storage
Object storage is optimized for storing and retrieving large binary objects (images, files, video and audio streams,
large application data objects and documents, virtual machine disk images). Large data files are also popularly used
in this model, for example, delimiter file (CSV), parquet, and ORC. Object stores can manage extremely large
amounts of unstructured data.
Azure service
Azure Blob Storage | (Security Baseline)
Azure Data Lake Storage Gen2 | (Security Baseline)
Workload
Identified by key.
Content is typically an asset such as a delimiter, image, or video file.
Content must be durable and external to any application tier.
Data type
Data size is large.
Value is opaque.
Examples
Images, videos, office documents, PDFs
Static HTML, JSON, CSS
Log and audit files
Database backups

Shared files
Sometimes, using simple flat files can be the most effective means of storing and retrieving information. Using file
shares enables files to be accessed across a network. Given appropriate security and concurrent access control
mechanisms, sharing data in this way can enable distributed services to provide highly scalable data access for
performing basic, low-level operations such as simple read and write requests.
Azure service
Azure Files | (Security Baseline)
Workload
Migration from existing apps that interact with the file system.
Requires SMB interface.
Data type
Files in a hierarchical set of folders.
Accessible with standard I/O libraries.
Examples
Legacy files
Shared content accessible among a number of VMs or app instances
Aided with this understanding of different data storage models, the next step is to evaluate your workload and
application, and decide which data store will meet your specific needs. Use the data storage decision tree to help
with this process.
Select an Azure data store for your application
12/18/2020 • 2 minutes to read • Edit Online

Azure offers a number of managed data storage solutions, each providing different features and capabilities. This
article will help you to choose a data store for your application.
If your application consists of multiple workloads, evaluate each workload separately. A complete solution may
incorporate multiple data stores.
Use the following flowchart to select a candidate data store.
START

Azure Database MySql


for MySQL Else... Yes
Require compatible Relational data?
format? SQL Database
Azure Database PostreSQL
for PostgreSQL No

MariaDB Semi-structured Yes CosmosDB SQL


Azure Database data, schema-on-
for MariaDB read? API

Cassandra No
CosmosDB
Cassandra API
Yes
Need SMB interface? Azure Files
CosmosDB MongoDB
MongoDB API No
Blob Storage
Yes
Archive? cool access tier
archive access tier

No

Yes
Search index data? Azure Search

No

Yes
Time series data?
Time Series
Insights

No

File system Yes Analyse with


File system or object Large data (TB/PB) Azure Synapse, Databricks,
Data Lake Store store? for analysis? Azure Analytics Services.

Object No

Yes Binary blobs


Blob Storage (images, PDF files,
etc.)?

No

Yes Cosmos DB
Graph data?
Graph API

No
Yes
Azure Cache for
Redis
Transient data?
No
Cosmos DB SQL
API

The output from this flowchart is a star ting point for consideration. Next, perform a more detailed evaluation of
the data store to see if it meets your needs. Refer to Criteria for choosing a data store to aid in this evaluation.
Criteria for choosing a data store
12/18/2020 • 3 minutes to read • Edit Online

This article describes the comparison criteria you should use when evaluating a data store. The goal is to help you
determine which data storage types can meet your solution's requirements.

General considerations
Keep the following considerations in mind when making your selection.
Functional requirements
Data format . What type of data are you intending to store? Common types include transactional data,
JSON objects, telemetry, search indexes, or flat files.
Data size . How large are the entities you need to store? Will these entities need to be maintained as a
single document, or can they be split across multiple documents, tables, collections, and so forth?
Scale and structure . What is the overall amount of storage capacity you need? Do you anticipate
partitioning your data?
Data relationships . Will your data need to support one-to-many or many-to-many relationships? Are
relationships themselves an important part of the data? Will you need to join or otherwise combine data
from within the same dataset, or from external datasets?
Consistency model . How important is it for updates made in one node to appear in other nodes, before
further changes can be made? Can you accept eventual consistency? Do you need ACID guarantees for
transactions?
Schema flexibility . What kind of schemas will you apply to your data? Will you use a fixed schema, a
schema-on-write approach, or a schema-on-read approach?
Concurrency . What kind of concurrency mechanism do you want to use when updating and synchronizing
data? Will the application perform many updates that could potentially conflict. If so, you may require
record locking and pessimistic concurrency control. Alternatively, can you support optimistic concurrency
controls? If so, is simple timestamp-based concurrency control enough, or do you need the added
functionality of multi-version concurrency control?
Data movement . Will your solution need to perform ETL tasks to move data to other stores or data
warehouses?
Data lifecycle . Is the data write-once, read-many? Can it be moved into cool or cold storage?
Other suppor ted features . Do you need any other specific features, such as schema validation,
aggregation, indexing, full-text search, MapReduce, or other query capabilities?
Non-functional requirements
Performance and scalability . What are your data performance requirements? Do you have specific
requirements for data ingestion rates and data processing rates? What are the acceptable response times
for querying and aggregation of data once ingested? How large will you need the data store to scale up? Is
your workload more read-heavy or write-heavy?
Reliability . What overall SLA do you need to support? What level of fault-tolerance do you need to provide
for data consumers? What kind of backup and restore capabilities do you need?
Replication . Will your data need to be distributed among multiple replicas or regions? What kind of data
replication capabilities do you require?
Limits . Will the limits of a particular data store support your requirements for scale, number of
connections, and throughput?
Management and cost
Managed ser vice . When possible, use a managed data service, unless you require specific capabilities that
can only be found in an IaaS-hosted data store.
Region availability . For managed services, is the service available in all Azure regions? Does your solution
need to be hosted in certain Azure regions?
Por tability . Will your data need to be migrated to on-premises, external datacenters, or other cloud hosting
environments?
Licensing . Do you have a preference of a proprietary versus OSS license type? Are there any other external
restrictions on what type of license you can use?
Overall cost . What is the overall cost of using the service within your solution? How many instances will
need to run, to support your uptime and throughput requirements? Consider operations costs in this
calculation. One reason to prefer managed services is the reduced operational cost.
Cost effectiveness . Can you partition your data, to store it more cost effectively? For example, can you
move large objects out of an expensive relational database into an object store?
Security
Security . What type of encryption do you require? Do you need encryption at rest? What authentication
mechanism do you want to use to connect to your data?
Auditing . What kind of audit log do you need to generate?
Networking requirements . Do you need to restrict or otherwise manage access to your data from other
network resources? Does data need to be accessible only from inside the Azure environment? Does the data
need to be accessible from specific IP addresses or subnets? Does it need to be accessible from applications
or services hosted on-premises or in other external datacenters?
DevOps
Skill set . Are there particular programming languages, operating systems, or other technology that your
team is particularly adept at using? Are there others that would be difficult for your team to work with?
Clients Is there good client support for your development languages?
Overview of load-balancing options in Azure
12/18/2020 • 4 minutes to read • Edit Online

The term load balancing refers to the distribution of workloads across multiple computing resources. Load
balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overloading
any single resource. It can also improve availability by sharing a workload across redundant computing resources.

Overview
Azure load balancing services can be categorized along two dimensions: global versus regional, and HTTP(S) versus
non-HTTP(S).
Global versus regional
Global load-balancing services distribute traffic across regional backends, clouds, or hybrid on-premises
services. These services route end-user traffic to the closest available backend. They also react to changes in
service reliability or performance, in order to maximize availability and performance. You can think of them
as systems that load balance between application stamps, endpoints, or scale-units hosted across different
regions/geographies.
Regional load-balancing services distribute traffic within virtual networks across virtual machines (VMs) or
zonal and zone-redundant service endpoints within a region. You can think of them as systems that load
balance between VMs, containers, or clusters within a region in a virtual network.
HTTP(S ) versus non-HTTP(S )
HTTP(S) load-balancing services are Layer 7 load balancers that only accept HTTP(S) traffic. They are
intended for web applications or other HTTP(S) endpoints. They include features such as SSL offload, web
application firewall, path-based load balancing, and session affinity.
Non-HTTP/S load-balancing services can handle non-HTTP(S) traffic and are recommended for non-web
workloads.
The following table summarizes the Azure load balancing services by these categories:

SERVIC E GLO B A L / REGIO N A L REC O M M EN DED T RA F F IC

Azure Front Door Global HTTP(S)

Traffic Manager Global non-HTTP(S)

Application Gateway Regional HTTP(S)

Azure Load Balancer Global non-HTTP(S)

Azure load balancing services


Here are the main load-balancing services currently available in Azure:
Front Door is an application delivery network that provides global load balancing and site acceleration service for
web applications. It offers Layer 7 capabilities for your application like SSL offload, path-based routing, fast failover,
caching, etc. to improve performance and high-availability of your applications.
NOTE
At this time, Azure Front Door does not support Web Sockets.

Traffic Manager is a DNS-based traffic load balancer that enables you to distribute traffic optimally to services
across global Azure regions, while providing high availability and responsiveness. Because Traffic Manager is a
DNS-based load-balancing service, it load balances only at the domain level. For that reason, it can't fail over as
quickly as Front Door, because of common challenges around DNS caching and systems not honoring DNS TTLs.
Application Gateway provides application delivery controller (ADC) as a service, offering various Layer 7 load-
balancing capabilities. Use it to optimize web farm productivity by offloading CPU-intensive SSL termination to the
gateway.
Azure Load Balancer is a high-performance, ultra low-latency Layer 4 load-balancing service (inbound and
outbound) for all UDP and TCP protocols. It is built to handle millions of requests per second while ensuring your
solution is highly available. Azure Load Balancer is zone-redundant, ensuring high availability across Availability
Zones.

Decision tree for load balancing in Azure


When selecting the load-balancing options, here are some factors to consider:
Traffic type . Is it a web (HTTP/HTTPS) application? Is it public facing or a private application?
Global versus. regional . Do you need to load balance VMs or containers within a virtual network, or load
balance scale unit/deployments across regions, or both?
Availability . What is the service SLA?
Cost . See Azure pricing. In addition to the cost of the service itself, consider the operations cost for managing a
solution built on that service.
Features and limits . What are the overall limitations of each service? See Service limits.
The following flowchart will help you to choose a load-balancing solution for your application. The flowchart
guides you through a set of key decision criteria to reach a recommendation.
Treat this flowchar t as a star ting point. Every application has unique requirements, so use the
recommendation as a starting point. Then perform a more detailed evaluation.
If your application consists of multiple workloads, evaluate each workload separately. A complete solution may
incorporate two or more load-balancing solutions.
Definitions
Internet facing . Applications that are publicly accessible from the internet. As a best practice, application
owners apply restrictive access policies or protect the application by setting up offerings like web application
firewall and DDoS protection.
Global . End users or clients are located beyond a small geographical area. For example, users across
multiple continents, across countries/regions within a continent, or even across multiple metropolitan areas
within a larger country/region.
PaaS . Platform as a service (PaaS) services provide a managed hosting environment, where you can deploy
your application without needing to manage VMs or networking resources. In this case, PaaS refers to
services that provide integrated load balancing within a region. See Choosing a compute service –
Scalability.
IaaS . Infrastructure as a service (IaaS) is a computing option where you provision the VMs that you need,
along with associated network and storage components. IaaS applications require internal load balancing
within a virtual network, using Azure Load Balancer.
Application-layer processing refers to special routing within a virtual network. For example, path-based
routing within the virtual network across VMs or virtual machine scale sets. For more information, see When
should we deploy an Application Gateway behind Front Door?.
Asynchronous messaging options in Azure
12/18/2020 • 17 minutes to read • Edit Online

This article describes the different types of messages and the entities that participate in a messaging infrastructure.
Based on the requirements of each message type, the article recommends Azure messaging services. The options
include Azure Service Bus, Event Grid, and Event Hubs.
At an architectural level, a message is a datagram created by an entity (producer), to distribute information so that
other entities (consumers) can be aware and act accordingly. The producer and the consumer can communicate
directly or optionally through an intermediary entity (message broker). This article focuses on asynchronous
messaging using a message broker.

Messages can be classified into two main categories. If the producer expects an action from the consumer, that
message is a command. If the message informs the consumer that an action has taken place, then the message is
an event.

Commands
The producer sends a command with the intent that the consumer(s) will perform an operation within the scope of
a business transaction.
A command is a high-value message and must be delivered at least once. If a command is lost, the entire business
transaction might fail. Also, a command shouldn't be processed more than once. Doing so might cause an
erroneous transaction. A customer might get duplicate orders or billed twice.
Commands are often used to manage the workflow of a multistep business transaction. Depending on the
business logic, the producer may expect the consumer to acknowledge the message and report the results of the
operation. Based on that result, the producer may choose an appropriate course of action.

Events
An event is a type of message that a producer raises to announce facts.
The producer (known as the publisher in this context) has no expectations that the events will result in any action.
Interested consumer(s), can subscribe, listen for events, and take actions depending on their consumption scenario.
Events can have multiple subscribers or no subscribers at all. Two different subscribers can react to an event with
different actions and not be aware of one another.
The producer and consumer are loosely coupled and managed independently. The consumer isn't expected to
acknowledge the event back to the producer. A consumer that is no longer interested in the events, can
unsubscribe. The consumer is removed from the pipeline without affecting the producer or the overall functionality
of the system.
There are two categories of events:
The producer raises events to announce discrete facts. A common use case is event notification. For
example, Azure Resource Manager raises events when it creates, modifies, or deletes resources. A subscriber
of those events could be a Logic App that sends alert emails.
The producer raises related events in a sequence, or a stream of events, over a period of time. Typically, a
stream is consumed for statistical evaluation. The evaluation can be done within a temporal window or as
events arrive. Telemetry is a common use case, for example, health and load monitoring of a system.
Another case is event streaming from IoT devices.
A common pattern for implementing event messaging is the Publisher-Subscriber pattern.

Role and benefits of a message broker


An intermediate message broker provides the functionality of moving messages from producer to consumer and
can offer additional benefits.
Decoupling
A message broker decouples the producer from the consumer in the logic that generates and uses the messages,
respectively. In a complex workflow, the broker can encourage business operations to be decoupled and help
coordinate the workflow.
For example, a single business transaction requires distinct operations that are performed in a business logic
sequence. The producer issues a command that signals a consumer to start an operation. The consumer
acknowledges the message in a separate queue reserved for lining up responses for the producer. Only after
receiving the response, the producer sends a new message to start the next operation in the sequence. A different
consumer processes that message and sends a completion message to the response queue. By using messaging,
the services coordinate the workflow of the transaction among themselves.
A message broker provides temporal decoupling. The producer and consumer don't have to run concurrently. A
producer can send a message to the message broker regardless of the availability of the consumer. Conversely, the
consumer isn't restricted by the producer's availability.
For example, the user interface of a web app generates messages and uses a queue as the message broker. When
ready, consumers can retrieve messages from the queue and perform the work. Temporal decoupling helps the
user interface to remain responsive. It's not blocked while the messages are handled asynchronously.
Certain operations can take long to complete. After issuing a command, the producer shouldn't have to wait until
the consumer completes it. A message broker helps asynchronous processing of messages.
Load balancing
Producers may post a large number of messages that are serviced by many consumers. Use a message broker to
distribute processing across servers and improve throughput. Consumers can run on different servers to spread
the load. Consumers can be added dynamically to scale out the system when needed or removed otherwise.

The Competing Consumers Pattern explains how to process multiple messages concurrently to optimize
throughput, to improve scalability and availability, and to balance the workload.
Load leveling
The volume of messages generated by the producer or a group of producers can be variable. At times there might
be a large volume causing spikes in messages. Instead of adding consumers to handle this work, a message broker
can act as a buffer, and consumers gradually drain messages at their own pace without stressing the system.

The Queue-based Load Leveling Pattern provides more information.


Reliable messaging
A message broker helps ensure that messages aren't lost even if communication fails between the producer and
consumer. The producer can post messages to the message broker and the consumer can retrieve them when
communication is reestablished. The producer isn't blocked unless it loses connectivity with the message broker.
Resilient messaging
A message broker can add resiliency to the consumers in your system. If a consumer fails while processing a
message, another instance of the consumer can process that message. The reprocessing is possible because the
message persists in the broker.

Technology choices for a message broker


Azure provides several message broker services, each with a range of features. Before choosing a service,
determine the intent and requirements of the message.
Azure Service Bus
Azure Service Bus queues are well suited for transferring commands from producers to consumers. Here are some
considerations.
Pull model
A consumer of a Service Bus queue constantly polls Service Bus to check if new messages are available. The client
SDKs and Azure Functions trigger for Service Bus abstract that model. When a new message is available, the
consumer's callback is invoked, and the message is sent to the consumer.
Guaranteed delivery
Service Bus allows a consumer to peek the queue and lock a message from other consumers.
It's the responsibility of the consumer to report the processing status of the message. Only when the consumer
marks the message as consumed, Service Bus removes the message from the queue. If a failure, timeout, or crash
occurs, Service Bus unlocks the message so that other consumers can retrieve it. This way messages aren't lost in
transfer.
A producer might accidentally send the same message twice. For instance, a producer instance fails after sending a
message. Another producer replaces the original instance and sends the message again. Azure Service Bus queues
provide a built-in de-duping capability that detects and removes duplicate messages. There's still a chance that a
message is delivered twice. For example, if a consumer fails while processing, the message is returned to the queue
and is retrieved by the same or another consumer. The message processing logic in the consumer should be
idempotent so that even if the work is repeated, the state of the system isn't changed. For more information about
idempotency, see Idempotency Patterns on Jonathon Oliver's blog.
Message ordering
If you want consumers to get the messages in the order they are sent, Service Bus queues guarantee first-in-first-
out (FIFO) ordered delivery by using sessions. A session can have one or more messages. The messages are
correlated with the SessionId property. Messages that are part of a session, never expire. A session can be locked
to a consumer to prevent its messages from being handled by a different consumer.
For more information, see Message Sessions.
Message persistence
Service bus queues support temporal decoupling. Even when a consumer isn't available or unable to process the
message, it remains in the queue.
Checkpoint long-running transactions
Business transactions can run for a long time. Each operation in the transaction can have multiple messages. Use
checkpointing to coordinate the workflow and provide resiliency in case a transaction fails.
Service Bus queues allow checkpointing through the session state capability. State information is incrementally
recorded in the queue (SetState ) for messages that belong to a session. For example, a consumer can track
progress by checking the state (GetState ) every now and then. If a consumer fails, another consumer can use state
information to determine the last known checkpoint to resume the session.
Dead-letter queue (DLQ)
A Service Bus queue has a default subqueue, called the dead-letter queue (DLQ) to hold messages that couldn't be
delivered or processed. Service Bus or the message processing logic in the consumer can add messages to the
DLQ. The DLQ keeps the messages until they are retrieved from the queue.
Here are examples when a message can end up being in the DLQ:
A poison message is a message that cannot be handled because it's malformed or contains unexpected
information. In Service Bus queues, you can detect poison messages by setting the MaxDeliver yCount
property of the queue. If number of times the same message is received exceeds that property value,
Service Bus moves the message to the DLQ.
A message might no longer be relevant if it isn't processed within a period. Service Bus queues allow the
producer to post messages with a time-to-live attribute. If this period expires before the message is
received, the message is placed in the DLQ.
Examine messages in the DLQ to determine the reason for failure.
Hybrid solution
Service Bus bridges on-premises systems and cloud solutions. On-premises systems are often difficult to reach
because of firewall restrictions. Both the producer and consumer (either can be on-premises or the cloud) can use
the Service Bus queue endpoint as the pickup and drop off location for messages.
Topics and subscriptions
Service Bus supports the Publisher-Subscriber pattern through Service Bus topics and subscriptions.
This feature provides a way for the producer to broadcast messages to multiple consumers. When a topic receives
a message, it's forwarded to all the subscribed consumers. Optionally, a subscription can have filter criteria that
allows the consumer to get a subset of messages. Each consumer retrieves messages from a subscription in a
similar way to a queue.
For more information, see Azure Service Bus topics.
Azure Event Grid
Azure Event Grid is recommended for discrete events. Event Grid follows the Publisher-Subscriber pattern. When
event sources trigger events, they are published to Event grid topics. Consumers of those events create Event Grid
subscriptions by specifying event types and event handler that will process the events. If there are no subscribers,
the events are discarded. Each event can have multiple subscriptions.
Push Model
Event Grid propagates messages to the subscribers in a push model. Suppose you have an event grid subscription
with a webhook. When a new event arrives, Event Grid posts the event to the webhook endpoint.
Integrated with Azure
Choose Event Grid if you want to get notifications about Azure resources. Many Azure services act as event sources
that have built-in Event Grid topics. Event Grid also supports various Azure services that can be configured as
event handlers. It's easy to subscribe to those topics to route events to event handlers of your choice. For example,
you can use Event Grid to invoke an Azure Function when a blob storage is created or deleted.
Custom topics
Create custom Event Grid topics, if you want to send events from your application or an Azure service that isn't
integrated with Event Grid.
For example, to see the progress of an entire business transaction, you want the participating services to raise
events as they are processing their individual business operations. A web app shows those events. One way is to
create a custom topic and add a subscription with your web app registered through an HTTP WebHook. As
business services send events to the custom topic, Event Grid pushes them to your web app.
Filtered events
You can specify filters in a subscription to instruct Event Grid to route only a subset of events to a specific event
handler. The filters are specified in the subscription schema. Any event sent to the topic with values that match the
filter are automatically forwarded to that subscription.
For example, content in various formats are uploaded to Blob Storage. Each time a file is added, an event is raised
and published to Event Grid. The event subscription might have a filter that only sends events for images so that an
event handler can generate thumbnails.
For more information about filtering, see Filter events for Event Grid.
High throughput
Event Grid can route 10,000,000 events per second per region. The first 100,000 operations per month are free.
For cost considerations, see How much does Event Grid cost?
Resilient delivery
Even though successful delivery for events isn't as crucial as commands, you might still want some guarantee
depending on the type of event. Event Grid offers features that you can enable and customize, such as retry
policies, expiration time, and dead lettering. For more information, see Delivery and retry.
Event Grid's retry process can help resiliency but it's not fail-safe. In the retry process, Event Grid might deliver the
message more than once, skip, or delay some retries if the endpoint is unresponsive for a long time. For more
information, see Retry schedule and duration.
You can persist undelivered events to a blob storage account by enabling dead-lettering. There's a delay in
delivering the message to the blob storage endpoint and if that endpoint is unresponsive, then Event Grid discards
the event. For more information, see Dead letter and retry policies.
Azure Event Hubs
When working with an event stream, Azure Event Hubs is the recommended message broker. Essentially, it's a
large buffer that's capable of receiving large volumes of data with low latency. The received data can be read
quickly through concurrent operations. You can transform the received data by using any real-time analytics
provider. Event Hubs also provides the capability to store events in a storage account.
Fast ingestion
Event Hubs is capable of ingesting millions of events per second. The events are only appended to the stream and
are ordered by time.
Pull model
Like Event Grid, Event Hubs also offers Publisher-Subscriber capabilities. A key difference between Event Grid and
Event Hubs is in the way event data is made available to the subscribers. Event Grid pushes the ingested data to the
subscribers whereas Event Hub makes the data available in a pull model. As events are received, Event Hubs
appends them to the stream. A subscriber manages its cursor and can move forward and back in the stream, select
a time offset, and replay a sequence at its pace.
Stream processors are subscribers that pull data from Event Hubs for the purposes of transformation and
statistical analysis. Use Azure Stream Analytics and Apache Spark for complex processing such as aggregation over
time windows or anomaly detection.
If you want to act on each event per partition, you can pull the data by using Event Processing Host or by using
built in connector such as Logic Apps to provide the transformation logic. Another option is to use Azure Functions.
Partitioning
A partition is a portion of the event stream. The events are divided by using a partition key. For example, several
IoT devices send device data to an event hub. The partition key is the device identifier. As events are ingested, Event
Hubs moves them to separate partitions. Within each partition, all events are ordered by time.
A consumer is an instance of code that processes the event data. Event Hubs follows a partitioned consumer
pattern. Each consumer only reads a specific partition. Having multiple partitions results in faster processing
because the stream can be read concurrently by multiple consumers.
Instances of the same consumer make up a single consumer group. Multiple consumer groups can read the same
stream with different intentions. Suppose an event stream has data from a temperature sensor. One consumer
group can read the stream to detect anomalies such as a spike in temperature. Another can read the same stream
to calculate a rolling average temperature in a temporal window.
Event Hubs supports the Publisher-Subscriber pattern by allowing multiple consumer groups. Each consumer
group is a subscriber.
For more information about Event Hub partitioning, see Partitions.
Event Hubs Capture
The Capture feature allows you to store the event stream to an Azure Blob storage or Data Lake Storage. This way
of storing events is reliable because even if the storage account isn't available, Capture keeps your data for a
period, and then writes to the storage after it's available.

Storage services can also offer additional features for analyzing events. For example, by taking advantage of
the access tiers of a blob storage account, you can store events in a hot tier for data that needs frequent access.
You might use that data for visualization. Alternately, you can store data in the archive tier and retrieve it
occasionally for auditing purposes.

Capture stores all events ingested by Event Hubs and is useful for batch processing. You can generate reports on
the data by using a MapReduce function. Captured data can also serve as the source of truth. If certain facts were
missed while aggregating the data, you can refer to the captured data.
For details about this feature, see Capture events through Azure Event Hubs in Azure Blob Storage or Azure Data
Lake Storage.
Support for Apache Kafka clients
Event Hubs provides an endpoint for Apache Kafka clients. Existing clients can update their configuration to point
to the endpoint and start sending events to Event Hubs. No code changes are required.
For more information, see Event Hubs for Apache Kafka.

Crossover scenarios
In some cases, it's advantageous to combine two messaging services.
Combining services can increase the efficiency of your messaging system. For instance, in your business
transaction, you use Azure Service Bus queues to handle messages. Queues that are mostly idle and receive
messages occasionally are inefficient because the consumer is constantly polling the queue for new messages. You
can set up an Event Grid subscription with an Azure Function as the event handler. Each time the queue receives a
message and there are no consumers listening, Event Grid sends a notification, which invokes the Azure Function
that drains the queue.

For details about connecting Service Bus to Event Grid, see Azure Service Bus to Event Grid integration overview.
The Enterprise integration on Azure using message queues and events reference architecture shows an
implementation of Service Bus to Event Grid integration.
Here's another example. Event Grid receives a set of events in which some events require a workflow while others
are for notification. The message metadata indicates the type of event. One way is to check the metadata by using
the filtering feature in the event subscription. If it requires a workflow, Event Grid sends it to Azure Service Bus
queue. The receivers of that queue can take necessary actions. The notification events are sent to Logic Apps to
send alert emails.

Related patterns
Consider these patterns when implementing asynchronous messaging:
Competing Consumers Pattern. Multiple consumers may need to compete to read messages from a queue. This
pattern explains how to process multiple messages concurrently to optimize throughput, to improve scalability
and availability, and to balance the workload.
Priority Queue Pattern. For cases where the business logic requires that some messages are processed before
others, this pattern describes how messages posted by a producer that have a higher priority can be received
and processed more quickly by a consumer than messages of a lower priority.
Queue-based Load Leveling Pattern. This pattern uses a message broker to act as a buffer between a producer
and a consumer to help to minimize the impact on availability and responsiveness of intermittent heavy loads
for both those entities.
Retry Pattern. A producer or consumer might be unable connect to a queue, but the reasons for this failure may
be temporary and quickly pass. This pattern describes how to handle this situation to add resiliency to an
application.
Scheduler Agent Supervisor Pattern. Messaging is often used as part of a workflow implementation. This
pattern demonstrates how messaging can coordinate a set of actions across a distributed set of services and
other remote resources, and enable a system to recover and retry actions that fail.
Choreography pattern. This pattern shows how services can use messaging to control the workflow of a
business transaction.
Claim-Check Pattern. This pattern shows how to split a large message into a claim check and a payload.
Web API design
12/18/2020 • 28 minutes to read • Edit Online

Most modern web applications expose APIs that clients can use to interact with the application. A well-designed
web API should aim to support:
Platform independence . Any client should be able to call the API, regardless of how the API is
implemented internally. This requires using standard protocols, and having a mechanism whereby the client
and the web service can agree on the format of the data to exchange.
Ser vice evolution . The web API should be able to evolve and add functionality independently from client
applications. As the API evolves, existing client applications should continue to function without
modification. All functionality should be discoverable so that client applications can fully use it.
This guidance describes issues that you should consider when designing a web API.

Introduction to REST
In 2000, Roy Fielding proposed Representational State Transfer (REST) as an architectural approach to designing
web services. REST is an architectural style for building distributed systems based on hypermedia. REST is
independent of any underlying protocol and is not necessarily tied to HTTP. However, most common REST
implementations use HTTP as the application protocol, and this guide focuses on designing REST APIs for HTTP.
A primary advantage of REST over HTTP is that it uses open standards, and does not bind the implementation of
the API or the client applications to any specific implementation. For example, a REST web service could be written
in ASP.NET, and client applications can use any language or toolset that can generate HTTP requests and parse
HTTP responses.
Here are some of the main design principles of RESTful APIs using HTTP:
REST APIs are designed around resources, which are any kind of object, data, or service that can be
accessed by the client.
A resource has an identifier, which is a URI that uniquely identifies that resource. For example, the URI for a
particular customer order might be:

https://adventure-works.com/orders/1

Clients interact with a service by exchanging representations of resources. Many web APIs use JSON as the
exchange format. For example, a GET request to the URI listed above might return this response body:

{"orderId":1,"orderValue":99.90,"productId":1,"quantity":1}

REST APIs use a uniform interface, which helps to decouple the client and service implementations. For
REST APIs built on HTTP, the uniform interface includes using standard HTTP verbs to perform operations
on resources. The most common operations are GET, POST, PUT, PATCH, and DELETE.
REST APIs use a stateless request model. HTTP requests should be independent and may occur in any order,
so keeping transient state information between requests is not feasible. The only place where information is
stored is in the resources themselves, and each request should be an atomic operation. This constraint
enables web services to be highly scalable, because there is no need to retain any affinity between clients
and specific servers. Any server can handle any request from any client. That said, other factors can limit
scalability. For example, many web services write to a backend data store, which may be hard to scale out.
For more information about strategies to scale out a data store, see Horizontal, vertical, and functional data
partitioning.
REST APIs are driven by hypermedia links that are contained in the representation. For example, the
following shows a JSON representation of an order. It contains links to get or update the customer
associated with the order.

{
"orderID":3,
"productID":2,
"quantity":4,
"orderValue":16.60,
"links": [
{"rel":"product","href":"https://adventure-works.com/customers/3", "action":"GET" },
{"rel":"product","href":"https://adventure-works.com/customers/3", "action":"PUT" }
]
}

In 2008, Leonard Richardson proposed the following maturity model for web APIs:
Level 0: Define one URI, and all operations are POST requests to this URI.
Level 1: Create separate URIs for individual resources.
Level 2: Use HTTP methods to define operations on resources.
Level 3: Use hypermedia (HATEOAS, described below).
Level 3 corresponds to a truly RESTful API according to Fielding's definition. In practice, many published web APIs
fall somewhere around level 2.

Organize the API around resources


Focus on the business entities that the web API exposes. For example, in an e-commerce system, the primary
entities might be customers and orders. Creating an order can be achieved by sending an HTTP POST request that
contains the order information. The HTTP response indicates whether the order was placed successfully or not.
When possible, resource URIs should be based on nouns (the resource) and not verbs (the operations on the
resource).

https://adventure-works.com/orders // Good

https://adventure-works.com/create-order // Avoid

A resource doesn't have to be based on a single physical data item. For example, an order resource might be
implemented internally as several tables in a relational database, but presented to the client as a single entity.
Avoid creating APIs that simply mirror the internal structure of a database. The purpose of REST is to model
entities and the operations that an application can perform on those entities. A client should not be exposed to the
internal implementation.
Entities are often grouped together into collections (orders, customers). A collection is a separate resource from
the item within the collection, and should have its own URI. For example, the following URI might represent the
collection of orders:

https://adventure-works.com/orders

Sending an HTTP GET request to the collection URI retrieves a list of items in the collection. Each item in the
collection also has its own unique URI. An HTTP GET request to the item's URI returns the details of that item.
Adopt a consistent naming convention in URIs. In general, it helps to use plural nouns for URIs that reference
collections. It's a good practice to organize URIs for collections and items into a hierarchy. For example,
/customers is the path to the customers collection, and /customers/5 is the path to the customer with ID equal to
5. This approach helps to keep the web API intuitive. Also, many web API frameworks can route requests based on
parameterized URI paths, so you could define a route for the path /customers/{id} .
Also consider the relationships between different types of resources and how you might expose these
associations. For example, the /customers/5/orders might represent all of the orders for customer 5. You could
also go in the other direction, and represent the association from an order back to a customer with a URI such as
/orders/99/customer . However, extending this model too far can become cumbersome to implement. A better
solution is to provide navigable links to associated resources in the body of the HTTP response message. This
mechanism is described in more detail in the section Use HATEOAS to enable navigation to related resources.
In more complex systems, it can be tempting to provide URIs that enable a client to navigate through several levels
of relationships, such as /customers/1/orders/99/products . However, this level of complexity can be difficult to
maintain and is inflexible if the relationships between resources change in the future. Instead, try to keep URIs
relatively simple. Once an application has a reference to a resource, it should be possible to use this reference to
find items related to that resource. The preceding query can be replaced with the URI /customers/1/orders to find
all the orders for customer 1, and then /orders/99/products to find the products in this order.

TIP
Avoid requiring resource URIs more complex than collection/item/collection.

Another factor is that all web requests impose a load on the web server. The more requests, the bigger the load.
Therefore, try to avoid "chatty" web APIs that expose a large number of small resources. Such an API may require
a client application to send multiple requests to find all of the data that it requires. Instead, you might want to
denormalize the data and combine related information into bigger resources that can be retrieved with a single
request. However, you need to balance this approach against the overhead of fetching data that the client doesn't
need. Retrieving large objects can increase the latency of a request and incur additional bandwidth costs. For more
information about these performance antipatterns, see Chatty I/O and Extraneous Fetching.
Avoid introducing dependencies between the web API and the underlying data sources. For example, if your data
is stored in a relational database, the web API doesn't need to expose each table as a collection of resources. In fact,
that's probably a poor design. Instead, think of the web API as an abstraction of the database. If necessary,
introduce a mapping layer between the database and the web API. That way, client applications are isolated from
changes to the underlying database scheme.
Finally, it might not be possible to map every operation implemented by a web API to a specific resource. You can
handle such non-resource scenarios through HTTP requests that invoke a function and return the results as an
HTTP response message. For example, a web API that implements simple calculator operations such as add and
subtract could provide URIs that expose these operations as pseudo resources and use the query string to specify
the parameters required. For example, a GET request to the URI /add?operand1=99&operand2=1 would return a
response message with the body containing the value 100. However, only use these forms of URIs sparingly.

Define operations in terms of HTTP methods


The HTTP protocol defines a number of methods that assign semantic meaning to a request. The common HTTP
methods used by most RESTful web APIs are:
GET retrieves a representation of the resource at the specified URI. The body of the response message contains
the details of the requested resource.
POST creates a new resource at the specified URI. The body of the request message provides the details of the
new resource. Note that POST can also be used to trigger operations that don't actually create resources.
PUT either creates or replaces the resource at the specified URI. The body of the request message specifies the
resource to be created or updated.
PATCH performs a partial update of a resource. The request body specifies the set of changes to apply to the
resource.
DELETE removes the resource at the specified URI.
The effect of a specific request should depend on whether the resource is a collection or an individual item. The
following table summarizes the common conventions adopted by most RESTful implementations using the e-
commerce example. Not all of these requests might be implemented—it depends on the specific scenario.

RESO URC E P O ST GET P UT DEL ET E

/customers Create a new Retrieve all customers Bulk update of Remove all customers
customer customers

/customers/1 Error Retrieve the details Update the details of Remove customer 1
for customer 1 customer 1 if it exists

/customers/1/orders Create a new order Retrieve all orders for Bulk update of orders Remove all orders for
for customer 1 customer 1 for customer 1 customer 1

The differences between POST, PUT, and PATCH can be confusing.


A POST request creates a resource. The server assigns a URI for the new resource, and returns that URI to
the client. In the REST model, you frequently apply POST requests to collections. The new resource is added
to the collection. A POST request can also be used to submit data for processing to an existing resource,
without any new resource being created.
A PUT request creates a resource or updates an existing resource. The client specifies the URI for the
resource. The request body contains a complete representation of the resource. If a resource with this URI
already exists, it is replaced. Otherwise a new resource is created, if the server supports doing so. PUT
requests are most frequently applied to resources that are individual items, such as a specific customer,
rather than collections. A server might support updates but not creation via PUT. Whether to support
creation via PUT depends on whether the client can meaningfully assign a URI to a resource before it exists.
If not, then use POST to create resources and PUT or PATCH to update.
A PATCH request performs a partial update to an existing resource. The client specifies the URI for the
resource. The request body specifies a set of changes to apply to the resource. This can be more efficient
than using PUT, because the client only sends the changes, not the entire representation of the resource.
Technically PATCH can also create a new resource (by specifying a set of updates to a "null" resource), if the
server supports this.
PUT requests must be idempotent. If a client submits the same PUT request multiple times, the results should
always be the same (the same resource will be modified with the same values). POST and PATCH requests are not
guaranteed to be idempotent.

Conform to HTTP semantics


This section describes some typical considerations for designing an API that conforms to the HTTP specification.
However, it doesn't cover every possible detail or scenario. When in doubt, consult the HTTP specifications.
Media types
As mentioned earlier, clients and servers exchange representations of resources. For example, in a POST request,
the request body contains a representation of the resource to create. In a GET request, the response body contains
a representation of the fetched resource.
In the HTTP protocol, formats are specified through the use of media types, also called MIME types. For non-binary
data, most web APIs support JSON (media type = application/json) and possibly XML (media type =
application/xml).
The Content-Type header in a request or response specifies the format of the representation. Here is an example of
a POST request that includes JSON data:

POST https://adventure-works.com/orders HTTP/1.1


Content-Type: application/json; charset=utf-8
Content-Length: 57

{"Id":1,"Name":"Gizmo","Category":"Widgets","Price":1.99}

If the server doesn't support the media type, it should return HTTP status code 415 (Unsupported Media Type).
A client request can include an Accept header that contains a list of media types the client will accept from the
server in the response message. For example:

GET https://adventure-works.com/orders/2 HTTP/1.1


Accept: application/json

If the server cannot match any of the media type(s) listed, it should return HTTP status code 406 (Not Acceptable).
GET methods
A successful GET method typically returns HTTP status code 200 (OK). If the resource cannot be found, the method
should return 404 (Not Found).
POST methods
If a POST method creates a new resource, it returns HTTP status code 201 (Created). The URI of the new resource
is included in the Location header of the response. The response body contains a representation of the resource.
If the method does some processing but does not create a new resource, the method can return HTTP status code
200 and include the result of the operation in the response body. Alternatively, if there is no result to return, the
method can return HTTP status code 204 (No Content) with no response body.
If the client puts invalid data into the request, the server should return HTTP status code 400 (Bad Request). The
response body can contain additional information about the error or a link to a URI that provides more details.
PUT methods
If a PUT method creates a new resource, it returns HTTP status code 201 (Created), as with a POST method. If the
method updates an existing resource, it returns either 200 (OK) or 204 (No Content). In some cases, it might not
be possible to update an existing resource. In that case, consider returning HTTP status code 409 (Conflict).
Consider implementing bulk HTTP PUT operations that can batch updates to multiple resources in a collection. The
PUT request should specify the URI of the collection, and the request body should specify the details of the
resources to be modified. This approach can help to reduce chattiness and improve performance.
PATCH methods
With a PATCH request, the client sends a set of updates to an existing resource, in the form of a patch document.
The server processes the patch document to perform the update. The patch document doesn't describe the whole
resource, only a set of changes to apply. The specification for the PATCH method (RFC 5789) doesn't define a
particular format for patch documents. The format must be inferred from the media type in the request.
JSON is probably the most common data format for web APIs. There are two main JSON-based patch formats,
called JSON patch and JSON merge patch.
JSON merge patch is somewhat simpler. The patch document has the same structure as the original JSON
resource, but includes just the subset of fields that should be changed or added. In addition, a field can be deleted
by specifying null for the field value in the patch document. (That means merge patch is not suitable if the
original resource can have explicit null values.)
For example, suppose the original resource has the following JSON representation:

{
"name":"gizmo",
"category":"widgets",
"color":"blue",
"price":10
}

Here is a possible JSON merge patch for this resource:

{
"price":12,
"color":null,
"size":"small"
}

This tells the server to update price , delete color , and add size , while name and category are not modified.
For the exact details of JSON merge patch, see RFC 7396. The media type for JSON merge patch is
application/merge-patch+json .

Merge patch is not suitable if the original resource can contain explicit null values, due to the special meaning of
null in the patch document. Also, the patch document doesn't specify the order that the server should apply the
updates. That may or may not matter, depending on the data and the domain. JSON patch, defined in RFC 6902, is
more flexible. It specifies the changes as a sequence of operations to apply. Operations include add, remove,
replace, copy, and test (to validate values). The media type for JSON patch is application/json-patch+json .
Here are some typical error conditions that might be encountered when processing a PATCH request, along with
the appropriate HTTP status code.

ERRO R C O N DIT IO N H T T P STAT US C O DE

The patch document format isn't supported. 415 (Unsupported Media Type)

Malformed patch document. 400 (Bad Request)

The patch document is valid, but the changes can't be applied 409 (Conflict)
to the resource in its current state.

DELETE methods
If the delete operation is successful, the web server should respond with HTTP status code 204, indicating that the
process has been successfully handled, but that the response body contains no further information. If the resource
doesn't exist, the web server can return HTTP 404 (Not Found).
Asynchronous operations
Sometimes a POST, PUT, PATCH, or DELETE operation might require processing that takes a while to complete. If
you wait for completion before sending a response to the client, it may cause unacceptable latency. If so, consider
making the operation asynchronous. Return HTTP status code 202 (Accepted) to indicate the request was accepted
for processing but is not completed.
You should expose an endpoint that returns the status of an asynchronous request, so the client can monitor the
status by polling the status endpoint. Include the URI of the status endpoint in the Location header of the 202
response. For example:

HTTP/1.1 202 Accepted


Location: /api/status/12345

If the client sends a GET request to this endpoint, the response should contain the current status of the request.
Optionally, it could also include an estimated time to completion or a link to cancel the operation.

HTTP/1.1 200 OK
Content-Type: application/json

{
"status":"In progress",
"link": { "rel":"cancel", "method":"delete", "href":"/api/status/12345" }
}

If the asynchronous operation creates a new resource, the status endpoint should return status code 303 (See
Other) after the operation completes. In the 303 response, include a Location header that gives the URI of the new
resource:

HTTP/1.1 303 See Other


Location: /api/orders/12345

For more information, see Asynchronous Request-Reply pattern.

Filter and paginate data


Exposing a collection of resources through a single URI can lead to applications fetching large amounts of data
when only a subset of the information is required. For example, suppose a client application needs to find all
orders with a cost over a specific value. It might retrieve all orders from the /orders URI and then filter these
orders on the client side. Clearly this process is highly inefficient. It wastes network bandwidth and processing
power on the server hosting the web API.
Instead, the API can allow passing a filter in the query string of the URI, such as /orders?minCost=n. The web API is
then responsible for parsing and handling the minCost parameter in the query string and returning the filtered
results on the server side.
GET requests over collection resources can potentially return a large number of items. You should design a web
API to limit the amount of data returned by any single request. Consider supporting query strings that specify the
maximum number of items to retrieve and a starting offset into the collection. For example:

/orders?limit=25&offset=50

Also consider imposing an upper limit on the number of items returned, to help prevent Denial of Service attacks.
To assist client applications, GET requests that return paginated data should also include some form of metadata
that indicate the total number of resources available in the collection.
You can use a similar strategy to sort data as it is fetched, by providing a sort parameter that takes a field name as
the value, such as /orders?sort=ProductID. However, this approach can have a negative effect on caching, because
query string parameters form part of the resource identifier used by many cache implementations as the key to
cached data.
You can extend this approach to limit the fields returned for each item, if each item contains a large amount of
data. For example, you could use a query string parameter that accepts a comma-delimited list of fields, such as
/orders?fields=ProductID,Quantity.
Give all optional parameters in query strings meaningful defaults. For example, set the limit parameter to 10
and the offset parameter to 0 if you implement pagination, set the sort parameter to the key of the resource if
you implement ordering, and set the fields parameter to all fields in the resource if you support projections.

Support partial responses for large binary resources


A resource may contain large binary fields, such as files or images. To overcome problems caused by unreliable
and intermittent connections and to improve response times, consider enabling such resources to be retrieved in
chunks. To do this, the web API should support the Accept-Ranges header for GET requests for large resources.
This header indicates that the GET operation supports partial requests. The client application can submit GET
requests that return a subset of a resource, specified as a range of bytes.
Also, consider implementing HTTP HEAD requests for these resources. A HEAD request is similar to a GET request,
except that it only returns the HTTP headers that describe the resource, with an empty message body. A client
application can issue a HEAD request to determine whether to fetch a resource by using partial GET requests. For
example:

HEAD https://adventure-works.com/products/10?fields=productImage HTTP/1.1

Here is an example response message:

HTTP/1.1 200 OK

Accept-Ranges: bytes
Content-Type: image/jpeg
Content-Length: 4580

The Content-Length header gives the total size of the resource, and the Accept-Ranges header indicates that the
corresponding GET operation supports partial results. The client application can use this information to retrieve
the image in smaller chunks. The first request fetches the first 2500 bytes by using the Range header:

GET https://adventure-works.com/products/10?fields=productImage HTTP/1.1


Range: bytes=0-2499

The response message indicates that this is a partial response by returning HTTP status code 206. The Content-
Length header specifies the actual number of bytes returned in the message body (not the size of the resource),
and the Content-Range header indicates which part of the resource this is (bytes 0-2499 out of 4580):

HTTP/1.1 206 Partial Content

Accept-Ranges: bytes
Content-Type: image/jpeg
Content-Length: 2500
Content-Range: bytes 0-2499/4580

[...]

A subsequent request from the client application can retrieve the remainder of the resource.
Use HATEOAS to enable navigation to related resources
One of the primary motivations behind REST is that it should be possible to navigate the entire set of resources
without requiring prior knowledge of the URI scheme. Each HTTP GET request should return the information
necessary to find the resources related directly to the requested object through hyperlinks included in the
response, and it should also be provided with information that describes the operations available on each of these
resources. This principle is known as HATEOAS, or Hypertext as the Engine of Application State. The system is
effectively a finite state machine, and the response to each request contains the information necessary to move
from one state to another; no other information should be necessary.

NOTE
Currently there are no general-purpose standards that define how to model the HATEOAS principle. The examples shown in
this section illustrate one possible, proprietary solution.

For example, to handle the relationship between an order and a customer, the representation of an order could
include links that identify the available operations for the customer of the order. Here is a possible representation:

{
"orderID":3,
"productID":2,
"quantity":4,
"orderValue":16.60,
"links":[
{
"rel":"customer",
"href":"https://adventure-works.com/customers/3",
"action":"GET",
"types":["text/xml","application/json"]
},
{
"rel":"customer",
"href":"https://adventure-works.com/customers/3",
"action":"PUT",
"types":["application/x-www-form-urlencoded"]
},
{
"rel":"customer",
"href":"https://adventure-works.com/customers/3",
"action":"DELETE",
"types":[]
},
{
"rel":"self",
"href":"https://adventure-works.com/orders/3",
"action":"GET",
"types":["text/xml","application/json"]
},
{
"rel":"self",
"href":"https://adventure-works.com/orders/3",
"action":"PUT",
"types":["application/x-www-form-urlencoded"]
},
{
"rel":"self",
"href":"https://adventure-works.com/orders/3",
"action":"DELETE",
"types":[]
}]
}
In this example, the links array has a set of links. Each link represents an operation on a related entity. The data
for each link includes the relationship ("customer"), the URI ( https://adventure-works.com/customers/3 ), the HTTP
method, and the supported MIME types. This is all the information that a client application needs to be able to
invoke the operation.
The links array also includes self-referencing information about the resource itself that has been retrieved. These
have the relationship self.
The set of links that are returned may change, depending on the state of the resource. This is what is meant by
hypertext being the "engine of application state."

Versioning a RESTful web API


It is highly unlikely that a web API will remain static. As business requirements change new collections of
resources may be added, the relationships between resources might change, and the structure of the data in
resources might be amended. While updating a web API to handle new or differing requirements is a relatively
straightforward process, you must consider the effects that such changes will have on client applications
consuming the web API. The issue is that although the developer designing and implementing a web API has full
control over that API, the developer does not have the same degree of control over client applications, which may
be built by third-party organizations operating remotely. The primary imperative is to enable existing client
applications to continue functioning unchanged while allowing new client applications to take advantage of new
features and resources.
Versioning enables a web API to indicate the features and resources that it exposes, and a client application can
submit requests that are directed to a specific version of a feature or resource. The following sections describe
several different approaches, each of which has its own benefits and trade-offs.
No versioning
This is the simplest approach, and may be acceptable for some internal APIs. Significant changes could be
represented as new resources or new links. Adding content to existing resources might not present a breaking
change as client applications that are not expecting to see this content will ignore it.
For example, a request to the URI https://adventure-works.com/customers/3 should return the details of a single
customer containing id , name , and address fields expected by the client application:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"id":3,"name":"Contoso LLC","address":"1 Microsoft Way Redmond WA 98053"}

NOTE
For simplicity, the example responses shown in this section do not include HATEOAS links.

If the DateCreated field is added to the schema of the customer resource, then the response would look like this:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"id":3,"name":"Contoso LLC","dateCreated":"2014-09-04T12:11:38.0376089Z","address":"1 Microsoft Way Redmond


WA 98053"}

Existing client applications might continue functioning correctly if they are capable of ignoring unrecognized fields,
while new client applications can be designed to handle this new field. However, if more radical changes to the
schema of resources occur (such as removing or renaming fields) or the relationships between resources change
then these may constitute breaking changes that prevent existing client applications from functioning correctly. In
these situations, you should consider one of the following approaches.
URI versioning
Each time you modify the web API or change the schema of resources, you add a version number to the URI for
each resource. The previously existing URIs should continue to operate as before, returning resources that
conform to their original schema.
Extending the previous example, if the address field is restructured into subfields containing each constituent part
of the address (such as streetAddress , city , state , and zipCode ), this version of the resource could be exposed
through a URI containing a version number, such as https://adventure-works.com/v2/customers/3 :

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"id":3,"name":"Contoso LLC","dateCreated":"2014-09-04T12:11:38.0376089Z","address":{"streetAddress":"1
Microsoft Way","city":"Redmond","state":"WA","zipCode":98053}}

This versioning mechanism is very simple but depends on the server routing the request to the appropriate
endpoint. However, it can become unwieldy as the web API matures through several iterations and the server has
to support a number of different versions. Also, from a purist's point of view, in all cases the client applications are
fetching the same data (customer 3), so the URI should not really be different depending on the version. This
scheme also complicates implementation of HATEOAS as all links will need to include the version number in their
URIs.
Query string versioning
Rather than providing multiple URIs, you can specify the version of the resource by using a parameter within the
query string appended to the HTTP request, such as https://adventure-works.com/customers/3?version=2 . The
version parameter should default to a meaningful value such as 1 if it is omitted by older client applications.
This approach has the semantic advantage that the same resource is always retrieved from the same URI, but it
depends on the code that handles the request to parse the query string and send back the appropriate HTTP
response. This approach also suffers from the same complications for implementing HATEOAS as the URI
versioning mechanism.

NOTE
Some older web browsers and web proxies will not cache responses for requests that include a query string in the URI. This
can degrade performance for web applications that use a web API and that run from within such a web browser.

Header versioning
Rather than appending the version number as a query string parameter, you could implement a custom header
that indicates the version of the resource. This approach requires that the client application adds the appropriate
header to any requests, although the code handling the client request could use a default value (version 1) if the
version header is omitted. The following examples use a custom header named Custom-Header. The value of this
header indicates the version of web API.
Version 1:

GET https://adventure-works.com/customers/3 HTTP/1.1


Custom-Header: api-version=1
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"id":3,"name":"Contoso LLC","address":"1 Microsoft Way Redmond WA 98053"}

Version 2:

GET https://adventure-works.com/customers/3 HTTP/1.1


Custom-Header: api-version=2

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"id":3,"name":"Contoso LLC","dateCreated":"2014-09-04T12:11:38.0376089Z","address":{"streetAddress":"1
Microsoft Way","city":"Redmond","state":"WA","zipCode":98053}}

As with the previous two approaches, implementing HATEOAS requires including the appropriate custom header
in any links.
Media type versioning
When a client application sends an HTTP GET request to a web server it should stipulate the format of the content
that it can handle by using an Accept header, as described earlier in this guidance. Frequently the purpose of the
Accept header is to allow the client application to specify whether the body of the response should be XML, JSON,
or some other common format that the client can parse. However, it is possible to define custom media types that
include information enabling the client application to indicate which version of a resource it is expecting. The
following example shows a request that specifies an Accept header with the value application/vnd.adventure-
works.v1+json. The vnd.adventure-works.v1 element indicates to the web server that it should return version 1 of
the resource, while the json element specifies that the format of the response body should be JSON:

GET https://adventure-works.com/customers/3 HTTP/1.1


Accept: application/vnd.adventure-works.v1+json

The code handling the request is responsible for processing the Accept header and honoring it as far as possible
(the client application may specify multiple formats in the Accept header, in which case the web server can choose
the most appropriate format for the response body). The web server confirms the format of the data in the
response body by using the Content-Type header:

HTTP/1.1 200 OK
Content-Type: application/vnd.adventure-works.v1+json; charset=utf-8

{"id":3,"name":"Contoso LLC","address":"1 Microsoft Way Redmond WA 98053"}

If the Accept header does not specify any known media types, the web server could generate an HTTP 406 (Not
Acceptable) response message or return a message with a default media type.
This approach is arguably the purest of the versioning mechanisms and lends itself naturally to HATEOAS, which
can include the MIME type of related data in resource links.
NOTE
When you select a versioning strategy, you should also consider the implications on performance, especially caching on the
web server. The URI versioning and Query String versioning schemes are cache-friendly inasmuch as the same URI/query
string combination refers to the same data each time.
The Header versioning and Media Type versioning mechanisms typically require additional logic to examine the values in the
custom header or the Accept header. In a large-scale environment, many clients using different versions of a web API can
result in a significant amount of duplicated data in a server-side cache. This issue can become acute if a client application
communicates with a web server through a proxy that implements caching, and that only forwards a request to the web
server if it does not currently hold a copy of the requested data in its cache.

Open API Initiative


The Open API Initiative was created by an industry consortium to standardize REST API descriptions across
vendors. As part of this initiative, the Swagger 2.0 specification was renamed the OpenAPI Specification (OAS) and
brought under the Open API Initiative.
You may want to adopt OpenAPI for your web APIs. Some points to consider:
The OpenAPI Specification comes with a set of opinionated guidelines on how a REST API should be
designed. That has advantages for interoperability, but requires more care when designing your API to
conform to the specification.
OpenAPI promotes a contract-first approach, rather than an implementation-first approach. Contract-first
means you design the API contract (the interface) first and then write code that implements the contract.
Tools like Swagger can generate client libraries or documentation from API contracts. For example, see
ASP.NET Web API help pages using Swagger.

More information
Microsoft REST API guidelines. Detailed recommendations for designing public REST APIs.
Web API checklist. A useful list of items to consider when designing and implementing a web API.
Open API Initiative. Documentation and implementation details on Open API.
Web API implementation
12/18/2020 • 46 minutes to read • Edit Online

A carefully designed RESTful web API defines the resources, relationships, and navigation schemes that are
accessible to client applications. When you implement and deploy a web API, you should consider the physical
requirements of the environment hosting the web API and the way in which the web API is constructed rather than
the logical structure of the data. This guidance focuses on best practices for implementing a web API and
publishing it to make it available to client applications. For detailed information about web API design, see Web API
design.

Processing requests
Consider the following points when you implement the code to handle requests.
GET, PUT, DELETE, HEAD, and PATCH actions should be idempotent
The code that implements these requests should not impose any side-effects. The same request repeated over the
same resource should result in the same state. For example, sending multiple DELETE requests to the same URI
should have the same effect, although the HTTP status code in the response messages may be different. The first
DELETE request might return status code 204 (No Content), while a subsequent DELETE request might return status
code 404 (Not Found).

NOTE
The article Idempotency Patterns on Jonathan Oliver's blog provides an overview of idempotency and how it relates to data
management operations.

POST actions that create new resources should not have unrelated side -effects
If a POST request is intended to create a new resource, the effects of the request should be limited to the new
resource (and possibly any directly related resources if there is some sort of linkage involved) For example, in an e-
commerce system, a POST request that creates a new order for a customer might also amend inventory levels and
generate billing information, but it should not modify information not directly related to the order or have any
other side-effects on the overall state of the system.
Avoid implementing chatty POST, PUT, and DELETE operations
Support POST, PUT and DELETE requests over resource collections. A POST request can contain the details for
multiple new resources and add them all to the same collection, a PUT request can replace the entire set of
resources in a collection, and a DELETE request can remove an entire collection.
The OData support included in ASP.NET Web API 2 provides the ability to batch requests. A client application can
package up several web API requests and send them to the server in a single HTTP request, and receive a single
HTTP response that contains the replies to each request. For more information, Introducing batch support in Web
API and Web API OData.
Follow the HTTP specification when sending a response
A web API must return messages that contain the correct HTTP status code to enable the client to determine how to
handle the result, the appropriate HTTP headers so that the client understands the nature of the result, and a
suitably formatted body to enable the client to parse the result.
For example, a POST operation should return status code 201 (Created) and the response message should include
the URI of the newly created resource in the Location header of the response message.
Support content negotiation
The body of a response message may contain data in a variety of formats. For example, an HTTP GET request could
return data in JSON, or XML format. When the client submits a request, it can include an Accept header that
specifies the data formats that it can handle. These formats are specified as media types. For example, a client that
issues a GET request that retrieves an image can specify an Accept header that lists the media types that the client
can handle, such as image/jpeg, image/gif, image/png . When the web API returns the result, it should format the
data by using one of these media types and specify the format in the Content-Type header of the response.
If the client does not specify an Accept header, then use a sensible default format for the response body. As an
example, the ASP.NET Web API framework defaults to JSON for text-based data.
Provide links to support HATEOAS -style navigation and discovery of resources
The HATEOAS approach enables a client to navigate and discover resources from an initial starting point. This is
achieved by using links containing URIs; when a client issues an HTTP GET request to obtain a resource, the
response should contain URIs that enable a client application to quickly locate any directly related resources. For
example, in a web API that supports an e-commerce solution, a customer may have placed many orders. When a
client application retrieves the details for a customer, the response should include links that enable the client
application to send HTTP GET requests that can retrieve these orders. Additionally, HATEOAS-style links should
describe the other operations (POST, PUT, DELETE, and so on) that each linked resource supports together with the
corresponding URI to perform each request. This approach is described in more detail in API design.
Currently there are no standards that govern the implementation of HATEOAS, but the following example illustrates
one possible approach. In this example, an HTTP GET request that finds the details for a customer returns a
response that includes HATEOAS links that reference the orders for that customer:

GET https://adventure-works.com/customers/2 HTTP/1.1


Accept: text/json
...

HTTP/1.1 200 OK
...
Content-Type: application/json; charset=utf-8
...
Content-Length: ...
{"CustomerID":2,"CustomerName":"Bert","Links":[
{"rel":"self",
"href":"https://adventure-works.com/customers/2",
"action":"GET",
"types":["text/xml","application/json"]},
{"rel":"self",
"href":"https://adventure-works.com/customers/2",
"action":"PUT",
"types":["application/x-www-form-urlencoded"]},
{"rel":"self",
"href":"https://adventure-works.com/customers/2",
"action":"DELETE",
"types":[]},
{"rel":"orders",
"href":"https://adventure-works.com/customers/2/orders",
"action":"GET",
"types":["text/xml","application/json"]},
{"rel":"orders",
"href":"https://adventure-works.com/customers/2/orders",
"action":"POST",
"types":["application/x-www-form-urlencoded"]}
]}

In this example, the customer data is represented by the Customer class shown in the following code snippet. The
HATEOAS links are held in the Links collection property:

public class Customer


{
public int CustomerID { get; set; }
public string CustomerName { get; set; }
public List<Link> Links { get; set; }
...
}

public class Link


{
public string Rel { get; set; }
public string Href { get; set; }
public string Action { get; set; }
public string [] Types { get; set; }
}

The HTTP GET operation retrieves the customer data from storage and constructs a Customer object, and then
populates the Links collection. The result is formatted as a JSON response message. Each link comprises the
following fields:
The relationship between the object being returned and the object described by the link. In this case self
indicates that the link is a reference back to the object itself (similar to a this pointer in many object-oriented
languages), and orders is the name of a collection containing the related order information.
The hyperlink ( Href ) for the object being described by the link in the form of a URI.
The type of HTTP request ( Action ) that can be sent to this URI.
The format of any data ( Types ) that should be provided in the HTTP request or that can be returned in the
response, depending on the type of the request.
The HATEOAS links shown in the example HTTP response indicate that a client application can perform the
following operations:
An HTTP GET request to the URI https://adventure-works.com/customers/2 to fetch the details of the customer
(again). The data can be returned as XML or JSON.
An HTTP PUT request to the URI https://adventure-works.com/customers/2 to modify the details of the customer.
The new data must be provided in the request message in x-www-form-urlencoded format.
An HTTP DELETE request to the URI https://adventure-works.com/customers/2 to delete the customer. The
request does not expect any additional information or return data in the response message body.
An HTTP GET request to the URI https://adventure-works.com/customers/2/orders to find all the orders for the
customer. The data can be returned as XML or JSON.
An HTTP POST request to the URI https://adventure-works.com/customers/2/orders to create a new order for this
customer. The data must be provided in the request message in x-www-form-urlencoded format.

Handling exceptions
Consider the following points if an operation throws an uncaught exception.
Capture exceptions and return a meaningful response to clients
The code that implements an HTTP operation should provide comprehensive exception handling rather than letting
uncaught exceptions propagate to the framework. If an exception makes it impossible to complete the operation
successfully, the exception can be passed back in the response message, but it should include a meaningful
description of the error that caused the exception. The exception should also include the appropriate HTTP status
code rather than simply returning status code 500 for every situation. For example, if a user request causes a
database update that violates a constraint (such as attempting to delete a customer that has outstanding orders),
you should return status code 409 (Conflict) and a message body indicating the reason for the conflict. If some
other condition renders the request unachievable, you can return status code 400 (Bad Request). You can find a full
list of HTTP status codes on the Status code definitions page on the W3C website.
The code example traps different conditions and returns an appropriate response.

[HttpDelete]
[Route("customers/{id:int}")]
public IHttpActionResult DeleteCustomer(int id)
{
try
{
// Find the customer to be deleted in the repository
var customerToDelete = repository.GetCustomer(id);

// If there is no such customer, return an error response


// with status code 404 (Not Found)
if (customerToDelete == null)
{
return NotFound();
}

// Remove the customer from the repository


// The DeleteCustomer method returns true if the customer
// was successfully deleted
if (repository.DeleteCustomer(id))
{
// Return a response message with status code 204 (No Content)
// To indicate that the operation was successful
return StatusCode(HttpStatusCode.NoContent);
}
else
{
// Otherwise return a 400 (Bad Request) error response
return BadRequest(Strings.CustomerNotDeleted);
}
}
catch
{
// If an uncaught exception occurs, return an error response
// with status code 500 (Internal Server Error)
return InternalServerError();
}
}

TIP
Do not include information that could be useful to an attacker attempting to penetrate your API.

Many web servers trap error conditions themselves before they reach the web API. For example, if you configure
authentication for a web site and the user fails to provide the correct authentication information, the web server
should respond with status code 401 (Unauthorized). Once a client has been authenticated, your code can perform
its own checks to verify that the client should be able access the requested resource. If this authorization fails, you
should return status code 403 (Forbidden).
Handle exceptions consistently and log information about errors
To handle exceptions in a consistent manner, consider implementing a global error handling strategy across the
entire web API. You should also incorporate error logging which captures the full details of each exception; this
error log can contain detailed information as long as it is not made accessible over the web to clients.
Distinguish between client-side errors and server-side errors
The HTTP protocol distinguishes between errors that occur due to the client application (the HTTP 4xx status codes),
and errors that are caused by a mishap on the server (the HTTP 5xx status codes). Make sure that you respect this
convention in any error response messages.

Optimizing client-side data access


In a distributed environment such as that involving a web server and client applications, one of the primary sources
of concern is the network. This can act as a considerable bottleneck, especially if a client application is frequently
sending requests or receiving data. Therefore you should aim to minimize the amount of traffic that flows across
the network. Consider the following points when you implement the code to retrieve and maintain data:
Support client-side caching
The HTTP 1.1 protocol supports caching in clients and intermediate servers through which a request is routed by
the use of the Cache-Control header. When a client application sends an HTTP GET request to the web API, the
response can include a Cache-Control header that indicates whether the data in the body of the response can be
safely cached by the client or an intermediate server through which the request has been routed, and for how long
before it should expire and be considered out-of-date. The following example shows an HTTP GET request and the
corresponding response that includes a Cache-Control header:

GET https://adventure-works.com/orders/2 HTTP/1.1

HTTP/1.1 200 OK
...
Cache-Control: max-age=600, private
Content-Type: text/json; charset=utf-8
Content-Length: ...
{"orderID":2,"productID":4,"quantity":2,"orderValue":10.00}

In this example, the Cache-Control header specifies that the data returned should be expired after 600 seconds, and
is only suitable for a single client and must not be stored in a shared cache used by other clients (it is private). The
Cache-Control header could specify public rather than private in which case the data can be stored in a shared
cache, or it could specify no-store in which case the data must not be cached by the client. The following code
example shows how to construct a Cache-Control header in a response message:
public class OrdersController : ApiController
{
...
[Route("api/orders/{id:int:min(0)}")]
[HttpGet]
public IHttpActionResult FindOrderByID(int id)
{
// Find the matching order
Order order = ...;
...
// Create a Cache-Control header for the response
var cacheControlHeader = new CacheControlHeaderValue();
cacheControlHeader.Private = true;
cacheControlHeader.MaxAge = new TimeSpan(0, 10, 0);
...

// Return a response message containing the order and the cache control header
OkResultWithCaching<Order> response = new OkResultWithCaching<Order>(order, this)
{
CacheControlHeader = cacheControlHeader
};
return response;
}
...
}

This code uses a custom IHttpActionResult class named OkResultWithCaching . This class enables the controller to
set the cache header contents:

public class OkResultWithCaching<T> : OkNegotiatedContentResult<T>


{
public OkResultWithCaching(T content, ApiController controller)
: base(content, controller) { }

public OkResultWithCaching(T content, IContentNegotiator contentNegotiator, HttpRequestMessage request,


IEnumerable<MediaTypeFormatter> formatters)
: base(content, contentNegotiator, request, formatters) { }

public CacheControlHeaderValue CacheControlHeader { get; set; }


public EntityTagHeaderValue ETag { get; set; }

public override async Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken)


{
HttpResponseMessage response;
try
{
response = await base.ExecuteAsync(cancellationToken);
response.Headers.CacheControl = this.CacheControlHeader;
response.Headers.ETag = ETag;
}
catch (OperationCanceledException)
{
response = new HttpResponseMessage(HttpStatusCode.Conflict) {ReasonPhrase = "Operation was
cancelled"};
}
return response;
}
}
NOTE
The HTTP protocol also defines the no-cache directive for the Cache-Control header. Rather confusingly, this directive does
not mean "do not cache" but rather "revalidate the cached information with the server before returning it"; the data can still
be cached, but it is checked each time it is used to ensure that it is still current.

Cache management is the responsibility of the client application or intermediate server, but if properly
implemented it can save bandwidth and improve performance by removing the need to fetch data that has already
been recently retrieved.
The max-age value in the Cache-Control header is only a guide and not a guarantee that the corresponding data
won't change during the specified time. The web API should set the max-age to a suitable value depending on the
expected volatility of the data. When this period expires, the client should discard the object from the cache.

NOTE
Most modern web browsers support client-side caching by adding the appropriate cache-control headers to requests and
examining the headers of the results, as described. However, some older browsers will not cache the values returned from a
URL that includes a query string. This is not usually an issue for custom client applications which implement their own cache
management strategy based on the protocol discussed here.
Some older proxies exhibit the same behavior and might not cache requests based on URLs with query strings. This could be
an issue for custom client applications that connect to a web server through such a proxy.

Provide ETags to optimize query processing


When a client application retrieves an object, the response message can also include an ETag (Entity Tag). An ETag is
an opaque string that indicates the version of a resource; each time a resource changes the ETag is also modified.
This ETag should be cached as part of the data by the client application. The following code example shows how to
add an ETag as part of the response to an HTTP GET request. This code uses the GetHashCode method of an object
to generate a numeric value that identifies the object (you can override this method if necessary and generate your
own hash using an algorithm such as MD5) :

public class OrdersController : ApiController


{
...
public IHttpActionResult FindOrderByID(int id)
{
// Find the matching order
Order order = ...;
...

var hashedOrder = order.GetHashCode();


string hashedOrderEtag = $"\"{hashedOrder}\"";
var eTag = new EntityTagHeaderValue(hashedOrderEtag);

// Return a response message containing the order and the cache control header
OkResultWithCaching<Order> response = new OkResultWithCaching<Order>(order, this)
{
...,
ETag = eTag
};
return response;
}
...
}

The response message posted by the web API looks like this:
HTTP/1.1 200 OK
...
Cache-Control: max-age=600, private
Content-Type: text/json; charset=utf-8
ETag: "2147483648"
Content-Length: ...
{"orderID":2,"productID":4,"quantity":2,"orderValue":10.00}

TIP
For security reasons, do not allow sensitive data or data returned over an authenticated (HTTPS) connection to be cached.

A client application can issue a subsequent GET request to retrieve the same resource at any time, and if the
resource has changed (it has a different ETag) the cached version should be discarded and the new version added
to the cache. If a resource is large and requires a significant amount of bandwidth to transmit back to the client,
repeated requests to fetch the same data can become inefficient. To combat this, the HTTP protocol defines the
following process for optimizing GET requests that you should support in a web API:
The client constructs a GET request containing the ETag for the currently cached version of the resource
referenced in an If-None-Match HTTP header:

GET https://adventure-works.com/orders/2 HTTP/1.1


If-None-Match: "2147483648"

The GET operation in the web API obtains the current ETag for the requested data (order 2 in the above
example), and compares it to the value in the If-None-Match header.
If the current ETag for the requested data matches the ETag provided by the request, the resource has not
changed and the web API should return an HTTP response with an empty message body and a status code
of 304 (Not Modified).
If the current ETag for the requested data does not match the ETag provided by the request, then the data has
changed and the web API should return an HTTP response with the new data in the message body and a
status code of 200 (OK).
If the requested data no longer exists then the web API should return an HTTP response with the status code
of 404 (Not Found).
The client uses the status code to maintain the cache. If the data has not changed (status code 304) then the
object can remain cached and the client application should continue to use this version of the object. If the
data has changed (status code 200) then the cached object should be discarded and the new one inserted. If
the data is no longer available (status code 404) then the object should be removed from the cache.

NOTE
If the response header contains the Cache-Control header no-store then the object should always be removed from the
cache regardless of the HTTP status code.

The code below shows the FindOrderByID method extended to support the If-None-Match header. Notice that if the
If-None-Match header is omitted, the specified order is always retrieved:
public class OrdersController : ApiController
{
[Route("api/orders/{id:int:min(0)}")]
[HttpGet]
public IHttpActionResult FindOrderByID(int id)
{
try
{
// Find the matching order
Order order = ...;

// If there is no such order then return NotFound


if (order == null)
{
return NotFound();
}

// Generate the ETag for the order


var hashedOrder = order.GetHashCode();
string hashedOrderEtag = $"\"{hashedOrder}\"";

// Create the Cache-Control and ETag headers for the response


IHttpActionResult response;
var cacheControlHeader = new CacheControlHeaderValue();
cacheControlHeader.Public = true;
cacheControlHeader.MaxAge = new TimeSpan(0, 10, 0);
var eTag = new EntityTagHeaderValue(hashedOrderEtag);

// Retrieve the If-None-Match header from the request (if it exists)


var nonMatchEtags = Request.Headers.IfNoneMatch;

// If there is an ETag in the If-None-Match header and


// this ETag matches that of the order just retrieved,
// then create a Not Modified response message
if (nonMatchEtags.Count > 0 &&
String.CompareOrdinal(nonMatchEtags.First().Tag, hashedOrderEtag) == 0)
{
response = new EmptyResultWithCaching()
{
StatusCode = HttpStatusCode.NotModified,
CacheControlHeader = cacheControlHeader,
ETag = eTag
};
}
// Otherwise create a response message that contains the order details
else
{
response = new OkResultWithCaching<Order>(order, this)
{
CacheControlHeader = cacheControlHeader,
ETag = eTag
};
}

return response;
}
catch
{
return InternalServerError();
}
}
...
}

This example incorporates an additional custom IHttpActionResult class named EmptyResultWithCaching . This class
simply acts as a wrapper around an HttpResponseMessage object that does not contain a response body:
public class EmptyResultWithCaching : IHttpActionResult
{
public CacheControlHeaderValue CacheControlHeader { get; set; }
public EntityTagHeaderValue ETag { get; set; }
public HttpStatusCode StatusCode { get; set; }
public Uri Location { get; set; }

public async Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken)


{
HttpResponseMessage response = new HttpResponseMessage(StatusCode);
response.Headers.CacheControl = this.CacheControlHeader;
response.Headers.ETag = this.ETag;
response.Headers.Location = this.Location;
return response;
}
}

TIP
In this example, the ETag for the data is generated by hashing the data retrieved from the underlying data source. If the ETag
can be computed in some other way, then the process can be optimized further and the data only needs to be fetched from
the data source if it has changed. This approach is especially useful if the data is large or accessing the data source can result
in significant latency (for example, if the data source is a remote database).

Use ETags to Support Optimistic Concurrency


To enable updates over previously cached data, the HTTP protocol supports an optimistic concurrency strategy. If,
after fetching and caching a resource, the client application subsequently sends a PUT or DELETE request to change
or remove the resource, it should include in If-Match header that references the ETag. The web API can then use this
information to determine whether the resource has already been changed by another user since it was retrieved
and send an appropriate response back to the client application as follows:
The client constructs a PUT request containing the new details for the resource and the ETag for the currently
cached version of the resource referenced in an If-Match HTTP header. The following example shows a PUT
request that updates an order:

PUT https://adventure-works.com/orders/1 HTTP/1.1


If-Match: "2282343857"
Content-Type: application/x-www-form-urlencoded
Content-Length: ...
productID=3&quantity=5&orderValue=250

The PUT operation in the web API obtains the current ETag for the requested data (order 1 in the above
example), and compares it to the value in the If-Match header.
If the current ETag for the requested data matches the ETag provided by the request, the resource has not
changed and the web API should perform the update, returning a message with HTTP status code 204 (No
Content) if it is successful. The response can include Cache-Control and ETag headers for the updated
version of the resource. The response should always include the Location header that references the URI of
the newly updated resource.
If the current ETag for the requested data does not match the ETag provided by the request, then the data has
been changed by another user since it was fetched and the web API should return an HTTP response with an
empty message body and a status code of 412 (Precondition Failed).
If the resource to be updated no longer exists then the web API should return an HTTP response with the
status code of 404 (Not Found).
The client uses the status code and response headers to maintain the cache. If the data has been updated
(status code 204) then the object can remain cached (as long as the Cache-Control header does not specify
no-store) but the ETag should be updated. If the data was changed by another user changed (status code
412) or not found (status code 404) then the cached object should be discarded.
The next code example shows an implementation of the PUT operation for the Orders controller:

public class OrdersController : ApiController


{
[HttpPut]
[Route("api/orders/{id:int}")]
public IHttpActionResult UpdateExistingOrder(int id, DTOOrder order)
{
try
{
var baseUri = Constants.GetUriFromConfig();
var orderToUpdate = this.ordersRepository.GetOrder(id);
if (orderToUpdate == null)
{
return NotFound();
}

var hashedOrder = orderToUpdate.GetHashCode();


string hashedOrderEtag = $"\"{hashedOrder}\"";

// Retrieve the If-Match header from the request (if it exists)


var matchEtags = Request.Headers.IfMatch;

// If there is an ETag in the If-Match header and


// this ETag matches that of the order just retrieved,
// or if there is no ETag, then update the Order
if (((matchEtags.Count > 0 &&
String.CompareOrdinal(matchEtags.First().Tag, hashedOrderEtag) == 0)) ||
matchEtags.Count == 0)
{
// Modify the order
orderToUpdate.OrderValue = order.OrderValue;
orderToUpdate.ProductID = order.ProductID;
orderToUpdate.Quantity = order.Quantity;

// Save the order back to the data store


// ...

// Create the No Content response with Cache-Control, ETag, and Location headers
var cacheControlHeader = new CacheControlHeaderValue();
cacheControlHeader.Private = true;
cacheControlHeader.MaxAge = new TimeSpan(0, 10, 0);

hashedOrder = order.GetHashCode();
hashedOrderEtag = $"\"{hashedOrder}\"";
var eTag = new EntityTagHeaderValue(hashedOrderEtag);

var location = new Uri($"{baseUri}/{Constants.ORDERS}/{id}");


var response = new EmptyResultWithCaching()
{
StatusCode = HttpStatusCode.NoContent,
CacheControlHeader = cacheControlHeader,
ETag = eTag,
Location = location
};

return response;
}

// Otherwise return a Precondition Failed response


return StatusCode(HttpStatusCode.PreconditionFailed);
}
}
catch
{
return InternalServerError();
}
}
...
}

TIP
Use of the If-Match header is entirely optional, and if it is omitted the web API will always attempt to update the specified
order, possibly blindly overwriting an update made by another user. To avoid problems due to lost updates, always provide an
If-Match header.

Handling large requests and responses


There may be occasions when a client application needs to issue requests that send or receive data that may be
several megabytes (or bigger) in size. Waiting while this amount of data is transmitted could cause the client
application to become unresponsive. Consider the following points when you need to handle requests that include
significant amounts of data:
Optimize requests and responses that involve large objects
Some resources may be large objects or include large fields, such as graphics images or other types of binary data.
A web API should support streaming to enable optimized uploading and downloading of these resources.
The HTTP protocol provides the chunked transfer encoding mechanism to stream large data objects back to a client.
When the client sends an HTTP GET request for a large object, the web API can send the reply back in piecemeal
chunks over an HTTP connection. The length of the data in the reply may not be known initially (it might be
generated), so the server hosting the web API should send a response message with each chunk that specifies the
Transfer-Encoding: Chunked header rather than a Content-Length header. The client application can receive each
chunk in turn to build up the complete response. The data transfer completes when the server sends back a final
chunk with zero size.
A single request could conceivably result in a massive object that consumes considerable resources. If during the
streaming process the web API determines that the amount of data in a request has exceeded some acceptable
bounds, it can abort the operation and return a response message with status code 413 (Request Entity Too Large).
You can minimize the size of large objects transmitted over the network by using HTTP compression. This approach
helps to reduce the amount of network traffic and the associated network latency, but at the cost of requiring
additional processing at the client and the server hosting the web API. For example, a client application that expects
to receive compressed data can include an Accept-Encoding: gzip request header (other data compression
algorithms can also be specified). If the server supports compression it should respond with the content held in
gzip format in the message body and the Content-Encoding: gzip response header.
You can combine encoded compression with streaming; compress the data first before streaming it, and specify the
gzip content encoding and chunked transfer encoding in the message headers. Also note that some web servers
(such as Internet Information Server) can be configured to automatically compress HTTP responses regardless of
whether the web API compresses the data or not.
Implement partial responses for clients that do not support asynchronous operations
As an alternative to asynchronous streaming, a client application can explicitly request data for large objects in
chunks, known as partial responses. The client application sends an HTTP HEAD request to obtain information
about the object. If the web API supports partial responses if should respond to the HEAD request with a response
message that contains an Accept-Ranges header and a Content-Length header that indicates the total size of the
object, but the body of the message should be empty. The client application can use this information to construct a
series of GET requests that specify a range of bytes to receive. The web API should return a response message with
HTTP status 206 (Partial Content), a Content-Length header that specifies the actual amount of data included in the
body of the response message, and a Content-Range header that indicates which part (such as bytes 4000 to 8000)
of the object this data represents.
HTTP HEAD requests and partial responses are described in more detail in API design.
Avoid sending unnecessary 100-Continue status messages in client applications
A client application that is about to send a large amount of data to a server may determine first whether the server
is actually willing to accept the request. Prior to sending the data, the client application can submit an HTTP request
with an Expect: 100-Continue header, a Content-Length header that indicates the size of the data, but an empty
message body. If the server is willing to handle the request, it should respond with a message that specifies the
HTTP status 100 (Continue). The client application can then proceed and send the complete request including the
data in the message body.
If you are hosting a service by using IIS, the HTTP.sys driver automatically detects and handles Expect: 100-Continue
headers before passing requests to your web application. This means that you are unlikely to see these headers in
your application code, and you can assume that IIS has already filtered any messages that it deems to be unfit or
too large.
If you are building client applications by using the .NET Framework, then all POST and PUT messages will first send
messages with Expect: 100-Continue headers by default. As with the server-side, the process is handled
transparently by the .NET Framework. However, this process results in each POST and PUT request causing two
round-trips to the server, even for small requests. If your application is not sending requests with large amounts of
data, you can disable this feature by using the ServicePointManager class to create ServicePoint objects in the
client application. A ServicePoint object handles the connections that the client makes to a server based on the
scheme and host fragments of URIs that identify resources on the server. You can then set the Expect100Continue
property of the ServicePoint object to false. All subsequent POST and PUT requests made by the client through a
URI that matches the scheme and host fragments of the ServicePoint object will be sent without Expect: 100-
Continue headers. The following code shows how to configure a ServicePoint object that configures all requests
sent to URIs with a scheme of http and a host of www.contoso.com .

Uri uri = new Uri("https://www.contoso.com/");


ServicePoint sp = ServicePointManager.FindServicePoint(uri);
sp.Expect100Continue = false;

You can also set the static Expect100Continue property of the ServicePointManager class to specify the default value
of this property for all subsequently created ServicePoint objects.
Support pagination for requests that may return large numbers of objects
If a collection contains a large number of resources, issuing a GET request to the corresponding URI could result in
significant processing on the server hosting the web API affecting performance, and generate a significant amount
of network traffic resulting in increased latency.
To handle these cases, the web API should support query strings that enable the client application to refine requests
or fetch data in more manageable, discrete blocks (or pages). The code below shows the GetAllOrders method in
the Orders controller. This method retrieves the details of orders. If this method was unconstrained, it could
conceivably return a large amount of data. The limit and offset parameters are intended to reduce the volume
of data to a smaller subset, in this case only the first 10 orders by default:
public class OrdersController : ApiController
{
...
[Route("api/orders")]
[HttpGet]
public IEnumerable<Order> GetAllOrders(int limit=10, int offset=0)
{
// Find the number of orders specified by the limit parameter
// starting with the order specified by the offset parameter
var orders = ...
return orders;
}
...
}

A client application can issue a request to retrieve 30 orders starting at offset 50 by using the URI
https://www.adventure-works.com/api/orders?limit=30&offset=50 .

TIP
Avoid enabling client applications to specify query strings that result in a URI that is more than 2000 characters long. Many
web clients and servers cannot handle URIs that are this long.

Maintaining responsiveness, scalability, and availability


The same web API might be used by many client applications running anywhere in the world. It is important to
ensure that the web API is implemented to maintain responsiveness under a heavy load, to be scalable to support a
highly varying workload, and to guarantee availability for clients that perform business-critical operations.
Consider the following points when determining how to meet these requirements:
Provide asynchronous support for long-running requests
A request that might take a long time to process should be performed without blocking the client that submitted
the request. The web API can perform some initial checking to validate the request, initiate a separate task to
perform the work, and then return a response message with HTTP code 202 (Accepted). The task could run
asynchronously as part of the web API processing, or it could be offloaded to a background task.
The web API should also provide a mechanism to return the results of the processing to the client application. You
can achieve this by providing a polling mechanism for client applications to periodically query whether the
processing has finished and obtain the result, or enabling the web API to send a notification when the operation
has completed.
You can implement a simple polling mechanism by providing a polling URI that acts as a virtual resource using the
following approach:
1. The client application sends the initial request to the web API.
2. The web API stores information about the request in a table held in table storage or Microsoft Azure Cache, and
generates a unique key for this entry, possibly in the form of a GUID.
3. The web API initiates the processing as a separate task. The web API records the state of the task in the table as
Running.
4. The web API returns a response message with HTTP status code 202 (Accepted), and the GUID of the table entry
in the body of the message.
5. When the task has completed, the web API stores the results in the table, and sets the state of the task to
Complete. Note that if the task fails, the web API could also store information about the failure and set the status
to Failed.
6. While the task is running, the client can continue performing its own processing. It can periodically send a
request to the URI /polling/{guid} where {guid} is the GUID returned in the 202 response message by the web
API.
7. The web API at the /polling/{guid} URI queries the state of the corresponding task in the table and returns a
response message with HTTP status code 200 (OK) containing this state (Running, Complete, or Failed). If the
task has completed or failed, the response message can also include the results of the processing or any
information available about the reason for the failure.
Options for implementing notifications include:
Using a notification hub to push asynchronous responses to client applications. For more information, see Send
notifications to specific users by using Azure Notification Hubs.
Using the Comet model to retain a persistent network connection between the client and the server hosting the
web API, and using this connection to push messages from the server back to the client. The MSDN magazine
article Building a Simple Comet Application in the Microsoft .NET Framework describes an example solution.
Using SignalR to push data in real time from the web server to the client over a persistent network connection.
SignalR is available for ASP.NET web applications as a NuGet package. You can find more information on the
ASP.NET SignalR website.
Ensure that each request is stateless
Each request should be considered atomic. There should be no dependencies between one request made by a client
application and any subsequent requests submitted by the same client. This approach assists in scalability;
instances of the web service can be deployed on a number of servers. Client requests can be directed at any of
these instances and the results should always be the same. It also improves availability for a similar reason; if a web
server fails requests can be routed to another instance (by using Azure Traffic Manager) while the server is
restarted with no ill effects on client applications.
Track clients and implement throttling to reduce the chances of DOS attacks
If a specific client makes a large number of requests within a given period of time it might monopolize the service
and affect the performance of other clients. To mitigate this issue, a web API can monitor calls from client
applications either by tracking the IP address of all incoming requests or by logging each authenticated access. You
can use this information to limit resource access. If a client exceeds a defined limit, the web API can return a
response message with status 503 (Service Unavailable) and include a Retry-After header that specifies when the
client can send the next request without it being declined. This strategy can help to reduce the chances of a Denial
Of Service (DOS) attack from a set of clients stalling the system.
Manage persistent HTTP connections carefully
The HTTP protocol supports persistent HTTP connections where they are available. The HTTP 1.0 specification
added the Connection:Keep-Alive header that enables a client application to indicate to the server that it can use the
same connection to send subsequent requests rather than opening new ones. The connection closes automatically
if the client does not reuse the connection within a period defined by the host. This behavior is the default in HTTP
1.1 as used by Azure services, so there is no need to include Keep-Alive headers in messages.
Keeping a connection open can help to improve responsiveness by reducing latency and network congestion, but it
can be detrimental to scalability by keeping unnecessary connections open for longer than required, limiting the
ability of other concurrent clients to connect. It can also affect battery life if the client application is running on a
mobile device; if the application only makes occasional requests to the server, maintaining an open connection can
cause the battery to drain more quickly. To ensure that a connection is not made persistent with HTTP 1.1, the client
can include a Connection:Close header with messages to override the default behavior. Similarly, if a server is
handling a very large number of clients it can include a Connection:Close header in response messages which
should close the connection and save server resources.
NOTE
Persistent HTTP connections are a purely optional feature to reduce the network overhead associated with repeatedly
establishing a communications channel. Neither the web API nor the client application should depend on a persistent HTTP
connection being available. Do not use persistent HTTP connections to implement Comet-style notification systems; instead
you should use sockets (or web sockets if available) at the TCP layer. Finally, note Keep-Alive headers are of limited use if a
client application communicates with a server via a proxy; only the connection with the client and the proxy will be persistent.

Publishing and managing a web API


To make a web API available for client applications, the web API must be deployed to a host environment. This
environment is typically a web server, although it may be some other type of host process. You should consider the
following points when publishing a web API:
All requests must be authenticated and authorized, and the appropriate level of access control must be enforced.
A commercial web API might be subject to various quality guarantees concerning response times. It is important
to ensure that host environment is scalable if the load can vary significantly over time.
It may be necessary to meter requests for monetization purposes.
It might be necessary to regulate the flow of traffic to the web API, and implement throttling for specific clients
that have exhausted their quotas.
Regulatory requirements might mandate logging and auditing of all requests and responses.
To ensure availability, it may be necessary to monitor the health of the server hosting the web API and restart it
if necessary.
It is useful to be able to decouple these issues from the technical issues concerning the implementation of the web
API. For this reason, consider creating a façade, running as a separate process and that routes requests to the web
API. The façade can provide the management operations and forward validated requests to the web API. Using a
façade can also bring many functional advantages, including:
Acting as an integration point for multiple web APIs.
Transforming messages and translating communications protocols for clients built by using varying
technologies.
Caching requests and responses to reduce load on the server hosting the web API.

Testing a web API


A web API should be tested as thoroughly as any other piece of software. You should consider creating unit tests to
validate the functionality.
The nature of a web API brings its own additional requirements to verify that it operates correctly. You should pay
particular attention to the following aspects:
Test all routes to verify that they invoke the correct operations. Be especially aware of HTTP status code 405
(Method Not Allowed) being returned unexpectedly as this can indicate a mismatch between a route and the
HTTP methods (GET, POST, PUT, DELETE) that can be dispatched to that route.
Send HTTP requests to routes that do not support them, such as submitting a POST request to a specific
resource (POST requests should only be sent to resource collections). In these cases, the only valid response
should be status code 405 (Not Allowed).
Verify that all routes are protected properly and are subject to the appropriate authentication and
authorization checks.
NOTE
Some aspects of security such as user authentication are most likely to be the responsibility of the host environment
rather than the web API, but it is still necessary to include security tests as part of the deployment process.

Test the exception handling performed by each operation and verify that an appropriate and meaningful
HTTP response is passed back to the client application.
Verify that request and response messages are well-formed. For example, if an HTTP POST request contains
the data for a new resource in x-www-form-urlencoded format, confirm that the corresponding operation
correctly parses the data, creates the resources, and returns a response containing the details of the new
resource, including the correct Location header.
Verify all links and URIs in response messages. For example, an HTTP POST message should return the URI
of the newly created resource. All HATEOAS links should be valid.
Ensure that each operation returns the correct status codes for different combinations of input. For example:
If a query is successful, it should return status code 200 (OK)
If a resource is not found, the operation should return HTTP status code 404 (Not Found).
If the client sends a request that successfully deletes a resource, the status code should be 204 (No
Content).
If the client sends a request that creates a new resource, the status code should be 201 (Created).
Watch out for unexpected response status codes in the 5xx range. These messages are usually reported by the host
server to indicate that it was unable to fulfill a valid request.
Test the different request header combinations that a client application can specify and ensure that the web
API returns the expected information in response messages.
Test query strings. If an operation can take optional parameters (such as pagination requests), test the
different combinations and order of parameters.
Verify that asynchronous operations complete successfully. If the web API supports streaming for requests
that return large binary objects (such as video or audio), ensure that client requests are not blocked while
the data is streamed. If the web API implements polling for long-running data modification operations, verify
that the operations report their status correctly as they proceed.
You should also create and run performance tests to check that the web API operates satisfactorily under duress.
You can build a web performance and load test project by using Visual Studio Ultimate. For more information, see
Run performance tests on an application before a release.

Using Azure API Management


On Azure, consider using Azure API Management to publish and manage a web API. Using this facility, you can
generate a service that acts as a façade for one or more web APIs. The service is itself a scalable web service that
you can create and configure by using the Azure portal. You can use this service to publish and manage a web API
as follows:
1. Deploy the web API to a website, Azure cloud service, or Azure virtual machine.
2. Connect the API management service to the web API. Requests sent to the URL of the management API are
mapped to URIs in the web API. The same API management service can route requests to more than one
web API. This enables you to aggregate multiple web APIs into a single management service. Similarly, the
same web API can be referenced from more than one API management service if you need to restrict or
partition the functionality available to different applications.
NOTE
The URIs in HATEOAS links generated as part of the response for HTTP GET requests should reference the URL of the
API management service and not the web server hosting the web API.

3. For each web API, specify the HTTP operations that the web API exposes together with any optional
parameters that an operation can take as input. You can also configure whether the API management service
should cache the response received from the web API to optimize repeated requests for the same data.
Record the details of the HTTP responses that each operation can generate. This information is used to
generate documentation for developers, so it is important that it is accurate and complete.
You can either define operations manually using the wizards provided by the Azure portal, or you can
import them from a file containing the definitions in WADL or Swagger format.
4. Configure the security settings for communications between the API management service and the web
server hosting the web API. The API management service currently supports Basic authentication and
mutual authentication using certificates, and OAuth 2.0 user authorization.
5. Create a product. A product is the unit of publication; you add the web APIs that you previously connected to
the management service to the product. When the product is published, the web APIs become available to
developers.

NOTE
Prior to publishing a product, you can also define user-groups that can access the product and add users to these
groups. This gives you control over the developers and applications that can use the web API. If a web API is subject
to approval, prior to being able to access it a developer must send a request to the product administrator. The
administrator can grant or deny access to the developer. Existing developers can also be blocked if circumstances
change.

6. Configure policies for each web API. Policies govern aspects such as whether cross-domain calls should be
allowed, how to authenticate clients, whether to convert between XML and JSON data formats transparently,
whether to restrict calls from a given IP range, usage quotas, and whether to limit the call rate. Policies can
be applied globally across the entire product, for a single web API in a product, or for individual operations
in a web API.
For more information, see the API Management documentation.

TIP
Azure provides the Azure Traffic Manager which enables you to implement failover and load-balancing, and reduce latency
across multiple instances of a web site hosted in different geographic locations. You can use Azure Traffic Manager in
conjunction with the API Management Service; the API Management Service can route requests to instances of a web site
through Azure Traffic Manager. For more information, see Traffic Manager routing methods.
In this structure, if you are using custom DNS names for your web sites, you should configure the appropriate CNAME record
for each web site to point to the DNS name of the Azure Traffic Manager web site.

Supporting client-side developers


Developers constructing client applications typically require information on how to access the web API, and
documentation concerning the parameters, data types, return types, and return codes that describe the different
requests and responses between the web service and the client application.
Document the REST operations for a web API
The Azure API Management Service includes a developer portal that describes the REST operations exposed by a
web API. When a product has been published it appears on this portal. Developers can use this portal to sign up for
access; the administrator can then approve or deny the request. If the developer is approved, they are assigned a
subscription key that is used to authenticate calls from the client applications that they develop. This key must be
provided with each web API call otherwise it will be rejected.
This portal also provides:
Documentation for the product, listing the operations that it exposes, the parameters required, and the different
responses that can be returned. Note that this information is generated from the details provided in step 3 in
the list in the Publishing a web API by using the Microsoft Azure API Management Service section.
Code snippets that show how to invoke operations from several languages, including JavaScript, C#, Java, Ruby,
Python, and PHP.
A developers' console that enables a developer to send an HTTP request to test each operation in the product
and view the results.
A page where the developer can report any issues or problems found.
The Azure portal enables you to customize the developer portal to change the styling and layout to match the
branding of your organization.
Implement a client SDK
Building a client application that invokes REST requests to access a web API requires writing a significant amount of
code to construct each request and format it appropriately, send the request to the server hosting the web service,
and parse the response to work out whether the request succeeded or failed and extract any data returned. To
insulate the client application from these concerns, you can provide an SDK that wraps the REST interface and
abstracts these low-level details inside a more functional set of methods. A client application uses these methods,
which transparently convert calls into REST requests and then convert the responses back into method return
values. This is a common technique that is implemented by many services, including the Azure SDK.
Creating a client-side SDK is a considerable undertaking as it has to be implemented consistently and tested
carefully. However, much of this process can be made mechanical, and many vendors supply tools that can
automate many of these tasks.

Monitoring a web API


Depending on how you have published and deployed your web API you can monitor the web API directly, or you
can gather usage and health information by analyzing the traffic that passes through the API Management service.
Monitoring a web API directly
If you have implemented your web API by using the ASP.NET Web API template (either as a Web API project or as a
Web role in an Azure cloud service) and Visual Studio 2013, you can gather availability, performance, and usage
data by using ASP.NET Application Insights. Application Insights is a package that transparently tracks and records
information about requests and responses when the web API is deployed to the cloud; once the package is installed
and configured, you don't need to amend any code in your web API to use it. When you deploy the web API to an
Azure web site, all traffic is examined and the following statistics are gathered:
Server response time.
Number of server requests and the details of each request.
The top slowest requests in terms of average response time.
The details of any failed requests.
The number of sessions initiated by different browsers and user agents.
The most frequently viewed pages (primarily useful for web applications rather than web APIs).
The different user roles accessing the web API.
You can view this data in real time in the Azure portal. You can also create web tests that monitor the health of the
web API. A web test sends a periodic request to a specified URI in the web API and captures the response. You can
specify the definition of a successful response (such as HTTP status code 200), and if the request does not return
this response you can arrange for an alert to be sent to an administrator. If necessary, the administrator can restart
the server hosting the web API if it has failed.
For more information, see Application Insights - Get started with ASP.NET.
Monitoring a web API through the API Management Service
If you have published your web API by using the API Management service, the API Management page on the Azure
portal contains a dashboard that enables you to view the overall performance of the service. The Analytics page
enables you to drill down into the details of how the product is being used. This page contains the following tabs:
Usage . This tab provides information about the number of API calls made and the bandwidth used to handle
these calls over time. You can filter usage details by product, API, and operation.
Health . This tab enables you to view the outcome of API requests (the HTTP status codes returned), the
effectiveness of the caching policy, the API response time, and the service response time. Again, you can filter
health data by product, API, and operation.
Activity . This tab provides a text summary of the numbers of successful calls, failed calls, blocked calls, average
response time, and response times for each product, web API, and operation. This page also lists the number of
calls made by each developer.
At a glance . This tab displays a summary of the performance data, including the developers responsible for
making the most API calls, and the products, web APIs, and operations that received these calls.
You can use this information to determine whether a particular web API or operation is causing a bottleneck, and if
necessary scale the host environment and add more servers. You can also ascertain whether one or more
applications are using a disproportionate volume of resources and apply the appropriate policies to set quotas and
limit call rates.

NOTE
You can change the details for a published product, and the changes are applied immediately. For example, you can add or
remove an operation from a web API without requiring that you republish the product that contains the web API.

More information
ASP.NET Web API OData contains examples and further information on implementing an OData web API by
using ASP.NET.
Introducing batch support in Web API and Web API OData describes how to implement batch operations in a
web API by using OData.
Idempotency patterns on Jonathan Oliver's blog provides an overview of idempotency and how it relates to
data management operations.
Status code definitions on the W3C website contains a full list of HTTP status codes and their descriptions.
Run background tasks with WebJobs provides information and examples on using WebJobs to perform
background operations.
Azure Notification Hubs notify users shows how to use an Azure Notification Hub to push asynchronous
responses to client applications.
API Management describes how to publish a product that provides controlled and secure access to a web API.
Azure API Management REST API reference describes how to use the API Management REST API to build custom
management applications.
Traffic Manager routing methods summarizes how Azure Traffic Manager can be used to load-balance requests
across multiple instances of a website hosting a web API.
Application Insights - Get started with ASP.NET provides detailed information on installing and configuring
Application Insights in an ASP.NET Web API project.
Autoscaling
12/18/2020 • 15 minutes to read • Edit Online

Autoscaling is the process of dynamically allocating resources to match performance requirements. As the
volume of work grows, an application may need additional resources to maintain the desired performance levels
and satisfy service-level agreements (SLAs). As demand slackens and the additional resources are no longer
needed, they can be de-allocated to minimize costs.
Autoscaling takes advantage of the elasticity of cloud-hosted environments while easing management overhead.
It reduces the need for an operator to continually monitor the performance of a system and make decisions about
adding or removing resources.
There are two main ways that an application can scale:
Ver tical scaling , also called scaling up and down, means changing the capacity of a resource. For
example, you could move an application to a larger VM size. Vertical scaling often requires making the
system temporarily unavailable while it is being redeployed. Therefore, it's less common to automate
vertical scaling.
Horizontal scaling , also called scaling out and in, means adding or removing instances of a resource. The
application continues running without interruption as new resources are provisioned. When the
provisioning process is complete, the solution is deployed on these additional resources. If demand drops,
the additional resources can be shut down cleanly and deallocated.
Many cloud-based systems, including Microsoft Azure, support automatic horizontal scaling. The rest of this
article focuses on horizontal scaling.

NOTE
Autoscaling mostly applies to compute resources. While it's possible to horizontally scale a database or message queue, this
usually involves data partitioning, which is generally not automated.

Overview
An autoscaling strategy typically involves the following pieces:
Instrumentation and monitoring systems at the application, service, and infrastructure levels. These systems
capture key metrics, such as response times, queue lengths, CPU utilization, and memory usage.
Decision-making logic that evaluates these metrics against predefined thresholds or schedules, and decides
whether to scale.
Components that scale the system.
Testing, monitoring, and tuning of the autoscaling strategy to ensure that it functions as expected.
Azure provides built-in autoscaling mechanisms that address common scenarios. If a particular service or
technology does not have built-in autoscaling functionality, or if you have specific autoscaling requirements
beyond its capabilities, you might consider a custom implementation. A custom implementation would collect
operational and system metrics, analyze the metrics, and then scale resources accordingly.

Configure autoscaling for an Azure solution


Azure provides built-in autoscaling for most compute options.
Azure Vir tual Machines autoscale via virtual machine scale sets, which manage a set of Azure virtual
machines as a group. See How to use automatic scaling and virtual machine scale sets.
Ser vice Fabric also supports autoscaling through virtual machine scale sets. Every node type in a Service
Fabric cluster is set up as a separate virtual machine scale set. That way, each node type can be scaled in or
out independently. See Scale a Service Fabric cluster in or out using autoscale rules.
Azure App Ser vice has built-in autoscaling. Autoscale settings apply to all of the apps within an App
Service. See Scale instance count manually or automatically.
Azure Cloud Ser vices has built-in autoscaling at the role level. See How to configure auto scaling for a
Cloud Service in the portal.
These compute options all use Azure Monitor autoscale to provide a common set of autoscaling functionality.
Azure Functions differs from the previous compute options, because you don't need to configure any
autoscale rules. Instead, Azure Functions automatically allocates compute power when your code is running,
scaling out as necessary to handle load. For more information, see Choose the correct hosting plan for Azure
Functions.
Finally, a custom autoscaling solution can sometimes be useful. For example, you could use Azure diagnostics and
application-based metrics, along with custom code to monitor and export the application metrics. Then you could
define custom rules based on these metrics, and use Resource Manager REST APIs to trigger autoscaling.
However, a custom solution is not simple to implement, and should be considered only if none of the previous
approaches can fulfill your requirements.
Use the built-in autoscaling features of the platform, if they meet your requirements. If not, carefully consider
whether you really need more complex scaling features. Examples of additional requirements may include more
granularity of control, different ways to detect trigger events for scaling, scaling across subscriptions, and scaling
other types of resources.

Use Azure Monitor autoscale


Azure Monitor autoscale provide a common set of autoscaling functionality for virtual machine scale sets, Azure
App Service, and Azure Cloud Service. Scaling can be performed on a schedule, or based on a runtime metric,
such as CPU or memory usage.
Examples:
Scale out to 10 instances on weekdays, and scale in to 4 instances on Saturday and Sunday.
Scale out by one instance if average CPU usage is above 70%, and scale in by one instance if CPU usage falls
below 50%.
Scale out by one instance if the number of messages in a queue exceeds a certain threshold.
Scale up the resource when load increases to ensure availability. Similarly, at times of low usage, scale down, so
you can optimize cost. Always use a scale-out and scale-in rule combination. Otherwise, the autoscaling takes
place only in one direction until it reaches the threshold (maximum or minimum instance counts) set in the
profile.
Select a default instance count that's safe for your workload. It's scaled based on that value if maximum or
minimum instance counts are not set.
For a list of built-in metrics, see Azure Monitor autoscaling common metrics. You can also implement custom
metrics by using Application Insights.
You can configure autoscaling by using PowerShell, the Azure CLI, an Azure Resource Manager template, or the
Azure portal. For more detailed control, use the Azure Resource Manager REST API. The Azure Monitoring Service
Management Library and the Microsoft Insights Library (in preview) are SDKs that allow collecting metrics from
different resources, and perform autoscaling by making use of the REST APIs. For resources where Azure
Resource Manager support isn't available, or if you are using Azure Cloud Services, the Service Management
REST API can be used for autoscaling. In all other cases, use Azure Resource Manager.
Consider the following points when using Azure autoscale:
Consider whether you can predict the load on the application accurately enough to use scheduled
autoscaling, adding and removing instances to meet anticipated peaks in demand. If this isn't possible, use
reactive autoscaling based on runtime metrics, in order to handle unpredictable changes in demand.
Typically, you can combine these approaches. For example, create a strategy that adds resources based on a
schedule of the times when you know the application is busiest. This helps to ensure that capacity is
available when required, without any delay from starting new instances. For each scheduled rule, define
metrics that allow reactive autoscaling during that period to ensure that the application can handle
sustained but unpredictable peaks in demand.
It's often difficult to understand the relationship between metrics and capacity requirements, especially
when an application is initially deployed. Provision a little extra capacity at the beginning, and then monitor
and tune the autoscaling rules to bring the capacity closer to the actual load.
Configure the autoscaling rules, and then monitor the performance of your application over time. Use the
results of this monitoring to adjust the way in which the system scales if necessary. However, keep in mind
that autoscaling is not an instantaneous process. It takes time to react to a metric such as average CPU
utilization exceeding (or falling below) a specified threshold.
Autoscaling rules that use a detection mechanism based on a measured trigger attribute (such as CPU
usage or queue length) use an aggregated value over time, rather than instantaneous values, to trigger an
autoscaling action. By default, the aggregate is an average of the values. This prevents the system from
reacting too quickly, or causing rapid oscillation. It also allows time for new instances that are automatically
started to settle into running mode, preventing additional autoscaling actions from occurring while the
new instances are starting up. For Azure Cloud Services and Azure Virtual Machines, the default period for
the aggregation is 45 minutes, so it can take up to this period of time for the metric to trigger autoscaling
in response to spikes in demand. You can change the aggregation period by using the SDK, but periods of
less than 25 minutes may cause unpredictable results. For Web Apps, the averaging period is much shorter,
allowing new instances to be available in about five minutes after a change to the average trigger measure.
Avoid flapping where scale-in and scale-out actions continually go back and forth. Suppose there are two
instances, and upper limit is 80% CPU, lower limit is 60%. When the load is at 85%, another instance is
added. After some time, the load decreases to 60%. Before scaling in, the autoscale service calculates the
distribution of total load (of three instances) when an instance is removed, taking it to 90%. This means it
would have to scale out again immediately. So, it skips scaling-in and you might never see the expected
scaling results.
The flapping situation can be controlled by choosing an adequate margin between the scale-out and scale-
in thresholds.
Manual scaling is reset by maximum and minimum number of instances used for autoscaling. If you
manually update the instance count to a value higher or lower than the maximum value, the autoscale
engine automatically scales back to the minimum (if lower) or the maximum (if higher). For example, you
set the range between 3 and 6. If you have one running instance, the autoscale engine scales to three
instances on its next run. Likewise, if you manually set the scale to eight instances, on the next run
autoscale will scale it back to six instances on its next run. Manual scaling is temporary unless you reset the
autoscale rules as well.
The autoscale engine processes only one profile at a time. If a condition is not met, then it checks for the
next profile. Keep key metrics out of the default profile because that profile is checked last. Within a profile,
you can have multiple rules. On scale-out, autoscale runs if any rule is met. On scale-in, autoscale require
all rules to be met.
For details about how Azure Monitor scales, see Best practices for Autoscale.
If you configure autoscaling using the SDK rather than the portal, you can specify a more detailed schedule
during which the rules are active. You can also create your own metrics and use them with or without any
of the existing ones in your autoscaling rules. For example, you may wish to use alternative counters, such
as the number of requests per second or the average memory availability, or use custom counters to
measure specific business processes.
When autoscaling Service Fabric, the node types in your cluster are made of virtual machine scale sets at
the back end, so you need to set up autoscale rules for each node type. Take into account the number of
nodes that you must have before you set up autoscaling. The minimum number of nodes that you must
have for the primary node type is driven by the reliability level you have chosen. For more information, see
scale a Service Fabric cluster in or out using autoscale rules.
You can use the portal to link resources such as SQL Database instances and queues to a Cloud Service
instance. This allows you to more easily access the separate manual and automatic scaling configuration
options for each of the linked resources. For more information, see How to: Link a resource to a cloud
service.
When you configure multiple policies and rules, they could conflict with each other. Autoscale uses the
following conflict resolution rules to ensure that there is always a sufficient number of instances running:
Scale-out operations always take precedence over scale-in operations.
When scale-out operations conflict, the rule that initiates the largest increase in the number of instances
takes precedence.
When scale in operations conflict, the rule that initiates the smallest decrease in the number of instances
takes precedence.
In an App Service Environment, any worker pool or front-end metrics can be used to define autoscale rules.
For more information, see Autoscaling and App Service Environment.

Application design considerations


Autoscaling isn't an instant solution. Simply adding resources to a system or running more instances of a process
doesn't guarantee that the performance of the system will improve. Consider the following points when designing
an autoscaling strategy:
The system must be designed to be horizontally scalable. Avoid making assumptions about instance
affinity; do not design solutions that require that the code is always running in a specific instance of a
process. When scaling a cloud service or web site horizontally, don't assume that a series of requests from
the same source will always be routed to the same instance. For the same reason, design services to be
stateless to avoid requiring a series of requests from an application to always be routed to the same
instance of a service. When designing a service that reads messages from a queue and processes them,
don't make any assumptions about which instance of the service handles a specific message. Autoscaling
could start additional instances of a service as the queue length grows. The Competing Consumers pattern
describes how to handle this scenario.
If the solution implements a long-running task, design this task to support both scaling out and scaling in.
Without due care, such a task could prevent an instance of a process from being shut down cleanly when
the system scales in, or it could lose data if the process is forcibly terminated. Ideally, refactor a long-
running task and break up the processing that it performs into smaller, discrete chunks. The Pipes and
Filters pattern provides an example of how you can achieve this.
Alternatively, you can implement a checkpoint mechanism that records state information about the task at
regular intervals, and save this state in durable storage that can be accessed by any instance of the process
running the task. In this way, if the process is shut down, the work that it was performing can be resumed
from the last checkpoint by using another instance.
When background tasks run on separate compute instances, such as in worker roles of a cloud-services–
hosted application, you may need to scale different parts of the application using different scaling policies.
For example, you may need to deploy additional user interface (UI) compute instances without increasing
the number of background compute instances, or the opposite of this. If you offer different levels of service
(such as basic and premium service packages), you may need to scale out the compute resources for
premium service packages more aggressively than those for basic service packages in order to meet SLAs.
Consider using the length of the queue over which UI and background compute instances communicate as
a criterion for your autoscaling strategy. This is the best indicator of an imbalance or difference between
the current load and the processing capacity of the background task.
If you base your autoscaling strategy on counters that measure business processes, such as the number of
orders placed per hour or the average execution time of a complex transaction, ensure that you fully
understand the relationship between the results from these types of counters and the actual compute
capacity requirements. It may be necessary to scale more than one component or compute unit in
response to changes in business process counters.
To prevent a system from attempting to scale out excessively, and to avoid the costs associated with
running many thousands of instances, consider limiting the maximum number of instances that can be
automatically added. Most autoscaling mechanisms allow you to specify the minimum and maximum
number of instances for a rule. In addition, consider gracefully degrading the functionality that the system
provides if the maximum number of instances have been deployed, and the system is still overloaded.
Keep in mind that autoscaling might not be the most appropriate mechanism to handle a sudden burst in
workload. It takes time to provision and start new instances of a service or add resources to a system, and
the peak demand may have passed by the time these additional resources have been made available. In
this scenario, it may be better to throttle the service. For more information, see the Throttling pattern.
Conversely, if you do need the capacity to process all requests when the volume fluctuates rapidly, and cost
isn't a major contributing factor, consider using an aggressive autoscaling strategy that starts additional
instances more quickly. You can also use a scheduled policy that starts a sufficient number of instances to
meet the maximum load before that load is expected.
The autoscaling mechanism should monitor the autoscaling process, and log the details of each
autoscaling event (what triggered it, what resources were added or removed, and when). If you create a
custom autoscaling mechanism, ensure that it incorporates this capability. Analyze the information to help
measure the effectiveness of the autoscaling strategy, and tune it if necessary. You can tune both in the
short term, as the usage patterns become more obvious, and over the long term, as the business expands
or the requirements of the application evolve. If an application reaches the upper limit defined for
autoscaling, the mechanism might also alert an operator who could manually start additional resources if
necessary. Note that under these circumstances the operator may also be responsible for manually
removing these resources after the workload eases.

Related patterns and guidance


The following patterns and guidance may also be relevant to your scenario when implementing autoscaling:
Throttling pattern. This pattern describes how an application can continue to function and meet SLAs when
an increase in demand places an extreme load on resources. Throttling can be used with autoscaling to
prevent a system from being overwhelmed while the system scales out.
Competing Consumers pattern. This pattern describes how to implement a pool of service instances that
can handle messages from any application instance. Autoscaling can be used to start and stop service
instances to match the anticipated workload. This approach enables a system to process multiple messages
concurrently to optimize throughput, improve scalability and availability, and balance the workload.
Monitoring and diagnostics. Instrumentation and telemetry are vital for gathering the information that can
drive the autoscaling process.
Background jobs
12/18/2020 • 23 minutes to read • Edit Online

Many types of applications require background tasks that run independently of the user interface (UI). Examples
include batch jobs, intensive processing tasks, and long-running processes such as workflows. Background jobs
can be executed without requiring user interaction--the application can start the job and then continue to process
interactive requests from users. This can help to minimize the load on the application UI, which can improve
availability and reduce interactive response times.
For example, if an application is required to generate thumbnails of images that are uploaded by users, it can do
this as a background job and save the thumbnail to storage when it is complete--without the user needing to wait
for the process to be completed. In the same way, a user placing an order can initiate a background workflow that
processes the order, while the UI allows the user to continue browsing the web app. When the background job is
complete, it can update the stored orders data and send an email to the user that confirms the order.
When you consider whether to implement a task as a background job, the main criteria is whether the task can run
without user interaction and without the UI needing to wait for the job to be completed. Tasks that require the user
or the UI to wait while they are completed might not be appropriate as background jobs.

Types of background jobs


Background jobs typically include one or more of the following types of jobs:
CPU-intensive jobs, such as mathematical calculations or structural model analysis.
I/O-intensive jobs, such as executing a series of storage transactions or indexing files.
Batch jobs, such as nightly data updates or scheduled processing.
Long-running workflows, such as order fulfillment, or provisioning services and systems.
Sensitive-data processing where the task is handed off to a more secure location for processing. For example,
you might not want to process sensitive data within a web app. Instead, you might use a pattern such as the
Gatekeeper pattern to transfer the data to an isolated background process that has access to protected storage.

Triggers
Background jobs can be initiated in several different ways. They fall into one of the following categories:
Event-driven triggers . The task is started in response to an event, typically an action taken by a user or a step
in a workflow.
Schedule-driven triggers . The task is invoked on a schedule based on a timer. This might be a recurring
schedule or a one-off invocation that is specified for a later time.
Event-driven triggers
Event-driven invocation uses a trigger to start the background task. Examples of using event-driven triggers
include:
The UI or another job places a message in a queue. The message contains data about an action that has taken
place, such as the user placing an order. The background task listens on this queue and detects the arrival of a
new message. It reads the message and uses the data in it as the input to the background job.
The UI or another job saves or updates a value in storage. The background task monitors the storage and
detects changes. It reads the data and uses it as the input to the background job.
The UI or another job makes a request to an endpoint, such as an HTTPS URI, or an API that is exposed as a web
service. It passes the data that is required to complete the background task as part of the request. The endpoint
or web service invokes the background task, which uses the data as its input.
Typical examples of tasks that are suited to event-driven invocation include image processing, workflows, sending
information to remote services, sending email messages, and provisioning new users in multitenant applications.
Schedule -driven triggers
Schedule-driven invocation uses a timer to start the background task. Examples of using schedule-driven triggers
include:
A timer that is running locally within the application or as part of the application's operating system invokes a
background task on a regular basis.
A timer that is running in a different application, such as Azure Logic Apps, sends a request to an API or web
service on a regular basis. The API or web service invokes the background task.
A separate process or application starts a timer that causes the background task to be invoked once after a
specified time delay, or at a specific time.
Typical examples of tasks that are suited to schedule-driven invocation include batch-processing routines (such as
updating related-products lists for users based on their recent behavior), routine data processing tasks (such as
updating indexes or generating accumulated results), data analysis for daily reports, data retention cleanup, and
data consistency checks.
If you use a schedule-driven task that must run as a single instance, be aware of the following:
If the compute instance that is running the scheduler (such as a virtual machine using Windows scheduled
tasks) is scaled, you will have multiple instances of the scheduler running. These could start multiple instances
of the task.
If tasks run for longer than the period between scheduler events, the scheduler may start another instance of
the task while the previous one is still running.

Returning results
Background jobs execute asynchronously in a separate process, or even in a separate location, from the UI or the
process that invoked the background task. Ideally, background tasks are "fire and forget" operations, and their
execution progress has no impact on the UI or the calling process. This means that the calling process does not
wait for completion of the tasks. Therefore, it cannot automatically detect when the task ends.
If you require a background task to communicate with the calling task to indicate progress or completion, you
must implement a mechanism for this. Some examples are:
Write a status indicator value to storage that is accessible to the UI or caller task, which can monitor or check
this value when required. Other data that the background task must return to the caller can be placed into the
same storage.
Establish a reply queue that the UI or caller listens on. The background task can send messages to the queue
that indicate status and completion. Data that the background task must return to the caller can be placed into
the messages. If you are using Azure Service Bus, you can use the ReplyTo and CorrelationId properties to
implement this capability.
Expose an API or endpoint from the background task that the UI or caller can access to obtain status
information. Data that the background task must return to the caller can be included in the response.
Have the background task call back to the UI or caller through an API to indicate status at predefined points or
on completion. This might be through events raised locally or through a publish-and-subscribe mechanism.
Data that the background task must return to the caller can be included in the request or event payload.

Hosting environment
You can host background tasks by using a range of different Azure platform services:
Azure Web Apps and WebJobs . You can use WebJobs to execute custom jobs based on a range of different
types of scripts or executable programs within the context of a web app.
Azure Vir tual Machines . If you have a Windows service or want to use the Windows Task Scheduler, it is
common to host your background tasks within a dedicated virtual machine.
Azure Batch . Batch is a platform service that schedules compute-intensive work to run on a managed
collection of virtual machines. It can automatically scale compute resources.
Azure Kubernetes Ser vice (AKS). Azure Kubernetes Service provides a managed hosting environment for
Kubernetes on Azure.
The following sections describe each of these options in more detail, and include considerations to help you
choose the appropriate option.
Azure Web Apps and WebJobs
You can use Azure WebJobs to execute custom jobs as background tasks within an Azure Web App. WebJobs run
within the context of your web app as a continuous process. WebJobs also run in response to a trigger event from
Azure Logic Apps or external factors, such as changes to storage blobs and message queues. Jobs can be started
and stopped on demand, and shut down gracefully. If a continuously running WebJob fails, it is automatically
restarted. Retry and error actions are configurable.
When you configure a WebJob:
If you want the job to respond to an event-driven trigger, you should configure it as Run continuously . The
script or program is stored in the folder named site/wwwroot/app_data/jobs/continuous.
If you want the job to respond to a schedule-driven trigger, you should configure it as Run on a schedule . The
script or program is stored in the folder named site/wwwroot/app_data/jobs/triggered.
If you choose the Run on demand option when you configure a job, it will execute the same code as the Run
on a schedule option when you start it.
Azure WebJobs run within the sandbox of the web app. This means that they can access environment variables and
share information, such as connection strings, with the web app. The job has access to the unique identifier of the
machine that is running the job. The connection string named AzureWebJobsStorage provides access to Azure
storage queues, blobs, and tables for application data, and access to Service Bus for messaging and
communication. The connection string named AzureWebJobsDashboard provides access to the job action log
files.
Azure WebJobs have the following characteristics:
Security : WebJobs are protected by the deployment credentials of the web app.
Suppor ted file types : You can define WebJobs by using command scripts (.cmd), batch files (.bat), PowerShell
scripts (.ps1), bash shell scripts (.sh), PHP scripts (.php), Python scripts (.py), JavaScript code (.js), and executable
programs (.exe, .jar, and more).
Deployment : You can deploy scripts and executables by using the Azure portal, by using Visual Studio, by
using the Azure WebJobs SDK, or by copying them directly to the following locations:
For triggered execution: site/wwwroot/app_data/jobs/triggered/{ job name}
For continuous execution: site/wwwroot/app_data/jobs/continuous/{ job name}
Logging : Console.Out is treated (marked) as INFO. Console.Error is treated as ERROR. You can access
monitoring and diagnostics information by using the Azure portal. You can download log files directly from the
site. They are saved in the following locations:
For triggered execution: Vfs/data/jobs/triggered/jobName
For continuous execution: Vfs/data/jobs/continuous/jobName
Configuration : You can configure WebJobs by using the portal, the REST API, and PowerShell. You can use a
configuration file named settings.job in the same root directory as the job script to provide configuration
information for a job. For example:
{ "stopping_wait_time": 60 }
{ "is_singleton": true }
Considerations
By default, WebJobs scale with the web app. However, you can configure jobs to run on single instance by
setting the is_singleton configuration property to true . Single instance WebJobs are useful for tasks that you
do not want to scale or run as simultaneous multiple instances, such as reindexing, data analysis, and similar
tasks.
To minimize the impact of jobs on the performance of the web app, consider creating an empty Azure Web App
instance in a new App Service plan to host long-running or resource-intensive WebJobs.
Azure Virtual Machines
Background tasks might be implemented in a way that prevents them from being deployed to Azure Web Apps, or
these options might not be convenient. Typical examples are Windows services, and third-party utilities and
executable programs. Another example might be programs written for an execution environment that is different
than that hosting the application. For example, it might be a Unix or Linux program that you want to execute from
a Windows or .NET application. You can choose from a range of operating systems for an Azure virtual machine,
and run your service or executable on that virtual machine.
To help you choose when to use Virtual Machines, see Azure App Services, Cloud Services and Virtual Machines
comparison. For information about the options for Virtual Machines, see Sizes for Windows virtual machines in
Azure. For more information about the operating systems and prebuilt images that are available for Virtual
Machines, see Azure Virtual Machines Marketplace.
To initiate the background task in a separate virtual machine, you have a range of options:
You can execute the task on demand directly from your application by sending a request to an endpoint that the
task exposes. This passes in any data that the task requires. This endpoint invokes the task.
You can configure the task to run on a schedule by using a scheduler or timer that is available in your chosen
operating system. For example, on Windows you can use Windows Task Scheduler to execute scripts and tasks.
Or, if you have SQL Server installed on the virtual machine, you can use the SQL Server Agent to execute scripts
and tasks.
You can use Azure Logic Apps to initiate the task by adding a message to a queue that the task listens on, or by
sending a request to an API that the task exposes.
See the earlier section Triggers for more information about how you can initiate background tasks.
Considerations
Consider the following points when you are deciding whether to deploy background tasks in an Azure virtual
machine:
Hosting background tasks in a separate Azure virtual machine provides flexibility and allows precise control
over initiation, execution, scheduling, and resource allocation. However, it will increase runtime cost if a virtual
machine must be deployed just to run background tasks.
There is no facility to monitor the tasks in the Azure portal and no automated restart capability for failed tasks--
although you can monitor the basic status of the virtual machine and manage it by using the Azure Resource
Manager Cmdlets. However, there are no facilities to control processes and threads in compute nodes. Typically,
using a virtual machine will require additional effort to implement a mechanism that collects data from
instrumentation in the task, and from the operating system in the virtual machine. One solution that might be
appropriate is to use the System Center Management Pack for Azure.
You might consider creating monitoring probes that are exposed through HTTP endpoints. The code for these
probes could perform health checks, collect operational information and statistics--or collate error information
and return it to a management application. For more information, see the Health Endpoint Monitoring pattern.
For more information, see:
Virtual Machines
Azure Virtual Machines FAQ
Azure Batch
Consider Azure Batch if you need to run large, parallel high-performance computing (HPC) workloads across tens,
hundreds, or thousands of VMs.
The Batch service provisions the VMs, assign tasks to the VMs, runs the tasks, and monitors the progress. Batch
can automatically scale out the VMs in response to the workload. Batch also provides job scheduling. Azure Batch
supports both Linux and Windows VMs.
Considerations
Batch works well with intrinsically parallel workloads. It can also perform parallel calculations with a reduce step at
the end, or run Message Passing Interface (MPI) applications for parallel tasks that require message passing
between nodes.
An Azure Batch job runs on a pool of nodes (VMs). One approach is to allocate a pool only when needed and then
delete it after the job completes. This maximizes utilization, because nodes are not idle, but the job must wait for
nodes to be allocated. Alternatively, you can create a pool ahead of time. That approach minimizes the time that it
takes for a job to start, but can result in having nodes that sit idle. For more information, see Pool and compute
node lifetime.
For more information, see:
What is Azure Batch?
Develop large-scale parallel compute solutions with Batch
Batch and HPC solutions for large-scale computing workloads
Azure Kubernetes Service
Azure Kubernetes Service (AKS) manages your hosted Kubernetes environment, which makes it easy to deploy and
manage containerized applications.
Containers can be useful for running background jobs. Some of the benefits include:
Containers support high-density hosting. You can isolate a background task in a container, while placing
multiple containers in each VM.
The container orchestrator handles internal load balancing, configuring the internal network, and other
configuration tasks.
Containers can be started and stopped as needed.
Azure Container Registry allows you to register your containers inside Azure boundaries. This comes with
security, privacy, and proximity benefits.
Considerations
Requires an understanding of how to use a container orchestrator. Depending on the skill set of your DevOps
team, this may or may not be an issue.
For more information, see:
Overview of containers in Azure
Introduction to private Docker container registries

Partitioning
If you decide to include background tasks within an existing compute instance, you must consider how this will
affect the quality attributes of the compute instance and the background task itself. These factors will help you to
decide whether to colocate the tasks with the existing compute instance or separate them out into a separate
compute instance:
Availability : Background tasks might not need to have the same level of availability as other parts of the
application, in particular the UI and other parts that are directly involved in user interaction. Background
tasks might be more tolerant of latency, retried connection failures, and other factors that affect availability
because the operations can be queued. However, there must be sufficient capacity to prevent the backup of
requests that could block queues and affect the application as a whole.
Scalability : Background tasks are likely to have a different scalability requirement than the UI and the
interactive parts of the application. Scaling the UI might be necessary to meet peaks in demand, while
outstanding background tasks might be completed during less busy times by fewer compute instances.
Resiliency : Failure of a compute instance that just hosts background tasks might not fatally affect the
application as a whole if the requests for these tasks can be queued or postponed until the task is available
again. If the compute instance and/or tasks can be restarted within an appropriate interval, users of the
application might not be affected.
Security : Background tasks might have different security requirements or restrictions than the UI or other
parts of the application. By using a separate compute instance, you can specify a different security
environment for the tasks. You can also use patterns such as Gatekeeper to isolate the background compute
instances from the UI in order to maximize security and separation.
Performance : You can choose the type of compute instance for background tasks to specifically match the
performance requirements of the tasks. This might mean using a less expensive compute option if the tasks
do not require the same processing capabilities as the UI, or a larger instance if they require additional
capacity and resources.
Manageability : Background tasks might have a different development and deployment rhythm from the
main application code or the UI. Deploying them to a separate compute instance can simplify updates and
versioning.
Cost : Adding compute instances to execute background tasks increases hosting costs. You should carefully
consider the trade-off between additional capacity and these extra costs.
For more information, see the Leader Election pattern and the Competing Consumers pattern.

Conflicts
If you have multiple instances of a background job, it is possible that they will compete for access to resources and
services, such as databases and storage. This concurrent access can result in resource contention, which might
cause conflicts in availability of the services and in the integrity of data in storage. You can resolve resource
contention by using a pessimistic locking approach. This prevents competing instances of a task from concurrently
accessing a service or corrupting data.
Another approach to resolve conflicts is to define background tasks as a singleton, so that there is only ever one
instance running. However, this eliminates the reliability and performance benefits that a multiple-instance
configuration can provide. This is especially true if the UI can supply sufficient work to keep more than one
background task busy.
It is vital to ensure that the background task can automatically restart and that it has sufficient capacity to cope
with peaks in demand. You can achieve this by allocating a compute instance with sufficient resources, by
implementing a queueing mechanism that can store requests for later execution when demand decreases, or by
using a combination of these techniques.
Coordination
The background tasks might be complex and might require multiple individual tasks to execute to produce a result
or to fulfill all the requirements. It is common in these scenarios to divide the task into smaller discreet steps or
subtasks that can be executed by multiple consumers. Multistep jobs can be more efficient and more flexible
because individual steps might be reusable in multiple jobs. It is also easy to add, remove, or modify the order of
the steps.
Coordinating multiple tasks and steps can be challenging, but there are three common patterns that you can use to
guide your implementation of a solution:
Decomposing a task into multiple reusable steps . An application might be required to perform a
variety of tasks of varying complexity on the information that it processes. A straightforward but inflexible
approach to implementing this application might be to perform this processing as a monolithic module.
However, this approach is likely to reduce the opportunities for refactoring the code, optimizing it, or
reusing it if parts of the same processing are required elsewhere within the application. For more
information, see the Pipes and Filters pattern.
Managing execution of the steps for a task . An application might perform tasks that comprise a
number of steps (some of which might invoke remote services or access remote resources). The individual
steps might be independent of each other, but they are orchestrated by the application logic that
implements the task. For more information, see Scheduler Agent Supervisor pattern.
Managing recover y for task steps that fail . An application might need to undo the work that is
performed by a series of steps (which together define an eventually consistent operation) if one or more of
the steps fail. For more information, see the Compensating Transaction pattern.

Resiliency considerations
Background tasks must be resilient in order to provide reliable services to the application. When you are planning
and designing background tasks, consider the following points:
Background tasks must be able to gracefully handle restarts without corrupting data or introducing
inconsistency into the application. For long-running or multistep tasks, consider using check pointing by
saving the state of jobs in persistent storage, or as messages in a queue if this is appropriate. For example,
you can persist state information in a message in a queue and incrementally update this state information
with the task progress so that the task can be processed from the last known good checkpoint--instead of
restarting from the beginning. When using Azure Service Bus queues, you can use message sessions to
enable the same scenario. Sessions allow you to save and retrieve the application processing state by using
the SetState and GetState methods. For more information about designing reliable multistep processes and
workflows, see the Scheduler Agent Supervisor pattern.
When you use queues to communicate with background tasks, the queues can act as a buffer to store
requests that are sent to the tasks while the application is under higher than usual load. This allows the
tasks to catch up with the UI during less busy periods. It also means that restarts will not block the UI. For
more information, see the Queue-Based Load Leveling pattern. If some tasks are more important than
others, consider implementing the Priority Queue pattern to ensure that these tasks run before less
important ones.
Background tasks that are initiated by messages or process messages must be designed to handle
inconsistencies, such as messages arriving out of order, messages that repeatedly cause an error (often
referred to as poison messages), and messages that are delivered more than once. Consider the following:
Messages that must be processed in a specific order, such as those that change data based on the
existing data value (for example, adding a value to an existing value), might not arrive in the original
order in which they were sent. Alternatively, they might be handled by different instances of a
background task in a different order due to varying loads on each instance. Messages that must be
processed in a specific order should include a sequence number, key, or some other indicator that
background tasks can use to ensure that they are processed in the correct order. If you are using
Azure Service Bus, you can use message sessions to guarantee the order of delivery. However, it is
usually more efficient, where possible, to design the process so that the message order is not
important.
Typically, a background task will peek at messages in the queue, which temporarily hides them from
other message consumers. Then it deletes the messages after they have been successfully processed.
If a background task fails when processing a message, that message will reappear on the queue after
the peek time-out expires. It will be processed by another instance of the task or during the next
processing cycle of this instance. If the message consistently causes an error in the consumer, it will
block the task, the queue, and eventually the application itself when the queue becomes full.
Therefore, it is vital to detect and remove poison messages from the queue. If you are using Azure
Service Bus, messages that cause an error can be moved automatically or manually to an associated
dead letter queue.
Queues are guaranteed at least once delivery mechanisms, but they might deliver the same message
more than once. In addition, if a background task fails after processing a message but before deleting
it from the queue, the message will become available for processing again. Background tasks should
be idempotent, which means that processing the same message more than once does not cause an
error or inconsistency in the application's data. Some operations are naturally idempotent, such as
setting a stored value to a specific new value. However, operations such as adding a value to an
existing stored value without checking that the stored value is still the same as when the message
was originally sent will cause inconsistencies. Azure Service Bus queues can be configured to
automatically remove duplicated messages.
Some messaging systems, such as Azure storage queues and Azure Service Bus queues, support a
de-queue count property that indicates the number of times a message has been read from the
queue. This can be useful in handling repeated and poison messages. For more information, see
Asynchronous Messaging Primer and Idempotency Patterns.

Scaling and performance considerations


Background tasks must offer sufficient performance to ensure they do not block the application, or cause
inconsistencies due to delayed operation when the system is under load. Typically, performance is improved by
scaling the compute instances that host the background tasks. When you are planning and designing background
tasks, consider the following points around scalability and performance:
Azure supports autoscaling (both scaling out and scaling back in) based on current demand and load or on
a predefined schedule, for Web Apps and Virtual Machines hosted deployments. Use this feature to ensure
that the application as a whole has sufficient performance capabilities while minimizing runtime costs.
Where background tasks have a different performance capability from the other parts of an application (for
example, the UI or components such as the data access layer), hosting the background tasks together in a
separate compute service allows the UI and background tasks to scale independently to manage the load. If
multiple background tasks have significantly different performance capabilities from each other, consider
dividing them and scaling each type independently. However, note that this might increase runtime costs.
Simply scaling the compute resources might not be sufficient to prevent loss of performance under load.
You might also need to scale storage queues and other resources to prevent a single point of the overall
processing chain from becoming a bottleneck. Also, consider other limitations, such as the maximum
throughput of storage and other services that the application and the background tasks rely on.
Background tasks must be designed for scaling. For example, they must be able to dynamically detect the
number of storage queues in use in order to listen on or send messages to the appropriate queue.
By default, WebJobs scale with their associated Azure Web Apps instance. However, if you want a WebJob to
run as only a single instance, you can create a Settings.job file that contains the JSON data {
"is_singleton": true } . This forces Azure to only run one instance of the WebJob, even if there are multiple
instances of the associated web app. This can be a useful technique for scheduled jobs that must run as only
a single instance.

Related patterns
Compute Partitioning Guidance
Caching
12/18/2020 • 55 minutes to read • Edit Online

Caching is a common technique that aims to improve the performance and scalability of a system. It does this by
temporarily copying frequently accessed data to fast storage that's located close to the application. If this fast data
storage is located closer to the application than the original source, then caching can significantly improve
response times for client applications by serving data more quickly.
Caching is most effective when a client instance repeatedly reads the same data, especially if all the following
conditions apply to the original data store:
It remains relatively static.
It's slow compared to the speed of the cache.
It's subject to a high level of contention.
It's far away when network latency can cause access to be slow.

Caching in distributed applications


Distributed applications typically implement either or both of the following strategies when caching data:
Using a private cache, where data is held locally on the computer that's running an instance of an application
or service.
Using a shared cache, serving as a common source that can be accessed by multiple processes and machines.
In both cases, caching can be performed client-side and server-side. Client-side caching is done by the process
that provides the user interface for a system, such as a web browser or desktop application. Server-side caching is
done by the process that provides the business services that are running remotely.
Private caching
The most basic type of cache is an in-memory store. It's held in the address space of a single process and accessed
directly by the code that runs in that process. This type of cache is quick to access. It can also provide an effective
means for storing modest amounts of static data, since the size of a cache is typically constrained by the amount
of memory available on the machine hosting the process.
If you need to cache more information than is physically possible in memory, you can write cached data to the
local file system. This will be slower to access than data held in memory, but should still be faster and more
reliable than retrieving data across a network.
If you have multiple instances of an application that uses this model running concurrently, each application
instance has its own independent cache holding its own copy of the data.
Think of a cache as a snapshot of the original data at some point in the past. If this data is not static, it is likely that
different application instances hold different versions of the data in their caches. Therefore, the same query
performed by these instances can return different results, as shown in Figure 1.
Figure 1: Using an in-memory cache in different instances of an application.
Shared caching
Using a shared cache can help alleviate concerns that data might differ in each cache, which can occur with in-
memory caching. Shared caching ensures that different application instances see the same view of cached data. It
does this by locating the cache in a separate location, typically hosted as part of a separate service, as shown in
Figure 2.

Figure 2: Using a shared cache.


An important benefit of the shared caching approach is the scalability it provides. Many shared cache services are
implemented by using a cluster of servers and use software to distribute the data across the cluster transparently.
An application instance simply sends a request to the cache service. The underlying infrastructure determines the
location of the cached data in the cluster. You can easily scale the cache by adding more servers.
There are two main disadvantages of the shared caching approach:
The cache is slower to access because it is no longer held locally to each application instance.
The requirement to implement a separate cache service might add complexity to the solution.

Considerations for using caching


The following sections describe in more detail the considerations for designing and using a cache.
Decide when to cache data
Caching can dramatically improve performance, scalability, and availability. The more data that you have and the
larger the number of users that need to access this data, the greater the benefits of caching become. That's
because caching reduces the latency and contention that's associated with handling large volumes of concurrent
requests in the original data store.
For example, a database might support a limited number of concurrent connections. Retrieving data from a
shared cache, however, rather than the underlying database, makes it possible for a client application to access
this data even if the number of available connections is currently exhausted. Additionally, if the database becomes
unavailable, client applications might be able to continue by using the data that's held in the cache.
Consider caching data that is read frequently but modified infrequently (for example, data that has a higher
proportion of read operations than write operations). However, we don't recommend that you use the cache as
the authoritative store of critical information. Instead, ensure that all changes that your application cannot afford
to lose are always saved to a persistent data store. This means that if the cache is unavailable, your application can
still continue to operate by using the data store, and you won't lose important information.
Determine how to cache data effectively
The key to using a cache effectively lies in determining the most appropriate data to cache, and caching it at the
appropriate time. The data can be added to the cache on demand the first time it is retrieved by an application.
This means that the application needs to fetch the data only once from the data store, and that subsequent access
can be satisfied by using the cache.
Alternatively, a cache can be partially or fully populated with data in advance, typically when the application starts
(an approach known as seeding). However, it might not be advisable to implement seeding for a large cache
because this approach can impose a sudden, high load on the original data store when the application starts
running.
Often an analysis of usage patterns can help you decide whether to fully or partially prepopulate a cache, and to
choose the data to cache. For example, it can be useful to seed the cache with the static user profile data for
customers who use the application regularly (perhaps every day), but not for customers who use the application
only once a week.
Caching typically works well with data that is immutable or that changes infrequently. Examples include reference
information such as product and pricing information in an e-commerce application, or shared static resources that
are costly to construct. Some or all of this data can be loaded into the cache at application startup to minimize
demand on resources and to improve performance. It might also be appropriate to have a background process
that periodically updates reference data in the cache to ensure it is up-to-date, or that refreshes the cache when
reference data changes.
Caching is less useful for dynamic data, although there are some exceptions to this consideration (see the section
Cache highly dynamic data later in this article for more information). When the original data changes regularly,
either the cached information becomes stale very quickly or the overhead of synchronizing the cache with the
original data store reduces the effectiveness of caching.
Note that a cache does not have to include the complete data for an entity. For example, if a data item represents a
multivalued object such as a bank customer with a name, address, and account balance, some of these elements
might remain static (such as the name and address), while others (such as the account balance) might be more
dynamic. In these situations, it can be useful to cache the static portions of the data and retrieve (or calculate) only
the remaining information when it is required.
We recommend that you carry out performance testing and usage analysis to determine whether prepopulating
or on-demand loading of the cache, or a combination of both, is appropriate. The decision should be based on the
volatility and usage pattern of the data. Cache utilization and performance analysis are particularly important in
applications that encounter heavy loads and must be highly scalable. For example, in highly scalable scenarios it
might make sense to seed the cache to reduce the load on the data store at peak times.
Caching can also be used to avoid repeating computations while the application is running. If an operation
transforms data or performs a complicated calculation, it can save the results of the operation in the cache. If the
same calculation is required afterward, the application can simply retrieve the results from the cache.
An application can modify data that's held in a cache. However, we recommend thinking of the cache as a transient
data store that could disappear at any time. Do not store valuable data in the cache only; make sure that you
maintain the information in the original data store as well. This means that if the cache becomes unavailable, you
minimize the chance of losing data.
Cache highly dynamic data
When you store rapidly changing information in a persistent data store, it can impose an overhead on the system.
For example, consider a device that continually reports status or some other measurement. If an application
chooses not to cache this data on the basis that the cached information will nearly always be outdated, then the
same consideration could be true when storing and retrieving this information from the data store. In the time it
takes to save and fetch this data, it might have changed.
In a situation such as this, consider the benefits of storing the dynamic information directly in the cache instead of
in the persistent data store. If the data is noncritical and does not require auditing, then it doesn't matter if the
occasional change is lost.
Manage data expiration in a cache
In most cases, data that's held in a cache is a copy of data that's held in the original data store. The data in the
original data store might change after it was cached, causing the cached data to become stale. Many caching
systems enable you to configure the cache to expire data and reduce the period for which data may be out of date.
When cached data expires, it's removed from the cache, and the application must retrieve the data from the
original data store (it can put the newly fetched information back into cache). You can set a default expiration
policy when you configure the cache. In many cache services, you can also stipulate the expiration period for
individual objects when you store them programmatically in the cache. Some caches enable you to specify the
expiration period as an absolute value, or as a sliding value that causes the item to be removed from the cache if it
is not accessed within the specified time. This setting overrides any cache-wide expiration policy, but only for the
specified objects.

NOTE
Consider the expiration period for the cache and the objects that it contains carefully. If you make it too short, objects will
expire too quickly and you will reduce the benefits of using the cache. If you make the period too long, you risk the data
becoming stale.

It's also possible that the cache might fill up if data is allowed to remain resident for a long time. In this case, any
requests to add new items to the cache might cause some items to be forcibly removed in a process known as
eviction. Cache services typically evict data on a least-recently-used (LRU) basis, but you can usually override this
policy and prevent items from being evicted. However, if you adopt this approach, you risk exceeding the memory
that's available in the cache. An application that attempts to add an item to the cache will fail with an exception.
Some caching implementations might provide additional eviction policies. There are several types of eviction
policies. These include:
A most-recently-used policy (in the expectation that the data will not be required again).
A first-in-first-out policy (oldest data is evicted first).
An explicit removal policy based on a triggered event (such as the data being modified).
Invalidate data in a client-side cache
Data that's held in a client-side cache is generally considered to be outside the auspices of the service that
provides the data to the client. A service cannot directly force a client to add or remove information from a client-
side cache.
This means that it's possible for a client that uses a poorly configured cache to continue using outdated
information. For example, if the expiration policies of the cache aren't properly implemented, a client might use
outdated information that's cached locally when the information in the original data source has changed.
If you are building a web application that serves data over an HTTP connection, you can implicitly force a web
client (such as a browser or web proxy) to fetch the most recent information. You can do this if a resource is
updated by a change in the URI of that resource. Web clients typically use the URI of a resource as the key in the
client-side cache, so if the URI changes, the web client ignores any previously cached versions of a resource and
fetches the new version instead.

Managing concurrency in a cache


Caches are often designed to be shared by multiple instances of an application. Each application instance can read
and modify data in the cache. Consequently, the same concurrency issues that arise with any shared data store
also apply to a cache. In a situation where an application needs to modify data that's held in the cache, you might
need to ensure that updates made by one instance of the application do not overwrite the changes made by
another instance.
Depending on the nature of the data and the likelihood of collisions, you can adopt one of two approaches to
concurrency:
Optimistic . Immediately prior to updating the data, the application checks to see whether the data in the
cache has changed since it was retrieved. If the data is still the same, the change can be made. Otherwise, the
application has to decide whether to update it. (The business logic that drives this decision will be application-
specific.) This approach is suitable for situations where updates are infrequent, or where collisions are unlikely
to occur.
Pessimistic . When it retrieves the data, the application locks it in the cache to prevent another instance from
changing it. This process ensures that collisions cannot occur, but they can also block other instances that need
to process the same data. Pessimistic concurrency can affect the scalability of a solution and is recommended
only for short-lived operations. This approach might be appropriate for situations where collisions are more
likely, especially if an application updates multiple items in the cache and must ensure that these changes are
applied consistently.
Implement high availability and scalability, and improve performance
Avoid using a cache as the primary repository of data; this is the role of the original data store from which the
cache is populated. The original data store is responsible for ensuring the persistence of the data.
Be careful not to introduce critical dependencies on the availability of a shared cache service into your solutions.
An application should be able to continue functioning if the service that provides the shared cache is unavailable.
The application should not become unresponsive or fail while waiting for the cache service to resume.
Therefore, the application must be prepared to detect the availability of the cache service and fall back to the
original data store if the cache is inaccessible. The Circuit-Breaker pattern is useful for handling this scenario. The
service that provides the cache can be recovered, and once it becomes available, the cache can be repopulated as
data is read from the original data store, following a strategy such as the Cache-aside pattern.
However, system scalability may be affected if the application falls back to the original data store when the cache
is temporarily unavailable. While the data store is being recovered, the original data store could be swamped with
requests for data, resulting in timeouts and failed connections.
Consider implementing a local, private cache in each instance of an application, together with the shared cache
that all application instances access. When the application retrieves an item, it can check first in its local cache,
then in the shared cache, and finally in the original data store. The local cache can be populated using the data in
either the shared cache, or in the database if the shared cache is unavailable.
This approach requires careful configuration to prevent the local cache from becoming too stale with respect to
the shared cache. However, the local cache acts as a buffer if the shared cache is unreachable. Figure 3 shows this
structure.

Figure 3: Using a local private cache with a shared cache.


To support large caches that hold relatively long-lived data, some cache services provide a high-availability option
that implements automatic failover if the cache becomes unavailable. This approach typically involves replicating
the cached data that's stored on a primary cache server to a secondary cache server, and switching to the
secondary server if the primary server fails or connectivity is lost.
To reduce the latency that's associated with writing to multiple destinations, the replication to the secondary
server might occur asynchronously when data is written to the cache on the primary server. This approach leads
to the possibility that some cached information might be lost in the event of a failure, but the proportion of this
data should be small compared to the overall size of the cache.
If a shared cache is large, it might be beneficial to partition the cached data across nodes to reduce the chances of
contention and improve scalability. Many shared caches support the ability to dynamically add (and remove)
nodes and rebalance the data across partitions. This approach might involve clustering, in which the collection of
nodes is presented to client applications as a seamless, single cache. Internally, however, the data is dispersed
between nodes following a predefined distribution strategy that balances the load evenly. For more information
about possible partitioning strategies, see Data partitioning guidance.
Clustering can also increase the availability of the cache. If a node fails, the remainder of the cache is still
accessible. Clustering is frequently used in conjunction with replication and failover. Each node can be replicated,
and the replica can be quickly brought online if the node fails.
Many read and write operations are likely to involve single data values or objects. However, at times it might be
necessary to store or retrieve large volumes of data quickly. For example, seeding a cache could involve writing
hundreds or thousands of items to the cache. An application might also need to retrieve a large number of related
items from the cache as part of the same request.
Many large-scale caches provide batch operations for these purposes. This enables a client application to package
up a large volume of items into a single request and reduces the overhead that's associated with performing a
large number of small requests.

Caching and eventual consistency


For the cache-aside pattern to work, the instance of the application that populates the cache must have access to
the most recent and consistent version of the data. In a system that implements eventual consistency (such as a
replicated data store) this might not be the case.
One instance of an application could modify a data item and invalidate the cached version of that item. Another
instance of the application might attempt to read this item from a cache, which causes a cache-miss, so it reads
the data from the data store and adds it to the cache. However, if the data store has not been fully synchronized
with the other replicas, the application instance could read and populate the cache with the old value.
For more information about handling data consistency, see the Data consistency primer.
Protect cached data
Irrespective of the cache service you use, consider how to protect the data that's held in the cache from
unauthorized access. There are two main concerns:
The privacy of the data in the cache.
The privacy of data as it flows between the cache and the application that's using the cache.
To protect data in the cache, the cache service might implement an authentication mechanism that requires that
applications specify the following:
Which identities can access data in the cache.
Which operations (read and write) that these identities are allowed to perform.
To reduce overhead that's associated with reading and writing data, after an identity has been granted write
and/or read access to the cache, that identity can use any data in the cache.
If you need to restrict access to subsets of the cached data, you can do one of the following:
Split the cache into partitions (by using different cache servers) and only grant access to identities for the
partitions that they should be allowed to use.
Encrypt the data in each subset by using different keys, and provide the encryption keys only to identities that
should have access to each subset. A client application might still be able to retrieve all of the data in the cache,
but it will only be able to decrypt the data for which it has the keys.
You must also protect the data as it flows in and out of the cache. To do this, you depend on the security features
provided by the network infrastructure that client applications use to connect to the cache. If the cache is
implemented using an on-site server within the same organization that hosts the client applications, then the
isolation of the network itself might not require you to take additional steps. If the cache is located remotely and
requires a TCP or HTTP connection over a public network (such as the Internet), consider implementing SSL.
Considerations for implementing caching in Azure
Azure Cache for Redis is an implementation of the open source Redis cache that runs as a service in an Azure
datacenter. It provides a caching service that can be accessed from any Azure application, whether the application
is implemented as a cloud service, a website, or inside an Azure virtual machine. Caches can be shared by client
applications that have the appropriate access key.
Azure Cache for Redis is a high-performance caching solution that provides availability, scalability and security. It
typically runs as a service spread across one or more dedicated machines. It attempts to store as much
information as it can in memory to ensure fast access. This architecture is intended to provide low latency and
high throughput by reducing the need to perform slow I/O operations.
Azure Cache for Redis is compatible with many of the various APIs that are used by client applications. If you have
existing applications that already use Azure Cache for Redis running on-premises, the Azure Cache for Redis
provides a quick migration path to caching in the cloud.
Features of Redis
Redis is more than a simple cache server. It provides a distributed in-memory database with an extensive
command set that supports many common scenarios. These are described later in this document, in the section
Using Redis caching. This section summarizes some of the key features that Redis provides.
Redis as an in-memory database
Redis supports both read and write operations. In Redis, writes can be protected from system failure either by
being stored periodically in a local snapshot file or in an append-only log file. This is not the case in many caches
(which should be considered transitory data stores).
All writes are asynchronous and do not block clients from reading and writing data. When Redis starts running, it
reads the data from the snapshot or log file and uses it to construct the in-memory cache. For more information,
see Redis persistence on the Redis website.

NOTE
Redis does not guarantee that all writes will be saved in the event of a catastrophic failure, but at worst you might lose only
a few seconds worth of data. Remember that a cache is not intended to act as an authoritative data source, and it is the
responsibility of the applications using the cache to ensure that critical data is saved successfully to an appropriate data
store. For more information, see the Cache-aside pattern.

Redis data types


Redis is a key-value store, where values can contain simple types or complex data structures such as hashes, lists,
and sets. It supports a set of atomic operations on these data types. Keys can be permanent or tagged with a
limited time-to-live, at which point the key and its corresponding value are automatically removed from the cache.
For more information about Redis keys and values, visit the page An introduction to Redis data types and
abstractions on the Redis website.
Redis replication and clustering
Redis supports primary/subordinate replication to help ensure availability and maintain throughput. Write
operations to a Redis primary node are replicated to one or more subordinate nodes. Read operations can be
served by the primary or any of the subordinates.
In the event of a network partition, subordinates can continue to serve data and then transparently resynchronize
with the primary when the connection is reestablished. For further details, visit the Replication page on the Redis
website.
Redis also provides clustering, which enables you to transparently partition data into shards across servers and
spread the load. This feature improves scalability, because new Redis servers can be added and the data
repartitioned as the size of the cache increases.
Furthermore, each server in the cluster can be replicated by using primary/subordinate replication. This ensures
availability across each node in the cluster. For more information about clustering and sharding, visit the Redis
cluster tutorial page on the Redis website.
Redis memory use
A Redis cache has a finite size that depends on the resources available on the host computer. When you configure
a Redis server, you can specify the maximum amount of memory it can use. You can also configure a key in a
Redis cache to have an expiration time, after which it is automatically removed from the cache. This feature can
help prevent the in-memory cache from filling with old or stale data.
As memory fills up, Redis can automatically evict keys and their values by following a number of policies. The
default is LRU (least recently used), but you can also select other policies such as evicting keys at random or
turning off eviction altogether (in which, case attempts to add items to the cache fail if it is full). The page Using
Redis as an LRU cache provides more information.
Redis transactions and batches
Redis enables a client application to submit a series of operations that read and write data in the cache as an
atomic transaction. All the commands in the transaction are guaranteed to run sequentially, and no commands
issued by other concurrent clients will be interwoven between them.
However, these are not true transactions as a relational database would perform them. Transaction processing
consists of two stages--the first is when the commands are queued, and the second is when the commands are
run. During the command queuing stage, the commands that comprise the transaction are submitted by the
client. If some sort of error occurs at this point (such as a syntax error, or the wrong number of parameters) then
Redis refuses to process the entire transaction and discards it.
During the run phase, Redis performs each queued command in sequence. If a command fails during this phase,
Redis continues with the next queued command and does not roll back the effects of any commands that have
already been run. This simplified form of transaction helps to maintain performance and avoid performance
problems that are caused by contention.
Redis does implement a form of optimistic locking to assist in maintaining consistency. For detailed information
about transactions and locking with Redis, visit the Transactions page on the Redis website.
Redis also supports nontransactional batching of requests. The Redis protocol that clients use to send commands
to a Redis server enables a client to send a series of operations as part of the same request. This can help to
reduce packet fragmentation on the network. When the batch is processed, each command is performed. If any of
these commands are malformed, they will be rejected (which doesn't happen with a transaction), but the
remaining commands will be performed. There is also no guarantee about the order in which the commands in
the batch will be processed.
Redis security
Redis is focused purely on providing fast access to data, and is designed to run inside a trusted environment that
can be accessed only by trusted clients. Redis supports a limited security model based on password
authentication. (It is possible to remove authentication completely, although we don't recommend this.)
All authenticated clients share the same global password and have access to the same resources. If you need more
comprehensive sign-in security, you must implement your own security layer in front of the Redis server, and all
client requests should pass through this additional layer. Redis should not be directly exposed to untrusted or
unauthenticated clients.
You can restrict access to commands by disabling them or renaming them (and by providing only privileged
clients with the new names).
Redis does not directly support any form of data encryption, so all encoding must be performed by client
applications. Additionally, Redis does not provide any form of transport security. If you need to protect data as it
flows across the network, we recommend implementing an SSL proxy.
For more information, visit the Redis security page on the Redis website.

NOTE
Azure Cache for Redis provides its own security layer through which clients connect. The underlying Redis servers are not
exposed to the public network.

Azure Redis cache


Azure Cache for Redis provides access to Redis servers that are hosted at an Azure datacenter. It acts as a façade
that provides access control and security. You can provision a cache by using the Azure portal.
The portal provides a number of predefined configurations. These range from a 53 GB cache running as a
dedicated service that supports SSL communications (for privacy) and master/subordinate replication with an
SLA of 99.9% availability, down to a 250 MB cache without replication (no availability guarantees) running on
shared hardware.
Using the Azure portal, you can also configure the eviction policy of the cache, and control access to the cache by
adding users to the roles provided. These roles, which define the operations that members can perform, include
Owner, Contributor, and Reader. For example, members of the Owner role have complete control over the cache
(including security) and its contents, members of the Contributor role can read and write information in the cache,
and members of the Reader role can only retrieve data from the cache.
Most administrative tasks are performed through the Azure portal. For this reason, many of the administrative
commands that are available in the standard version of Redis are not available, including the ability to modify the
configuration programmatically, shut down the Redis server, configure additional subordinates, or forcibly save
data to disk.
The Azure portal includes a convenient graphical display that enables you to monitor the performance of the
cache. For example, you can view the number of connections being made, the number of requests being
performed, the volume of reads and writes, and the number of cache hits versus cache misses. Using this
information, you can determine the effectiveness of the cache and if necessary, switch to a different configuration
or change the eviction policy.
Additionally, you can create alerts that send email messages to an administrator if one or more critical metrics fall
outside of an expected range. For example, you might want to alert an administrator if the number of cache
misses exceeds a specified value in the last hour, because it means the cache might be too small or data might be
being evicted too quickly.
You can also monitor the CPU, memory, and network usage for the cache.
For further information and examples showing how to create and configure an Azure Cache for Redis, visit the
page Lap around Azure Cache for Redis on the Azure blog.

Caching session state and HTML output


If you're building ASP.NET web applications that run by using Azure web roles, you can save session state
information and HTML output in an Azure Cache for Redis. The session state provider for Azure Cache for Redis
enables you to share session information between different instances of an ASP.NET web application, and is very
useful in web farm situations where client-server affinity is not available and caching session data in-memory
would not be appropriate.
Using the session state provider with Azure Cache for Redis delivers several benefits, including:
Sharing session state with a large number of instances of ASP.NET web applications.
Providing improved scalability.
Supporting controlled, concurrent access to the same session state data for multiple readers and a single
writer.
Using compression to save memory and improve network performance.
For more information, see ASP.NET session state provider for Azure Cache for Redis.

NOTE
Do not use the session state provider for Azure Cache for Redis with ASP.NET applications that run outside of the Azure
environment. The latency of accessing the cache from outside of Azure can eliminate the performance benefits of caching
data.

Similarly, the output cache provider for Azure Cache for Redis enables you to save the HTTP responses generated
by an ASP.NET web application. Using the output cache provider with Azure Cache for Redis can improve the
response times of applications that render complex HTML output. Application instances that generate similar
responses can use the shared output fragments in the cache rather than generating this HTML output afresh. For
more information, see ASP.NET output cache provider for Azure Cache for Redis.

Building a custom Redis cache


Azure Cache for Redis acts as a façade to the underlying Redis servers. If you require an advanced configuration
that is not covered by the Azure Redis cache (such as a cache bigger than 53 GB) you can build and host your own
Redis servers by using Azure virtual machines.
This is a potentially complex process because you might need to create several VMs to act as primary and
subordinate nodes if you want to implement replication. Furthermore, if you wish to create a cluster, then you
need multiple primaries and subordinate servers. A minimal clustered replication topology that provides a high
degree of availability and scalability comprises at least six VMs organized as three pairs of primary/subordinate
servers (a cluster must contain at least three primary nodes).
Each primary/subordinate pair should be located close together to minimize latency. However, each set of pairs
can be running in different Azure datacenters located in different regions, if you wish to locate cached data close
to the applications that are most likely to use it. For an example of building and configuring a Redis node running
as an Azure VM, see Running Redis on a CentOS Linux VM in Azure.

NOTE
If you implement your own Redis cache in this way, you are responsible for monitoring, managing, and securing the service.

Partitioning a Redis cache


Partitioning the cache involves splitting the cache across multiple computers. This structure gives you several
advantages over using a single cache server, including:
Creating a cache that is much bigger than can be stored on a single server.
Distributing data across servers, improving availability. If one server fails or becomes inaccessible, the data
that it holds is unavailable, but the data on the remaining servers can still be accessed. For a cache, this is not
crucial because the cached data is only a transient copy of the data that's held in a database. Cached data on a
server that becomes inaccessible can be cached on a different server instead.
Spreading the load across servers, thereby improving performance and scalability.
Geolocating data close to the users that access it, thus reducing latency.
For a cache, the most common form of partitioning is sharding. In this strategy, each partition (or shard) is a Redis
cache in its own right. Data is directed to a specific partition by using sharding logic, which can use a variety of
approaches to distribute the data. The Sharding pattern provides more information about implementing sharding.
To implement partitioning in a Redis cache, you can take one of the following approaches:
Server-side query routing. In this technique, a client application sends a request to any of the Redis servers
that comprise the cache (probably the closest server). Each Redis server stores metadata that describes the
partition that it holds, and also contains information about which partitions are located on other servers. The
Redis server examines the client request. If it can be resolved locally, it will perform the requested operation.
Otherwise it will forward the request on to the appropriate server. This model is implemented by Redis
clustering, and is described in more detail on the Redis cluster tutorial page on the Redis website. Redis
clustering is transparent to client applications, and additional Redis servers can be added to the cluster (and
the data re-partitioned) without requiring that you reconfigure the clients.
Client-side partitioning. In this model, the client application contains logic (possibly in the form of a library)
that routes requests to the appropriate Redis server. This approach can be used with Azure Cache for Redis.
Create multiple Azure Cache for Redis (one for each data partition) and implement the client-side logic that
routes the requests to the correct cache. If the partitioning scheme changes (if additional Azure Cache for Redis
are created, for example), client applications might need to be reconfigured.
Proxy-assisted partitioning. In this scheme, client applications send requests to an intermediary proxy service
which understands how the data is partitioned and then routes the request to the appropriate Redis server.
This approach can also be used with Azure Cache for Redis; the proxy service can be implemented as an Azure
cloud service. This approach requires an additional level of complexity to implement the service, and requests
might take longer to perform than using client-side partitioning.
The page Partitioning: how to split data among multiple Redis instances on the Redis website provides further
information about implementing partitioning with Redis.
Implement Redis cache client applications
Redis supports client applications written in numerous programming languages. If you are building new
applications by using the .NET Framework, the recommended approach is to use the StackExchange.Redis client
library. This library provides a .NET Framework object model that abstracts the details for connecting to a Redis
server, sending commands, and receiving responses. It is available in Visual Studio as a NuGet package. You can
use this same library to connect to an Azure Cache for Redis, or a custom Redis cache hosted on a VM.
To connect to a Redis server you use the static Connect method of the ConnectionMultiplexer class. The
connection that this method creates is designed to be used throughout the lifetime of the client application, and
the same connection can be used by multiple concurrent threads. Do not reconnect and disconnect each time you
perform a Redis operation because this can degrade performance.
You can specify the connection parameters, such as the address of the Redis host and the password. If you are
using Azure Cache for Redis, the password is either the primary or secondary key that is generated for Azure
Cache for Redis by using the Azure portal.
After you have connected to the Redis server, you can obtain a handle on the Redis database that acts as the cache.
The Redis connection provides the GetDatabase method to do this. You can then retrieve items from the cache and
store data in the cache by using the StringGet and StringSet methods. These methods expect a key as a
parameter, and return the item either in the cache that has a matching value ( StringGet ) or add the item to the
cache with this key ( StringSet ).
Depending on the location of the Redis server, many operations might incur some latency while a request is
transmitted to the server and a response is returned to the client. The StackExchange library provides
asynchronous versions of many of the methods that it exposes to help client applications remain responsive.
These methods support the Task-based Asynchronous pattern in the .NET Framework.
The following code snippet shows a method named RetrieveItem . It illustrates an implementation of the cache-
aside pattern based on Redis and the StackExchange library. The method takes a string key value and attempts to
retrieve the corresponding item from the Redis cache by calling the StringGetAsync method (the asynchronous
version of StringGet ).
If the item is not found, it is fetched from the underlying data source using the GetItemFromDataSourceAsync
method (which is a local method and not part of the StackExchange library). It's then added to the cache by using
the StringSetAsync method so it can be retrieved more quickly next time.

// Connect to the Azure Redis cache


ConfigurationOptions config = new ConfigurationOptions();
config.EndPoints.Add("<your DNS name>.redis.cache.windows.net");
config.Password = "<Redis cache key from management portal>";
ConnectionMultiplexer redisHostConnection = ConnectionMultiplexer.Connect(config);
IDatabase cache = redisHostConnection.GetDatabase();
...
private async Task<string> RetrieveItem(string itemKey)
{
// Attempt to retrieve the item from the Redis cache
string itemValue = await cache.StringGetAsync(itemKey);

// If the value returned is null, the item was not found in the cache
// So retrieve the item from the data source and add it to the cache
if (itemValue == null)
{
itemValue = await GetItemFromDataSourceAsync(itemKey);
await cache.StringSetAsync(itemKey, itemValue);
}

// Return the item


return itemValue;
}

The StringGet and StringSet methods are not restricted to retrieving or storing string values. They can take any
item that is serialized as an array of bytes. If you need to save a .NET object, you can serialize it as a byte stream
and use the StringSet method to write it to the cache.
Similarly, you can read an object from the cache by using the StringGet method and deserializing it as a .NET
object. The following code shows a set of extension methods for the IDatabase interface (the GetDatabase method
of a Redis connection returns an IDatabase object), and some sample code that uses these methods to read and
write a BlogPost object to the cache:
public static class RedisCacheExtensions
{
public static async Task<T> GetAsync<T>(this IDatabase cache, string key)
{
return Deserialize<T>(await cache.StringGetAsync(key));
}

public static async Task<object> GetAsync(this IDatabase cache, string key)


{
return Deserialize<object>(await cache.StringGetAsync(key));
}

public static async Task SetAsync(this IDatabase cache, string key, object value)
{
await cache.StringSetAsync(key, Serialize(value));
}

static byte[] Serialize(object o)


{
byte[] objectDataAsStream = null;

if (o != null)
{
BinaryFormatter binaryFormatter = new BinaryFormatter();
using (MemoryStream memoryStream = new MemoryStream())
{
binaryFormatter.Serialize(memoryStream, o);
objectDataAsStream = memoryStream.ToArray();
}
}

return objectDataAsStream;
}

static T Deserialize<T>(byte[] stream)


{
T result = default(T);

if (stream != null)
{
BinaryFormatter binaryFormatter = new BinaryFormatter();
using (MemoryStream memoryStream = new MemoryStream(stream))
{
result = (T)binaryFormatter.Deserialize(memoryStream);
}
}

return result;
}
}

The following code illustrates a method named RetrieveBlogPost that uses these extension methods to read and
write a serializable BlogPost object to the cache following the cache-aside pattern:
// The BlogPost type
[Serializable]
public class BlogPost
{
private HashSet<string> tags;

public BlogPost(int id, string title, int score, IEnumerable<string> tags)


{
this.Id = id;
this.Title = title;
this.Score = score;
this.tags = new HashSet<string>(tags);
}

public int Id { get; set; }


public string Title { get; set; }
public int Score { get; set; }
public ICollection<string> Tags => this.tags;
}
...
private async Task<BlogPost> RetrieveBlogPost(string blogPostKey)
{
BlogPost blogPost = await cache.GetAsync<BlogPost>(blogPostKey);
if (blogPost == null)
{
blogPost = await GetBlogPostFromDataSourceAsync(blogPostKey);
await cache.SetAsync(blogPostKey, blogPost);
}

return blogPost;
}

Redis supports command pipelining if a client application sends multiple asynchronous requests. Redis can
multiplex the requests using the same connection rather than receiving and responding to commands in a strict
sequence.
This approach helps to reduce latency by making more efficient use of the network. The following code snippet
shows an example that retrieves the details of two customers concurrently. The code submits two requests and
then performs some other processing (not shown) before waiting to receive the results. The Wait method of the
cache object is similar to the .NET Framework Task.Wait method:

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
var task1 = cache.StringGetAsync("customer:1");
var task2 = cache.StringGetAsync("customer:2");
...
var customer1 = cache.Wait(task1);
var customer2 = cache.Wait(task2);

For additional information on writing client applications that can the Azure Cache for Redis, see the Azure Cache
for Redis documentation. More information is also available at StackExchange.Redis.
The page Pipelines and multiplexers on the same website provides more information about asynchronous
operations and pipelining with Redis and the StackExchange library.

Using Redis caching


The simplest use of Redis for caching concerns is key-value pairs where the value is an uninterpreted string of
arbitrary length that can contain any binary data. (It is essentially an array of bytes that can be treated as a string).
This scenario was illustrated in the section Implement Redis Cache client applications earlier in this article.
Note that keys also contain uninterpreted data, so you can use any binary information as the key. The longer the
key is, however, the more space it will take to store, and the longer it will take to perform lookup operations. For
usability and ease of maintenance, design your keyspace carefully and use meaningful (but not verbose) keys.
For example, use structured keys such as "customer:100" to represent the key for the customer with ID 100 rather
than simply "100". This scheme enables you to easily distinguish between values that store different data types.
For example, you could also use the key "orders:100" to represent the key for the order with ID 100.
Apart from one-dimensional binary strings, a value in a Redis key-value pair can also hold more structured
information, including lists, sets (sorted and unsorted), and hashes. Redis provides a comprehensive command set
that can manipulate these types, and many of these commands are available to .NET Framework applications
through a client library such as StackExchange. The page An introduction to Redis data types and abstractions on
the Redis website provides a more detailed overview of these types and the commands that you can use to
manipulate them.
This section summarizes some common use cases for these data types and commands.
Perform atomic and batch operations
Redis supports a series of atomic get-and-set operations on string values. These operations remove the possible
race hazards that might occur when using separate GET and SET commands. The operations that are available
include:
INCR , INCRBY , DECR , and DECRBY , which perform atomic increment and decrement operations on integer
numeric data values. The StackExchange library provides overloaded versions of the
IDatabase.StringIncrementAsync and IDatabase.StringDecrementAsync methods to perform these
operations and return the resulting value that is stored in the cache. The following code snippet illustrates
how to use these methods:

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
await cache.StringSetAsync("data:counter", 99);
...
long oldValue = await cache.StringIncrementAsync("data:counter");
// Increment by 1 (the default)
// oldValue should be 100

long newValue = await cache.StringDecrementAsync("data:counter", 50);


// Decrement by 50
// newValue should be 50

GETSET , which retrieves the value that's associated with a key and changes it to a new value. The
StackExchange library makes this operation available through the IDatabase.StringGetSetAsync method.
The code snippet below shows an example of this method. This code returns the current value that's
associated with the key "data:counter" from the previous example. Then it resets the value for this key back
to zero, all as part of the same operation:

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
string oldValue = await cache.StringGetSetAsync("data:counter", 0);

MGET and MSET, which can return or change a set of string values as a single operation. The
IDatabase.StringGetAsync and IDatabase.StringSetAsync methods are overloaded to support this
functionality, as shown in the following example:
ConnectionMultiplexer redisHostConnection = ...;
IDatabase cache = redisHostConnection.GetDatabase();
...
// Create a list of key-value pairs
var keysAndValues =
new List<KeyValuePair<RedisKey, RedisValue>>()
{
new KeyValuePair<RedisKey, RedisValue>("data:key1", "value1"),
new KeyValuePair<RedisKey, RedisValue>("data:key99", "value2"),
new KeyValuePair<RedisKey, RedisValue>("data:key322", "value3")
};

// Store the list of key-value pairs in the cache


cache.StringSet(keysAndValues.ToArray());
...
// Find all values that match a list of keys
RedisKey[] keys = { "data:key1", "data:key99", "data:key322"};
// values should contain { "value1", "value2", "value3" }
RedisValue[] values = cache.StringGet(keys);

You can also combine multiple operations into a single Redis transaction as described in the Redis transactions
and batches section earlier in this article. The StackExchange library provides support for transactions through the
ITransaction interface.

You create an ITransaction object by using the IDatabase.CreateTransaction method. You invoke commands to
the transaction by using the methods provided by the ITransaction object.
The ITransaction interface provides access to a set of methods that's similar to those accessed by the IDatabase
interface, except that all the methods are asynchronous. This means that they are only performed when the
ITransaction.Execute method is invoked. The value that's returned by the ITransaction.Execute method
indicates whether the transaction was created successfully (true) or if it failed (false).
The following code snippet shows an example that increments and decrements two counters as part of the same
transaction:

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
ITransaction transaction = cache.CreateTransaction();
var tx1 = transaction.StringIncrementAsync("data:counter1");
var tx2 = transaction.StringDecrementAsync("data:counter2");
bool result = transaction.Execute();
Console.WriteLine("Transaction {0}", result ? "succeeded" : "failed");
Console.WriteLine("Result of increment: {0}", tx1.Result);
Console.WriteLine("Result of decrement: {0}", tx2.Result);

Remember that Redis transactions are unlike transactions in relational databases. The Execute method simply
queues all the commands that comprise the transaction to be run, and if any of them is malformed then the
transaction is stopped. If all the commands have been queued successfully, each command runs asynchronously.
If any command fails, the others still continue processing. If you need to verify that a command has completed
successfully, you must fetch the results of the command by using the Result property of the corresponding task,
as shown in the example above. Reading the Result property will block the calling thread until the task has
completed.
For more information, see Transactions in Redis.
When performing batch operations, you can use the IBatch interface of the StackExchange library. This interface
provides access to a set of methods similar to those accessed by the IDatabase interface, except that all the
methods are asynchronous.
You create an object by using the IDatabase.CreateBatch method, and then run the batch by using the
IBatch
IBatch.Execute method, as shown in the following example. This code simply sets a string value, increments and
decrements the same counters used in the previous example, and displays the results:

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
IBatch batch = cache.CreateBatch();
batch.StringSetAsync("data:key1", 11);
var t1 = batch.StringIncrementAsync("data:counter1");
var t2 = batch.StringDecrementAsync("data:counter2");
batch.Execute();
Console.WriteLine("{0}", t1.Result);
Console.WriteLine("{0}", t2.Result);

It is important to understand that unlike a transaction, if a command in a batch fails because it is malformed, the
other commands might still run. The IBatch.Execute method does not return any indication of success or failure.
Perform fire and forget cache operations
Redis supports fire and forget operations by using command flags. In this situation, the client simply initiates an
operation but has no interest in the result and does not wait for the command to be completed. The example
below shows how to perform the INCR command as a fire and forget operation:

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
await cache.StringSetAsync("data:key1", 99);
...
cache.StringIncrement("data:key1", flags: CommandFlags.FireAndForget);

Specify automatically expiring keys


When you store an item in a Redis cache, you can specify a timeout after which the item will be automatically
removed from the cache. You can also query how much more time a key has before it expires by using the TTL
command. This command is available to StackExchange applications by using the IDatabase.KeyTimeToLive
method.
The following code snippet shows how to set an expiration time of 20 seconds on a key, and query the remaining
lifetime of the key:

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
// Add a key with an expiration time of 20 seconds
await cache.StringSetAsync("data:key1", 99, TimeSpan.FromSeconds(20));
...
// Query how much time a key has left to live
// If the key has already expired, the KeyTimeToLive function returns a null
TimeSpan? expiry = cache.KeyTimeToLive("data:key1");

You can also set the expiration time to a specific date and time by using the EXPIRE command, which is available in
the StackExchange library as the KeyExpireAsync method:
ConnectionMultiplexer redisHostConnection = ...;
IDatabase cache = redisHostConnection.GetDatabase();
...
// Add a key with an expiration date of midnight on 1st January 2015
await cache.StringSetAsync("data:key1", 99);
await cache.KeyExpireAsync("data:key1",
new DateTime(2015, 1, 1, 0, 0, 0, DateTimeKind.Utc));
...

TIP
You can manually remove an item from the cache by using the DEL command, which is available through the StackExchange
library as the IDatabase.KeyDeleteAsync method.

Use tags to cross-correlate cached items


A Redis set is a collection of multiple items that share a single key. You can create a set by using the SADD
command. You can retrieve the items in a set by using the SMEMBERS command. The StackExchange library
implements the SADD command with the IDatabase.SetAddAsync method, and the SMEMBERS command with the
IDatabase.SetMembersAsync method.

You can also combine existing sets to create new sets by using the SDIFF (set difference), SINTER (set intersection),
and SUNION (set union) commands. The StackExchange library unifies these operations in the
IDatabase.SetCombineAsync method. The first parameter to this method specifies the set operation to perform.

The following code snippets show how sets can be useful for quickly storing and retrieving collections of related
items. This code uses the BlogPost type that was described in the section Implement Redis Cache Client
Applications earlier in this article.
A BlogPost object contains four fields—an ID, a title, a ranking score, and a collection of tags. The first code
snippet below shows the sample data that's used for populating a C# list of BlogPost objects:
List<string[]> tags = new List<string[]>
{
new[] { "iot","csharp" },
new[] { "iot","azure","csharp" },
new[] { "csharp","git","big data" },
new[] { "iot","git","database" },
new[] { "database","git" },
new[] { "csharp","database" },
new[] { "iot" },
new[] { "iot","database","git" },
new[] { "azure","database","big data","git","csharp" },
new[] { "azure" }
};

List<BlogPost> posts = new List<BlogPost>();


int blogKey = 1;
int numberOfPosts = 20;
Random random = new Random();
for (int i = 0; i < numberOfPosts; i++)
{
blogKey++;
posts.Add(new BlogPost(
blogKey, // Blog post ID
string.Format(CultureInfo.InvariantCulture, "Blog Post #{0}",
blogKey), // Blog post title
random.Next(100, 10000), // Ranking score
tags[i % tags.Count])); // Tags--assigned from a collection
// in the tags list
}

You can store the tags for each BlogPost object as a set in a Redis cache and associate each set with the ID of the
BlogPost . This enables an application to quickly find all the tags that belong to a specific blog post. To enable
searching in the opposite direction and find all blog posts that share a specific tag, you can create another set that
holds the blog posts referencing the tag ID in the key:

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
// Tags are easily represented as Redis Sets
foreach (BlogPost post in posts)
{
string redisKey = string.Format(CultureInfo.InvariantCulture,
"blog:posts:{0}:tags", post.Id);
// Add tags to the blog post in Redis
await cache.SetAddAsync(
redisKey, post.Tags.Select(s => (RedisValue)s).ToArray());

// Now do the inverse so we can figure out which blog posts have a given tag
foreach (var tag in post.Tags)
{
await cache.SetAddAsync(string.Format(CultureInfo.InvariantCulture,
"tag:{0}:blog:posts", tag), post.Id);
}
}

These structures enable you to perform many common queries very efficiently. For example, you can find and
display all of the tags for blog post 1 like this:
// Show the tags for blog post #1
foreach (var value in await cache.SetMembersAsync("blog:posts:1:tags"))
{
Console.WriteLine(value);
}

You can find all tags that are common to blog post 1 and blog post 2 by performing a set intersection operation,
as follows:

// Show the tags in common for blog posts #1 and #2


foreach (var value in await cache.SetCombineAsync(SetOperation.Intersect, new RedisKey[]
{ "blog:posts:1:tags", "blog:posts:2:tags" }))
{
Console.WriteLine(value);
}

And you can find all blog posts that contain a specific tag:

// Show the ids of the blog posts that have the tag "iot".
foreach (var value in await cache.SetMembersAsync("tag:iot:blog:posts"))
{
Console.WriteLine(value);
}

Find recently accessed items


A common task required of many applications is to find the most recently accessed items. For example, a blogging
site might want to display information about the most recently read blog posts.
You can implement this functionality by using a Redis list. A Redis list contains multiple items that share the same
key. The list acts as a double-ended queue. You can push items to either end of the list by using the LPUSH (left
push) and RPUSH (right push) commands. You can retrieve items from either end of the list by using the LPOP
and RPOP commands. You can also return a set of elements by using the LRANGE and RRANGE commands.
The code snippets below show how you can perform these operations by using the StackExchange library. This
code uses the BlogPost type from the previous examples. As a blog post is read by a user, the
IDatabase.ListLeftPushAsync method pushes the title of the blog post onto a list that's associated with the key
"blog:recent_posts" in the Redis cache.

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
string redisKey = "blog:recent_posts";
BlogPost blogPost = ...; // Reference to the blog post that has just been read
await cache.ListLeftPushAsync(
redisKey, blogPost.Title); // Push the blog post onto the list

As more blog posts are read, their titles are pushed onto the same list. The list is ordered by the sequence in which
the titles have been added. The most recently read blog posts are toward the left end of the list. (If the same blog
post is read more than once, it will have multiple entries in the list.)
You can display the titles of the most recently read posts by using the IDatabase.ListRange method. This method
takes the key that contains the list, a starting point, and an ending point. The following code retrieves the titles of
the 10 blog posts (items from 0 to 9) at the left-most end of the list:
// Show latest ten posts
foreach (string postTitle in await cache.ListRangeAsync(redisKey, 0, 9))
{
Console.WriteLine(postTitle);
}

Note that the ListRangeAsync method does not remove items from the list. To do this, you can use the
IDatabase.ListLeftPopAsync and IDatabase.ListRightPopAsync methods.
To prevent the list from growing indefinitely, you can periodically cull items by trimming the list. The code snippet
below shows you how to remove all but the five left-most items from the list:

await cache.ListTrimAsync(redisKey, 0, 5);

Implement a leader board


By default, the items in a set are not held in any specific order. You can create an ordered set by using the ZADD
command (the IDatabase.SortedSetAdd method in the StackExchange library). The items are ordered by using a
numeric value called a score, which is provided as a parameter to the command.
The following code snippet adds the title of a blog post to an ordered list. In this example, each blog post also has
a score field that contains the ranking of the blog post.

ConnectionMultiplexer redisHostConnection = ...;


IDatabase cache = redisHostConnection.GetDatabase();
...
string redisKey = "blog:post_rankings";
BlogPost blogPost = ...; // Reference to a blog post that has just been rated
await cache.SortedSetAddAsync(redisKey, blogPost.Title, blogPost.Score);

You can retrieve the blog post titles and scores in ascending score order by using the
IDatabase.SortedSetRangeByRankWithScores method:

foreach (var post in await cache.SortedSetRangeByRankWithScoresAsync(redisKey))


{
Console.WriteLine(post);
}

NOTE
The StackExchange library also provides the IDatabase.SortedSetRangeByRankAsync method, which returns the data in
score order, but does not return the scores.

You can also retrieve items in descending order of scores, and limit the number of items that are returned by
providing additional parameters to the IDatabase.SortedSetRangeByRankWithScoresAsync method. The next example
displays the titles and scores of the top 10 ranked blog posts:

foreach (var post in await cache.SortedSetRangeByRankWithScoresAsync(


redisKey, 0, 9, Order.Descending))
{
Console.WriteLine(post);
}

The next example uses the IDatabase.SortedSetRangeByScoreWithScoresAsync method, which you can use to limit
the items that are returned to those that fall within a given score range:

// Blog posts with scores between 5000 and 100000


foreach (var post in await cache.SortedSetRangeByScoreWithScoresAsync(
redisKey, 5000, 100000))
{
Console.WriteLine(post);
}

Message by using channels


Apart from acting as a data cache, a Redis server provides messaging through a high-performance
publisher/subscriber mechanism. Client applications can subscribe to a channel, and other applications or services
can publish messages to the channel. Subscribing applications will then receive these messages and can process
them.
Redis provides the SUBSCRIBE command for client applications to use to subscribe to channels. This command
expects the name of one or more channels on which the application will accept messages. The StackExchange
library includes the ISubscription interface, which enables a .NET Framework application to subscribe and
publish to channels.
You create an ISubscription object by using the GetSubscriber method of the connection to the Redis server.
Then you listen for messages on a channel by using the SubscribeAsync method of this object. The following code
example shows how to subscribe to a channel named "messages:blogPosts":

ConnectionMultiplexer redisHostConnection = ...;


ISubscriber subscriber = redisHostConnection.GetSubscriber();
...
await subscriber.SubscribeAsync("messages:blogPosts", (channel, message) => Console.WriteLine("Title is:
{0}", message));

The first parameter to the Subscribe method is the name of the channel. This name follows the same conventions
that are used by keys in the cache. The name can contain any binary data, although it is advisable to use relatively
short, meaningful strings to help ensure good performance and maintainability.
Note also that the namespace used by channels is separate from that used by keys. This means you can have
channels and keys that have the same name, although this may make your application code more difficult to
maintain.
The second parameter is an Action delegate. This delegate runs asynchronously whenever a new message
appears on the channel. This example simply displays the message on the console (the message will contain the
title of a blog post).
To publish to a channel, an application can use the Redis PUBLISH command. The StackExchange library provides
the IServer.PublishAsync method to perform this operation. The next code snippet shows how to publish a
message to the "messages:blogPosts" channel:

ConnectionMultiplexer redisHostConnection = ...;


ISubscriber subscriber = redisHostConnection.GetSubscriber();
...
BlogPost blogPost = ...;
subscriber.PublishAsync("messages:blogPosts", blogPost.Title);

There are several points you should understand about the publish/subscribe mechanism:
Multiple subscribers can subscribe to the same channel, and they will all receive the messages that are
published to that channel.
Subscribers only receive messages that have been published after they have subscribed. Channels are not
buffered, and once a message is published, the Redis infrastructure pushes the message to each subscriber and
then removes it.
By default, messages are received by subscribers in the order in which they are sent. In a highly active system
with a large number of messages and many subscribers and publishers, guaranteed sequential delivery of
messages can slow performance of the system. If each message is independent and the order is unimportant,
you can enable concurrent processing by the Redis system, which can help to improve responsiveness. You can
achieve this in a StackExchange client by setting the PreserveAsyncOrder of the connection used by the
subscriber to false:

ConnectionMultiplexer redisHostConnection = ...;


redisHostConnection.PreserveAsyncOrder = false;
ISubscriber subscriber = redisHostConnection.GetSubscriber();

Serialization considerations
When you choose a serialization format, consider tradeoffs between performance, interoperability, versioning,
compatibility with existing systems, data compression, and memory overhead. When you are evaluating
performance, remember that benchmarks are highly dependent on context. They may not reflect your actual
workload, and may not consider newer libraries or versions. There is no single "fastest" serializer for all scenarios.
Some options to consider include:
Protocol Buffers (also called protobuf) is a serialization format developed by Google for serializing
structured data efficiently. It uses strongly typed definition files to define message structures. These
definition files are then compiled to language-specific code for serializing and deserializing messages.
Protobuf can be used over existing RPC mechanisms, or it can generate an RPC service.
Apache Thrift uses a similar approach, with strongly typed definition files and a compilation step to
generate the serialization code and RPC services.
Apache Avro provides similar functionality to Protocol Buffers and Thrift, but there is no compilation step.
Instead, serialized data always includes a schema that describes the structure.
JSON is an open standard that uses human-readable text fields. It has broad cross-platform support. JSON
does not use message schemas. Being a text-based format, it is not very efficient over the wire. In some
cases, however, you may be returning cached items directly to a client via HTTP, in which case storing JSON
could save the cost of deserializing from another format and then serializing to JSON.
BSON is a binary serialization format that uses a structure similar to JSON. BSON was designed to be
lightweight, easy to scan, and fast to serialize and deserialize, relative to JSON. Payloads are comparable in
size to JSON. Depending on the data, a BSON payload may be smaller or larger than a JSON payload.
BSON has some additional data types that are not available in JSON, notably BinData (for byte arrays) and
Date.
MessagePack is a binary serialization format that is designed to be compact for transmission over the wire.
There are no message schemas or message type checking.
Bond is a cross-platform framework for working with schematized data. It supports cross-language
serialization and deserialization. Notable differences from other systems listed here are support for
inheritance, type aliases, and generics.
gRPC is an open-source RPC system developed by Google. By default, it uses Protocol Buffers as its
definition language and underlying message interchange format.

Related patterns and guidance


The following patterns might also be relevant to your scenario when you implement caching in your applications:
Cache-aside pattern: This pattern describes how to load data on demand into a cache from a data store.
This pattern also helps to maintain consistency between data that's held in the cache and the data in the
original data store.
The Sharding pattern provides information about implementing horizontal partitioning to help improve
scalability when storing and accessing large volumes of data.

More information
Azure Cache for Redis documentation
Azure Cache for Redis FAQ
Task-based Asynchronous pattern
Redis documentation
StackExchange.Redis
Data partitioning guide
Best practices for using content delivery networks
(CDNs)
12/18/2020 • 8 minutes to read • Edit Online

A content delivery network (CDN) is a distributed network of servers that can efficiently deliver web content to
users. CDNs store cached content on edge servers that are close to end users to minimize latency.
CDNs are typically used to deliver static content such as images, style sheets, documents, client-side scripts, and
HTML pages. The major advantages of using a CDN are lower latency and faster delivery of content to users,
regardless of their geographical location in relation to the datacenter where the application is hosted. CDNs can
also help to reduce load on a web application, because the application does not have to service requests for the
content that is hosted in the CDN.

In Azure, the Azure Content Delivery Network is a global CDN solution for delivering high-bandwidth content that
is hosted in Azure or any other location. Using Azure CDN, you can cache publicly available objects loaded from
Azure blob storage, a web application, virtual machine, any publicly accessible web server.
This topic describes some general best practices and considerations when using a CDN. For more information, see
Azure CDN.

How and why a CDN is used


Typical uses for a CDN include:
Delivering static resources for client applications, often from a website. These resources can be images, style
sheets, documents, files, client-side scripts, HTML pages, HTML fragments, or any other content that the
server does not need to modify for each request. The application can create items at runtime and make
them available to the CDN (for example, by creating a list of current news headlines), but it does not do so
for each request.
Delivering public static and shared content to devices such as mobile phones and tablet computers. The
application itself is a web service that offers an API to clients running on the various devices. The CDN can
also deliver static datasets (via the web service) for the clients to use, perhaps to generate the client UI. For
example, the CDN could be used to distribute JSON or XML documents.
Serving entire websites that consist of only public static content to clients, without requiring any dedicated
compute resources.
Streaming video files to the client on demand. Video benefits from the low latency and reliable connectivity
available from the globally located datacenters that offer CDN connections. Microsoft Azure Media Services
(AMS) integrates with Azure CDN to deliver content directly to the CDN for further distribution. For more
information, see Streaming endpoints overview.
Generally improving the experience for users, especially those located far from the datacenter hosting the
application. These users might otherwise suffer higher latency. A large proportion of the total size of the
content in a web application is often static, and using the CDN can help to maintain performance and
overall user experience while eliminating the requirement to deploy the application to multiple datacenters.
For a list of Azure CDN node locations, see Azure CDN POP Locations.
Supporting IoT (Internet of Things) solutions. The huge numbers of devices and appliances involved in an
IoT solution could easily overwhelm an application if it had to distribute firmware updates directly to each
device.
Coping with peaks and surges in demand without requiring the application to scale, avoiding the
consequent increase in running costs. For example, when an update to an operating system is released for a
hardware device such as a specific model of router, or for a consumer device such as a smart TV, there will
be a huge peak in demand as it is downloaded by millions of users and devices over a short period.

Challenges
There are several challenges to take into account when planning to use a CDN.
Deployment . Decide the origin from which the CDN fetches the content, and whether you need to deploy
the content in more than one storage system. Take into account the process for deploying static content and
resources. For example, you may need to implement a separate step to load content into Azure blob
storage.
Versioning and cache-control . Consider how you will update static content and deploy new versions.
Understand how the CDN performs caching and time-to-live (TTL). For Azure CDN, see How caching works.
Testing . It can be difficult to perform local testing of your CDN settings when developing and testing an
application locally or in a staging environment.
Search engine optimization (SEO) . Content such as images and documents are served from a different
domain when you use the CDN. This can have an effect on SEO for this content.
Content security . Not all CDNs offer any form of access control for the content. Some CDN services,
including Azure CDN, support token-based authentication to protect CDN content. For more information,
see Securing Azure Content Delivery Network assets with token authentication.
Client security . Clients might connect from an environment that does not allow access to resources on the
CDN. This could be a security-constrained environment that limits access to only a set of known sources, or
one that prevents loading of resources from anything other than the page origin. A fallback implementation
is required to handle these cases.
Resilience . The CDN is a potential single point of failure for an application.
Scenarios where a CDN may be less useful include:
If the content has a low hit rate, it might be accessed only few times while it is valid (determined by its time-
to-live setting).
If the data is private, such as for large enterprises or supply chain ecosystems.

General guidelines and good practices


Using a CDN is a good way to minimize the load on your application, and maximize availability and performance.
Consider adopting this strategy for all of the appropriate content and resources your application uses. Consider
the points in the following sections when designing your strategy to use a CDN.
Deployment
Static content may need to be provisioned and deployed independently from the application if you do not include
it in the application deployment package or process. Consider how this will affect the versioning approach you use
to manage both the application components and the static resource content.
Consider using bundling and minification techniques to reduce load times for clients. Bundling combines multiple
files into a single file. Minification removes unnecessary characters from scripts and CSS files without altering
functionality.
If you need to deploy the content to an additional location, this will be an extra step in the deployment process. If
the application updates the content for the CDN, perhaps at regular intervals or in response to an event, it must
store the updated content in any additional locations as well as the endpoint for the CDN.
Consider how you will handle local development and testing when some static content is expected to be served
from a CDN. For example, you could predeploy the content to the CDN as part of your build script. Alternatively,
use compile directives or flags to control how the application loads the resources. For example, in debug mode, the
application could load static resources from a local folder. In release mode, the application would use the CDN.
Consider the options for file compression, such as gzip (GNU zip). Compression may be performed on the origin
server by the web application hosting or directly on the edge servers by the CDN. For more information, see
Improve performance by compressing files in Azure CDN.
Routing and versioning
You may need to use different CDN instances at various times. For example, when you deploy a new version of the
application you may want to use a new CDN and retain the old CDN (holding content in an older format) for
previous versions. If you use Azure blob storage as the content origin, you can create a separate storage account
or a separate container and point the CDN endpoint to it.
Do not use the query string to denote different versions of the application in links to resources on the CDN
because, when retrieving content from Azure blob storage, the query string is part of the resource name (the blob
name). This approach can also affect how the client caches resources.
Deploying new versions of static content when you update an application can be a challenge if the previous
resources are cached on the CDN. For more information, see the section on cache control, below.
Consider restricting the CDN content access by country/region. Azure CDN allows you to filter requests based on
the country or region of origin and restrict the content delivered. For more information, see Restrict access to your
content by country/region.
Cache control
Consider how to manage caching within the system. For example, in Azure CDN, you can set global caching rules,
and then set custom caching for particular origin endpoints. You can also control how caching is performed in a
CDN by sending cache-directive headers at the origin.
For more information, see How caching works.
To prevent objects from being available on the CDN, you can delete them from the origin, remove or delete the
CDN endpoint, or in the case of blob storage, make the container or blob private. However, items are not removed
from the CDN until the time-to-live expires. You can also manually purge a CDN endpoint.
Security
The CDN can deliver content over HTTPS (SSL), by using the certificate provided by the CDN, as well as over
standard HTTP. To avoid browser warnings about mixed content, you might need to use HTTPS to request static
content that is displayed in pages loaded through HTTPS.
If you deliver static assets such as font files by using the CDN, you might encounter same-origin policy issues if
you use an XMLHttpRequest call to request these resources from a different domain. Many web browsers prevent
cross-origin resource sharing (CORS) unless the web server is configured to set the appropriate response headers.
You can configure the CDN to support CORS by using one of the following methods:
Configure the CDN to add CORS headers to the responses. For more information, see Using Azure CDN
with CORS.
If the origin is Azure blob storage, add CORS rules to the storage endpoint. For more information, see
Cross-Origin Resource Sharing (CORS) Support for the Azure Storage Services.
Configure the application to set the CORS headers. For example, see Enabling Cross-Origin Requests
(CORS) in the ASP.NET Core documentation.
CDN fallback
Consider how your application will cope with a failure or temporary unavailability of the CDN. Client applications
may be able to use copies of the resources that were cached locally (on the client) during previous requests, or you
can include code that detects failure and instead requests resources from the origin (the application folder or
Azure blob container that holds the resources) if the CDN is unavailable.
Horizontal, vertical, and functional data partitioning
12/18/2020 • 17 minutes to read • Edit Online

In many large-scale solutions, data is divided into partitions that can be managed and accessed separately.
Partitioning can improve scalability, reduce contention, and optimize performance. It can also provide a
mechanism for dividing data by usage pattern. For example, you can archive older data in cheaper data storage.
However, the partitioning strategy must be chosen carefully to maximize the benefits while minimizing adverse
effects.

NOTE
In this article, the term partitioning means the process of physically dividing data into separate data stores. It is not the
same as SQL Server table partitioning.

Why partition data?


Improve scalability . When you scale up a single database system, it will eventually reach a physical
hardware limit. If you divide data across multiple partitions, each hosted on a separate server, you can
scale out the system almost indefinitely.
Improve performance . Data access operations on each partition take place over a smaller volume of
data. Correctly done, partitioning can make your system more efficient. Operations that affect more than
one partition can run in parallel.
Improve security . In some cases, you can separate sensitive and nonsensitive data into different
partitions and apply different security controls to the sensitive data.
Provide operational flexibility . Partitioning offers many opportunities for fine-tuning operations,
maximizing administrative efficiency, and minimizing cost. For example, you can define different strategies
for management, monitoring, backup and restore, and other administrative tasks based on the
importance of the data in each partition.
Match the data store to the pattern of use . Partitioning allows each partition to be deployed on a
different type of data store, based on cost and the built-in features that data store offers. For example,
large binary data can be stored in blob storage, while more structured data can be held in a document
database. See Choose the right data store.
Improve availability . Separating data across multiple servers avoids a single point of failure. If one
instance fails, only the data in that partition is unavailable. Operations on other partitions can continue.
For managed PaaS data stores, this consideration is less relevant, because these services are designed
with built-in redundancy.

Designing partitions
There are three typical strategies for partitioning data:
Horizontal par titioning (often called sharding). In this strategy, each partition is a separate data store,
but all partitions have the same schema. Each partition is known as a shard and holds a specific subset of
the data, such as all the orders for a specific set of customers.
Ver tical par titioning . In this strategy, each partition holds a subset of the fields for items in the data
store. The fields are divided according to their pattern of use. For example, frequently accessed fields
might be placed in one vertical partition and less frequently accessed fields in another.
Functional par titioning . In this strategy, data is aggregated according to how it is used by each
bounded context in the system. For example, an e-commerce system might store invoice data in one
partition and product inventory data in another.
These strategies can be combined, and we recommend that you consider them all when you design a
partitioning scheme. For example, you might divide data into shards and then use vertical partitioning to further
subdivide the data in each shard.
Horizontal partitioning (sharding)
Figure 1 shows horizontal partitioning or sharding. In this example, product inventory data is divided into shards
based on the product key. Each shard holds the data for a contiguous range of shard keys (A-G and H-Z),
organized alphabetically. Sharding spreads the load over more computers, which reduces contention and
improves performance.

Figure 1 - Horizontally partitioning (sharding) data based on a partition key.


The most important factor is the choice of a sharding key. It can be difficult to change the key after the system is
in operation. The key must ensure that data is partitioned to spread the workload as evenly as possible across
the shards.
The shards don't have to be the same size. It's more important to balance the number of requests. Some shards
might be very large, but each item has a low number of access operations. Other shards might be smaller, but
each item is accessed much more frequently. It's also important to ensure that a single shard does not exceed the
scale limits (in terms of capacity and processing resources) of the data store.
Avoid creating "hot" partitions that can affect performance and availability. For example, using the first letter of a
customer's name causes an unbalanced distribution, because some letters are more common. Instead, use a hash
of a customer identifier to distribute data more evenly across partitions.
Choose a sharding key that minimizes any future requirements to split large shards, coalesce small shards into
larger partitions, or change the schema. These operations can be very time consuming, and might require taking
one or more shards offline while they are performed.
If shards are replicated, it might be possible to keep some of the replicas online while others are split, merged, or
reconfigured. However, the system might need to limit the operations that can be performed during the
reconfiguration. For example, the data in the replicas might be marked as read-only to prevent data
inconsistences.
For more information about horizontal partitioning, see sharding pattern.
Vertical partitioning
The most common use for vertical partitioning is to reduce the I/O and performance costs associated with
fetching items that are frequently accessed. Figure 2 shows an example of vertical partitioning. In this example,
different properties of an item are stored in different partitions. One partition holds data that is accessed more
frequently, including product name, description, and price. Another partition holds inventory data: the stock
count and last-ordered date.

Figure 2 - Vertically partitioning data by its pattern of use.


In this example, the application regularly queries the product name, description, and price when displaying the
product details to customers. Stock count and last- ordered date are held in a separate partition because these
two items are commonly used together.
Other advantages of vertical partitioning:
Relatively slow-moving data (product name, description, and price) can be separated from the more
dynamic data (stock level and last ordered date). Slow moving data is a good candidate for an application
to cache in memory.
Sensitive data can be stored in a separate partition with additional security controls.
Vertical partitioning can reduce the amount of concurrent access that's needed.
Vertical partitioning operates at the entity level within a data store, partially normalizing an entity to break it
down from a wide item to a set of narrow items. It is ideally suited for column-oriented data stores such as
HBase and Cassandra. If the data in a collection of columns is unlikely to change, you can also consider using
column stores in SQL Server.
Functional partitioning
When it's possible to identify a bounded context for each distinct business area in an application, functional
partitioning is a way to improve isolation and data access performance. Another common use for functional
partitioning is to separate read-write data from read-only data. Figure 3 shows an overview of functional
partitioning where inventory data is separated from customer data.
Figure 3 - Functionally partitioning data by bounded context or subdomain.
This partitioning strategy can help reduce data access contention across different parts of a system.

Designing partitions for scalability


It's vital to consider size and workload for each partition and balance them so that data is distributed to achieve
maximum scalability. However, you must also partition the data so that it does not exceed the scaling limits of a
single partition store.
Follow these steps when designing partitions for scalability:
1. Analyze the application to understand the data access patterns, such as the size of the result set returned by
each query, the frequency of access, the inherent latency, and the server-side compute processing
requirements. In many cases, a few major entities will demand most of the processing resources.
2. Use this analysis to determine the current and future scalability targets, such as data size and workload. Then
distribute the data across the partitions to meet the scalability target. For horizontal partitioning, choosing the
right shard key is important to make sure distribution is even. For more information, see the sharding pattern.
3. Make sure each partition has enough resources to handle the scalability requirements, in terms of data size
and throughput. Depending on the data store, there might be a limit on the amount of storage space,
processing power, or network bandwidth per partition. If the requirements are likely to exceed these limits,
you may need to refine your partitioning strategy or split data out further, possibly combining two or more
strategies.
4. Monitor the system to verify that data is distributed as expected and that the partitions can handle the load.
Actual usage does not always match what an analysis predicts. If so, it might be possible to rebalance the
partitions, or else redesign some parts of the system to gain the required balance.
Some cloud environments allocate resources in terms of infrastructure boundaries. Ensure that the limits of your
selected boundary provide enough room for any anticipated growth in the volume of data, in terms of data
storage, processing power, and bandwidth.
For example, if you use Azure table storage, there is a limit to the volume of requests that can be handled by a
single partition in a particular period of time. (For more information, see Azure storage scalability and
performance targets.) A busy shard might require more resources than a single partition can handle. If so, the
shard might need to be repartitioned to spread the load. If the total size or throughput of these tables exceeds
the capacity of a storage account, you might need to create additional storage accounts and spread the tables
across these accounts.
Designing partitions for query performance
Query performance can often be boosted by using smaller data sets and by running parallel queries. Each
partition should contain a small proportion of the entire data set. This reduction in volume can improve the
performance of queries. However, partitioning is not an alternative for designing and configuring a database
appropriately. For example, make sure that you have the necessary indexes in place.
Follow these steps when designing partitions for query performance:
1. Examine the application requirements and performance:
Use business requirements to determine the critical queries that must always perform quickly.
Monitor the system to identify any queries that perform slowly.
Find which queries are performed most frequently. Even if a single query has a minimal cost, the
cumulative resource consumption could be significant.
2. Partition the data that is causing slow performance:
Limit the size of each partition so that the query response time is within target.
If you use horizontal partitioning, design the shard key so that the application can easily select the right
partition. This prevents the query from having to scan through every partition.
Consider the location of a partition. If possible, try to keep data in partitions that are geographically
close to the applications and users that access it.
3. If an entity has throughput and query performance requirements, use functional partitioning based on
that entity. If this still doesn't satisfy the requirements, apply horizontal partitioning as well. In most cases,
a single partitioning strategy will suffice, but in some cases it is more efficient to combine both strategies.
4. Consider running queries in parallel across partitions to improve performance.

Designing partitions for availability


Partitioning data can improve the availability of applications by ensuring that the entire dataset does not
constitute a single point of failure and that individual subsets of the dataset can be managed independently.
Consider the following factors that affect availability:
How critical the data is to business operations . Identify which data is critical business information, such as
transactions, and which data is less critical operational data, such as log files.
Consider storing critical data in highly available partitions with an appropriate backup plan.
Establish separate management and monitoring procedures for the different datasets.
Place data that has the same level of criticality in the same partition so that it can be backed up together at
an appropriate frequency. For example, partitions that hold transaction data might need to be backed up
more frequently than partitions that hold logging or trace information.
How individual par titions can be managed . Designing partitions to support independent management and
maintenance provides several advantages. For example:
If a partition fails, it can be recovered independently without applications that access data in other
partitions.
Partitioning data by geographical area allows scheduled maintenance tasks to occur at off-peak hours for
each location. Ensure that partitions are not too large to prevent any planned maintenance from being
completed during this period.
Whether to replicate critical data across par titions . This strategy can improve availability and
performance, but can also introduce consistency issues. It takes time to synchronize changes with every replica.
During this period, different partitions will contain different data values.

Application design considerations


Partitioning adds complexity to the design and development of your system. Consider partitioning as a
fundamental part of system design even if the system initially only contains a single partition. If you address
partitioning as an afterthought, it will be more challenging because you already have a live system to maintain:
Data access logic will need to be modified.
Large quantities of existing data may need to be migrated, to distribute it across partitions.
Users expect to be able to continue using the system during the migration.
In some cases, partitioning is not considered important because the initial dataset is small and can be easily
handled by a single server. This might be true for some workloads, but many commercial systems need to
expand as the number of users increases.
Moreover, it's not only large data stores that benefit from partitioning. For example, a small data store might be
heavily accessed by hundreds of concurrent clients. Partitioning the data in this situation can help to reduce
contention and improve throughput.
Consider the following points when you design a data partitioning scheme:
Minimize cross-par tition data access operations . Where possible, keep data for the most common
database operations together in each partition to minimize cross-partition data access operations. Querying
across partitions can be more time-consuming than querying within a single partition, but optimizing partitions
for one set of queries might adversely affect other sets of queries. If you must query across partitions, minimize
query time by running parallel queries and aggregating the results within the application. (This approach might
not be possible in some cases, such as when the result from one query is used in the next query.)
Consider replicating static reference data. If queries use relatively static reference data, such as postal code
tables or product lists, consider replicating this data in all of the partitions to reduce separate lookup operations
in different partitions. This approach can also reduce the likelihood of the reference data becoming a "hot"
dataset, with heavy traffic from across the entire system. However, there is an additional cost associated with
synchronizing any changes to the reference data.
Minimize cross-par tition joins. Where possible, minimize requirements for referential integrity across
vertical and functional partitions. In these schemes, the application is responsible for maintaining referential
integrity across partitions. Queries that join data across multiple partitions are inefficient because the application
typically needs to perform consecutive queries based on a key and then a foreign key. Instead, consider
replicating or de-normalizing the relevant data. If cross-partition joins are necessary, run parallel queries over
the partitions and join the data within the application.
Embrace eventual consistency . Evaluate whether strong consistency is actually a requirement. A common
approach in distributed systems is to implement eventual consistency. The data in each partition is updated
separately, and the application logic ensures that the updates are all completed successfully. It also handles the
inconsistencies that can arise from querying data while an eventually consistent operation is running.
Consider how queries locate the correct par tition . If a query must scan all partitions to locate the required
data, there is a significant impact on performance, even when multiple parallel queries are running. With vertical
and functional partitioning, queries can naturally specify the partition. Horizontal partitioning, on the other hand,
can make locating an item difficult, because every shard has the same schema. A typical solution to maintain a
map that is used to look up the shard location for specific items. This map can be implemented in the sharding
logic of the application, or maintained by the data store if it supports transparent sharding.
Consider periodically rebalancing shards . With horizontal partitioning, rebalancing shards can help
distribute the data evenly by size and by workload to minimize hotspots, maximize query performance, and work
around physical storage limitations. However, this is a complex task that often requires the use of a custom tool
or process.
Replicate par titions. If you replicate each partition, it provides additional protection against failure. If a single
replica fails, queries can be directed toward a working copy.
If you reach the physical limits of a par titioning strategy, you might need to extend the scalability
to a different level . For example, if partitioning is at the database level, you might need to locate or replicate
partitions in multiple databases. If partitioning is already at the database level, and physical limitations are an
issue, it might mean that you need to locate or replicate partitions in multiple hosting accounts.
Avoid transactions that access data in multiple par titions . Some data stores implement transactional
consistency and integrity for operations that modify data, but only when the data is located in a single partition.
If you need transactional support across multiple partitions, you will probably need to implement this as part of
your application logic because most partitioning systems do not provide native support.
All data stores require some operational management and monitoring activity. The tasks can range from loading
data, backing up and restoring data, reorganizing data, and ensuring that the system is performing correctly and
efficiently.
Consider the following factors that affect operational management:
How to implement appropriate management and operational tasks when the data is
par titioned . These tasks might include backup and restore, archiving data, monitoring the system, and
other administrative tasks. For example, maintaining logical consistency during backup and restore
operations can be a challenge.
How to load the data into multiple par titions and add new data that's arriving from other
sources . Some tools and utilities might not support sharded data operations such as loading data into
the correct partition.
How to archive and delete the data on a regular basis . To prevent the excessive growth of
partitions, you need to archive and delete data on a regular basis (such as monthly). It might be necessary
to transform the data to match a different archive schema.
How to locate data integrity issues . Consider running a periodic process to locate any data integrity
issues, such as data in one partition that references missing information in another. The process can either
attempt to fix these issues automatically or generate a report for manual review.

Rebalancing partitions
As a system matures, you might have to adjust the partitioning scheme. For example, individual partitions might
start getting a disproportionate volume of traffic and become hot, leading to excessive contention. Or you might
have underestimated the volume of data in some partitions, causing some partitions to approach capacity limits.
Some data stores, such as Cosmos DB, can automatically rebalance partitions. In other cases, rebalancing is an
administrative task that consists of two stages:
1. Determine a new partitioning strategy.
Which partitions need to be split (or possibly combined)?
What is the new partition key?
2. Migrate data from the old partitioning scheme to the new set of partitions.
Depending on the data store, you might be able to migrate data between partitions while they are in use. This is
called online migration. If that's not possible, you might need to make partitions unavailable while the data is
relocated (offline migration).
Offline migration
Offline migration is typically simpler because it reduces the chances of contention occurring. Conceptually, offline
migration works as follows:
1. Mark the partition offline.
2. Split-merge and move the data to the new partitions.
3. Verify the data.
4. Bring the new partitions online.
5. Remove the old partition.
Optionally, you can mark a partition as read-only in step 1, so that applications can still read the data while it is
being moved.

Online migration
Online migration is more complex to perform but less disruptive. The process is similar to offline migration,
except the original partition is not marked offline. Depending on the granularity of the migration process (for
example, item by item versus shard by shard), the data access code in the client applications might have to
handle reading and writing data that's held in two locations, the original partition and the new partition.

Related patterns
The following design patterns might be relevant to your scenario:
The sharding pattern describes some common strategies for sharding data.
The index table pattern shows how to create secondary indexes over data. An application can quickly
retrieve data with this approach, by using queries that do not reference the primary key of a collection.
The materialized view pattern describes how to generate prepopulated views that summarize data to
support fast query operations. This approach can be useful in a partitioned data store if the partitions that
contain the data being summarized are distributed across multiple sites.

Next steps
Learn about partitioning strategies for specific Azure services. See Data partitioning strategies
Data partitioning strategies
12/18/2020 • 31 minutes to read • Edit Online

This article describes some strategies for partitioning data in various Azure data stores. For general guidance about
when to partition data and best practices, see Data partitioning.

Partitioning Azure SQL Database


A single SQL database has a limit to the volume of data that it can contain. Throughput is constrained by
architectural factors and the number of concurrent connections that it supports.
Elastic pools support horizontal scaling for a SQL database. Using elastic pools, you can partition your data into
shards that are spread across multiple SQL databases. You can also add or remove shards as the volume of data
that you need to handle grows and shrinks. Elastic pools can also help reduce contention by distributing the load
across databases.
Each shard is implemented as a SQL database. A shard can hold more than one dataset (called a shardlet). Each
database maintains metadata that describes the shardlets that it contains. A shardlet can be a single data item, or a
group of items that share the same shardlet key. For example, in a multitenant application, the shardlet key can be
the tenant ID, and all data for a tenant can be held in the same shardlet.
Client applications are responsible for associating a dataset with a shardlet key. A separate SQL database acts as a
global shard map manager. This database has a list of all the shards and shardlets in the system. The application
connects to the shard map manager database to obtain a copy of the shard map. It caches the shard map locally,
and uses the map to route data requests to the appropriate shard. This functionality is hidden behind a series of
APIs that are contained in the Elastic Database client library, which is available for Java and .NET.
For more information about elastic pools, see Scaling out with Azure SQL Database.
To reduce latency and improve availability, you can replicate the global shard map manager database. With the
Premium pricing tiers, you can configure active geo-replication to continuously copy data to databases in different
regions.
Alternatively, use Azure SQL Data Sync or Azure Data Factory to replicate the shard map manager database across
regions. This form of replication runs periodically and is more suitable if the shard map changes infrequently, and
does not require Premium tier.
Elastic Database provides two schemes for mapping data to shardlets and storing them in shards:
A list shard map associates a single key to a shardlet. For example, in a multitenant system, the data for
each tenant can be associated with a unique key and stored in its own shardlet. To guarantee isolation, each
shardlet can be held within its own shard.
A range shard map associates a set of contiguous key values to a shardlet. For example, you can group the
data for a set of tenants (each with their own key) within the same shardlet. This scheme is less expensive
than the first, because tenants share data storage, but has less isolation.

A single shard can contain the data for several shardlets. For example, you can use list shardlets to store data for
different non-contiguous tenants in the same shard. You can also mix range shardlets and list shardlets in the same
shard, although they will be addressed through different maps. The following diagram shows this approach:
Elastic pools make it possible to add and remove shards as the volume of data shrinks and grows. Client
applications can create and delete shards dynamically, and transparently update the shard map manager. However,
removing a shard is a destructive operation that also requires deleting all the data in that shard.
If an application needs to split a shard into two separate shards or combine shards, use the split-merge tool. This
tool runs as an Azure web service, and migrates data safely between shards.
The partitioning scheme can significantly affect the performance of your system. It can also affect the rate at which
shards have to be added or removed, or that data must be repartitioned across shards. Consider the following
points:
Group data that is used together in the same shard, and avoid operations that access data from multiple
shards. A shard is a SQL database in its own right, and cross-database joins must be performed on the client
side.
Although SQL Database does not support cross-database joins, you can use the Elastic Database tools to
perform multi-shard queries. A multi-shard query sends individual queries to each database and merges the
results.
Don't design a system that has dependencies between shards. Referential integrity constraints, triggers, and
stored procedures in one database cannot reference objects in another.
If you have reference data that is frequently used by queries, consider replicating this data across shards.
This approach can remove the need to join data across databases. Ideally, such data should be static or slow-
moving, to minimize the replication effort and reduce the chances of it becoming stale.
Shardlets that belong to the same shard map should have the same schema. This rule is not enforced by
SQL Database, but data management and querying becomes very complex if each shardlet has a different
schema. Instead, create separate shard maps for each schema. Remember that data belonging to different
shardlets can be stored in the same shard.
Transactional operations are only supported for data within a shard, and not across shards. Transactions can
span shardlets as long as they are part of the same shard. Therefore, if your business logic needs to perform
transactions, either store the data in the same shard or implement eventual consistency.
Place shards close to the users that access the data in those shards. This strategy helps reduce latency.
Avoid having a mixture of highly active and relatively inactive shards. Try to spread the load evenly across
shards. This might require hashing the sharding keys. If you are geo-locating shards, make sure that the
hashed keys map to shardlets held in shards stored close to the users that access that data.

Partitioning Azure table storage


Azure table storage is a key-value store that's designed around partitioning. All entities are stored in a partition,
and partitions are managed internally by Azure table storage. Each entity stored in a table must provide a two-part
key that includes:
The par tition key . This is a string value that determines the partition where Azure table storage will place
the entity. All entities with the same partition key are stored in the same partition.
The row key . This is a string value that identifies the entity within the partition. All entities within a partition
are sorted lexically, in ascending order, by this key. The partition key/row key combination must be unique
for each entity and cannot exceed 1 KB in length.
If an entity is added to a table with a previously unused partition key, Azure table storage creates a new partition
for this entity. Other entities with the same partition key will be stored in the same partition.
This mechanism effectively implements an automatic scale-out strategy. Each partition is stored on the same server
in an Azure datacenter to help ensure that queries that retrieve data from a single partition run quickly.
Microsoft has published scalability targets for Azure Storage. If your system is likely to exceed these limits, consider
splitting entities into multiple tables. Use vertical partitioning to divide the fields into the groups that are most
likely to be accessed together.
The following diagram shows the logical structure of an example storage account. The storage account contains
three tables: Customer Info, Product Info, and Order Info.
Each table has multiple partitions.
In the Customer Info table, the data is partitioned according to the city where the customer is located. The row
key contains the customer ID.
In the Product Info table, products are partitioned by product category, and the row key contains the product
number.
In the Order Info table, the orders are partitioned by order date, and the row key specifies the time the order
was received. All data is ordered by the row key in each partition.
Consider the following points when you design your entities for Azure table storage:
Select a partition key and row key by how the data is accessed. Choose a partition key/row key combination
that supports the majority of your queries. The most efficient queries retrieve data by specifying the
partition key and the row key. Queries that specify a partition key and a range of row keys can be completed
by scanning a single partition. This is relatively fast because the data is held in row key order. If queries don't
specify which partition to scan, every partition must be scanned.
If an entity has one natural key, then use it as the partition key and specify an empty string as the row key. If
an entity has a composite key consisting of two properties, select the slowest changing property as the
partition key and the other as the row key. If an entity has more than two key properties, use a
concatenation of properties to provide the partition and row keys.
If you regularly perform queries that look up data by using fields other than the partition and row keys,
consider implementing the Index Table pattern, or consider using a different data store that supports
indexing, such as Cosmos DB.
If you generate partition keys by using a monotonic sequence (such as "0001", "0002", "0003") and each
partition only contains a limited amount of data, Azure table storage can physically group these partitions
together on the same server. Azure Storage assumes that the application is most likely to perform queries
across a contiguous range of partitions (range queries) and is optimized for this case. However, this
approach can lead to hotspots, because all insertions of new entities are likely to be concentrated at one end
the contiguous range. It can also reduce scalability. To spread the load more evenly, consider hashing the
partition key.
Azure table storage supports transactional operations for entities that belong to the same partition. An
application can perform multiple insert, update, delete, replace, or merge operations as an atomic unit, as
long as the transaction doesn't include more than 100 entities and the payload of the request doesn't exceed
4 MB. Operations that span multiple partitions are not transactional, and might require you to implement
eventual consistency. For more information about table storage and transactions, see Performing entity
group transactions.
Consider the granularity of the partition key:
Using the same partition key for every entity results in a single partition that's held on one server.
This prevents the partition from scaling out and focuses the load on a single server. As a result, this
approach is only suitable for storing a small number of entities. However, it does ensure that all
entities can participate in entity group transactions.
Using a unique partition key for every entity causes the table storage service to create a separate
partition for each entity, possibly resulting in a large number of small partitions. This approach is
more scalable than using a single partition key, but entity group transactions are not possible. Also,
queries that fetch more than one entity might involve reading from more than one server. However, if
the application performs range queries, then using a monotonic sequence for the partition keys
might help to optimize these queries.
Sharing the partition key across a subset of entities makes it possible to group related entities in the
same partition. Operations that involve related entities can be performed by using entity group
transactions, and queries that fetch a set of related entities can be satisfied by accessing a single
server.
For more information, see Azure storage table design guide and Scalable partitioning strategy.

Partitioning Azure blob storage


Azure blob storage makes it possible to hold large binary objects. Use block blobs in scenarios when you need to
upload or download large volumes of data quickly. Use page blobs for applications that require random