0% found this document useful (0 votes)
156 views40 pages

Modern Data Architectures for AWS

This document discusses modernizing databases and data management for cloud-native development. It advocates adopting practices like version control, testing, and continuous integration/delivery (CI/CD) for database changes. This helps break up monolithic databases and create a unified data source. The document explores purpose-built databases like NoSQL, database DevOps, and implementing tools and processes for ingesting logs and metrics. The goal is to provide efficient, agile, and scalable data management for modern applications.

Uploaded by

Viren Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views40 pages

Modern Data Architectures for AWS

This document discusses modernizing databases and data management for cloud-native development. It advocates adopting practices like version control, testing, and continuous integration/delivery (CI/CD) for database changes. This helps break up monolithic databases and create a unified data source. The document explores purpose-built databases like NoSQL, database DevOps, and implementing tools and processes for ingesting logs and metrics. The goal is to provide efficient, agile, and scalable data management for modern applications.

Uploaded by

Viren Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EBOOK

Modern data architectures


for cloud-native
development
Applying version control, testing, and CI/CD
to data changes to break the data monolith
and create a unified source of truth

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1
MODERN DATA ARCHITECTURE FOR CLOUD-NATIVE DEVELOPMENT

Table of contents
A modernized database is efficient, agile, and scalable …..……………..….………….… 3
Modern applications ……………..……………………………………..………………..….… 4
What is cloud-native? Why does it matter? …………………………………………...….… 6
Modern data management for cloud-native development ……….…….………….……. 7
Introducing modern application architectures …………………..…………………..……. 9
Purpose-built databases: NoSQL ……………………………………………......…..….…. 12
DevOps for databases ……………..….…..….…………………………………….……..… 22
Key takeaways for Database DevOps ……...……………………..….……………….…… 25
Implementing modern data management processes and tools .…………….…………26
Ingesting logs and metrics ………………………….………………………...……...…...… 36
Recap: Modern data architecture ………………………….…………………..………...… 38
Implement your modern data management strategy today ………………...……....… 39
Conclusion ………………………………………………………………...………………...… 40

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2
A modernized database is
efficient, agile, and scalable
Our goals as developers are to deliver business value faster and create products
that help us stay competitive in the market. To achieve these goals, we need
to modernize by adopting new technologies and practices.

In this chapter, you’ll learn the importance of test automation, data management,
and data democratization in enterprises, and how to integrate data changes into
your Continuous Integration/Continuous Delivery (CI/CD) pipeline.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 3
Modern applications
Modern applications were born out of a necessity to deliver smaller
features faster to customers. While this directly addresses only the
application architecture aspect, it also forces other teams to build and
execute in a similar manner. In order to continuously deliver features,
organizations need all cross-functional teams to operate as “One Team.”

Key aspects of
modern applications:

• Use independently
scalable microservices such
as serverless and containers
• Connect through APIs
• Deliver updates continuously
• Adapt quickly to change
• Scale globally
• Are fault tolerant
• Carefully manage
state and persistence
• Have security built in

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 4
Modern applications require more
performance, scale, and availability
Modern applications are pushing boundaries. Users are connected
all the time and expect microsecond latency and access from mobile
and internet of things (IoT) devices. They also demand that they
have the ability to connect from anywhere they happen to be.

Users ……………… 1M+


Data volume …….. Terabytes–petabytes
Locality …………… Global
Performance …..... Microsecond latency
Request rate …….. Millions per second
Access ...…………... Mobile, IoT, devices
Scale ………………. Virtually unlimited
Economics ……….. Pay-as-you-go
Developer access .. Instance API access

e-commerce Media Social Online Shared Development ...….. Apps and storage
streaming media gaming economy are decoupled

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 5
What is cloud-native? Why does it matter?

Cloud-native is an evolving term. The vast amount of software that’s being


built today needs a place to run and all the components and processes required
to build an application need to fit together and work cohesively as a system.

The Cloud Native Computing Foundation (CNCF) definition states:

Cloud-native technologies empower organizations to


build and run scalable applications in modern, dynamic
environments such as public, private, and hybrid clouds.

This definition has to broadly apply to everyone, but not everyone has the
same capabilities. This is known as the lowest common denominator problem.
It is where you try and appeal to a broader group and their capabilities, but in
doing so you also need to limit the capabilities that can be leveraged.

Amazon Web Services (AWS) goes many steps further


by providing a broad set of capabilities that belong to
a family called serverless. Serverless technologies are
more than just AWS Lambda—these services remove
the heavy lifting associated with running, managing,
and maintaining servers. This lets you focus on core
business logic and quickly adding value.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 6
Modern data management
for cloud-native development
As we cover the different capabilities your organization needs to acquire
to go fully cloud-native, it’s useful to view each one as a step in a journey.

The map below is a model for how organizations typically evolve their cloud-native
understanding. As your organization or team moves from stage to stage, capabilities
are gained that make releasing new features and functionality faster, better, and cheaper.
In the following sections, we’ll be focusing on the capability of Modern Data Management.

5. Modern Data
Management

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 7
A look back—traditional
three-tier application architecture

Let’s first take a look at the traditional architecture model that preceded
cloud-native data management.

Historically, applications were built with monolithic, three-tiered web


architectures that looked something like this graphic, in which there is:

1. A presentation layer
hosted by web servers
Web servers
Presentation layers

2. An application layer in
which business logic runs
Application servers
Business logic

3. A data layer that


has database servers
Database servers
Data layer

Note: Some implementations have the application layer


horizontally scaled but still bottlenecked at the data layer.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 8
Introducing modern
application architectures
Here is an example of a modern application architecture. Pro tip:
Yes, it does indeed resemble a three-tier web architecture,
but there are some very important differences. The best tool for a job usually
differs by use case so you should
build new applications with
Presentation
purpose-built databases.
How and where you store your
data is different in a modern
Events Events Business application. Each database is
APIs logic serving a specific need. This
will be covered in depth later.
Queues/messages

Data

The first major difference in this architecture is that it How AWS customers
doesn’t reflect the entire application—this is not a monolith,
use microservices
it’s a single microservice. The other key differences with this
architecture are related to:
AWS customers run hundreds or
Data. How and where you store your data is different in even thousands of microservices,
a modern application. In this diagram, we can see multiple an approach that greatly improves
data store options. Each database is serving a specific need, their scalability and fault tolerance.
meaning it’s purpose-built for the task at hand. Additionally, Usually, microservices communicate
data stores are decoupled. via well-defined APIs, and many
customers start the process of
Application integration and communication. Communication
refactoring by wrapping their
both within the application as well as between each service
applications with an API.
is different. In modern data architectures, there are
fundamentally different approaches to using messaging,
events, and APIs within the business.

Compute. The logic you write is important in differentiating


your organization. New compute technologies make focusing
on business logic easier than ever

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 9
Decoupling data along with business logic

When it comes to the data requirements of modular services, one size does not fit all.
Does the service need massive data volume? High-speed rendering? Data warehousing?
AWS customers are considering what they are doing with their data and choosing the
datastore that best fits that purpose.

Because the only database choice was a relational database—no matter the shape or
function of the data in the application—the data was modeled as relational for decades.
Instead of the use case driving the requirements for the database, it was the other way
around. The database was driving the data model for the application use case.

Is a relational database purpose-built for a normalized schema and to enforce referential


integrity in the database? Absolutely, but the key point here is that not all application data
models or use cases match the relational model. Developers are building highly distributed
and decoupled applications, and AWS enables them to build these cloud-native
applications by using multiple AWS services for scale, performance, and availability.

Why consider purpose-built databases?

You’ll likely remember that one of the core concepts related to microservices is that each
service needs to own its own data. But why is this important? There are two really big
innovation benefits to following this approach.

1. First, it makes it easier to change your schemas, which is a common challenge


associated with monoliths and shared services. When you can independently
own and change your schema, that encapsulation lets you evolve your
service without breaking service contracts.

2. The second really important point is that each data store can scale independently.
This means that one service may have a small database and another a very large
one, but each service can optimize its scale for performance and cost.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 10
Purpose-built databases:
data models and use cases

Here are some examples of purpose-built databases and Duolingo:


common use cases. With AWS, you get access to all these data A customer success story
stores, which can all be spun up in minutes. This allows you
to focus on your application and business logic rather than the Duolingo, the maker of a
heavy lifting associated with running and managing databases. language learning app, uses
purpose-built AWS databases to
serve up over 31 billion items for
Common Common Solution
data models
80 language courses with high
use cases components
performance and scalability.
Referential Lift and shift, Amazon Aurora
integrity, ACID ERP, CRM, Amazon Relational
Relational transactions, finance Database Service PRIMARY DATABASE:
schema-on-write
(RDS) Amazon DynamoDB
High throughput, Real-time Amazon
low-latency bidding, shopping DynamoDB/ • 24,000 reads and
Key value reads and writes, cart, social, product Amazon Managed 3,000 writes per second
endless scale catalog, customer
Apache Cassandra
preferences
• Personalize lessons for
Store documents Content MongoDB
and quickly access management, Amazon
users taking six billion
Document querying on any personalization, DocumentDB exercises per month
attribute mobile

Query by key Leaderboards, Amazon


with microsecond real-time analytics, ElastiCache
In-memory latency caching
IN-MEMORY CACHING:
Amazon ElastiCache
Quickly and easily Fraud detection, Amazon
create and navigate social networking, Neptune • Instance access to common
Graph relationships recommendation
between data engine words and phrases

Collect, store, IoT applications, Amazon


and process data event tracking Timestream
Time series sequenced by time
TRANSACTIONAL DATA:
Amazon Aurora
Complete, Systems of record, Amazon Quantum
immutable, and supply chain, health Ledger Database
Ledger verifiable history care, registrations, (QLDB)
• Maintain user data
of all changes to financial
application data

Scalable, Build low- Amazon


highly-available, latency applications, Keyspaces
Wide and managed leverage open source, Managed
column Apache Cassandra– migrate Cassandra Cassandra
compatible service to the cloud

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 11
Purpose-built
databases: NoSQL
So far, we’ve talked about modern applications and how these modern
applications achieve scale and high availability by using purpose-built
databases. Let’s now look at a specific family of databases called NoSQL.

NoSQL stands for Not only SQL. NoSQL databases are non-tabular and
store data differently than relational tables. NoSQL databases come in a
variety of types based on their data model. The main types are document,
key-value, wide-column, and graph. They provide flexible schemas and
scale easily with large amounts of data and high user loads.

Examples of NoSQL databases include Amazon DynamoDB (a key-value,


fully managed data store) and MongoDB (a document database).

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 12
Deeper into traditional three-tier application architectures
As we touched on earlier, developers were stuck with the single three-tier application architecture
—before the introduction of cloud technology. Understanding this previous limitation is helpful in
grasping the value of the cloud-native tools available today.

Web servers • Basic transactions (OLTP)


Presentation layers
• Complex queries

• Key-value access
Application servers
Business logic • Analytics (OLAP)

Database servers All forced into the same


Data layer relational database

With the three-tier application architectures, developers were limited to a relational database
management system. And this database was responsible for everything!

Let’s look at some common operations that the single database was responsible for:

1. As companies started out, the database was 3. Once the company needed to write reports,
responsible for transactional create, read, update, the database was responsible for long-running
and delete (CRUD)-based queries like getUser analytical style queries that would impact
or updateOrderStatus. These type of operations available resources on the database while the
are called online transaction processing (OLTP). database was servicing customer requests.

2. As the companies’ systems grew, queries grew 4. If the team needed something as simple
to match the complexity of the systems and as a basic key-value datastore with
teams would join multiple tables and write hundreds of thousands of entries, the
queries with nested inner joins. database had to also handle this load.

This ever-growing load often led to downtime, loss of availability


and inefficient resource use—meaning it was not a scalable approach.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 13
SQL vs. NoSQL SQL - Optimized for storage NoSQL - Optimized for compute

Normalized/relational Denormalized/hierarchical
NoSQL was introduced Ad hoc queries Instantiated views
to address the limitations Scale vertically Scale horizontally
we learned about in the
previous section. Good for OLAP Built for OLTP at scale

Relational database management systems Of the applications developers write today,


(RDBMS) were designed in a time when storage 90 percent are written to support common
was more expensive as compared to compute. business processes that represent OLTP
So, SQL and RDBMS evolved around normalization, applications. So, NoSQL is actually one of
where storage is minimized. Replicating and the most relevant technologies you can
duplicating data was considered a bad practice. learn as a developer today.

At that time, developer teams didn’t need to When we talk about NoSQL, it’s important
think about the access patterns ahead of time. to understand that it’s not good for everything—
They normalized, reduced redundancy and it’s good for a certain class of applications. Those
storage, and queried data on the fly—but at applications have repeatable access patterns.
the cost of scale.
In case of a relational database, the ad-hoc
NoSQL was designed at the time when developers engine gives developers some flexibility.
focused on OLTP and storage was becoming If a developer doesn’t yet understand how
cheaper. Because storage was cheap, duplicating they are going to access data, then it could
data was suddenly not a bad thing after all. be very beneficial to have an ad-hoc query
engine, and that’s really suitable for an online
If a developer was designing for NoSQL, they had analytics processing (OLAP) type of workload.
to know their access patterns ahead of time. Access
patterns or query patterns define how the users
and the system access the data to satisfy business
needs. Using SQL means you don’t need to know
this access pattern ahead of time.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 14
Product database

Normalized Denormalized

Products {
PRODUCT ID,
ID TYPE,
TYPE PRICE,
PRICE DESCRIPTION,
DESCRIPTION AUTHOR,
{
TITLE,
PRODUCT ID,
FICTION,
TYPE,
CATEGORY,
PRICE,
DATE,
Books Albums Videos DESCRIPTION,

ARTIST, {
}
ID ID ID TITLE PRODUCT ID,
AUTHOR TYPE TITLE GENRE, TYPE,
TITLE TITLE CATEGORY TRACKS:[{ PRICE,
FICTION GENRE FICTION TITLE 1, DESCRIPTION,
CATEGORY … PRODUCER DURATION 1 TITLE
DATE DIRECTOR },{ CATEGORY,
… … TITLE 2, FICTION,
TYPE DURATION 2 PRODUCER
PRICE }] DIRECTOR,
DESCRIPTION … ACTORS:[{
} ACTOR ID,
Tracks Actor/video NAME,
AGE,
ID ACTOR ID GENDER,
ALBUM ID VIDEO ID
SHORT BIO
TITLE
}…]
DURATION }

Actor

ID
NAME
AGE Here, we step into the world of aggregated items,
GENDER
BIO which are essentially prebuilt items. Rather than

running queries/joins across tables, these items
are written as they will be retrieved. This requires
understanding access patterns and writing the
SQL vs. NoSQL data appropriately, the benefit being nearly
design pattern limitless scale and consistent performance.

On the right-hand side of this chart—which is


The typical relational model is nicely organized denormalized for NoSQL—a developer would
in “human-readable” constructs—tables that are simply retrieve a single product record to receive
related to categories. The relational database all the information on a specific product.
is modeled to reduce storage.
On the left-hand side—modeled after SQL tables—
they have to query for product, execute joins, inner
joins, and potentially make additional queries to
retrieve all the information for a specific product.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 15
Pro tip:
Forget what you know
about relational databases!

An excellent piece of advice is to forget about Some key concepts:


trying to model your data the way you would
in a relational database. Designing for NoSQL • Forget normalization—replicating
is going to feel a bit foreign at first. It is going and duplicating data is OK.
to feel harder than SQL—as you probably
• There are no joins—forget the idea
already know how to design schemas and
of one entity type per table. What
tables in SQL. But RDBMS do not scale like
does this mean? You might have
NoSQL and the benefits you gain can be well a table that has different types of
worth the additional effort to design and think entities in your entity relationship
about your access patterns up front. diagram in the same table.

• An example for NoSQL is that you


may have a single table where all
your data for a user profile is in a
single record. You can return that
record and all associated information
with a single query or retrieval.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 16
Designing for the cloud means
thinking differently—your data
access patterns are important

Often developers work with customers who are


already using NoSQL but treating it like a relational
database—one entity per table, creating joins, and
general relational concepts into the application.

While this is one way to do it, it’s not how NoSQL


was designed and won’t provide the benefits it was
designed for. Watch out for this type of anti-pattern,
where developers fall back into familiar patterns and
design concepts.

To take advantage of NoSQL, developers should design


around access patterns to ensure that a single query
is returning the data they need without having to
join and use relational concepts.

Designing for NoSQL • A defined entity relationship diagram (ERD)—


data is relational, of course, but it doesn’t need
and scalability
to be stored that way.

When designing for NoSQL, • Access patterns that predetermine the way data
developers need to have the will be accessed. If this can’t be done up front,
following information up front: NoSQL may be the wrong approach for the project.

• Indices that are designed around access patterns.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 17
NoSQL use cases

Built for the cloud, NoSQL databases can scale nearly infinitely without
impacting performance. Additionally, data models for NoSQL in JSON are
natural for developers as the data is similar to code. This list of use cases
is not exhaustive, these are just a few examples that take advantage of
NoSQL’s scale and flexibility as a data store.

Here we see some typical use cases for NoSQL. Most of them take
advantage of the key benefits of using NoSQL: Having a flexible schema
that allows for changes and experimentation of the application.

Fraud Inventory and catalog Remote asset Financial services


Personalization
detection management monitoring and payments

Instant Geolocation Logistics and Content management Digital and


Messaging operations asset management systems media management

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 18
Data democracy and silos

We just covered NoSQL and the scalability and flexibility


it offers. Now let’s talk about data democracy.

Data democracy does not mean every employee needs


to have access to every bit of data that an organization
has. Instead, it’s an ongoing process of enabling
everybody in an organization—irrespective of their Customer service Finance

technical know-how—to work with data comfortably, to


feel confident talking about it, and,
as a result, make data-informed decisions and build
customer experiences powered by data.

One of the key issues that data democracy addresses Marketing Manufacturing

is data silos. A data silo is a repository of data that's


controlled by a single department or team, isolated
from the rest of the organization. Data silos tend to
arise naturally in large companies because departments
often have their own goals, priorities, and IT budgets.
Sales Supply chain

But any size organization can end up with challenges related to data silos if they’re not intentional
about preventing them. Here are some ways to break down data silos and connect data assets:

• Data integration usually involves extracting • Enterprise data management and


siloed data and loading it into a target governance focuses on preventing
system or application new silos from forming

• Data warehouses and data lakes can store • Culture changes that might consist of
large amounts of structured or unstructured instituting a data governance initiative
data in a repository or change management program

The cost of data silos depends on the organization but can impact businesses in terms of finances,
productivity, effectiveness, missed opportunities, and a lack of trust around data.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 19
Data warehouse BI Reports

A data warehouse is a central repository of


information that can be analyzed to make more
informed decisions. In this model, data flows into Data warehouses
a data warehouse from transactional systems,
relational databases, and other sources—
typically, in a regular cadence.

A key feature of data warehouses is that structured


ETL
data goes through an extract, transform, and load
(ETL) process and afterwards that data is stored in
a data warehouse database. The primary persona
that uses data warehouses is business professional.
Structured data

Data lake

A data lake is a centralized repository that BI Reports Data Machine


allows you to store all your structured and science learning

unstructured data at any scale. You can


store your data as-is—without having to first
structure it—and run different types of analytics,
from dashboards and visualizations to big data Data warehouses
processing, real-time analytics, and machine
learning to guide better decisions.

Some key differences compared to a data ETL


warehouse: A data lake has semi-structured and
unstructured data in addition to a data warehouse.
Furthermore, a data lake pulls from multiple data
sources and is not limited to a single data source Data lake

and structure. The primary personas that use data


lakes are data scientists and data engineers.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 20
Data lakehouse
BI Reports Data Machine
A data lakehouse brings in the best of both the data science learning
warehouse and the data lake, combining the reliability,
structure, and atomicity, consistency, isolation, and
durability (ACID) transactions of data warehouses
with the scalability and agility of data lakes.
Metadata and
governance Layer
A lakehouse enables business intelligence and
machine learning for all data. Lakehouses bring ETL Structured, semi-structured
Data lake
and unstructured data
different organizational personas together onto one
platform: Data scientists, data engineers, data teams,
business professionals, and software developers.

Why do developers choose lakehouses?

Lack of data agility and model reproducibility makes it challenging


to meet business-specific requirements. A lakehouse simplifies the Data governance
complexity of reporting, risk management, and compliance by securely and management
streamlining the acquisition, processing, and transmission of data.

Data silos prevent having a complete view of customer behaviors,


opportunities, and the insights needed for personalization at scale. Deeper customer
insights
Lakehouses unify a variety of data, enabling personalized experiences
that drive opportunities and customer satisfaction.

Vendor lock-in and disjointed tools hinder the ability to perform real-
time analytics that drive and democratize smarter business decisions.
Real-time
Lakehouses allow for rapid ingestion of all your data sources at scale decisions
to make better investment decisions, quickly detect new fraud patterns,
and bring real-time capabilities to risk management practices.

Legacy technologies can’t harness customer insights from fast-growing


unstructured and alternative data sets, and don’t offer open data
sharing capabilities. Data lakehouses bring together vast amounts Access to
third-party data
of internal and third-party data to share innovative business solutions,
monetize new data products, and deliver advanced analytics capabilities.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 21
DevOps for databases
We just talked about data lakehouses and the capabilities that can give
your business an edge. Now we are going to shift focus and talk about
DevOps for databases.

DevOps is defined as the combination of cultural philosophies, practices,


and tools that increases an organization’s ability to deliver applications
and services at high velocity. In most DevOps implementations, data
is an afterthought. So, let’s talk about a few DevOps capabilities for
databases that can help bring it back into the spotlight.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 22
Continuous Integration/Continuous Delivery
for databases
CI/CD for databases enables the rapid integration of database schema and logic changes into
application development efforts and provide immediate feedback to developers on any issues
that arise. Database CD produces releasable database code in short, incremental cycles.

CI/CD for databases leverages DevOps tools such as Jenkins, CircleCI, AWS CodePipeline,
and layers in database-specific tooling that allows for database changes to follow the same
development workflows that software engineering teams are already familiar with.

Database CI/CD
Deploy
YES
Develop Source CI tool
Pass? CD
database code + Artifact
Tool
code control
Dev Test Prod
NO
+

Dev sandbox Rollback

In this example, Liquibase—an open-source database-independent library for tracking,


managing, and applying database schema changes—is layered in to provide lifecycle capabilities
for databases. Some of the benefits of using Liquibase include version-controlled database schema
changes, branching and merging of database changes, which allows teams of developers to work
simultaneously on database changes, and easy rollback of changes as well as fix forward.

Backup and disaster recovery


The next capability we’ll look at is backup and disaster recovery (DR).

The database is a very critical system for most applications and needs
to be extremely resilient. Organizations need to have a clear process
and practice for database backup and disaster recovery. Mature
DevOps implementations practice gamedays and chaos engineering
to make sure their backup and DR strategies work as expected,
without causing any downtime.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 23
Powerful
Consistent
version control
production-like
environments
Test data
Realistic
virtual data management Share
environments
in current state

Data cloning Self-service


access to test/
dev environments
High-
performance Clone, copy,
data masking or suspend
Test environments
Data with
100% functional environment Parallel
coverage development
management and testing
Right data,
all the time

Test data management


The third capability is test data management: Continuous Integration, Continuous Delivery and
Continuous Deployment all depend on strong test automation. And for the test automation to be
reliable and complete, teams need access to complete, high-quality data sets.

Happy path testing—a type of software testing that uses known input and produces an expected
output—is straightforward and often does not require extensive amounts of data.

But for Continuous Integration, Continuous Delivery and Continuous Deployment to work effectively,
the tests need to be complete—verifying all scenarios and edge cases. This is not possible unless you
have access to production-scale data (of course keeping regulatory compliance and any other
organizational policies in mind).

Observability into applications


The last capability we’ll look at is Observability. Observability into metrics, events, logs, and traces
(MELT) data across all systems is a critical capability. With modern applications being hybrid and
distributed, it is important to get visibility into third-party services like email, payments, location
services, delivery services, and even data changes to get the complete picture as you are
debugging your application.

A good example would be debugging a failed order: Was it due to a third-party payment gateway
being down or data schema change? Without complete visibility into a transaction, it is impossible
to debug and identify the root cause.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 24
Key takeaways for Database DevOps
Let’s review some of the important concepts we’ve covered so far.

Use cloud-native purpose-built databases DevOps for databases


This is where we highlighted four key
One size doesn’t fit all, and developers may be trying
capabilities application development
to solve a problem that is already solved for them in
teams need to build:
a purpose-built database.

NoSQL does not mean non-relational 1. CI/CD for database changes using
tools like Liquibase
Relationships are accessed and modeled differently
than SQL. Developers need to model data differently 2. Backup and DR to ensure resiliency
in NoSQL, so forget what you know about RDBMS. and that applications are built to
Think about access patterns when modeling. NoSQL handle failures
can be a great fit for OLTP, where databases are read,
written, and updated frequently. 3. Test data management to achieving
continuous deployment where there
Data democracy is no manual intervention to deploy
to production
Data lakehouses provide the flexibility, cost-efficiency,
and scale of data lakes with the data management and 4. Observability into MELT data across
ACID transactions of data warehouses. Plus, lakehouses all systems, including databases for
are designed for most organizational personas, which complete root cause analysis while
further democratizes data. debugging issues

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 25
Implementing modern data
management processes and tools
In this section, we’ll look at some of the best-fit tools you could use to achieve the tenets discussed in
the previous section. At AWS, we’ve long been believers in enabling builders to use the right tool for the
job—and when you build with AWS, you’re provided with choice. You can build using the native services
AWS provides or use AWS Marketplace to acquire third-party software offered by AWS Partners to take
away the heavy lifting and allow your development teams to focus on delivering value to customers.

Let’s take a deeper look at three key components at this stage of your cloud-native journey: a way to get
data out of silos and democratize data for all personas in the company, a purpose-built database for OLTP
applications that can scale to meet the demands of customers, and end-to-end Observability of your
application and data workloads.

Adding development capabilities with AWS Marketplace


Find, try, and acquire tools across the DevOps landscape for building cloud-native applications

Plan Build Test Secure Release Operate

Amazon Amazon
Kinesis CloudWatch

Amazon
EventBridge
Amazon
EKS

Amazon
DynamoDB

AWS CodeCommit
AWS
AWS Device CodeDeploy
Farm

CodeCatalyst
AWS Cloud9 AWS
Lambd
a

Sample AWS and AWS Marketplace solutions


3,000+ vendors | 13,000+ products

AWS Marketplace is a cloud marketplace that makes it easy to find, try, and acquire the tools you need to
build cloud-native. More than 13,000 products from over 3,000 Independent Software Vendors are listed in AWS
Marketplace—many of which you can try for free and, if you decide to use, will be billed through your AWS account.

© 2023,
© 2023,
Amazon
Amazon
Web
Web
Services,
Services,
[Link].
or or
its its
affiliates.
affiliates.
AllAll
rights
rights
reserved.
reserved. 26
For our example architecture, we’ll use MongoDB for purpose-built databases, Databricks to
democratize data, and Elastic for end-to-end visibility. Here is an architectural diagram that
illustrates how these three components can be implemented.

AWS Cloud Observability by

VPC

DevOps pipeline
Dev
Engineers Git Repo
Build Test Release
Amazon S3 Amazon Aurora
Test DB changes
Fail – notify engineer Pass - deploy change
Test

Amazon S3 Amazon Aurora

Data Business users IT Data warehouse Data service


scientists users users users Prod

Lakehouse Platform
Amazon S3 Amazon Aurora

MongoDB Atlas MongoDB Atlas provides the purpose-built database for OLTP, along
with Amazon Simple Storage Service (Amazon S3) for object storage,
and Amazon Aurora for online analytic processing.

To democratize data, the solution uses the Databricks Lakehouse


Platform, which serves the needs of business professionals who
need to run reports and business analytics, as well as data scientists,
data engineers, and software engineers who need to run machine
Try it in AWS Marketplace ›
learning (ML) workloads to surface business insights.
Watch a demo and learn more ›
Observability is provided by Elastic to enable visibility into cloud-
native applications that span multiple Availability Zones, multiple
AWS Regions, multiple accounts, and hybrid clouds.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 27
More about how each component
plays a part in the architecture

MongoDB is a NoSQL document database.


MongoDB Atlas is an integrated suite of cloud-
native database services that allow you to
address a wide variety of use cases—from
transactional to analytical, from search
to data visualizations.

Databricks enables massive-scale data engineering,


collaborative data science, full-lifecycle machine
learning and business analytics. Combining the best
of data warehouses and data lakes, the Databricks
Lakehouse Platform is built on an open and reliable
data foundation that efficiently handles all data
types and applies one common security and
governance approach across all data and cloud
platforms.

Elastic is the solution historically acknowledged as providing the de facto search


platform: Elasticsearch, Logstash, and Kibana—commonly known as the ELK stack.
Today, Elastic offers three solutions:

Elastic Elastic Elastic


Security Enterprise Search Observability

Elastic offers a SaaS model, a self-managed solution, and hybrid deployments


relying on federation concepts to offer hybrid architectures.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 28
How the solution works

In this architecture, an engineer would check in a database change—also known


as a migration—into a Git repository. The recommended format for Liquibase
is XML, as this format is database agnostic.

The change triggers a DevOps pipeline that reads in the XML, creates the change set,
executes the change, and tests the database changes. If the change passes the tests,
the change is promoted, and the environment database is updated. If the change fails,
the database change is rolled back and a notification is sent to the engineer.

AWS Cloud

VPC

DevOps pipeline
Dev
Engineers Git Repo
Build Test Release
Amazon S3 Amazon Aurora
Test DB changes
Fail – notify engineer Pass - deploy change
Test

Amazon S3 Amazon Aurora

Prod

Amazon S3 Amazon Aurora

Because it maintains a history and sequence of changes—besides production


databases—Liquibase is a good use case for standing up databases for sandbox
and temporary environments.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 29
Cloud-native NoSQL with MongoDB Atlas

The solution for a NoSQL purpose-built document database is provided by


MongoDB Atlas. MongoDB Atlas is a SaaS solution, and connecting this to your
AWS account is very straightforward.

Corporate AWS Cloud


data center
Region

VPC A - Your VPC VPC B – MongoDB Atlas VPC

Endpoint service

Application clients MongoDB Atlas


Subnet A

MongoDB Atlas
cluster Node
VPC Subnet MongoDB Atlas
Subnet B

Interface Load MongoDB Atlas MongoDB Atlas


endpoint balancer Subnet C cluster node

AWS Direct
Connect
MongoDB Atlas
cluster node

In this example, an AWS Direct Connect is used to connect an on-premises environment to


a MongoDB Atlas cluster. The cluster is distributed to multiple locations to provide scalability
and availability. MongoDB takes care of managing and maintaining the infrastructure so
developers can focus on building the business logic and application.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 30
The document model is fundamentally different

MongoDB’s document model naturally maps to objects in code and eliminates requirements
to use object-relational mapping (ORMs). It breaks down complex interdependencies between
developer and database administration (DBA) teams.

With MongoDB, developers can represent the data of any structure, and each document
can contain different fields. The schema can be modified at any time. Additionally, MongoDB
is strongly typed for ease of processing with over 20 binary encoded JSON data types.

Tabular (relational) data model Document data model


Related data split across multiple records and tables Related data contained in a single, rich document

{
"_id" : ObjectId("5ad88534e3632e1a35a58d00"),
"name" : {
"first" : "John",
"last" : "Doe" },
"address" : [
{ "location" : "work",
"address" : {
"street" : "16 Hatfields",
"city" : "London",
"postal_code" : "SE1 8DJ"},
"geo" : { "type" : "Point", "coord" : [
-0.109081, 51.5065752]}},
+ {...}
],
"dob" : ISODate("1977-04-01T[Link]Z"),
"retirement_fund" :
NumberDecimal("1292815.75")
}

The example illustrated here shows the relational model on the left and a document model
on the right. Notice that all the information contained on the left is also on the right side,
which highlights the simplicity of having all information contained in a single JSON object.
It’s easy to read and a developer doesn’t have to understand the complex tabular relationship.

© 2023,
© 2023,
Amazon
Amazon
Web
Web
Services,
Services,
[Link].
or or
its its
affiliates.
affiliates.
AllAll
rights
rights
reserved.
reserved. 31
Flexible: Adapt to change

Perhaps the most powerful feature of document databases is the ability to nest objects inside
of documents. A good rule of thumb for structuring data in MongoDB is to prefer embedding
data inside documents as opposed to breaking it apart into separate collections. There are, of
course, some exceptions such as needing to store unbounded lists of items or needing to look
up objects directly without retrieving a parent document.

Add new fields dynamically at runtime

{ {
"_id" : ObjectId("5ad88534e3632e1a35a58d00"), "_id" : ObjectId("5ad88534e3632e1a35a58d00"),
"name" : { "name" : {
"first" : "John", "first" : "John",
"last" : "Doe" }, "last" : "Doe" },
"address" : [ "address" : [
{ "location" : "work", { "location" : "work",
"address" : { "address" : {
"street" : "16 Hatfields", "street" : "16 Hatfields",
"city" : "London", "city" : "London",
"postal_code" : "SE1 8DJ"}, "postal_code" : "SE1 8DJ"},
"geo" : { "type" : "Point", "coord" : [ "geo" : { "type" : "Point", "coord" : [
-0.109081, 51.5065752]}}, -0.109081, 51.5065752]}},
+ {...} +
], ],
"dob" : ISODate("1977-04-01T[Link]Z"), "phone" : [
"retirement_fund" : { "location" : "work",
NumberDecimal("1292815.75") "number" : "+44-1234567890"},
+
} ],
"dob" : ISODate("1977-04-01T[Link]Z"),
"retirement_fund" :
NumberDecimal("1292815.75")
}

Note in this example that the name field is a nested object containing both given and family
name components, and that the address field stores an array containing multiple addresses.
Each address can have different fields in it, which makes it easy to store different types of data.

© 2023,
© 2023,
Amazon
Amazon
Web
Web
Services,
Services,
[Link].
or or
its its
affiliates.
affiliates.
AllAll
rights
rights
reserved.
reserved. 32
MongoDB for Microsoft Visual
Studio (Microsoft VS) code
• Explore your MongoDB data
MongoDB also provides an extension for Microsoft • Prototype queries and
VS code, which lets you work with MongoDB and run MongoDB commands
your data directly within your coding environment. • Create a Shared Tier Atlas cluster
You can use MongoDB for Visual Studio Code to: using a Terraform template.

Top features of MongoDB Atlas

1. Ad-hoc queries for optimized real-time


analytics. You might have noticed that earlier 4. Sharding. This is the process of splitting larger
datasets across multiple distributed collections.
we talked about NoSQL for online transaction This helps the database distribute and better
processing—MongoDB does that as well as execute queries. Sharding in MongoDB allows
online analytics processing. for much greater horizontal scalability.

2. Indexing for better query executions.


MongoDB indices can be created on demand 5. Large-scale load balancing. MongoDB supports
this via horizontal scaling features like replication
to accommodate real-time, ever-changing query and sharding. The platform can handle multiple
patterns and application requirements. They can concurrent read and write requests for the same
also be declared on any field within any of your data with best-in-class concurrency control and
documents, including those nested within arrays. locking protocols that ensure data consistency.
There’s no need to add an external load balancer
3. Replication for better data availability
and stability. When your data only resides
—MongoDB ensures that each and every user has
a consistent view and quality experience with the
in a single database, it is exposed to multiple data they need to access.
potential points of failure, such as a server crash,
service interruptions, or even hardware failure.
Any of these events would make accessing your 6. Database provisioning, maintenance, and
upgrades. MongoDB Atlas handles all of the
data nearly impossible. Replication allows you heavy lifting behind these tasks. Select the cluster
to sidestep these vulnerabilities by deploying configuration you want using the UI or API and
multiple servers for disaster recovery and backup. deploy a new cluster or update an existing cluster
Horizontal scaling across multiple servers that in minutes. Security patches and minor version
house the same data means greatly increased upgrades are automatically applied, and all updates
data availability and stability. occur in a rolling fashion across deployment to
reduce performance impact to applications. And
if a node goes down, MongoDB Atlas immediately
elects a new primary and restores or replaces the
offline node to ensure continuous availability.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 33
Databricks Lakehouse Platform
Try it in AWS Marketplace › The Databricks Lakehouse Platform combines the best elements
of data lakes and data warehouses to deliver the reliability,
Start Databricks on AWS training ›
strong governance, and performance of data warehouses
with the openness, flexibility, and ML support of data lakes.

AWS Cloud

VPC

Dev

Amazon S3 Amazon Aurora

Test
Data Business users IT Data warehouse Data service
scientists users users users

Amazon S3 Amazon Aurora


Lakehouse Platform

Prod

Amazon S3 Amazon Aurora

This unified approach simplifies the modern data stack by eliminating data silos that traditionally
separate and complicate data engineering, analytics, BI, data science, and machine learning. It’s built
on open source and open standards to maximize flexibility. Additionally, its common approach to data
management, security, and governance helps developers operate more efficiently and innovate faster.

In our solution, Databricks integrates with the data plane in a customer’s account for access to
MongoDB, Amazon S3, and Amazon Aurora database services. The control plane in the Databricks
cloud environment allows for uniform access for different personas in the organization via https,
JDBC, ODBC, or REST APIs.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 34
Key features of Databricks Lakehouse Platform

The Databricks Lakehouse Platform architecture:

Takes a decentralized approach to data ownership. Organizations can


create many different lakehouses to serve the individual needs of specific
business groups. Based on their needs, they can store and manage various
data—images, video, text, structured tabular data, and related data assets such
as ML models and associated code to reproduce transformations and insights.

Helps organizations manage data as a product by providing different


data team members in domain-specific teams with complete control over the
data lifecycle.

Provides an end-to-end data platform for data management, data


engineering, analytics, data science, and machine learning with integrations
for a broad ecosystem of tools. Adding data management on top of existing
data lakes simplifies data access and sharing—anyone can request access.

Enables discovery of data and other artifacts like code and ML models.
Organizations can assign different administrators to different parts of the
catalog to decentralize control and management of data assets. This approach
—a centralized catalog with federated control—preserves the independence
and agility of the local domain-specific teams while ensuring data asset reuse
across these teams and enforcing a common security and governance model.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 35
Ingesting logs No solution is complete without end-to-end visibility of your application
and data stack. With the complexity of modern environments, tracing
and metrics issues or performance problems across microservices that extend into
other Availability Zones, Regions, accounts, and even hybrid clouds,
is difficult without a tool to aggregate all this information.

Elastic Cloud is a SaaS solution that can be deployed in minutes


across supported AWS Regions and scales capacity automatically
so you can focus on building features and functionality versus
Try it in AWS Marketplace › building and maintaining monitoring tools.

Watch a demo and learn more › There are over 23 out-of-the-box integrations for AWS products
and services that allow MELT data to be sent into the Elastic Cloud.
Start a hands-on lab ›
Elastic supports cloud-native technologies such as Kubernetes,
AWS Lambda, DynamoDB, and many others.

MongoDB Atlas Elastic Cloud

Enterprise search Observability Security

Kibana
Explore, visualize, engage

Elasticsearch
Store, search, analyze

Integrations
Connect, collect, alert

Ingesting telemetry MongoDB monitoring is a critical component of all database


from MongoDB Atlas administration, and tight MongoDB cluster monitoring will show
the state of your database. However, due to its complex architecture,
monitoring can be a challenging task. With the Elastic Agent, logs and
metrics are streamed to Elastic Cloud with full visualizations in Kibana.

Elastic Cloud supports integrations with both MongoDB and Databricks


for full observability across applications and these data sources.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 36
Key benefits of Elastic Cloud Observability

Elastic Cloud enables you to:

Get up and running in minutes. This not only includes setup of the
Elastic Cloud services, but also ingesting data from AWS accounts and
hybrid cloud accounts.

Save time on operational overhead and focus on delivering business


value by leveraging managed services. Automatically scale up or down
and pay only for the resources you use.

Make sure your deployment is secure from the ground up, including but
not limited to OS hardening, network security controls, and encryption
for data in motion as well as data at rest.

Collect telemetry data seamlessly. Ingest all business and operational


data, including support for open instrumentation standards and open-
source projects such as OpenTelemetry, Jaeger, and Prometheus. This
allows you to consolidate monitoring tools and efficiently store data
at affordable costs to visualize and analyze historical trends.

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 37
Recap: Modern data architecture
To recap the solution:

Liquibase is used to manage database migrations Databricks Lakehouse Platform breaks


and treat data as part of a DevOps lifecycle. down data silos and democratizes data.

The capabilities of MongoDB are leveraged for Elastic Cloud Observability brings
NoSQL document storage that can scale to handle together MELT data into one place
performance and availability requirements of the to visualize.
most demanding cloud-native applications.

An important final point is that all of these services are SaaS offerings on AWS that
abstract away the heaving lifting of managing and maintaining servers and infrastructure,
helping developers focus on creating new features to drive increased customer satisfaction.

AWS Cloud Observability by

VPC

DevOps pipeline
Dev
Engineers Git Repo
Build Test Release
Amazon S3 Amazon Aurora
Test DB changes
Fail – notify engineer Pass - deploy change
Test

Amazon S3 Amazon Aurora

Data Business users IT Data warehouse Data service


scientists users users users Prod

Lakehouse Platform
Amazon S3 Amazon Aurora

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 38
Implement your modern data
management strategy today
Databricks Lakehouse Platform, MongoDB Atlas, and Elastic Cloud can be used together along with AWS
to build the foundation for establishing a well-engineered approach to modern data architecture. Establishing
these capabilities can provide a strong foundation for continuing to advance through your cloud-native journey.
You can explore these and other DevOps and modern data management-focused tools in AWS Marketplace.

To get started, visit: [Link]

AWS Marketplace Over 13,000 products from 3,000+ vendors:

Third-party research has found that customers


using AWS Marketplace are experiencing an
average time savings of 49 percent when needing
to find, buy, and deploy a third-party solution.
And some of the highest-rated benefits of using
AWS Marketplace are identified as:

Time to value Buy through AWS Billing using


flexible purchasing options:
• Free trial
Cloud readiness • Pay-as-you-go
of the solution • Hourly | Monthly | Annual | Multi-Year
• Bring your own license (BYOL)
• Seller private offers
Return on • Channel Partner private offers
Investment
Deploy with multiple deployment options:

Part of the reason for this is that AWS • AWS Control Tower
• AWS Service Catalog
Marketplace is supported by a team of solution
• AWS CloudFormation (Infrastructure as Code)
architects, security experts, product specialists,
• Software as a Service (SaaS)
and other experts to help you connect with the • Amazon Machine Image (AMI)
software and resources you need to succeed • Amazon Elastic Container Service (ECS)
with your applications running on AWS. • Amazon Elastic Kubernetes Service (EKS)

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 39
Get started today
Visit [Link] to find, try and
buy software with flexible pricing and multiple deployment
options to support your use case.

[Link]

Authors:

James Bland
Global Tech Lead for DevOps, AWS

Aditya Muppavarapu
Global Segment Leader for DevOps, AWS

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 40

You might also like