Distributed SQL
Distributed SQL
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Distributed SQL For Dummies®, MariaDB Special Edition
Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright © 2023 by John Wiley & Sons, Inc.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without
the prior written permission of the Publisher. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, The Dummies Way, Dummies.com,
Making Everything Easier, and related trade dress are trademarks or registered trademarks of
John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries, and may not
be used without written permission. MariaDB and the MariaDB logo are registered trademarks of
MariaDB. All other trademarks are the property of their respective owners. John Wiley & Sons,
Inc., is not associated with any product or vendor mentioned in this book.
For general information on our other products and services, or how to create a custom For
Dummies book for your business or organization, please contact our Business Development
Department in the U.S. at 877-409-4177, contact [email protected], or visit www.wiley.com/
go/custompub. For information about licensing the For Dummies brand for products or services,
contact BrandedRights&[email protected].
ISBN: 978-1-394-15978-9 (pbk); ISBN: 978-1-394-15979-6 (ebk). Some blank pages in the
print version may not be included in the ePDF version.
Publisher’s Acknowledgments
Some of the people who helped bring this book to market include the
following:
Project Manager: Client Account Manager:
Carrie Burchfield-Leighton Jeremith Coward
Sr. Managing Editor: Rev Mengle Special Help: Patrick Bossman,
Managing Editor: Camille Graves Lewis Carr, Alejandro Duarte
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Table of Contents
INTRODUCTION................................................................................................ 1
About This Book.................................................................................... 1
Icons Used in This Book........................................................................ 2
Conventions Used In This Book........................................................... 2
Beyond the Book................................................................................... 2
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Crafting the Perfect Schema.............................................................. 23
Keys and slices............................................................................... 23
Tables big and small...................................................................... 25
REPLICAS=ALLNODES.................................................................... 26
Range queries................................................................................. 27
JSON................................................................................................. 28
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
D
istributed SQL is a relatively new technology that provides a
modern way to scale large databases while maintaining ACID-
level consistency, high availability, and disaster recovery. In
database systems, ACID (atomicity, consistency, isolation, durability)
refers to a standard set of properties that guarantees database trans-
actions are reliably processed. Other scaling solutions for large data-
bases such as NoSQL don’t allow for the same consistency and ease
of querying with industry-standard SQL.
Use distributed SQL in the cloud with SkySQL and allow your
database to take advantage of the cloud-centric architecture that
makes adding nodes simple and inexpensive. SkySQL allows for
simple distributed SQL database management in a powerful and
secure web client.
Introduction 1
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Icons Used in This Book
Like most For Dummies books, you’ll find some icons in the mar-
gins that help you spot important information highlighted in this
book. Here is what they mean:
The Tip icon points out helpful information. This content may
help you save time or money.
The Warning icon alerts you to information that may save you
from making decisions that are harmful or helps you avoid pitfalls.
»» mariadb.com/products/enterprise/xpand
»» mariadb.com/docs/products/mariadb-xpand
»» mariadb.com/products/skysql
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Recognizing the need for distributed SQL
Chapter 1
Getting to Know
Distributed SQL
C
lient-server databases such as Oracle, PostgreSQL, Microsoft
SQL Server, MySQL, and MariaDB were originally designed to
handle smaller data sets at lower throughput and scale.
Increasing scale or throughput requires the capability of a single
machine. These databases now achieve high availability by replicat-
ing the entire database to more machines. NoSQL databases such as
Cassandra and MongoDB focused on scale but with reduced capa-
bilities, especially with regards to joins, transactional integrity, and
SQL — the most popular standard query language. This chapter
explains how distributed SQL addresses scale, integrity, and availa-
bility while maintaining the feature set of a full relational database.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Managing scale
Scaling a database requires potentially handling multiple issues:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
What if you have a house on the scale of the Winchester Mansion
(160 rooms)? In that case, you may not even be able to rent a big
enough moving truck! Renting a lot of smaller ones with more
help is the only feasible method of handling that much load.
This is also the case with databases. Smaller loads may do bet-
ter with a traditional client-server database such as MariaDB
Server, but larger loads with more traffic are probably better with
MariaDB Xpand, a distributed SQL database. Using Xpand in the
Cloud makes it possible for workloads to be rapidly and affordably
scaled, including across regions and globally.
Scale is the one reason to use a distributed database, but it’s not
the only reason. Availability is another. The one thing computing
can guarantee is that things will break. Faults can happen on a
single machine, network device, or even the powerlines that feed
them. For modern services, especially on the scale of a large bank
or SaaS company, losing service for any period of time is unthink-
able. Distributed SQL databases store redundant copies of data on
additional machines in additional locations (such as cloud avail-
ability zones).
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Some databases, such as Oracle, can automatically partition a
database on these values. Other databases may require application
developers to manually divide the data. While sharding is a good
way to share resources, the partitions must also be replicated to
ensure availability.
Sharding isn’t used for availability; it’s used for scalability but is
sometimes used along with high availability techniques to allow
for larger database sizes. When you’re choosing a method for high
availability, decide which tradeoffs, such as risk, performance,
and complexity, that you can live with. When making your deci-
sion, you should also take into consideration feasibility issues like
data size and network topography.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Single writer multiple replicas
When traditional databases such as Microsoft SQL Server, MySQL,
or MariaDB Server need to handle heavy read with fewer write
operations, it’s common to configure a cluster where one node
handles the writes while reads are directed to other nodes. Often,
writes are handled synchronously, meaning all read replicas must
complete updates before a write transaction commits. As long as
an application client isn’t interested in a new or updated row,
they won’t have to wait. Any client interested in a row locked
in the transaction may block until the transaction commits.
Because this blocking occurs until all replicas acknowledge the
update, it may take longer than on a single-instance database.
Systems that fit on a single node without splitting the data are
extremely economical and efficient. Replicas give you high availa-
bility and read-scale but don’t work well in systems where a large
number of write operations exist or the data set grows too large.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Standby instance
Using a synchronously replicated standby instance is a variation
of a multiple-writer system. Instead of having writes go to two
(or more) nodes and replicate to each node, writes go to one node,
and a copy is sent to a standby node. If the primary goes down, the
standby becomes the new primary.
Asynchronous replication
The other methods we talk about in this section assume that
absolute transactional integrity is required with no chance of data
loss during failures. For some data sets, performance is more
important than transactional integrity. It’s possible to config-
ure replicas with lower levels of assurance than full transactional
acknowledgment. In these cases, your system may be okay with
simply receiving a message that the data was sent or received
instead of waiting for the data to be committed.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
FIGURE 1-1: Every slice has a replica on one of the other nodes.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
FIGURE 1-2: Data is assigned to slices using a hash value for even distribution.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Balancing the load with Xpand
Distributed SQL databases use different methods to balance the
load across the nodes by detecting when a node is overused and
responds by moving the data to even out the load. In addition to
traditional load balancing reads and writes, MariaDB redistrib-
utes data when a node gets too busy by using one of the following
methods:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Understanding Distributed SQL
Topographies
When deciding how to deploy a distributed SQL database, consider
the following:
In MariaDB Xpand, you can add a node to a zone with the follow-
ing code:
Disaster recovery
For disaster recovery, some distributed SQL databases support
synchronous replication between regions. Synchronous replica-
tion delivers recovery point objective — zero data loss. It is true
there is severe write performance penalties. So many systems use
highly efficient parallel replication combined with regular back-
ups. In the event of a disaster, the application can failover to a
different geographic region.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
MariaDB Xpand supports parallel asynchronous replication as
well as parallel backup. These capabilities use multiple nodes in
the cluster to replicate data. This capability is essential for high
throughput systems that need frequent backups or those that
replicate to a disaster recovery region.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Understanding when distributed SQL
is used
Chapter 2
Distributed SQL in
the Real World
D
istributed SQL is used for systems of record at high scale
with high availability. Many use cases in many industries
have these requirements. In this chapter, you look at a few
of those use cases across multiple industries.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Ecommerce Use Cases
Ecommerce handles an incredible amount of data. Some tables
tend not to change often but must be able to handle a great many
read operations. Additionally, transactionally intense operations,
such as purchasing, require high availability and scalability.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IoT and Manufacturing Use Cases
There are a multitude of uses for distributed SQL in the Internet
of Things (IoT) use cases found in manufacturing, energy explo-
ration, power generation and distribution, and transportation and
logistics systems. Data from sensors and asset tracking require a
high number of write operations and fewer read operations. These
data sets are commonly accessed by range queries.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Gaming Use Cases
Gaming is a serious and large business worldwide. Games require
low latency and can involve massive amounts of data coming in at
high throughput. Distributed SQL allows games to handle a lot of
concurrent reads and updates and ensure that the system is avail-
able globally at all hours of the day and night.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Deciding whether to cloud host
»» Using Docker
Chapter 3
Getting Started with
Distributed SQL
S
electing the correct deployment model and designing proper
schemas optimize your use of a distributed SQL database
and your investment of time and resources. This chapter
looks at reasons to deploy on-premises or in the cloud. It also
describes how to craft schemas, including the key field necessary
to ensure proper data distribution and other issues particular to
distributed databases.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
also decide if it wants to host the solution itself or purchase a
database-as-a-service solution. This decision is based on whether
Hosting on-premises
Hosting on-premises allows a team more flexibility in terms of
how a system is set up, which resources it’s assigned, if it can
meet specific regulatory requirements. Occasionally, on-premises
systems are used due to more stringent latency or data sovereignty
requirements or because, in an industry like manufacturing, the
system is as reliable as the shopfloor systems. However, on-
premises systems must ensure that adequate backup and upgrade
procedures are in place. Additionally, it’s important to make sure
that there are sufficient zones or protections for redundancy. For
instance, you can use separate racks or ideally a separate building
with sufficient isolated resources.
Cloud hosting
Self-hosting a distributed database in the cloud allows for more
flexibility but also allows the database to grow and shrink as
needed. Instead of procuring more infrastructure than needed,
the organization can use pay as you go. However, it still requires
some of the same expertise as well as backup and upgrade proce-
dures. But, it’s easier to ensure deployment in separate availabil-
ity zones, and many of the redundancies the system would need
are built into the cloud infrastructure.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
A database-as-a-service deployment is much simpler than on-
premises or self-hosted and will generally follow best practices
for that particular database. In the case of MariaDB Xpand, data-
bases can be deployed to AWS and Google Cloud using MariaDB’s
database-as-a-service system called SkySQL. SkySQL allows you
to manage many aspects of your database system, including sys-
tem upgrades.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
2. Start services.
After you register an account, click Access SkySQL.
3. Click Launch New Service, choose the Xpand Distributed
SQL topology, and then you can select options for cloud
provider, instance size, and storage size.
For this case, Figure 3-1 shows the 3-node option, but there
are 1-, 3-, or 6-node options to choose from in the panel.
After you set up your account to deploy Xpand, you have a couple
of connection options:
Most options are in the Main tab in the Connection Setting screen,
as shown in Figure 3-2. The path to the CA certificate is in the
SSL tab.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
FIGURE 3-2: The database connection detail dialog box.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
aren’t efficient on even high-traffic, client-server databases but
are a bigger problem on a high-scale distributed database. The
performance issue is the database must synchronize threads in
order to assure monotonically incremental values. In order to
work around this, some developers have used BINARY(16) fields
and generated universally unique identifiers (UUIDs). These avoid
the issue of thread synchronization with an extremely low risk
of duplicates, but because UUIDs fill the keyspace randomly at a
distance, they index poorly.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
However, for some data it makes sense to pick a different set of
fields in order to avoid hotspots based on usage patterns or the
data itself. When picking a key, ensure the field or set of fields
has a high number of distinct values (NDV). This means if a table
has STATE_CODE (referring to states in the United States) and is
used as a distribution key, the table can’t be divided into 100 slices
(because there are only 50 states and a handful of districts and
territories), so pick a key that has enough NDV.
This key helps the database pick which nodes get which record or
index entry.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
FIGURE 3-3: Consider using larger tables over many small tables.
REPLICAS=ALLNODES
Some tables are frequently accessed as reference data. It’s more
efficient to distribute these tables to every node in the cluster. Con-
sider doing this for tables that are relatively small (<10 megabytes
[MB]), rarely updated, read from frequently, and are frequently
joined with other tables. To do this, specify REPLICAS=ALLNODES
like the following code:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Using this feature allows joins against reference data to avoid a
hop and to happen locally on a single node making for more effi-
cient queries. The tradeoff is that writes to this data are slow (due
to the fact that all nodes have to be updated).
Range queries
Queries that use operators like < and > or keywords like between
are range queries. These types of queries can be more efficient if a
columnar index is used instead of a row index. Columnar indexes
essentially turn the table sideways because data in a column is
often repetitive, so it compresses better than a row index. A lot of
compressed values can be traversed more quickly, so this works
well for range queries.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
JSON
Most modern relational database support JSON. The ANSI SQL
2016 standard even incorporates it. Distributed SQL databases
aren’t different, but how they support it and to what extent var-
ies. In Xpand, JSON is stored in its native format.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Developing applications with Java
Chapter 4
Creating Distributed
SQL Applications
X
pand fully supports most popular languages, frameworks,
and technologies. In fact, you can generally use the
same tools and technologies that you use to connect to
MariaDB and MySQL. This chapter overviews three of the more
popular frameworks in which you can create distributed SQL
applications.
But before you jump into the sections in this chapter, you may
want to check out Chapter 3 (if you haven’t already). The infor-
mation in this chapter is based on the SkySQL setup in Chapter 3.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Regardless of the language you chose, ensure you have: installed
the mariadb client utilities, connected to your SkySQL Xpand
instance, and created the tables the examples depend on. These
are the steps necessary to do that:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
5. Create the ORDER_ITEMS table.
Create the ORDER_ITEMS table with the following SQL
statement:
Java
Java is one of the most common languages for developing appli-
cations with distributed SQL databases, especially MariaDB
Xpand. The code you see in this section demonstrate the processes
of connecting to your created database, inserting, querying, and
deleting rows using the Java Database Connectivity (JDBC) appli-
cation programming interface (API).
Before you code and compile the example, ensure you have the
following:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
»» Maven: Maven is a popular build tool. On MacOS, you can
install it with Homebrew by using this command:
• www.jetbrains.com/idea/download
• www.eclipse.org/downloads
»» A directory structure: Create a directory to hold your
overall project and a subdirectory structure matching src/
main/java/com/example.
<dependencies>
<dependency>
<groupId>org.mariadb.jdbc</groupId>
<artifactId>mariadb-java-client ↩
</artifactId>
<version>3.0.8</version>
</dependency>
</dependencies>
You can find a complete example of creating the Maven build file
here: github.com/mariadb-developers/xpand-dummies/blob/
main/java/pom.xml.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Creating the main application
Create the main Java application by using your favorite editor at
src/main/java/com/example/Application.java. You can find the
complete listing at the following:
https://github.com/mariadb-developers/xpand-
dummies/blob/main/java/src/main/java/com/example/
Application.java
After you’ve entered that code, supply your server, port, and data-
base name along with the properties object to the Java JDBC driver
manager. You do that by following this code:
connection = DriverManager.getConnection(
"jdbc:mariadb://YOURSERVER:YOURPORT/orders",
connConfig);
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
A prepared statement can be constructed inside of a Java try/catch
block. The parameterized values are supplied in order by calling
the appropriate “set” functions on the statement object. Finally,
executeUpdate() tells the database to execute the statement.
Deletes and updates are done in a similar way but with a different
SQL statement. For example, the following statement deletes an
order line item:
Selects are done similarly. A query joins the ORDERS and ORDER_
ITEM table based on the order id. It returns all rows from the
ORDER_ITEM table in the database. The query looks like this:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
orderid = rs.getLong("ORDER_ID");
customerid = rs.getInt("CUSTOMER_ID");
orderdate = rs.getDate("ORDER_DATE");
ordercreated = ↩
rs.getDate("ORDER_CREATED");
enteredby = rs.getString("ENTERED_BY");
itemid = rs.getLong("ITEM_ID");
linenum = rs.getInt("LINE_NUM");
productid = rs.getInt("PRODUCT_ID");
description = ↩
rs.getString("DESCRIPTION"));
}
}
}
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
JavaScript
JavaScript and TypeScript are the fastest growing ecosystems for
new cloud applications, especially MariaDB Xpand. Before you
begin using JavaScript, ensure you have the following installed:
For any Xpand application, you need to ensure the mariadb client
library is loaded with the following:
You need the filesystem library as well in order to read the SSL
certificate. You can get that by running the following code:
const fs = require("fs");
Read the pem file, including the path where you downloaded it,
as follows:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
let connection = await mariadb.createConnection({
host: "YOURHOST",
port: YOURPORT,
ssl: {ca: serverCert},
user: "YOURUSER",
password: "PASSWORD$",
database: "orders"
After this, you need to specify the character set immediately after
connecting by using the following code:
Queries are done the same way. The following query joins data
from the orders and order_items table together and returns all
rows in the database:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
In your function, send this to the database by using the query
function of the connection object. The result is returned in the
“rows” object that can be iterated through the following code:
Deletes and updates are handled the same way. This SQL state-
ment accepts an order id and line number as parameters and
deletes the row with those values from the order_item table:
node application.js
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Python
Partly because of Python’s strength in analytics and Xpand’s
columnar index capability, there are an increasing number of
distributed SQL applications written in Python. Before you begin
ensure you have the following installed:
import mariadb
connection = mariadb.connect(
host="YOURHOST",
port=YOURPORT,
ssl_ca="skysql_chain.pem",
user="YOURUSER",
password="PASSWORD$",
database="orders",
)
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Construct a cursor object from the connection specifying that you
want results to be returned as named_tuples. You must also spec-
ify the character set (utf8) immediately after by using this code:
cursor = connection.cursor(named_tuple=True)
cursor.execute("SET NAMES utf8")
Consult dev.mysql.com/doc/connector-python/en/connector-
python-api-mysqlcursor.html for other options, such as raw or
dictionaries, for the field named_tuples.
cursor.execute(
sqlInsertOrders, (1,"andy"))
connection.commit()
Queries are done the same way. The following query joins data
from the orders and order_items table together and returns all
rows in the database:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
i.line_num, i.product_id, i.description
from orders o inner join order_item i on o.order_ ↩
id = i.order_id"
Send this to the database using the execute function below of the
cursor object.
cursor.execute(
sqlQueryOrdersItems
)
You can then iterate through the returned values by using a for/
range function as follows:
for x in range(0,cursor.rowcount):
row = cursor.fetchone()
print(row)
Deletes and updates are handled the same way. The following SQL
statement accepts an order id and line number as parameters and
deletes the row with those values from the order_item table:
Send this to the database by using the execute function of the cur-
sor object along with the parameters:
connection.commit()
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Running the application
Build and run the application with the following command:
python app.py
• mariadb.com/products/skysql/docs/quickstart: A quick-
start using SkySQL with Python, PHP, JavaScript, or Java
• mariadb.com/products/skysql/docs/reference/sample-
code: Additional language-specific tutorials in different languages
using different frameworks such as R2DBC
• mariadb.com/docs/products/mariadb-xpand: A link to the
Xpand documentation
• mariadb.com/developers: The MariaDB Developer Hub
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Considering if distributed SQL is the
solution
Chapter 5
Deciding if Distributed
SQL Is Right for You
D
istributed SQL databases may not be the right solution for
every application. This chapter helps you consider when a
distributed SQL database is the correct application technol-
ogy and it’s not.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
rate is less than 10,000 queries per second (QPS) or 5,000
transactions per second (TPS), a distributed SQL database
may not be the answer.
»» When you have primarily JSON data that can’t be
represented as tables: When you need to store complex
JSON documents, subdocuments, and fields with no
reasonable way to map them to tables, a distributed SQL
database may not perform as well as a dedicated NoSQL
document database.
Having JSON data doesn’t mean using distributed SQL is
wrong. Distributed SQL databases can handle JSON fields
and also represent tables as JSON.
»» When the application is purely analytical: Distributed SQL
databases weren’t created to handle heavy analysis. They’re
designed for transactional data. While MariaDB Xpand has
some analytical capabilities with columnar indexes, it shouldn’t
be used for complex analytical queries that extract large
quantities of data for a relatively small number of users.
For heavy analysis, consider using a dedicated analytical
database such as MariaDB ColumnStore.
»» When data consistency isn’t an issue: In a scenario where
data consistency (expecting all versions of the data to be the
same) isn’t important, database solutions exist with relaxed
consistency. This is particularly true for read-only databases
where data rarely, if ever, changes. Distributed SQL data-
bases may reduce architecture complexity but have strict
consistency requirements.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
the 50 terabyte (TB) mark with an Oracle database, things start
to get a little hairy. There are scalability solutions (check out
Chapter 2), but each of these performance solutions begins
to involve extra labor and may create bottlenecks. Because
distributed SQL databases spread the data across nodes,
they can use more resources and handle larger database
sizes without intervention by the application developer.
When your database tables have hundreds of millions of
rows, consider using a distributed SQL database.
»» You have a high number of client connections. In
applications that carry many simultaneous client connec-
tions, a server failure means a high client reconnection
requirement. Your operations team needs to regulate
cold-start performance against the number of connections
your database can open at once. By spreading those
connections across more nodes, distributed SQL databases
can handle more connections opened and sustained at once.
Use multiple database proxies to balance the load and
reduce the chance of bottlenecks.
»» You need high availability with multi-zone failure
tolerance. While it’s possible to handle faults using read
replicas and electing a new primary with specialized
software, such as MariaDB MaxScale, performance always
takes a hit. Because distributed SQL databases don’t have
primary nodes and every node can handle traffic, client
interruption is avoided. Data traffic can simply be redirected
to other instances. The recovery process, copying data to
additional nodes, is automatic.
»» Your application is read-heavy. While a primary database
that uses multiple read replicas can handle a lot of reads, it
doesn’t scale linearly. Distributed SQL databases handle
more reads by making sure they’re evenly distributed across
the cluster.
»» You have a lot of writes. Client-server databases can scale
reads with read replicas, but most don’t handle a heavy
number of writes per second. Distributed SQL databases
spread writes evenly to slices across the cluster. Because
only replicas of a slice need to be locked, writes have little
impact on data not involved in the transaction.
»» You have varying or seasonal demand. A key benefit of
distributed SQL is that you can add and remove nodes as
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
needed. This “elastic” scale is useful for scenarios where a site
receives lots of user traffic followed by much lower demand.
»» An existing sharding solution has grown too complex.
Many people have scaled existing relational databases
by manually splitting the data into shards. It is difficult to
maintain the balance and distribution of data while assuring
integrity. Distributed SQL databases balance and rebalance
data automatically.
Distributed SQL databases were designed for high transac-
tion applications.
»» This is a system of record. Systems of record are those that
contain authoritative data requiring strict control over the
reliability of their data. Distributed SQL databases were
designed to be atomicity, consistency, isolation, durability
(ACID) compliant from the ground up. For this reason, they’re
excellent choices for systems of record.
NoSQL databases can provide fault tolerance and scale.
However, they don’t operate the same way that traditional
systems of record operate. Transactions generally don’t
perform as well compared to relational databases.
»» You’ve outgrown your existing relational database. If
you’re using an SQL database but need one that’s more
powerful, reliable, distributes the load more evenly, and
scales reads and writes, a distributed SQL database is a
natural fit.
When migrating from a MySQL or MariaDB database,
MariaDB Xpand is highly compatible. If migrating from
databases such as Oracle or PostgreSQL, they have features
that make migration to a distributed SQL database fairly
painless.
»» You’re migrating to the public cloud. The MariaDB Xpand
distributed SQL database is designed to use availability zones
(AZs) that give your application resiliency and edge computing
advantages. While you can host MySQL, MariaDB, PostgreSQL,
and even Oracle or SQL Server instances in a public cloud,
they aren’t architected for AZs. You can get them to work, but
it’s not a clean fit. Distributed SQL databases are designed to
be self-healing even if an AZ goes down.
AZs can be geographically located near your users, which
improves application efficiency.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
»» Loading data in parallel
Chapter 6
Ten Tips for Success
with Distributed SQL
Applications
A
fter you’ve gotten started with distributed SQL, you can do
several things to improve performance and ensure that
your applications are optimized. In this chapter, here are
ten tips to ensure that success.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
more than 8GB times the number of nodes, you may want to pre-
slice the tables. This step speeds up data loads by allowing for
additional parallelism, so the rebalancer doesn’t try to reslice the
data as it’s being loaded.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
the underlying database infrastructure. Here are some of the
MaxScale features:
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
You can create a columnar index with MariaDB Xpand by using
the CREATE COLUMNAR INDEX statement:
Use MariaDB SkySQL to gain fast and easy control over your data-
base, get built-in high availability, and clear visibility into perfor-
mance and managing your database.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Add Nodes for More Performance
One of the standard ways to scale an application is to add more
nodes. With MariaDB Xpand, you can simply add nodes to get more
database capacity as well. Xpand automatically manages distrib-
uting the data across the new nodes, all while staying online.
These materials are © 2023 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.