SQLSentry - Optimizing SSAS
SQLSentry - Optimizing SSAS
SQL Server
Analysis Services
(SSAS)
by Steven Wright
Optimizing SQL Server Analysis Services (SSAS)
Troubleshooting performance problems with SQL Server Analysis Services (SSAS) can be
a frustrating exercise. This insightful guide by Steven Wright, Product Manager of Analysis
Services at SentryOne, helps uncover the myriad forces that can conspire to degrade SSAS
performance. In this five-part guide, Wright explains MOLAP SSAS architecture, examines MDX
query performance, and offers time-saving tips for tracking trace events and troubleshooting
SSAS bottlenecks. While the information in this book focuses on SSAS in Multidimensional
mode (MOLAP), future editions will expand on differences found in SSAS Tabular mode.
O
ver the past couple years, since SentryOne Introduced BI Sentry for Analysis Services,
I’ve heard many people say they see SSAS as a “black box” on their servers with no
real visibility into troubleshooting performance issues. BI Sentry lifts the shroud from
Analysis Services, but for some it may not be obvious how to interpret all the information
gained from this new visibility.
Many resources cover OLAP architecture and MDX, but much less information is available that
explains the various metrics that are available and how
With a few basic they correlate with each other. In response, I developed
a presentation I’ve given several times now at SQL
pieces of information, Saturdays that I believe serves to “bridge the gap” for DBAs
you should be able to who may be new to Multidimensional SSAS, or possibly
even developers who lack a dedicated administrator and
tell whether there is need to be able to tell where the bottleneck might be on
room for MDX query their SSAS server. I was also privileged to be invited to
an interview on RunAs Radio where I spent some time
optimization, or wheth- discussing this with Richard Campbell and Greg Hughes.
er the bottleneck lies This book is a compilation of those presentations and the
series of blog posts I’ve written about how to troubleshoot
elsewhere. SSAS performance. Anyone working with SSAS should
find this book useful, but users of BI Sentry should find
themselves even more proficient in troubleshooting SSAS performance issues. We’ll start with
a very basic introduction to the service itself and how a query is handled, and move into
specific metrics and identifying bottlenecks.
Basic Architecture
It’s important to understand that SSAS is its own engine. The executable is msmdsrv.exe. It’s
licensed with SQL Server, but in large production environments, you aren’t likely to be running
it on the same machine as a SQL Server. There are two primary types of activity that occur
here—querying and processing.
Types of Activity
Querying can be done by various methods, but the most common type of queries are
Multidimensional Expressions (MDX). This is just an analytical query language, optimized for
calculations and aggregations of data. I won’t go into the how’s and why’s of MDX itself as
there are many resources already available on its mastery. While this series won’t tell you
Part 1: Basic Architecture of SQL Server Analysis Services
how to write MDX, by the time we’re done you should be able to tell, with a few basic pieces of
information described later, whether there is room for query optimization or if the bottleneck
lies elsewhere.
Processing is essentially the maintenance activity of your cubes. It’s how cubes are created
and updated with new data and/or how existing data is recalculated and reorganized. Just
as with SQL Server, what level of processing you perform and when is essential to ensuring
proper performance without interfering with peak querying times (and we’ll cover those later).
An interesting thing about the FE is that it is single threaded. If you are watching the CPUs of
your multi-core server with SSAS while a request is being handled and you only see one core
spike, it’s probably because the FE was doing the work at that time.
That said, when we talk about SSAS cache optimization, warming, etc., we are more often
dealing with the SE caches. That’s a bit of an oversimplification, but this is an introduction,
after all. The SE consists of a Dimension Cache and a Measure Group Cache to store the data
retrieved from the corresponding sources in the file system itself.
The FE then requests any other needed data from the SE. The SE then retrieves whatever it
can from its caches. If still more data is needed, the SE queries the file system on the server.
The data from the file system may, at least in part, come from the Windows file cache.
Remember that we mentioned this distinction earlier. Anything else results in physical disk IO.
These steps may be iterated as needed while data is moved from SE caches to FE caches, etc.
So now that we have a good understanding of how SSAS handles MDX requests, the question
we need to answer is, “What can impact MDX performance, and how can we tell?” That is the
question we will answer in the next section.
I
n the last section, I mentioned that SentryOne BI Sentry “lifts the veil” from the black box
known as SQL Server Analysis Services. But there is little information available that helps
someone new to the technology easily interpret all this new information in order to optimize
the performance of their servers. In this section, I want to dig deeper into the common sources
of bottlenecks and what information is available to troubleshoot these issues.
In the section, I’ll break into six categories the factors that can impact MDX performance on
your SSAS server:
• Processing
• Unoptimized code
• CPU
• Memory
• Disk
• Network
I’ll tackle each one by giving an explanation and refer to specific metrics, primarily performance
counters, that can be used to analyze the given issue.
Processing
I mentioned in the previous section that processing involves
the creation and updating of cubes with new data and/ Querying any cube
or recalculating and reorganizing existing data. I related while processing is
this activity to maintenance on SQL Server and many
of the same factors you consider when you implement occurring can lead to
maintenance in SQL Server apply to processing in SSAS. general resource con-
Processing primarily involves the Storage Engine (SE), tention on the server.
which you’ll remember is multi-threaded, and can
obviously be a resource-intensive operation. Because
processing might involve new data or recalculation of existing data, cached data will become
outdated and will be flushed.
This then means that the next time a query runs that requires that data, the SE will have to
query the file system to retrieve needed data from the Windows cache or the disk system to
replace the flushed data. This will result in a performance hit to the query on its first run after
processing.
Querying while processing is occurring on the same cube can lead to blocks. Querying any
cube while processing is occurring can lead to general resource contention on the server.
So how can we tell when processing is occurring on the server? While there are many
performance metrics that have varying levels of usefulness, I’ll mention a handful for each
topic that will give you better insight for that topic. Don’t infer that if I don’t mention it, it’s not
useful, or vice versa. The point of this book is to give you a better understanding of what’s
going on in SSAS and how to make better use of the information tools such as SentryOne BI
Sentry provide.
• Processing: Rows written/sec tells you the rate of rows written during processing
• Proc Aggregations: Rows created/sec more specifically tells you the rate aggregation rows
are created during processing.
• Proc Indexes: Rows/sec tells the rate of rows from MOLAP files used to create indexes.
So, a non-zero value for these counters confirms that processing is taking place. These metrics
can also help determine the effectiveness of any processing tuning efforts. If your tuning
efforts are effective, you should see an increase in these values accordingly.
I
n the last section, I showed you some counters that can help identify when SSAS processing
is taking place, and at what rate. I then gave some suggestions as to how you might be
able to optimize your processing strategies in order to improve overall performance on your
server. In this section, I’ll talk about unoptimized MDX code as the source of your performance
issues.
Unoptimized Code
There are quite a few resources available on how to write MDX, so I won’t be going into that
here. The point I want to make is that there are ways you can identify queries running on your
SSAS instance that might benefit from optimization, even without your actually knowing MDX.
In this section, I’ll mention a handful of key performance counters that will allow you to identify
issues related to query optimization. You can then take this knowledge to the developers of
the queries with useful feedback as to how to improve performance.
Specific to MDX, there are a few performance counters that can provide a lot of insight. I’ll
group them into three general issues. The first group are:
• MDX: Total cells calculated
• MDX: Number of calculation covers
• MDX: Total Sonar subcubes
Another counter that can indicate serious performance issues with MDX:
This indicates errors in calculations during query execution. A non-zero value here can
indicate issues where unnecessary recalculations are taking place and can even lead to a
fallback to cell-by-cell mode in an effort to eliminate the error.
If the bottleneck appears to be in the formula engine, watch for cell-by-cell calculation,
and eliminate empty cells and tuples using the recommended strategies in the linked
whitepapers mentioned above.
Aside from the counters just mentioned, how do you know where the main bottleneck might
be—in the Formula Engine or Storage Engine? Where should you focus your performance
troubleshooting? I’ll help you further pinpoint those efforts in the next section.
I
n the last section, I talked about how to detect performance issues related to unoptimized
MDX and different strategies based on the type of issue. A big part of ensuring that you
are as efficient as possible with your troubleshooting efforts is to first identify whether the
bottleneck is in the Formula Engine (FE) or Storage Engine (SE). Technically, unoptimized MDX
itself is likely to manifest as an FE bottleneck where an SE bottleneck really resides more with
the underlying cube architecture. That said, all of this really ties back to the first section and
the anatomy of an MDX query.
In part 1, I walked through the anatomy of an MDX query and described what role the FE
and SE play in the handling of a request. When troubleshooting slow MDX performance, it’s
not likely that the bottleneck is entirely with the FE or entirely with the SE. So how do you tell
where the biggest bottleneck is, so you can ensure you are getting the most out of your tuning
efforts?
There are other events and additional detail provided within the trace, but this list covers
the primary events we’ve discussed. Note there can be many of these events—excluding the
command or query begin and end—for a single request. Tracking these events can get a bit
complicated fairly quickly, especially if there are multiple requests hitting the server at the
same time. However, if you are troubleshooting a particular query and run a trace, you do have
the information to add up the duration for each event and come up with an idea of where most
of the time is being spent.
Fortunately, SentryOne BI Sentry does all of this for you. You might have noticed a rather
unique chart on the BI Sentry dashboard.
This chart monitors all of this
activity for you continuously
at a fraction of the overhead a
Profiler trace tends to impart.
It summarizes this information
in real time and retains history
so you can quickly and easily
see where most of the time is
being spent. It even breaks the
information down into more
detail for you. Serialization is
handled by the FE and is related
to NON_EMPTY behavior as
discussed in an earlier post.
Processing and SQL queries
are handled by the SE. By the
way, SQL queries are likely due
to data reads for processing,
but can also be seen if you are
using ROLAP partitions. BI Sentry Dashboard Chart Detail
For a further breakdown of this activity on a query-by-query basis in BI Sentry, refer to the Top
Queries tab. Not only will you see whether your query experienced more FE or SE time, but you
can expand the view for a complete breakdown by Measure Group, Partition, Aggregation, and
Dimension. More information on this functionality is available on our website.
Troubleshooting Bottlenecks
Up to this point, I’ve been focusing on common activity in Analysis Services, how to identify the
bottleneck for that activity, and how to resolve the bottleneck. In the last section, I specifically
discussed how to identify whether the Storage Engine (SE) or Formula Engine (FE) was the
larger bottleneck. Now I want to talk more about identifying bottlenecks with the server’s
physical resources themselves. The four basic areas of server resources to investigate are:
• CPU
• Memory
• Disk
• Network
A lot of these issues, and methods for identifying them, are common to Windows Server
performance in general, but I’ll include pertinent SSAS-specific performance details and
metrics, too.
CPU
Any Windows server includes a handful of counters that provide good indication that the
bottleneck with your server might be related to the CPU(s):
• Processor: % Processor Time
• System: Context Switches/sec
• System: Processor Queue Length
While there is no magic number that indicates what “good” CPU utilization is, you generally
want to get the most utilization without overburdening the system, so look for sustained
periods of 100% utilization on one or all cores to suggest a bottleneck here. The other two
counters, when abnormally high for sustained periods, usually suggest an issue with too many
parallel operations on the server.
So let’s take a look at some SSAS performance metrics that shed some light on processor
utilization. Two sets of SSAS performance counters apply to the formula and storage engines:
• Threads: Query pool
• Threads: Processing pool
• Threads: Processing I/O pool
Don’t let the names confuse you. The query pool refers to FE activity. The processing pool not
only refers to processing, but any SE activity. Prior to SSAS 2012, there was no Processing I/O
pool. It was introduced to further break out processing activity from read activity in support of
query requests by the SE. That query-related read activity is now represented by the Processing
I/O pool. Each of these categories contains counters for Busy, Idle, Job Rate, and Queue
Length. They allow you to see the thread activity for each engine. If you are seeing consistently
high queue lengths, but not high CPU utilization, you might want to adjust your MaxThreads
and/or CoordinatorExecutionMode properties for your SSAS instance. Remember that the FE
is single threaded, so increasing the query pool setting might not improve performance of any
one query, but might improve the performance in handling multiple simultaneous requests.
Memory
Three groups of metrics are relevant in monitoring MOLAP Analysis Services memory:
• Overall usage—how much total memory is SSAS using on the server.
• Cache hit ratios—how efficient is the use of that memory.
• Cache activity—what is happening to the memory.
Memory monitoring is very different in Tabular mode as the entire Storage Engine operates
in–memory. Here we’ll review MOLAP memory usage.
This is the total memory usage for the server process, and should be the same as the Process:
Private Bytes counter for msmdsrv.exe.
NOTE: Do NOT rely on Task Manager for an accurate picture of memory usage.
These counters refer to the background cleaner for SSAS. The first counter refers to the amount
of memory known to the background cleaner. That memory is then divided into shrinkable
and nonshrinkable memory. This describes what portion of that known memory is subject to
purging by the cleaner based on memory limits. The cleaner value is likely to be a bit lower
than the total usage value, but it’s important to know because this value lets you know how
much room you have to actually work with when it comes to memory management. The limits
the cleaner works with are defined by properties indicated by the following two counters:
• Memory: Memory Limit Low KB
• Memory: Memory Limit High KB
A great explanation of these properties and counters, along with real-world examples of their
use is covered in Greg Gonzalez’ blog post, Analysis Services Memory Limits.
While there’s no persistent cache hit ratio counter itself for these caches, as there is for SQL
Server, these metrics let you to calculate the ratio for each cache for a given point in time.
These metrics give a fairly direct indicator of memory pressure on the server. If the Evictions/
sec and/or Cleaner: Memory shrunk KB/sec are consistently non-zero, you likely have memory
pressure on the server. The Cleaner: Memory shrunk counter in particular indicates that you
are exceeding your defined memory limits described earlier.
The answer is cache warming. There are different ways to warm the cache. The simplest is
to run a couple of the most commonly used queries after processing to pull the most likely
needed data into cache.
Disk
In order to determine whether the disk system is a bottleneck for your SSAS instance, you
need to first verify that SSAS is indeed accessing the disk system. I mentioned in the first post
that the SE accesses the file system. So check the previously mentioned counters to verify the
SE is active. A more specific counter is:
• MSAS: Storage Engine Query: Queries from file/sec
Remember, unlike the relational engine, SSAS data may be in the Windows file cache. This
means that even when the above counter is non-zero, it alone does not guarantee physical
disk IO. You’ll want to examine the following three counters to get an idea of how much of the
SE activity is actually reading from disk as opposed to the Windows file cache:
• MSAS: Storage Engine Query: Data bytes/sec
• Physical Disk: Disk Read Bytes/sec
• Cache: Copy Reads/sec
Be sure to account for activity outside of SSAS when using these metrics. For a much more
detailed explanation with great screen shots of BI Sentry for SSAS, see Greg Gonzalez’ blog
post on the subject.
So once you’ve determined SSAS is incurring physical disk IO, what should you check to
ensure the disks are performing optimally? There are many different counters available for
disk performance, and some are more useful than others.
For a long time, disk queue length was considered an important metric, but as server storage
grew to include more and more spindles, was moved to SANs, or included SSDs, this metric
has become less meaningful.
Optimally these values should remain below 10 ms. As you approach latency 20 ms to 30 ms
or more, you’re going to notice performance issues related to the disk system. With SSDs,
latency should be even lower. This isn’t specific to SSAS, but more of a general server guideline.
You might also want to disable Flight Recorder. Without going into the pros and cons of why
you do or don’t want Flight Recorder, many articles suggest a performance improvement by
disabling it. It is basically a file-based trace on your system and will increase IO. It can be
disabled in the properties for the SSAS instance.
A couple of other disk performance tactics worth mentioning are not specific to SSAS. First
note the location of your cube’s files. If they are sharing spindles with system files, or other
busy databases, you’re likely to run into contention.
The same principle applies to SAN allocation. This is often harder to investigate without the
help of your SAN administrator, but be sure you’re not sharing busy spindles on the SAN,
either.
Network
Just as with the relational engine, the network is the component that probably offers the least
visibility. The problem is just as likely to be outside your server. The network itself can be slow,
or the bottleneck might be on the client end. That said, there are some metrics to identify
whether the problem is a local network issue.
This will give you an idea of the traffic on the NIC(s), and let you know if there is a backup in
output. For SSAS, we can at least identify what kind of traffic we are sending through the pipe:
• Processing: Rows read/sec—tells the rate of rows read from all RDBMSs
• Storage Engine Query: Rows sent/sec—tell the rate of rows sent from the server to clients
This should give you better visibility into how your network cards are performing and how much
of that is related to SSAS activity.
About SentryOne
SentryOne is a Charlotte-based tech company whose award-winning solutions empower
Microsoft data professionals to achieve breakthroughperformance across physical, virtual,
and cloud environments.The team shares its expertise at blogs.sentryone.com and
sqlperformance.com.