March 2015
When Good Design Goes Bad
Bob Duffy
Database Architect
Prodata SQL Centre of Excellence
Bob Duffy
• 20 years in database sector, 250+ projects
• Senior Consultant with Microsoft 2005-2008
• One of about 25 MCA for SQL Server globally (aka SQL Ranger)
• SQL MCM on SQL 2005 and 2008
• SQL Server MVP 2009+
• SSAS Maestro
• Database Architect at Prodata SQL Centre of Excellence
• http://blogs.prodata.ie/author/bob.aspx
• [email protected]
@bob_duffy
What We Will Cover
Stored Procedures
Clustered Tables
Identity and Primary Keys
Indexes
Fragmentation
Naming Conventions
Partitioning
ORM
1. Stored Procedures
Why use Stored Procedures ?
The Dreaded Search Screen
http://www.sommarskog.se/dyn-search.html
The Chunky v Chatty Debate
Two Types of “Chunkiness”
Data Transferred per Call
Number of Calls
Network latency Important here
Stored Procedure Weigh In
Dynamic Design Patterns
Performance Slow Interpreted TSQL
Security Application Agility
Plan Cache Developer Agility
Maintainability The ORM Debate
Chunky Chatty
2. Clustered vs Heap
Best Practise: Cluster ALL Tables ?
Ever Increasing, Narrow, Unique, Static
Always use an Identity Column
When do Clustered Tables go Bad ?
Harder to Scale, especially for some key choices
Large Table Scan Workloads
Non sequential Clustering Keys Cause Fragmentation
Why is Fragmentation the Achilles Heel of table Scans
More Pages => More IO
Kills Read Ahead and disk performance
Heavy NCI Requirement
Best Practise “Clustered Index are Better for Seeks”
Well it depends on if the seek is on a NCI or not!
Clustered Index v Heap
Most Logging Tables
Ascending Keys Insert and Scan heavy Tables
Range Scans OLTP Transaction Tables (banking)
Lots of Deletes Bulk Loading
Tables with Heavy Primary
Seek
3 Always use Identity for Primary/CX Key
Best Practise
Always Use an Identity Column as Primary Key
Extension: Always add a new Surrogate Key
This may shoot you in the foot on large Fact Tables
The Distributed Database
Choice of Identity will cause a lot of pain!
How common is this Issue?
Very Frequent with replication and new MPP Architectures
The Distributed Write Cache ?
Identity creates a bottleneck on the DB
Serializes new records
What if database offline ?
The Over Zealous Dimensional Modeller
This may go bad if you are not a “single hop” data source
Customer DimCustomer
PK CustomerID PK CustomerKey
DimCustomer
CustomerName CustomerID
CustomerAddress PK CustomerKeyKeyKeyKey
CustomerName
CustomerAddress
CustomerKeyKeyKey
CustomerKeyKey
CustomerKey
CustomerID
CustomerName
CustomerAddress
4. We Don’t Need no Indexes?
Best Practise. Add Indexes..
To Reduce IO on important Queries
Seek rather than scan. SELECT * WHERE CustomerID=2
Narrower Scan. SELECT SUM(Qty)
DateKey Customer RegionKey Sales € Qty Cost
Jan 1 1 1000 100 80
Jan 2 1 1000 100 80
Jan 3 1 1000 100 80
Jan 4 1 1000 100 80
Jan 1 1 1000 100 80
Jan 2 1 1000 100 80
Feb 3 1 1000 100 80
Mar 1 1 1000 100 80
April 1 1 1000 100 80
April 1 1 1000 100 80
May 1 1 1000 100 80
June 1 1 1000 100 80
July 1 1 1000 100 80
Aug 1 1 1000 100 80
When Indexes go bad
OLTP
Small Tables
Larger Results – See “The Tipping Point” by Kimberly Tripp
When upsert is more important then select
When every column Indexed
High Throughput Queueing Design Patterns
DWH
Bad “Tipping Points”
Staging Tables
Tables that we scan
When avoiding bad statistics is very hard
Data Analytics
Where we need guaranteed query performance for varied workloads.
Guaranteed Performance !!?!
We have a 1TB Table. Query SLA is 5 mins… Add indexes?
5 Stop Worrying about Fragmentation
Best Practise – Defragment the hell out of your database
Why could this be bad ?
Takes a long time and may interfere with query performance
Why Could this be not worth the bother ?
More Memory will reduce reliance on contiguous disk blocks
Most SANs only do random IO anyway
Its mainly important if our primary concerns are Scans
6 Naming Conventions
Best Practice – use one!
Goes Bad When prefix is meta data (object type, data type, size)
Naming – Common Sense
Project with following prefix standards on SSIS
DATA Source
Transform Type (LOAD, TRANSFORM, EXTRACT)
Package
Control Flow
Shape
7 Partitioning
Best Practise – Partition when table it too big or too slow
Ordered Queries
Maintenance Operations Serial Queries
Parallel Queries Dynamic Parallel Queries
8 ORMs
Best Practise ? Hotly debated
Good For
Developer Agility
Code First, Database Second
Integrated Debugging
Domain Business Model
Cache Management
Key Management
Portable
Query Plan Nightmares
Source: http://www.scarydba.com/2014/12/19/pretty-plans-vs-performance/
When ORMs go Bad
Can write truly horrible TSQL and Plans
Naïve context
Parameterisation
The Disaster Scenario
Lazy/Eager Loading
Can be used as an excuse of lack of database expertise
Hard to Index for (lots of Select *)
Everything has good and bad Aspects
It Depends ;-)