0% found this document useful (0 votes)
2 views75 pages

Why SQL

Uploaded by

sunny297
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views75 pages

Why SQL

Uploaded by

sunny297
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Why SQL?

Because it’s the natural language for analysis

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


Who Am I
• Keith Laker
• Senior Principal Product Manager, Analytical SQL
• Oracle

• Twitter: @BigRedDW
• Facebook: facebook/BigRedDW
• Email: [Link]@[Link]

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 2


Why SQL - Agenda
1 Overview
2 Framework
3 Optimization
4 Evolution
5 Standards
6 Why Oracle SQL?
7 Summary
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 3
Overview

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


Overview
As a business your projects:
• NEEDS analytics
• Say NO to Data Silos
• MUST-HAVE cross-functional, cross-
data store analysis
• Analytic code has to be AGILE

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


There are so many languages out there….

Which language will you pick for your next data analysis project?
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 6
Why you need SQL …
It is the most Success is result of a four
unique factors:
successful, most
sophisticated 1. Powerful framework
language for data 2. Transparent optimization
analysis! 3. Continuous evolution
4. Standards based

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


Why you need SQL …
It is the most Success is result of a four
unique factors:
successful, most
But don’t [Link]
sophisticated take
Powerful framework
language for data
our word3. for it…
2. Transparent optimization
analysis! Continuous evolution
4. Standards based

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


New generation of tech companies realize value of SQL
"Airbnb actually teaches classes in SQL to
employees so everyone can learn to query the data
warehouses it maintains, and it has also created a
tool called Airpal to make it easier to design SQL
queries and dispatch them to the Presto layer of
the data warehouse. (This tool has also been open
sourced.)
Airpal was launched internally at Airbnb in the
spring of 2014, and within the first year, over a
third of all employees at the company had
launched an SQL query against the data
warehouse.”
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 9
New generation of tech companies realize value of SQL
Tried NoSQL but quickly moved back to SQL, here is
why:
1) Everybody in our company knows SQL. It’s not a
proprietary query language like Mongo’s. SQL
goes beyond the scope of the engineering
department into almost everywhere in the
organization.
2) From an organization scaling perspective, it’s
easier to hire new people who already know
SQL, than looking for people who used
[Link]
nosql/
MongoDB.

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 10


New generation of tech companies realize value of SQL
3) SQL has successfully been there for decades. Any
competing language would need a significant
amount of time to hash out all the conceptual
challenges which encompass the life cycle of
data applications.
MongoDB along with its NoSQL counterparts is
clearly not there yet.

[Link]
nosql/

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 11


New generation of tech companies realize value of SQL
4) The “no schema” concept of NoSQL is a smaller
advantage than we had originally expected. The
schema may quickly change at the product’s
early days, but later on it stabilizes. Not too hard
to take our JSON format and normalize it into
tables, columns, keys, etc.
5) So many databases support SQL/ODBC. If your
application is written on top of SQL/ODBC, then
it’s really not hard to switch between back-end
databases. This was exactly our problem in
[Link]
getting rid of MongoDB.
nosql/

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 12


New generation of tech companies realize value of SQL
• LinkedIn has created yet another SQL-
on-Hadoop product for is business
users
• Home-brew OLAP project is a new way
for their Hadoop users to query both
real-time and historical data
• Project Pinot is a narrow focus SQL
solution designed to support specific
requirements within LinkedIn
[Link]
[Link]

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 13


New generation of tech companies realize value of SQL
Fire up the SQL-Delorean, Google's
going back to the future
• Google built F1 (a new system to
handle data and analysis at scale)
partly because of the frustration its
engineers had found when dealing
with non-relational systems
• No encouraging developers to switch
to SQL "for low-latency OLTP queries,
large OLAP queries, and everything in
[Link]
/2013/08/30/google_f1_deepdive/ between

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 14


Part 1: FRAMEWORK
SQL provides a powerful, robust framework

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


FRAMEWORK
The four simple rules of data analysis
(set theory):

1. Projection
2. Filter
3. Join
4. Aggregate

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


PROJECTION – Specifying columns
Relational model organizes data into
rows and columns
Projections are specified in the first part
of a SQL query’s framework: the SELECT
clause

SELECT ename, job, hiredate


FROM emp;

columns are referred to as projections


Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
PROJECTIONS – Specifying columns
Oracle Database has comprehensive
metadata layer
Supports AUTOMATIC discovery of
columns

SELECT * FROM emp;

Oracle Database internally expands the


query statement to include all relevant
column names

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


EXTENDED PROJECTIONS – Adding new columns
New data columns can be created within
the result set using arithmetic operations

SELECT
ename,
sal,
comm,
comm/sal*100 AS COMM_RATE
FROM emp;

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


PROJECTIONS
• This combination of sophisticated
metadata and automatic
statement expansion is a core part
of SQL

• Missing in many data analysis


languages
– requires developers to add additional
bespoke code to cope with these
basic requirements
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
FILTERS – A way to specify rows
Selection of rows requires use of filters
that describe attributes associated with
row sets that are of interest

In SQL language row filtering is part of


the WHERE clause syntax
often called predicates

SQL supports a wide range of filtering


comparison operators
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
FILTERS – Specifying rows

SELECT
ename,
job,
hiredate
FROM emp
WHERE job=’CLERK’;

Limits set of rows returned to just those


associated with the job type ‘CLERK’ :

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


FILTERS – Comparison operators
Possible to test for equal to, not equal
to, greater than, greater than or equal
to, less than, less than or equal to, a
range between and including two values

SELECT
ename,
job,
hiredate
FROM emp
WHERE sal BETWEEN 10000
AND 25000;
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
FILTERS – String searches
Supports use of wildcard symbols or list
of given values
Can check for presence of null values
and either include or exclude

SELECT
ename,
job,
hiredate
FROM emp
WHERE job != ‘CLERK’;

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


FILTERS – Using multiple filters
WHERE clause can be extended using
the AND and OR operators

SELECT
ename,
job,
hiredate
FROM emp
WHERE sal BETWEEN 10000
AND 25000
AND job != ‘CLERK’;

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


FILTERS
• SQL language has a very rich
set of filtering techniques
• Simple to implement
• Simple to amend as
requirements evolve over time

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


JOINS – a way to combine data sets
A join is used to combine the fields
within two or more data sets in a single
query
Join is implemented using the ANSI
standard syntax in FROM

There are a number of ways that data


sets can be joined . . .

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


JOINS
There are a number of ways that data
sets can be joined:
• Inner - returns all rows where there is
at least one match in both tables
• Full Outer - returns all rows from both
tables
• Outer (- L/R) - returns all rows from one
table, and matched rows the other
table
• Cross - returns a Cartesian product of
source set
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
JOINS
Defining a SQL join is simple:

SELECT
[Link],
[Link],
[Link],
[Link]
FROM dept d
INNER JOIN
emp e
ON ([Link] = [Link]);

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


JOINS – What is missing?
SQL query DOES NOT describe actual
join process

In many procedural languages code is


COMPLICATED by need to explicitly
code each join structure along with join
algorithm

complexity of join process can


quickly escalate when other join
patterns need to be considered

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


JOINS – Managing NULL values
Using an inner join departments with
zero employees are not be returned

What if there are empty departments


containing zero employees?
Query needs to use an outer join to
return count of number of employees
in every department

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


JOINS – Managing NULL values
SELECT
[Link],
count([Link])
FROM dept d
LEFT OUTER JOIN
emp e
ON ([Link] =
[Link])
GROUP BY [Link];

SQL’s join code is easily readable – and


easily code-able
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
JOINS
SQL developer only specifies the
semantic join condition
Leaves processing details - such as the
order of joins – to SQL engine

Developers using procedural languages


must explicitly code each join structure
along with join algorithm – time
consuming and prone to error!

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE – summarizing data sets
Industry analysts often state that up to

90%
of all reports contain some level of
aggregate information.

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE
• Ability to simply, efficiently aggregate
data is a key requirement
• Types of aggregation applied to a data
set can vary from simple counts to
sums to moving averages to statistical
analysis such as standard deviations

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE – Procedural Languages
• Languages such as the Java based
MapReduce more than capable of
– Aggregating data
– Reducing data
– Providing simple summary result set.
• Approach is adequate for most simple
queries, i.e. basic counts
• More complex aggregations quickly
increases amount of code required

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE – SQL
• SQL has rich set of data aggregation
capabilities
– makes it easy to work on all rows in a set
• E.g. it is possible to sum all rows within
a single column as follows:

SELECT
SUM(sal) AS total_salary
FROM emp;

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE – SQL
• Easy to extend query to accommodate
new requirements such as a counts and
averages:

SELECT
COUNT(empno) AS no_of_employees,
SUM(sal) AS total_salary,
AVG(sal) As average_salary
FROM emp;

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE – SQL
• Using GROUP BY clause to group rows
into specific categories

SELECT
deptno,
COUNT(empno) AS no_of_employees,
SUM(sal) AS total_salary,
AVG(sal) AS average_salary
FROM emp
GROUP BY deptno;

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE – Extending the concept
• ANSI 2003 introduced concept of
analytic functions
• Functions divide a data set into groups
of rows called partitions
• Calculate aggregates for each partition
and for rows within each partition
• Additional keywords define the how
analytical functions will be evaluated
within each partition

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE – Extending the concept
SELECT
[Link], [Link],[Link],
[Link] AS sal,
SUM([Link]) OVER (ORDER BY [Link])) AS dept_sal,
ROUND(AVG([Link]) OVER (PARTITION BY [Link] ORDER BY [Link]))
AS moving_avg_sal,
ROUND(AVG([Link]) OVER (ORDER BY [Link])) AS avg_dept_sal
FROM dept d
LEFT OUTER JOIN
emp e ON ([Link] = [Link]);

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


AGGREGATE – Procedural Approach adds COMPLEXITY

[Link]
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
AGGREGATE
• SQL provides simple, convenient and
efficient data aggregation techniques
• Requires significantly less program code
compared to using other analytical
programming languages.
• Simplicity makes SQL easier and faster to
construct, manage, maintain and
incorporate new business requirements

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


Part 2: SIMPLIFIED
OPTIMIZATION
SQL provides a powerful, robust framework

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


OPTIMIZATION
• SQL is a declarative language.
• Developer describes “what” the
program, or query, should accomplish
• Does not describe the “how” part
• “how” part of processing is managed
transparently by the underlying
platform

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


OPTIMIZATION
• Procedural approach breaks down task
into a collection of structures
– variables, procedures, functions,
corresponding data structures and
conditional operators
• Tells computer what to do, step-by-step

• Many of today’s big data languages


adopt this procedural approach

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


OPTIMIZATION – Declarative vs. Procedural
SQL’s Declarative Approach Java’s Procedural Approach *
public class Employee implements Comparable<Employee>{
private int empno;
private String ename;
private int hiredate;

public Employee(int empno, String name, int hiredate) {

SELECT }
[Link]=empno;
[Link]=ename;
[Link]=hiredate;

empno, // Sort in descending order by Employee name. If name is same then sort in descending order by age.
// If age is same then sort in descending order by Empid

public int compareTo(Employee o){


if ([Link]([Link]))

ename, if ([Link] == 0)
return [Link] - [Link];
else return [Link];
else return [Link]([Link]);
}

hiredate public int getHiredate(){

}
return [Link];

public int getEmpNo(){


return [Link];

FROM emp }
public String getEname(){

}
return [Link];

ORDER BY empno, hiredate, public class TestEmployeeSort {

public static void main(String[] args) {

ename List coll = [Link]();


[Link](coll); // sort method
printList(coll);
[Link]("---------------------");
[Link](coll,new SortEmployeeByName());
printList(coll);
}

private static void printList(List<Employee> list) {


[Link]("EmpId\tName\tAge");
for (Employee e: list) {
[Link]([Link]() + "\t" + [Link]() + "\t" + [Link]());
}
}
}

public class SortEmployeeByName implements Comparator<Employee>{


public int compare(Employee o1, Employee o2){
return [Link]().compareTo([Link]());
}
}

* Example of how to sort a list in Java using Comparator Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 47
OPTIMIZATION
• Reasons why SQL is becoming go-to
language for big data analysis:

1. Declarative creates code that is


• More compact,
• Less complex
• Easier to maintain and enhance
2. Code does not contain instructions
on how to process the data
3. Code only expresses end result

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


OPTIMIZATION
Oracle query optimizer is at core of
Oracle’s database
• Determines most efficient method for
statement
• Attempts to generate best execution
plan for a given statement
– Uses many internal statistics, tools within
database to generate best plan

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


OPTIMIZATION
• Performs optimizations at source,
where data is located
• Shares tuning improvements across:
– Front-end systems (BI reports, dashboards,
applications etc.)
– Back-end systems (ETL, data quality tools
etc.)
• Possible to optimize for each query and
optimize for overall workload across
entire system

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


OPTIMIZATION
• Best Plan: optimizer examines multiple
access method:
– full table scan vs. index scans
– different join methods such as nested loops
and hash joins

• Optimizer is in better position than


developer to determine best method
for executing a statement!

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


Part 3: CONTINUOUS
EVOLUTION
Oracle has a long history of embedding sophisticated SQL-based analytics
within the Oracle Database.

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


SQL is continually evovling • Pattern matching
• Top N clause
• Statistical functions • Approx Count Distinct
• SQL model clause • Approx queries
• Introduction of • Partition Outer Join • JSON support
“window” functions • Data mining

8i 9i 10g 11g 12c

• Enhanced window functions • SQL Pivot


(percentile, etc.) • Recursive WITH
• Rollup, grouping sets, cube • ListAgg, Nth value window

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 53


CONTINUOUS EVOLUTION
Performance Results for
approximate count queries vs. count
distinct queries:
Real world customer workloads

5-50x improvement
• Notes:
– this approach does not use sampling, it uses a hash-based approach
– ignores rows that contain a null value for specified expression
– Supports any scalar data type other than BFILE, BLOB, CLOB, LONG,
LONG RAW, or NCLOB

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 54


CONTINUOUS EVOLUTION
Performance Results for
approximate statistical queries:

Using TPC-H schema and workload

6-13x improvement
• Note that major savings coming from:
– Use of bounded memory regardless of the input size
per group by key
– Reduction in chance of spill to disk

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 55


CONTINUOUS EVOLUTION
SQL has been extended to reach out
across an ever wider range of data
sources:

– XML documents
– JSON documents
– Hive tables
– HDFS files
– spatial objects
– image-style objects (BLOBs and CLOBs)
– semantic networks
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 56
Part 4:
STANDARDS BASED
industry-standard SQL minimizes the risks of those processes becoming
obsolete and maximizes re-use and sharing

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


STANDARDIZATION – languages need to be bound by rules
• Many big data languages are not bound by
industry standards
• Creates challenges when integrating queries
across multiple data reservoirs
• Languages support own syntax, analytical
functions and data types
• Lack of consistency typically results in large-
scale data shipping between platforms
• Lack of consistent naming, syntax and
processing conventions

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


STANDARDIZATION

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


STANDARDIZATION - SQL-92 standard
• Some of big data languages have
attempted to become more SQL-
like
• Many languages have adopted
some or all of the SQL 92 standard
• Not providing level of portability
that projects need

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


STANDARDIZATION – SQL-92 standard
• Hive is being displaced by newer SQL
implementations, such as Impala
– Both claim to adhere to earlier versions
of ANSI standards
• BUT Impala does not support many
of most common HiveQL features
– aggregate functions, lateral views,
multiple DISTINCT clauses per query etc.

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


STANDARDIZATION – Lack of consistent standards
• Extremely difficult to move code
from one project to another
• Forced to invest significant
resources in recoding application
logic.
• Lack of adherence to standards
increases overall level of risk on a
project

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


STANDARDIZATION – Benefits of consistent standards
• Implementing industry-standard
SQL provides significant benefits
• Industry-standard SQL maximizes
the re-use and sharing of analytic
processes
• Minimizes risks of code/processes
becoming obsolete

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


STANDARDIZATION – Benefits of consistent standards
• Provides application portability
across different database systems
without major code changes
• Tools are able to effectively support
multiple types of SQL databases in
a straightforward manner
• SQL standard has ensured
continuity in application
development

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


STANDARDIZATION – Benefits of consistent standards
• A SQL statement written 30 years
ago continues to run today
– Without modification to the SQL code

• What is the key difference in


today’s SQL statement?
– Code now executes much faster, since
database transparently optimizes
statement to take advantage of latest
performance enhancements

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


Part 5:
WHY ORACLE SQL?
Oracle is continuously working with customers and partners to assist them
in exploiting its expanding library of analytic functions

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 66


Why Oracle SQL?
Oracle SQL provides 3 key benefits
1. Allows project teams to work with a
single, rich analytical language
2. Reduces overall complexity, avoids the
creation of data silos
3. Makes it easier for IT teams to
incorporate enhancements and new
requirements from business teams

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


Oracle SQL is standards based
• Oracle’s analytical SQL fully supports the
ANSI SQL 2011
– Continuous evolution has delivered SQL
pattern-matching and approximate processing
as part of Database 12c
– Being incorporated into the next iteration of
the standard.
• Broad support, rapid adoption of newly
introduced functionality across apps and
tools
– across partner network and other independent
software vendors
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Oracle SQL is integrated into Engineered Systems
• Exadata platform delivered huge
performance gains for existing SQL code
– running it unchanged, even code written in the
1980’s
• Storage level performance optimizations
are completely transparent to SQL code
– allows every application, every SQL statement
to run unchanged and reap benefits of running
on an engineered and optimized platform

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |


Oracle SQL inherits new optimizations
• In-memory processing transparently
improves the performance of SQL queries
– Management and lifecycle of in-memory data
structures, use of different types of
compression depending on type of operations
(OLTP-centric vs. Analytics-centric) are
transparent to queries

• Developers, DBAs not forced to rewrite


application code to leverage this feature
– As with Exadata, all existing queries can
automatically inherit the benefits
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
SUMMARY
Bringing it all together

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 71


SQL is the most highly evolved language for analytics
Will you lot please stop
following me!

Image courtesy of Highly Scalable Blog: NOSQL DATA MODELING TECHNIQUES

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 72


Looking ahead…SQL will always be #1
The tech world is turning back
toward SQL bringing to a close a
possibly misspent half-decade in
which startups courted developers
with promises of infinite scalability
and the finest imitation-Google tools
available, and companies found
themselves exposed to unstable data
[Link]
and poor guarantees
/2013/08/30/google_f1_deepdive/

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 73


Summary
• The flexibility and power of SQL make it a vital tool for all data analysis
projects
• Growing number of IT teams are using SQL as go-to language for analysis:
– It provides a powerful framework
– Processing optimizations are completely transparent
– It continuously evolves to meet today’s demanding requirements
– Adherence to international ANSI standards

• SQL provides business users + big data developers with a simplified way to
support most complex data discovery and business intelligence reporting
requirements
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

You might also like