SQL The Complete Reference, 3rd Edition

  • 70 278 6
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

SQL The Complete Reference, Third Edition

This page intentionally left blank

SQL The Complete Reference, Third Edition Paul Weinberg James Groff Andrew Oppel

New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto

Copyright © 2010 by The McGraw-Hill Companies. All rights reserved. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher. ISBN: 978-0-07-159256-7 MHID: 0-07-159256-3 The material in this eBook also appears in the print version of this title: ISBN: 978-0-07-159255-0, MHID: 0-07-159255-5. All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps. McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. To contact a representative please e-mail us at [email protected]. Information has been obtained by McGraw-Hill from sources believed to be reliable. However, because of the possibility of human or mechanical error by our sources, McGraw-Hill, or others, McGraw-Hill does not guarantee the accuracy, adequacy, or completeness of any information and is not responsible for any errors or omissions or the results obtained from the use of such information. TERMS OF USE This is a copyrighted work and The McGraw-Hill Companies, Inc. (“McGraw-Hill”) and its licensors reserve all rights in and to the work. Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hill’s prior consent. You may use the work for your own noncommercial and personal use; any other use of the work is strictly prohibited. Your right to use the work may be terminated if you fail to comply with these terms. THE WORK IS PROVIDED “AS IS.” McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be uninterrupted or error free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages resulting therefrom. McGraw-Hill has no responsibility for the content of any information accessed through the work. Under no circumstances shall McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even if any of them has been advised of the possibility of such damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise.

About the Authors

James R. Groff is CEO of PBworks, whose hosted collaboration software helps teams of people work together more effectively and efficiently. Earlier, Groff was CEO of TimesTen, the leading provider of in-memory SQL databases. He led TimesTen from its early days through eight years of growth and a successful acquisition by Oracle in 2005, where he served as a senior vice president, and Oracle TimesTen became Oracle’s flagship real-time database product. Groff was the cofounder, with Paul Weinberg, of Network Innovations Corporation, a developer of SQL-based networking software, and coauthor with him of Understanding UNIX: A Conceptual Guide as well as this book. Groff has also held senior division management and marketing positions at Apple Computer and Hewlett-Packard. He holds a BS in Mathematics from the Massachusetts Institute of Technology and an MBA from Harvard University. Paul N. Weinberg is a senior vice president at SAP, where he runs core MDM (Master Data Management) development. Prior to working at SAP, Weinberg was president of A2i, Inc., which was acquired by SAP in 2004 for its enterprisewide platform for product content management and catalog publishing. Weinberg was the cofounder, with James Groff, of Network Innovations Corporation, a pioneer in client/server database access that was acquired by Apple Computer in 1988, and coauthor with him of Understanding UNIX: A Conceptual Guide as well as this book. He has also held software development and marketing positions at Bell Laboratories, Hewlett-Packard, and Plexus Computers. In 1981, he collaborated on The Simple Solution to Rubik’s Cube, the number-one best-selling book of that year, with over 6 million copies sold. He holds a BS from the University of Michigan and an MS from Stanford University, both in Computer Science. Andrew J. (Andy) Oppel is lead data modeler at Blue Shield of California. In addition, he has served as a part-time instructor in database technology with the University of California at Berkeley, Extension for more than 20 years. Andy has designed and implemented hundreds of databases for a wide range of applications, including heath care, banking, insurance, apparel manufacturing, telecommunications, wireless communications, and human resources. He is the author of Databases Demystified, SQL Demystified, and Databases: A Beginner’s Guide and is coauthor of SQL: A Beginner’s Guide. He holds a BA in Computer Science from Transylvania University (Lexington, KY).

About the Technical Editor Aaron Davenport has been working with SQL-based RDBMS technologies for over ten years. He is currently a principal at LCS Technologies, Inc., a Sacramento and San Francisco Bay Area database consulting firm specializing in performance tuning, application development, and database architecture. Prior to joining LCS, Aaron had tenures at Yahoo!, Gap Inc., and Blue Shield of California.

This page intentionally left blank

Contents at a Glance Part I An Overview of SQL 1

Introduction

.....................................................

2

A Quick Tour of SQL

3

SQL in Perspective

4

Relational Databases

3

.............................................

13

...............................................

21

.............................................

45

......................................................

63

Part II Retrieving Data 5

SQL Basics

6

Simple Queries

7

Multitable Queries (Joins)

8

Summary Queries

9

Subqueries and Query Expressions

..................................................

85

.........................................

119

................................................

163

.................................

187

................................................

231

....................................................

247

Part III Updating Data 10

Database Updates

11

Data Integrity

12

Transaction Processing

............................................

281

..............................................

315

...........................................................

355

Part IV Database Structure 13

Creating a Database

14

Views

15

SQL Security

16

The System Catalog

..................................................... ..............................................

375 399

vii

viii

SQL: The Complete Reference

Part V Programming with SQL 17

Embedded SQL

..................................................

429

18

Dynamic SQL*

...................................................

477

19

SQL APIs

........................................................

521

Part VI SQL Today and Tomorrow 20

Database Processing and Stored Procedural SQL

.....................

617

21

SQL and Data Warehousing

.......................................

667

22

SQL and Application Servers

......................................

681

23

SQL Networking and Distributed Databases

24

SQL and Objects

25

SQL and XML

26

Specialty Databases

27

The Future of SQL

........................

699

.................................................

735

....................................................

769

..............................................

805

................................................

819

Part VII Appendixes A

The Sample Database

.............................................

835

B

DBMS Vendor Profiles

............................................

841

C

SQL Syntax Reference

............................................

857

...........................................................

865

Index

Contents

Contents Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

Part I An Overview of SQL 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SQL Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Role of SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Success Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vendor Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Portability Across Computer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . Official SQL Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Early IBM Commitment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Microsoft Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relational Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . High-Level, English-Like Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interactive, Ad Hoc Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programmatic Database Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Views of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complete Database Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Data Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Client/Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enterprise Application Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensibility and Object Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . Internet Database Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Java Integration (JDBC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Open Source Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Industry Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4 6 7 8 8 9 9 9 9 10 10 10 10 10 10 11 11 11 11 12 12 12

2

A Quick Tour of SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Simple Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Retrieving Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summarizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding Data to the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deleting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Updating the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Protecting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 13 14 16 17 18 18 18 19 20

ix

ix

x

SQL: The Complete Reference

3

SQL in Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL and the Evolution of Database Management . . . . . . . . . . . . . . . . . . . . . . A Brief History of SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Early Years . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Early Relational Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Commercial Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The ANSI/ISO Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Early SQL Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODBC and the SQL Access Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JDBC and Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL and Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL and Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Centralized Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Client/Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multitier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Proliferation of SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL on Mainframes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL on Minicomputers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL on UNIX-Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL on Personal Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL and Transaction Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL and Workgroup Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL, Data Warehousing, and Business Intelligence . . . . . . . . . . . . . . SQL and Internet Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21 21 22 22 22 24 25 26 26 29 29 30 30 32 32 33 34 35 36 36 36 37 37 38 39 40 42 43

4

Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Early Data Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File Management Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hierarchical Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Network Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Relational Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Sample Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Primary Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Foreign Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Codd’s 12 Rules for Relational Databases* . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45 45 45 47 48 50 51 52 53 55 56 57 59

Contents

Part II Retrieving Data 5

SQL Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Column Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numeric Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . String Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Date and Time Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symbolic Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Built-In Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missing Data (NULL Values) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63 63 70 70 71 72 77 77 78 78 79 80 80 82 83

6

Simple Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SELECT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SELECT Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The FROM Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Query Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculated Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selecting All Columns (SELECT *) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Duplicate Rows (DISTINCT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Row Selection (WHERE Clause) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Comparison Test (=, , =) . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Range Test (BETWEEN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Set Membership Test (IN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Pattern Matching Test (LIKE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Null Value Test (IS NULL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compound Search Conditions (AND, OR, and NOT) . . . . . . . . . . . . Sorting Query Results (ORDER BY Clause) . . . . . . . . . . . . . . . . . . . . . . . . . . . Rules for Single-Table Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Combining Query Results (UNION)* . . . . . . . . . . . . . . . . . . . . . . . . . . Unions and Duplicate Rows* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unions and Sorting* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple UNIONs* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85 85 87 88 88 90 91 93 94 95 97 97 100 102 104 106 107 110 112 113 115 116 117 118

xi

xii

SQL: The Complete Reference

7

Multitable Queries (Joins) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Two-Table Query Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple Joins (Equi-Joins) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parent/Child Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An Alternative Way to Specify Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . Joins with Row Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Matching Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natural Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Queries with Three or More Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Equi-Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-Equi-Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Considerations for Multitable Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . Qualified Column Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . All-Column Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Self-Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multitable Query Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Structure of a Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rules for Multitable Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . Outer Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Left and Right Outer Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Older Outer Join Notation* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joins and the SQL Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inner Joins in Standard SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outer Joins in Standard SQL* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross Joins in Standard SQL* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multitable Joins in Standard SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

119 119 121 123 125 126 127 128 129 131 134 134 135 136 137 139 141 142 142 143 144 148 151 153 153 154 155 157 162

8

Summary Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Column Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing a Column Total (SUM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing a Column Average (AVG) . . . . . . . . . . . . . . . . . . . . . . . . . Finding Extreme Values (MIN and MAX) . . . . . . . . . . . . . . . . . . . . . . Counting Data Values (COUNT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Column Functions in the Select List . . . . . . . . . . . . . . . . . . . . . . . . . . . NULL Values and Column Functions . . . . . . . . . . . . . . . . . . . . . . . . . . Duplicate Row Elimination (DISTINCT) . . . . . . . . . . . . . . . . . . . . . . . Grouped Queries (GROUP BY Clause) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiple Grouping Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Restrictions on Grouped Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NULL Values in Grouping Columns . . . . . . . . . . . . . . . . . . . . . . . . . . .

163 163 165 166 166 168 169 171 173 173 176 179 181

Contents

9

Group Search Conditions (HAVING Clause) . . . . . . . . . . . . . . . . . . . . . . . . . . Restrictions on Group Search Conditions . . . . . . . . . . . . . . . . . . . . . . . NULL Values and Group Search Conditions . . . . . . . . . . . . . . . . . . . . HAVING Without GROUP BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

182 185 186 186 186

Subqueries and Query Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Subquery? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subqueries in the WHERE Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outer References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subquery Search Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Subquery Comparison Test (=, , =) . . . . . . . . . . . . . The Set Membership Test (IN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Existence Test (EXISTS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantified Tests (ANY and ALL)* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subqueries and Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nested Subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlated Subqueries* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subqueries in the HAVING Clause* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subquery Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advanced Queries* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scalar-Valued Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Row-Valued Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table-Valued Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Query Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Queries: A Final Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

187 187 188 189 191 192 192 194 196 198 203 204 205 208 209 211 213 218 221 224 227

Part III Updating Data 10

Database Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding Data to the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Single-Row INSERT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Multirow INSERT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bulk Load Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deleting Data from the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The DELETE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deleting All Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DELETE with Subquery* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifying Data in the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The UPDATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Updating All Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . UPDATE with Subquery* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

231 231 232 235 238 238 239 240 241 242 243 245 245 246

xiii

xiv

SQL: The Complete Reference

11

Data Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Data Integrity? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Required Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple Validity Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Column Check Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entity Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Uniqueness Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uniqueness and NULL Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Referential Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Referential Integrity Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delete and Update Rules* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cascaded Deletes and Updates* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Referential Cycles* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Foreign Keys and NULL Values* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advanced Constraint Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Constraint Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deferred Constraint Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Business Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Trigger? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Triggers and Referential Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trigger Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . Triggers and the SQL Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

247 248 249 250 251 251 253 253 254 255 256 258 262 262 267 269 270 270 271 274 275 277 277 278 279

12

Transaction Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Transaction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The ANSI/ISO SQL Transaction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The START TRANSACTION and SET TRANSACTION Statements . . . The SAVEPOINT and RELEASE SAVEPOINT Statements . . . . . . . . The COMMIT and ROLLBACK Statements . . . . . . . . . . . . . . . . . . . . . Transactions: Behind the Scenes* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transactions and Multiuser Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Lost Update Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Uncommitted Data Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Inconsistent Data Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Phantom Insert Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concurrent Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locking* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Locking Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shared and Exclusive Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deadlocks* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advanced Locking Techniques* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

281 282 284 284 286 286 289 290 291 292 293 294 296 297 298 300 300 303

Contents

Versioning* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Versioning in Operation* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Versioning Advantages and Disadvantages* . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

307 308 311 311

Part IV Database Structure 13

Creating a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Data Definition Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Table (CREATE TABLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . Removing a Table (DROP TABLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changing a Table Definition (ALTER TABLE) . . . . . . . . . . . . . . . . . . . Constraint Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aliases and Synonyms (CREATE/DROP ALIAS) . . . . . . . . . . . . . . . . . . . . . . Indexes (CREATE/DROP INDEX) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Managing Other Database Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-Database Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multidatabase Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multilocation Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Databases on Multiple Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Structure and the ANSI/ISO Standard . . . . . . . . . . . . . . . . . . . . . . Catalogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

315 315 317 318 318 327 328 332 332 333 333 335 339 342 343 344 346 348 348 350 351 354

14

View

355 355 357 357 358 358 359 361 361 363 364 366 367 368 368

............................................................ What Is a View? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How the DBMS Handles Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantages of Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disadvantages of Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a View (CREATE VIEW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Horizontal Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vertical Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Row/Column Subset Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Grouped Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joined Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Updating a View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . View Updates and the ANSI/ISO Standard . . . . . . . . . . . . . . . . . . . . . View Updates in Commercial SQL Products . . . . . . . . . . . . . . . . . . . . Checking View Updates (CHECK OPTION) . . . . . . . . . . . . . . . . . . . .

xv

xvi

SQL: The Complete Reference

Dropping a View (DROP VIEW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Materialized Views* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

371 372 374

15

SQL Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Security Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User-Ids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Security Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Views and SQL Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Granting Privileges (GRANT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Column Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Passing Privileges (GRANT OPTION) . . . . . . . . . . . . . . . . . . . . . . . . . Revoking Privileges (REVOKE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REVOKE and the GRANT OPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . REVOKE and the ANSI/ISO Standard . . . . . . . . . . . . . . . . . . . . . . . . . Role-Based Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

375 376 376 381 381 384 386 388 389 391 393 394 396 398

16

The System Catalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is the System Catalog? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Catalog and Query Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Catalog and the ANSI/ISO Standard . . . . . . . . . . . . . . . . . . . . . . Catalog Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Column Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . View Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Relationship Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Privileges Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SQL Information Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Catalog Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

399 399 400 401 401 403 407 410 412 413 415 417 418 425 426

Part V Programming with SQL 17

Embedded SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programmatic SQL Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DBMS Statement Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Embedded SQL Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Developing an Embedded SQL Program . . . . . . . . . . . . . . . . . . . . . . . Running an Embedded SQL Program . . . . . . . . . . . . . . . . . . . . . . . . . .

429 429 431 433 434 437

Contents

Simple Embedded SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Declaring Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Host Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Retrieval in Embedded SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-Row Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multirow Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cursor-Based Deletes and Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cursors and Transaction Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

439 441 443 451 457 457 464 470 475 476

18

Dynamic SQL* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations of Static SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic SQL Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Statement Execution (EXECUTE IMMEDIATE) . . . . . . . . . . . . . . . Two-Step Dynamic Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The PREPARE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The EXECUTE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The DESCRIBE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The DECLARE CURSOR Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . The Dynamic OPEN Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Dynamic FETCH Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Dynamic CLOSE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic SQL Dialects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic SQL in Oracle* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic SQL and the SQL Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic Dynamic SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Standard SQLDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SQL Standard and Dynamic SQL Queries . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

477 477 479 480 483 485 486 493 495 500 500 503 504 504 504 508 508 510 515 518

19

SQL APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . API Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The dblib API (SQL Server) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic SQL Server Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Server Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Positioned Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODBC and the SQL/CLI Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Call-Level Interface Standardization . . . . . . . . . . . . . . . . . . . . . . . CLI Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CLI Statement Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CLI Errors and Diagnostic Information . . . . . . . . . . . . . . . . . . . . . . . . CLI Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CLI Information Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

521 522 523 524 532 539 540 549 549 552 557 575 577 577

xvii

xviii

SQL: The Complete Reference

The ODBC API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Structure of ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODBC and DBMS Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODBC Catalog Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extended ODBC Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Oracle Call Interface (OCI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OCI Handles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oracle Server Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statement Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Query Results Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptor Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transaction Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Java Database Connectivity (JDBC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JDBC History and Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JDBC Implementations and Driver Types . . . . . . . . . . . . . . . . . . . . . . The JDBC API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

579 580 581 581 582 586 586 588 589 590 590 590 592 592 593 598 614

Part VI SQL Today and Tomorrow 20

Database Processing and Stored Procedural SQL . . . . . . . . . . . . . . . . . . . . . Procedural SQL Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Basic Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Stored Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calling a Stored Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stored Procedure Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statement Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Returning Values via Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conditional Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Repeated Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Flow-of-Control Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cursor-Based Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Handling Error Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantages of Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stored Procedure Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . System-Defined Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . External Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantages and Disadvantages of Triggers . . . . . . . . . . . . . . . . . . . . . Triggers in Transact-SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Triggers in Informix SPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Triggers in Oracle PL/SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Trigger Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

617 618 620 621 622 624 625 627 630 631 634 636 638 639 643 645 646 647 647 648 649 649 651 653 655

Contents

Stored Procedures, Functions, Triggers, and the SQL Standard . . . . . . . . . . The SQL/PSM Stored Procedures Standard . . . . . . . . . . . . . . . . . . . . The SQL/PSM Triggers Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

655 656 664 666

21

SQL and Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Warehousing Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Components of a Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Evolution of Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Architecture for Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fact Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Star Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multilevel Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Extensions for Data Warehousing . . . . . . . . . . . . . . . . . . . . . . . . . Warehouse Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Load Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Query Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

667 668 669 670 671 672 673 675 676 678 678 679 680

22

SQL and Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL and Web Sites: Early Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Servers and Three-Tier Web Site Architectures . . . . . . . . . . . . . Database Access from Application Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . EJB Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session Bean Database Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entity Bean Database Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EJB 2.0 Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EJB 3.0 Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Open Source Application Development . . . . . . . . . . . . . . . . . . . . . . . . Application Server Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

681 681 682 684 685 686 689 692 693 695 695 698

23

SQL Networking and Distributed Databases . . . . . . . . . . . . . . . . . . . . . . . . The Challenge of Distributed Data Management . . . . . . . . . . . . . . . . . . . . . . Distributing Data: Practical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remote Database Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remote Data Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Extracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Updateable Replicas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Replication Trade-Offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Typical Replication Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed Database Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remote Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remote Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

699 700 704 705 708 709 711 713 715 715 719 720 721 722 722

xix

xx

SQL: The Complete Reference

The Two-Phase Commit Protocol* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Network Applications and Database Architecture . . . . . . . . . . . . . . . . . . . . . Client/Server Applications and Database Architecture . . . . . . . . . . . Client/Server Applications with Stored Procedures . . . . . . . . . . . . . Enterprise Applications and Data Caching . . . . . . . . . . . . . . . . . . . . . High-Volume Internet Data Management . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

724 727 728 729 730 731 733

24

SQL and Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object-Oriented Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object-Oriented Database Characteristics . . . . . . . . . . . . . . . . . . . . . . Pros and Cons of Object-Oriented Databases . . . . . . . . . . . . . . . . . . . Objects and the Database Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object-Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Large Object Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LOBs in the Relational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specialized LOB Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abstract (Structured) Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining Abstract Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manipulating Abstract Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table Inheritance: Implementing Object Classes . . . . . . . . . . . . . . . . . Sets, Arrays, and Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Querying Collection Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manipulating Collection Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Collections and Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . User-Defined Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Methods and Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object Support in the SQL Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

735 735 736 737 738 739 740 740 742 744 746 748 749 751 754 755 758 759 760 762 763 766 767

25

SQL and XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is XML? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML for Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML and SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elements vs. Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using XML with Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML Data Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML Storage and Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

769 769 771 773 774 775 777 778 782 784 784

Contents

XML and Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Document Type Definitions (DTDs) . . . . . . . . . . . . . . . . . . . . . . . . . . . XML Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML and Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XQuery Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Query Processing in XQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XML Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

788 790 791 797 798 800 802 803

26

Specialty Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Very Low Latency and In-Memory Databases . . . . . . . . . . . . . . . . . . . . . . . . . Anatomy of an In-Memory Database . . . . . . . . . . . . . . . . . . . . . . . . . . In-Memory Database Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Caching with In-Memory Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Event-Processing and Stream Databases . . . . . . . . . . . . . . . . . . . . . Continuous Queries in Stream Databases . . . . . . . . . . . . . . . . . . . . . . Stream Database Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stream Database Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Embedded Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Embedded Database Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . Embedded Database Implementations . . . . . . . . . . . . . . . . . . . . . . . . . Mobile Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mobile Database Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mobile Database Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

805 805 806 808 808 810 811 812 813 814 815 815 816 816 817 818

27

The Future of SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Market Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enterprise Database Market Maturity . . . . . . . . . . . . . . . . . . . . . . . . . . Market Diversity and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . Packaged Enterprise Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software-as-a-Service (SaaS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hardware Performance Gains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Server Appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SQL in the Next Decade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massive Data Warehousing for Business Optimization . . . . . . . . . . . Ultrahigh-Performance Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Internet and Network Services Integration . . . . . . . . . . . . . . . . . . . . . Embedded Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cloud-Based and Horizontally Scalable Databases . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

819 820 820 821 822 823 823 824 825 826 826 826 827 828 829 829 830 832

xxi

xxii

SQL: The Complete Reference

Part VII Appendixes A

The Sample Database

.............................................

835

B

DBMS Vendor Profiles

............................................

841

C

SQL Syntax Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Definition Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Access Control Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic Data Manipulation Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transaction-Processing Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cursor-Based Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Query Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statement Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

857 858 859 859 860 860 860 862 863 863 864

Index

865

...........................................................

Introduction

Acknowledgments

S

pecial thanks to Andy Oppel, our new coauthor for this third edition of SQL: The Complete Reference. His impressive high-level mastery of the subject matter coupled with his meticulous attention to detail made this a better book, and we are fortunate to have had his involvement. —Jim and Paul

It’s an honor to join such an accomplished team of authors for this edition of SQL: The Complete Reference. My thanks for the excellent support of the entire McGraw-Hill team for their tireless support in this effort. In particular I wish to thank technical editor Aaron Davenport and copy editor Jan Jue for their persistence and attention to detail, which contributed so much to the overall quality of this book. —Andy

xxiii

xxiii

This page intentionally left blank

Introduction

S

QL: The Complete Reference, Third Edition provides a comprehensive, in-depth treatment of the SQL language for both technical and nontechnical users, programmers, data processing professionals, and managers who want to understand the impact of SQL in today’s computer industry. This book offers a conceptual framework for understanding and using SQL, describes the history of SQL and SQL standards, and explains the role of SQL in various computer industry segments, from enterprise data processing to data warehousing to web site architectures. This new edition contains new chapters specially focused on the role of SQL in application server architectures, and the integration of SQL with XML and other object-based technologies. This book will show you, step-by-step, how to use SQL features, with many illustrations and realistic examples to clarify SQL concepts. The book also compares SQL products from leading DBMS vendors—describing their advantages, benefits, and trade-offs—to help you select the right product for your application. Most of the examples in this book are based on the sample database described in Appendix A. The sample database contains data that supports a simple order-processing application for a small distribution company. Appendix A also contains instructions for downloading the SQL statements required to create and populate the sample database tables in a DBMS of you choice, such as Oracle, SQL Server, MySQL, and DB2. This allows you to try the examples in the book yourself and gain actual experience writing and running SQL statements. In some of the chapters, the subject matter is explored at two different levels—a fundamental description of the topic, and an advanced discussion intended for computer professionals who need to understand some of the internals behind SQL. The more advanced information is covered in sections marked with an asterisk (*). You do not need to read these sections to obtain an understanding of what SQL is and what it does.

xxv

xxvi

SQL: The Complete Reference

How This Book Is Organized The book is divided into six parts that cover various aspects of the SQL language: • Part I, “An Overview of SQL,” provides an introduction to SQL and a market perspective of its role as a database language. Its four chapters describe the history of SQL, the evolution of SQL standards, and how SQL relates to the relational data model and to earlier database technologies. Part I also contains a quick tour of SQL that briefly illustrates its most important features and provides you with an overview of the entire language early in the book. • Part II, “Retrieving Data,” describes the features of SQL that allow you to perform database queries. The first chapter in this part describes the basic structure of the SQL language. The next four chapters start with the simplest SQL queries and progressively build to more complex queries, including multitable queries, summary queries, and queries that use subqueries. • Part III, “Updating Data,” shows how you can use SQL to add new data to a database, delete data from a database, and modify existing database data. It also describes the database integrity issues that arise when data is updated, and how SQL addresses these issues. The last of the three chapters in this part discusses the SQL transaction concept and SQL support for multiuser transaction processing. • Part IV, “Database Structure,” deals with creating and administering a SQL-based database. Its four chapters tell you how to create the tables, views, and indexes that form the structure of a relational database. It also describes the SQL security scheme that prevents unauthorized access to data, and the SQL system catalog that describes the structure of a database. This part also discusses the significant differences between the database structures supported by various SQL-based DBMS products. • Part V, “Programming with SQL,” describes how application programs use SQL for database access. It discusses the embedded SQL specified by the ANSI standard and used by IBM, Oracle, Ingres, Informix, and many other SQL-based DBMS products. It also describes the dynamic SQL interface that is used to build general-purpose database tables, such as report writers and database browsing programs. Finally, this part describes the popular SQL APIs, including ODBC, the ISO-standard CallLevel Interface, and JDBC, the standard call-level interface for Java, as well as proprietary call-level interfaces such as Oracle’s OCI API. • Part VI, “SQL Today and Tomorrow,” examines the use of SQL in several of today’s “hottest” application areas, and the current state of SQL-based DBMS products. Two chapters describe the use of SQL stored procedures and triggers for online transaction processing, and the contrasting use of SQL for data warehousing. Four additional chapters describe SQL-based distributed databases, the influence of object technologies on SQL, specialty databases, and the integration of SQL with XML technologies. Finally, the last chapter explores the future of SQL and some of the most important trends in SQL-based data management.

Introduction

Conventions Used in This Book SQL: The Complete Reference, Third Edition describes the SQL features and functions available in the most popular SQL-based DBMS products and those described in the ANSI/ISO SQL standards. Whenever possible, the SQL statement syntax described in this book and used in the examples applies to all dialects of SQL. When the dialects differ, the differences are pointed out in the text, and the examples follow the most common practice. In these cases, you may have to modify the SQL statements in the examples slightly to suit your particular brand of DBMS. Throughout the book, technical terms appear in italics the first time they are used and defined. SQL language elements, including SQL keywords, table and column names, and sample SQL statements, appear in an UPPERCASE MONOSPACE font. SQL API function names appear in a lowercase monospace font. Program listings also appear in monospace font and use the normal case conventions for the particular programming language (uppercase for COBOL and FORTRAN, lowercase for C and Java). Note that these conventions are used solely to improve readability; most SQL implementations will accept either uppercase or lowercase statements. Many of the SQL examples include query results, which appear immediately following the SQL statement, as they would in an interactive SQL session. In some cases, long query results are truncated after a few rows; this is indicated by a vertical ellipsis (…) following the last row of query results.

Why This Book Is for You SQL: The Complete Reference, Third Edition is the right book for anyone who wants to understand and learn SQL, including database users, data processing professionals and architects, programmers, students, and managers. It describes—in simple, understandable language liberally illustrated with figures and examples—what SQL is, why it is important, and how you use it. This book is not specific to one particular brand or dialect of SQL. Rather, it describes the standard, central core of the SQL language and then goes on to describe the differences among the most popular SQL products, including Oracle, Microsoft SQL Server, IBM’s DB2 Universal Database and Informix, Sybase, and MySQL. It also explains the importance of SQL-based standards, such as ODBC and JDBC, and the ANSI/ ISO standards for SQL and SQL-related technologies. This third edition contains new chapters and sections that cover the latest SQL innovations, in the areas of object-relational technologies, XML, and application server architectures. If you are new to SQL, this book offers comprehensive, step-by-step treatment of the language, building from simple queries to more advanced concepts. The structure of the book will allow you to quickly start using SQL, but the book will continue to be valuable as you begin to use the more complex features of the language. You can create the sample database using an SQL script available on the McGraw-Hill website (see Appendix A) and use it to try out the examples and build your SQL skills. If you are a data processing professional, architect, or manager, this book will give you a perspective on the impact that SQL is having across the information technology industry—from personal computers to mainframes to data warehousing to Internet web sites and Internet-based distributed applications. The early chapters describe the history of SQL, its role in the market, and its evolution from earlier database technologies. Later chapters describe the future of SQL and the development of new database technologies, such as distributed databases, object-oriented extensions to SQL, business intelligence databases, and database/XML integration.

xxvii

xxviii

SQL: The Complete Reference

If you are a programmer, this book offers a very complete treatment of programming with SQL. Unlike the reference manuals of many DBMS products, it offers a conceptual framework for SQL programming, explaining the why as well as the how of developing a SQL-based application. It contrasts the SQL programming interfaces offered by all of the leading SQL products, including embedded SQL, dynamic SQL, ODBC, JDBC, and proprietary APIs such as the Oracle Call Interface. The description and comparison of programming techniques provides a perspective not found in any other book. If you are selecting a DBMS product, this book offers a comparison of the SQL features, advantages, and benefits offered by the various DBMS vendors. The differences between the leading DBMS products are explained, not only in technical terms, but also in terms of their impact on applications and their evolving competitive position in the marketplace. The “sample database” can be used to try these features in a prototype of your own application. In short, both technical and nontechnical users can benefit from this book. It is the most comprehensive source of information available about the SQL language, SQL features and benefits, popular SQL-based products, the history of SQL, and the impact of SQL on the future direction of the information technology industry.

I

PART

An Overview of SQL

T

he first four chapters of this book provide a perspective and a quick introduction to SQL. Chapter 1 describes what SQL is and explains its major features and benefits. In Chapter 2, a quick tour of SQL shows you many of its capabilities with simple, rapid-fire examples. Chapter 3 offers a market perspective of SQL by tracing its history, describing the SQL standards and the major vendors of SQL-based products, and by identifying the reasons for SQL’s prominence today. Chapter 4 describes the relational data model upon which SQL is based and compares it with earlier data models.

CHAPTER 1 Introduction CHAPTER 2 A Quick Tour of SQL CHAPTER 3 SQL in Perspective CHAPTER 4 Relational Databases

This page intentionally left blank

1

CHAPTER

Introduction

T

he SQL language and relational database systems based on it constitute one of the most important foundation technologies in the computer industry. Over the last three decades, SQL has grown from its first commercial use into a computer product and services market segment worth tens of billions of dollars per year, and SQL stands today as the standard computer database language. Hundreds of database products now support SQL, running on computer systems from mainframes to personal computers. A SQL-based database may even be embedded in your mobile phone or PDA, or in the entertainment system of your car. An official international SQL standard has been adopted and expanded several times. Every major enterprise software product relies on SQL for its data management, and SQL is at the core of the flagship database products from Microsoft, Oracle, and IBM, three of the largest software companies in the world. SQL is also at the heart of open-source database products such as MySQL and Postgres that are helping to fuel the popularity of Linux and the open source movement. From its obscure beginnings as an IBM research project, SQL has grown to become both an important piece of information technology and a powerful market force. What, exactly, is SQL? Why is it important? What can it do, and how does it work? If SQL is really a standard, why do we have so many different versions and dialects? How do popular SQL products like SQL Server, Oracle, MySQL, Sybase, and DB2 compare? How does SQL relate to Microsoft standards such as ODBC and .NET? How does JDBC link SQL to the world of Java and object technology? What role does it play in the Service-Oriented Architecture (SOA) and web services being embraced by enterprise IT organizations? Does SQL really scale from mainframes to handheld devices? Has it really delivered the performance needed for high-volume transaction processing? How will SQL impact the way you use computers, and how can you get the most out of this important data management tool? This book answers those questions by giving you a complete perspective and a solid working knowledge of SQL.

3

4

Part I:

An Overview of SQL

The SQL Language SQL is a tool for organizing, managing, and retrieving data stored by a computer database. The original name given it by IBM was Structured English Query Language, shortened to the acronym SEQUEL. When IBM discovered that SEQUEL was a trademark owned by the Hawker Siddeley Aircraft Company of the United Kingdom, they shortened the acronym to SQL. The word “English” was then dropped from the spelled-out name to match the new acronym. To this day, you will hear the acronym SQL pronounced as either a word (“sequel”) or as a string of letters (“S-Q-L”), and while the latter is generally preferred, both are considered correct. As the name implies, SQL is a computer language that you use to interact with a database. In fact, SQL works with one specific type of database, called a relational database, which has become the mainstream way to organize data across a very broad range of computer applications. Figure 1-1 shows how SQL works. The computer system in the figure has a database that stores important information. If the computer system is in a business, the database might store inventory, production, sales, or payroll data. On a personal computer, the database might store data about the checks you have written, lists of people and their phone numbers, or data extracted from a larger computer system. The computer program that controls the database is called a database management system (DBMS). When you need to retrieve data from a database, you use the SQL to make the request. The DBMS processes the SQL request, retrieves the requested data, and returns it to you. This process of requesting data from a database and receiving the results is called a database query—hence the name Structured Query Language. “Structured Query Language” is actually somewhat of a misnomer. First of all, SQL is far more than a query tool, although that was its original purpose, and retrieving data is still one of its most important functions. SQL is used to control all of the functions that a DBMS provides for its users, including

FIGURE 1-1

Using SQL for database access

Chapter 1:

Introduction

5

• Data definition SQL lets a user define the structure and organization of the stored data and relationships among the stored data items.

• Data manipulation SQL allows a user or an application program to update the database by adding new data, removing old data, and modifying previously stored data. • Access control SQL can be used to restrict a user’s ability to retrieve, add, and modify data, protecting stored data against unauthorized access. • Data sharing SQL is used to coordinate data sharing by concurrent users, ensuring that changes made by one user do not inadvertently wipe out changes made at nearly the same time by another user. • Data integrity SQL defines integrity constraints in the database, protecting it from corruption due to inconsistent updates or system failures. SQL is thus a comprehensive language for controlling and interacting with a database management system. Second, SQL is not really a complete computer language like COBOL, C, C++, or Java. Instead, SQL is a database sublanguage, consisting of about 40 statements specialized for database management tasks. These SQL statements can be embedded into another language such as COBOL or C to extend that language for use in database access. Alternatively, the statements can be explicitly sent to a database management system for processing, via a call-level interface from a language such as C, C++, or Java, or via messages sent over a computer network. SQL also differs from other computer languages because it describes what the user wants the computer to do rather than how the computer should do it. (In more technical terms, SQL is a declarative or descriptive language rather than a procedural one.) SQL contains no IF statement for testing conditions, and no GOTO, DO, or FOR statements for program flow control. Rather, SQL statements describe how a collection of data is to be organized, or what data is to be retrieved or added to the database. The sequence of steps to do those tasks is left for the DBMS to determine. Finally, SQL is not a particularly structured language, especially when compared with highly structured languages such as C, Pascal, or Java. Instead, SQL statements resemble English sentences, complete with “noise words” that don’t add to the meaning of the statement but make it read more naturally. The SQL has quite a few inconsistencies and also some special rules to prevent you from constructing SQL statements that look perfectly legal but that don’t make sense. Despite the inaccuracy of its name, SQL has emerged as the standard language for using relational databases. SQL is both a powerful language and one that is relatively easy to learn. The quick tour of SQL in Chapter 2 will give you a good overview of the language and its capabilities.

PART I

• Data retrieval SQL allows a user or an application program to retrieve stored data from the database and use it.

6

Part I:

An Overview of SQL

The Role of SQL SQL is not itself a database management system, nor is it a stand-alone product. You cannot go to a computer retailer or a web site selling computer software and buy SQL. Instead, SQL is an integral part of a database management system, a language and a tool for communicating with the DBMS. Figure 1-2 shows some of the components of a typical DBMS and how SQL links them together. The database engine is the heart of the DBMS, responsible for actually structuring, storing, and retrieving the data in the database. It accepts SQL requests from other DBMS components (such as a forms facility, report writer, or interactive query facility), from user-written application programs, and even from other computer systems. As the figure shows, SQL plays many different roles: • SQL is an interactive query language. Users type SQL commands into an interactive SQL program to retrieve data and display it on the screen, providing a convenient, easy-to-use tool for ad hoc database queries. • SQL is a database programming language. Programmers embed SQL commands into their application programs to access the data in a database. Both user-written programs and database utility programs (such as report writers and data entry tools) use this technique for database access.

FIGURE 1-2

Components of a typical database management system

Chapter 1:

Introduction

• SQL is a client/server language. Personal computer programs use SQL to communicate over a network with database servers that store shared data. This client/server architecture is used by many popular enterprise-class applications. • SQL is an Internet data access language. Internet web servers that interact with corporate data and Internet application servers all use SQL as a standard language for accessing corporate databases, often by embedding SQL database access within popular scripting languages like PHP or Perl. • SQL is a distributed database language. Distributed database management systems use SQL to help distribute data across many connected computer systems. The DBMS software on each system uses SQL to communicate with the other systems, sending requests for data access. • SQL is a database gateway language. In a computer network with a mix of different DBMS products, SQL is often used in a gateway that allows one brand of DBMS to communicate with another brand. SQL has thus emerged as a useful, powerful tool for linking people, computer programs, and computer systems to the data stored in a relational database.

SQL Success Factors In historical terms, SQL has been an extraordinarily successful information technology. Think about the computer market in the mid-1980s, when SQL first started to become important. Mainframes and minicomputers dominated corporate computing. The IBM personal computer had been introduced only a few years before, and the MS-DOS command line was its user interface. IBM’s mainframe operating systems and minicomputer operating systems from Digital Equipment, Data General, Hewlett-Packard, and others dominated business computing. Proprietary networking schemes like IBM’s SNA or Digital Equipment’s DECnet linked computers together. The Internet was still a tool for collaboration among research labs, and the World Wide Web had not yet appeared on the scene. COBOL, C, and Pascal were dominant computer languages; object-oriented programming was only beginning to emerge; and Java had not been invented. Across all of these areas of computer technology—from computer hardware to operating systems to networking to languages—the important key technologies of the mid-1980s have faded or become obsolete, replaced by significant new ones. But in the world of data management, the relational database and SQL continue to dominate the landscape. They have expanded over the years to support new hardware, operating systems, networks, and languages, but despite many attempts to dethrone them, the core relational model and the SQL have thrived and remain the dominant forces in data management. Here are some of the major features and market forces that have contributed to this success over the past 25 years: • Vendor independence • Portability across computer systems

PART I

• SQL is a database administration language. The database administrator responsible for managing a minicomputer or mainframe database uses SQL to define the database structure and to control access to the stored data.

7

8

Part I:

An Overview of SQL

• Official SQL standards • Early IBM commitment • Microsoft support • Relational foundation • High-level, English-like structure • Interactive, ad hoc queries • Programmatic database access • Multiple views of data • Complete database language • Dynamic data definition • Client/Server architecture • Enterprise application support • Extensibility and object technology • Internet database access • Java integration (JDBC) • Open source support • Industry infrastructure The sections that follow briefly describe each of these and how they contributed to SQL’s success.

Vendor Independence SQL is offered by all of the leading DBMS vendors, and no new database product over the last decade has been highly successful without SQL support. A SQL-based database and the programs that use it can be moved from one DBMS to another vendor’s DBMS with minimal conversion effort and little retraining of personnel. Database tools such as query tools, report writers, and application generators work with many different brands of SQL databases. The vendor independence thus provided by SQL was one of the most important reasons for its early popularity and remains an important feature today.

Portability Across Computer Systems SQL-based database products run on computer systems ranging from mainframes and midrange systems to personal computers, workstations, a wide range of specialized server computers, and even handheld devices. They operate on stand-alone computer systems, in departmental local area networks, and in enterprisewide or Internetwide networks. SQLbased applications that begin on single-user or departmental server systems can be moved to larger server systems as they grow. Data from corporate SQL-based databases can be extracted and downloaded into departmental or personal databases. Finally, economical personal computers can be used to test a prototype of a SQL-based database application before moving it to an expensive multiuser system.

Chapter 1:

Introduction

9

Official SQL Standards

Early IBM Commitment SQL was originally invented by IBM researchers and fairly quickly became a strategic product for IBM based on its flagship DB2 database. SQL support is available on all major IBM product families, from personal computers through midrange systems and UNIX-based servers to IBM mainframes. IBM’s initial work provided a clear signal of IBM’s direction for other database and system vendors to follow early in the development of SQL and relational databases. Later, IBM’s commitment and broad support speeded the market acceptance of SQL. In the 1970s, IBM was the dominant force in business computing, so its early and sustained support as the inventor and champion of SQL ensured its early importance.

Microsoft Support Microsoft has long considered database access a key part of its Windows personal computer software architecture. Both desktop and server versions of Windows provide standardized relational database access through Open Database Connectivity (ODBC), a SQL-based call-level API (application programming interface). Leading Windows software applications (spreadsheets, word processors, databases, etc.) from Microsoft and other vendors support ODBC, and all leading SQL databases provide ODBC access. Microsoft has enhanced ODBC support with higher-level, more object-oriented database access layers over the years, including data management support in .NET today. But these new technologies could always interact with relational databases through the ODBC/SQL layers below. When Microsoft began its effort in the late 1980s to make Windows a viable server operating system, it introduced SQL Server as its own SQL-based offering. SQL Server continues today as a flagship Microsoft product and as a key component of the Microsoft .NET architecture for web services.

Relational Foundation SQL is a language for relational databases, and it has become popular along with the relational database model. The tabular, row/column structure of a relational database is intuitive to users, keeping the SQL simple and easy to understand. The relational model also has a strong theoretical foundation that has guided the evolution and implementation of relational databases. Riding a wave of acceptance brought about by the success of the relational model, SQL has become the database language for relational databases.

PART I

An official standard for SQL was initially published by the American National Standards Institute (ANSI) and the International Standards Organization (ISO) in 1986, and was expanded in 1989 and again in 1992, 1999, 2003, and 2006. SQL is also a U.S. Federal Information Processing Standard (FIPS), making it a key requirement for large government computer contracts. Over the years, other international, government, and vendor groups have pioneered the standardization of new SQL capabilities, such as call-level interfaces or object-based extensions. Many of these new initiatives have been incorporated into the ANSI/ISO standard over time. The evolving standards serve as an official stamp of approval for SQL and have speeded its market acceptance.

10

Part I:

An Overview of SQL

High-Level, English-Like Structure SQL statements look like simple English sentences, making SQL relatively easy to learn and understand. This is in part because SQL statements describe the data to be retrieved, rather than specifying how to find the data. Tables and columns in a SQL database can have long, descriptive names. As a result, most SQL statements “say what they mean” and can be read as clear, natural sentences.

Interactive, Ad Hoc Queries SQL is an interactive query language that gives users ad hoc access to stored data. Using SQL interactively, a user can get answers even to complex questions in minutes or seconds, in sharp contrast to the days or weeks it would take for a programmer to write a custom report program. Because of the SQL ad hoc query power, data is more accessible and can be used to help an organization make better, more informed decisions. SQL’s ad hoc query capability was an important advantage over nonrelational databases early in its evolution and more recently has continued as a key advantage over pure object-based databases.

Programmatic Database Access SQL is also a database language used by programmers to write applications that access a database. The same SQL statements are used for both interactive and programmatic access, so the database access parts of a program can be tested first with interactive SQL and then embedded into the program. In contrast, nonrelational or object-oriented databases provided one set of tools for programmatic access and a separate query facility for ad hoc requests, without any synergy between the two modes of access.

Multiple Views of Data Using SQL, the creator of a database can give different users of the database different views of its structure and contents. For example, the database can be constructed so that each user sees data only for his or her department or sales region. In addition, data from several different parts of the database can be combined and presented to the user as a simple row/ column table. SQL views can thus be used to enhance the security of a database and to tailor it to the particular needs of individual users while preserving the fundamental row/column structure of the data.

Complete Database Language SQL was first developed as an ad hoc query language, but its powers now go far beyond data retrieval. SQL provides a complete, consistent language for creating a database, managing its security, updating its contents, retrieving data, and sharing data among many concurrent users. SQL concepts that are learned in one part of the language can be applied to other SQL commands, making users more productive.

Dynamic Data Definition Using SQL, the structure of a database can be changed and expanded dynamically, even while users are accessing database contents. This is a major advance over static data definition languages, which prevented access to the database while its structure was being changed. SQL thus provides maximum flexibility, allowing a database to adapt to changing requirements while online applications continue uninterrupted.

Chapter 1:

Introduction

11

Client/Server Architecture

Enterprise Application Support The largest enterprise applications that support the daily operation of large companies and organizations all use SQL-based databases to store and organize their data. In the 1990s, driven by the impending deadline for supporting dates in the year 2000 and beyond (the so-called “Y2K” problem), large enterprises moved en masse to abandon their homegrown systems and convert to packaged enterprise applications from vendors like SAP, Oracle, PeopleSoft, Siebel, and others. The data processed by these applications (orders, sales amounts, customers, inventory levels, payment amounts, etc.) tends to have a structured, records-and-fields format, which converts easily into the row/column format of SQL. By constructing their applications to use enterprise-class SQL databases, the major application vendors eliminated the need to develop their own data management software and benefited from existing tools and programming skills. Because every major enterprise application requires a SQL-based database for its operation, new sales of enterprise applications automatically generate “drag-along” demand for new copies of database software.

Extensibility and Object Technology The major challenge to SQL’s continued dominance as a database standard has come from the emergence of object-based programming through languages such as Java and C++, and from the introduction of object-based databases as an extension of the broad market trend toward object-based technology. SQL-based database vendors have responded to this challenge by slowly expanding and enhancing SQL to include object features. These “object/ relational” databases, which continue to be based on SQL, have emerged as a more popular alternative to “pure object” databases and have perpetuated SQL’s dominance through the last decade. The newest wave of object technology, embodied in the XML standard and web services architectures, once again created a crop of “XML databases” and alternative query languages to challenge SQL in the early 2000s. But once again, the major vendors of SQLbased databases responded by adding XML-based extensions, meeting the challenge and securing SQL’s continuing importance. History suggests that this “extend and integrate” approach will be successful in warding off new challenges in the future as well.

Internet Database Access With the exploding popularity of the Internet and the World Wide Web, and their standards-based foundation, SQL found a new role in the late 1990s as an Internet data access standard. Early in the development of the Web, developers needed a way to retrieve and present database information on web pages and used SQL as a common language for database gateways. More recently, the emergence of three-tiered Internet architectures with distinct thin client, application server, and database server layers, has established SQL as the standard link between the application and database tiers. The role of SQL in multitier

PART I

SQL is a natural vehicle for implementing applications using a distributed, client/server architecture. In this role, SQL serves as the link between “front-end” computer systems optimized for user interaction and “back-end” systems specialized for database management, allowing each system to do what it does best. SQL also allows personal computers to function as front-ends to network servers or to larger minicomputer and mainframe databases, providing access to corporate data from personal computer applications.

12

Part I:

An Overview of SQL

architectures is now beginning to extend beyond the back-end database layer, to include data caching and real-time data management in or near the application tier.

Java Integration (JDBC) A major area of SQL development over the last five to ten years has been the integration of SQL with Java. Seeing the need to link the Java language to existing relational databases, Sun Microsystems (the creator of Java) introduced Java Database Connectivity (JDBC), a standard API that allows Java programs to use SQL for database access. JDBC received a further boost when it was adopted as the data access standard within the Java2 Enterprise Edition (J2EE) specification, which defines the operating environment provided by most of the leading Internet application servers. In addition to the role of Java as a programming language from which databases are used, many of the leading database vendors have also announced or implemented Java support within their database systems, allowing Java to be used as a language for stored procedures and business logic within the database itself. This trend toward integration between Java and SQL will ensure the continued importance of SQL in the new era of Java-based programming.

Open Source Support One of the newer important developments in the computer industry is the emergence of an “open source” approach to building complex software systems. With this approach, the source code that defines the operation of a software system is open and freely available, and many different programmers can contribute to it, adding features, fixing bugs, enhancing its functionality, and providing support for its use. This community of programmers, potentially spread across thousands of different organizations and around the globe, with some coordination becomes the engine that drives the further development of the technology. Open source software is generally available at very low prices (or free), adding to its appeal. Several successful open source SQL-based databases have been built in the last decade, and one of these, MySQL, is a standard component of the most popular open source “stack” of software—the LAMP stack—which also includes Linux, the Apache web server, and the PHP scripting language. The widespread availability of free SQL-based open source databases has exposed SQL to an even broader range of programmers, continuing to build its popularity.

Industry Infrastructure Perhaps the most important factor contributing to the growing importance of SQL is the emergence of an entire computer industry infrastructure based on SQL. SQL-based relational database systems are an important part of this infrastructure. Enterprise applications that use SQL and require a SQL-based database are another important part, as are reporting tools, data entry tools, design tools, programming tools, and a host of other tools that simplify the use of SQL. A large pool of experienced SQL programmers is a critical part of the infrastructure. Another important part is the training and support services that surround SQL and help to create and perpetuate SQL expertise. An entire subindustry has emerged around SQL consulting, optimization, and performance-tuning. All parts of this infrastructure tend to reinforce one another and to contribute to the ongoing success of the other parts. Simply stated, to solve data management problems, the easiest, lowest-risk, lowest-cost solution is almost always a solution based on SQL.

2

CHAPTER

A Quick Tour of SQL

B

efore we dive into the details of SQL, it’s a good idea to develop an overall perspective on the language and how it works. This chapter contains a quick tour of SQL that illustrates its major features and functions. The goal of the quick tour is not to make you proficient in writing SQL statements; that’s the goal of Part II of this book. Rather, by the time you’ve finished this chapter, you will have a basic familiarity with the SQL and an overview of its capabilities.

A Simple Database The examples in this quick tour are based on a simple relational database for a small distribution company. The database, shown in Figure 2-1, stores the information needed to implement a small order-processing application. You will find instructions for creating the sample database in Appendix A, so you can try these queries yourself as you read. Specifically, it stores the following information: • The customers who buy the company’s products • The orders placed by those customers • The salespeople who sell the products to customers • The sales offices where those salespeople work This database, like most others, is a model of the “real world.” The data stored in the database represents real entities—customers, orders, salespeople, and offices. Each different kind of entity has a separate table of data. For example, in the SALESREPS table, each salesperson is represented by one row, and each column holds one type of information about salespeople, such as their name or the sales office where they are assigned. Database requests that you make using the SQL parallel real-world activities, as customers place, cancel, and change orders; as you hire and fire salespeople; and so on. Let’s see how you can use SQL to manipulate data.

13

14

Part I:

An Overview of SQL

FIGURE 2-1

A simple relational database

Retrieving Data First, let’s list the sales offices, showing the city where each one is, its office number, and its year-to-date sales. The SQL statement that retrieves data from the database is called SELECT. This SQL statement retrieves the data you want: SELECT CITY, OFFICE, SALES FROM OFFICES; CITY OFFICE SALES ------------ ------- -----------Denver 22 $186,042.00 New York 11 $692,637.00 Chicago 12 $735,042.00 Atlanta 13 $367,911.00 Los Angeles 21 $835,915.00

Chapter 2:

A Q u i c k To u r o f S Q L

SELECT NAME, REP_OFFICE, SALES, QUOTA FROM SALESREPS; NAME REP_OFFICE SALES QUOTA -------------- ----------- ------------ -----------Bill Adams 13 $367,911.00 $350,000.00 Mary Jones 11 $392,725.00 $300,000.00 Sue Smith 21 $474,050.00 $350,000.00 Sam Clark 11 $299,912.00 $275,000.00 Bob Smith 12 $142,594.00 $200,000.00 Dan Roberts 12 $305,673.00 $300,000.00 Tom Snyder NULL $75,985.00 NULL Larry Fitch 21 $361,865.00 $350,000.00 Paul Cruz 12 $286,775.00 $275,000.00 Nancy Angelli 22 $186,042.00 $300,000.00

The NULL values for Tom Snyder represent missing or unknown data. He is new to the company and has not yet been assigned to a sales office or been given a sales quota. However, he has already made some sales. The data in his row of query results shows this clearly. SQL also lets you ask for calculated results. For example, you can ask SQL to calculate the amount by which each salesperson is over or under quota: SELECT NAME, SALES, QUOTA, (SALES – QUOTA) FROM SALESREPS; NAME SALES QUOTA (SALES–QUOTA) -------------- ------------ ------------ -------------Bill Adams $367,911.00 $350,000.00 $17,911.00 Mary Jones $392,725.00 $300,000.00 $92,725.00 Sue Smith $474,050.00 $350,000.00 $124,050.00 Sam Clark $299,912.00 $275,000.00 $24,912.00 Bob Smith $142,594.00 $200,000.00 –$57,406.00 Dan Roberts $305,673.00 $300,000.00 $5,673.00 Tom Snyder $75,985.00 NULL NULL Larry Fitch $361,865.00 $350,000.00 $11,865.00 Paul Cruz $286,775.00 $275,000.00 $11,775.00 Nancy Angelli $186,042.00 $300,000.00 –$113,958.00

The requested data (including the calculated difference between sales and quota for each salesperson) once again appears in a row/column table. Perhaps you would like to focus on the salespeople whose sales are less than their quotas. SQL lets you retrieve that

PART I

The SELECT statement asks for three pieces of data—the city, the office number, and the amount of sales—for each office. It also specifies that all of this data comes from the OFFICES table, which stores data about sales offices. The results of the query appear, in tabular form, immediately after the request. Note that the formatting of the query results will vary from one SQL implementation to another. The SELECT statement is used for all SQL queries. For example, here is a query that lists the names and year-to-date sales for each salesperson in the database. It also shows the quota (sales target) and the office number where each person works. In this case, the data comes from the SALESREPS table.

15

16

Part I:

An Overview of SQL

kind of selective information very easily, by adding a mathematical comparison to the previous request: SELECT NAME, SALES, QUOTA, (SALES – QUOTA) FROM SALESREPS WHERE SALES < QUOTA; NAME SALES QUOTA -------------- ------------ -----------Bob Smith $142,594.00 $200,000.00 Nancy Angelli $186,042.00 $300,000.00

(SALES–QUOTA) ------------–$57,406.00 –$113,958.00

You can use the same technique to list large orders in the database and find out which customer placed the order, which product was ordered, and in what quantity. You can also ask SQL to sort the orders based on the order amount: SELECT FROM WHERE ORDER

ORDER_NUM, CUST, PRODUCT, QTY, AMOUNT ORDERS AMOUNT > 25000.00 BY AMOUNT;

ORDER_NUM CUST PRODUCT QTY AMOUNT ---------- ----- -------- ---- ----------112987 2103 4100Y 11 $27,500.00 113069 2109 775C 22 $31,350.00 112961 2117 2A44L 7 $31,500.00 113045 2112 2A44R 10 $45,000.00

Summarizing Data SQL not only retrieves individual pieces of data from the database, but it also can summarize the database contents. What’s the average size of an order in the database? This request asks SQL to look at all the orders and find the average amount: SELECT AVG(AMOUNT) FROM ORDERS; AVG(AMOUNT) -----------$8,256.37

You could also ask for the average order size for a particular customer: SELECT AVG(AMOUNT) FROM ORDERS WHERE CUST = 2103; AVG(AMOUNT) ----------$8,895.50

Chapter 2:

A Q u i c k To u r o f S Q L

SELECT CUST, SUM(AMOUNT) FROM ORDERS GROUP BY CUST; CUST SUM(AMOUNT) ----- -----------2101 $1,458.00 2102 $3,978.00 2103 $35,582.00 2106 $4,026.00 2107 $23,132.00 2108 $7,255.00 2109 $31,350.00 2111 $6,445.00 2112 $47,925.00 2113 $22,500.00 2114 $22,100.00 2117 $31,500.00 2118 $3,608.00 2120 $3,750.00 2124 $3,082.00

Adding Data to the Database You also use SQL to add new data to the database. For example, suppose you just opened a new Western region sales office in Dallas, with target sales of $275,000. Here’s the INSERT statement that adds the new office to the database, as office number 23: INSERT INTO OFFICES (CITY, REGION, TARGET, SALES, OFFICE) VALUES ('Dallas', 'Western', 275000.00, 0.00, 23); 1 row inserted.

Similarly, if Mary Jones (employee number 109) signs up a new customer, Acme Industries, this INSERT statement adds the customer to the database as customer number 2125 with a $25,000 credit limit: INSERT INTO CUSTOMERS (COMPANY, CUST_REP, CUST_NUM, CREDIT_LIMIT) VALUES ('Acme Industries', 109, 2125, 25000.00); 1 row inserted.

You may have noticed that the SQL engine returned feedback (1 row inserted) to tell you that the statements worked. The exact wording and formatting of this feedback will vary from one SQL implementation to another.

PART I

Finally, let’s find out the total value of the orders placed by each customer. To do this, you can ask SQL to group the orders together by customer number and then total the orders for each customer:

17

18

Part I:

An Overview of SQL

Deleting Data Just as the SQL INSERT statement adds new data to the database, the SQL DELETE statement removes data from the database. If Acme Industries decides a few days later to switch to a competitor, you can delete Acme’s customer information from the database with this statement: DELETE FROM CUSTOMERS WHERE COMPANY = 'Acme Industries'; 1 row deleted.

And if you decide to terminate all salespeople whose sales are less than their quotas, you can remove them from the database with this DELETE statement: DELETE FROM SALESREPS WHERE SALES < QUOTA; 2 rows deleted.

Updating the Database You can use SQL to modify data that is already stored in the database. For example, to increase the credit limit for First Corp. to $75,000, you would use the SQL UPDATE statement: UPDATE CUSTOMERS SET CREDIT_LIMIT = 75000.00 WHERE COMPANY = 'First Corp.'; 1 row updated.

The UPDATE statement can also make many changes in the database at once. For example, this UPDATE statement raises the quota for all salespeople by $15,000: UPDATE SALESREPS SET QUOTA = QUOTA + 15000.00; 8 rows updated.

Protecting Data An important role of a database is to protect the stored data from access or modification by unauthorized users. For example, suppose your assistant, Mary, has a database account but has not been previously authorized to insert data about new customers into the database. This SQL statement grants her that permission: GRANT INSERT ON CUSTOMERS TO MARY; Privilege granted.

Chapter 2:

A Q u i c k To u r o f S Q L

19

Similarly, the following SQL statement gives Mary permission to update data about customers and to retrieve customer data with the SELECT statement:

Privilege granted.

If Mary is no longer allowed to add new customers to the database, this REVOKE statement will disallow it: REVOKE INSERT ON CUSTOMERS FROM MARY; Privilege revoked.

Similarly, this REVOKE statement will revoke all of Mary’s privileges to access or modify customer data in any way: REVOKE ALL ON CUSTOMERS FROM MARY; Privilege revoked.

Creating a Database Before you can store data in a database, you must first define the structure of the data. Suppose you want to expand the sample database by adding a table of data about the products your company sells. For each product, the data to be stored includes the following: • A three-character manufacturer ID code • A five-character product ID code • A product description of up to 30 characters • The price of the product • The quantity currently on hand This SQL CREATE TABLE statement defines a new table to store the products’ data: CREATE TABLE (MFR_ID PRODUCT_ID DESCRIPTION PRICE QTY_ON_HAND

PRODUCTS CHAR(3), CHAR(5), VARCHAR(30), DECIMAL(9,2), INTEGER);

Table created.

PART I

GRANT UPDATE, SELECT ON CUSTOMERS TO MARY;

20

Part I:

An Overview of SQL

Although more cryptic than the previous SQL statement examples, the CREATE TABLE statement is still fairly straightforward. It assigns the name PRODUCTS to the new table and specifies the name and type of data stored in each of its five columns. The manufacturer and product IDs are stored as fixed-length sequences of characters, the product description is a variable-length character string, the price is decimal data (a real number), and the quantity is an integer. Once the table has been created, you can fill it with data. Here’s an INSERT statement for a new shipment of 250 size 7 widgets (product ACI-41007), which cost $225.00 apiece: INSERT INTO PRODUCTS (MFR_ID, PRODUCT_ID, DESCRIPTION, PRICE, QTY_ON_HAND) VALUES ('ACI', '41007', 'Size 7 Widget', 225.00, 250); 1 row inserted.

Finally, if you discover later that you no longer need to store the products’ data in the database, you can erase the table (and all of the data it contains) with the DROP TABLE statement: DROP TABLE PRODUCTS; Table dropped.

Summary This quick tour of SQL showed you what SQL can do and illustrated the style of the SQL by using eight of the most commonly used SQL statements. To summarize: • Use SQL to retrieve data from the database, by using the SELECT statement. You can retrieve all or part of the stored data, sort it, and ask SQL to summarize the data, using totals and averages. • Use SQL to update the database, by adding new data with the INSERT statement, deleting data with the DELETE statement, and modifying existing data with the UPDATE statement. • Use SQL to control access to the database, by granting and revoking specific privileges for specific users with the GRANT and REVOKE statements. • Use SQL to create/modify the database by defining the structure of new tables and dropping tables when they are no longer needed, by using the CREATE and DROP statements.

3

CHAPTER

SQL in Perspective

S

QL is the standard language for database management today. What does it mean for SQL to be a standard? How did it become a standard? What role does the official SQL standard play? How broadly adopted is it, and why are there still dialects of SQL despite the standard? How broad is SQL’s impact on various segments of the IT landscape? To answer these questions, this chapter traces the history of SQL and describes its current role in the computer market.

SQL and the Evolution of Database Management One of the major tasks of a computer system is to store and manage data. To handle this task, specialized computer programs known as database management systems began to appear in the late 1960s and early 1970s. A database management system, or DBMS, helped computer users to organize and structure their data and allowed the computer system to play a more active role in managing the data. Although database management systems were first developed on large mainframe systems, their popularity quickly spread to minicomputers, and then to computer workstations, personal computers, and specialized server computers. Database management has also played a key role in the explosion of computer networking and the Internet. Early database systems ran on large, monolithic computer systems, where the data, the database management software, and the users or application programs accessing the database all operated on the same system. The 1980s and 1990s saw the explosion of a new client/server model for database access, in which a user or an application program running on a personal computer accesses a database on a separate computer system by using a network. In the late 1990s, the increasing popularity of the Internet and the World Wide Web impacted the architecture of data management again. Today, users require little more than a web browser to access and interact with databases, not only within their own organizations, but also around the world. These Internet-based architectures usually involve three or more computer systems—one that runs the web browser and interacts with the user, connected over the Internet to a second system that runs an application program or application server, which is in turn connected to a third system that runs the database management system. Database management has become a very big business. Independent software companies and computer vendors ship billions of dollars’ worth of database management products every year. Virtually all enterprise-class computer applications that support the daily operation

21

22

Part I:

An Overview of SQL

of large companies and other organizations use databases. These applications include some of the fastest-growing application categories, such as Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Supply Chain Management (SCM), Sales Force Automation (SFA), and financial applications. Specialized high-performance server computers optimized to run the most popular database software constitute a multibillion-dollar market, and low-cost servers used exclusively for data management add billions more. Databases provide the intelligence behind most transaction-oriented web sites, and they are used to capture and analyze user interactions with web sites. Database management thus touches every segment of the computer market. Since the late 1980s, a specific type of DBMS, called a relational database management system (RDBMS), has become so popular that it is the standard database form. Relational databases organize data in a simple, tabular form and provide many advantages over earlier types of databases. SQL is specifically a relational database language used to work with relational databases.

A Brief History of SQL The history of the SQL is intimately intertwined with the development of relational databases. Table 3-1 shows some of the early milestones in its 40-year history. The relational database concept was originally developed by Edgar Frank “Ted” Codd, an IBM researcher. In June 1970, Codd published an article entitled “A Relational Model of Data for Large Shared Data Banks,” which outlined a mathematical theory of how data could be stored and manipulated using a tabular structure. Relational databases and SQL trace their origins to this article, which appeared in the Communications of the Association for Computing Machinery.

The Early Years Codd’s article triggered a flurry of relational database research, including a major research project within IBM. The goal of the project, called System/R, was to prove the workability of the relational concept and to provide some experience in actually implementing a relational DBMS. Work on System/R began in the mid-1970s at IBM’s Santa Teresa laboratories in San Jose, California. In 1974 and 1975, the first phase of the System/R project produced a minimal prototype of a relational DBMS. In addition to the DBMS itself, the System/R project included work on database query languages. One of these languages was called SEQUEL, an acronym for Structured English Query Language. In 1976 and 1977, the System/R research prototype was rewritten from scratch, and the new implementation was distributed to selected IBM customers for evaluation in 1978 and 1979. These early customer sites provided some actual user experience with System/R and its database language, which, for legal reasons, had been renamed SQL, or Structured Query Language. In 1979, the System/R research project came to an end, with IBM concluding that relational databases were not only feasible, but also could be the basis for a useful commercial product.

Early Relational Products The System/R project and its SQL database language were well-chronicled in technical journals during the 1970s. Seminars on database technology featured debates on the merits of the new and heretical relational model. By 1976, it was apparent that IBM was becoming enthusiastic about relational database technology and that it was making a major commitment to SQL.

Chapter 3:

SQL in Perspective

Event

1970

Codd defines relational database model

1974

IBM begins System/R project

1974

First article describing the SEQUEL language is published

1978

System/R customer tests are conducted

1979

Oracle introduces first commercial RDBMS

1981

Relational Technology introduces Ingres

1981

IBM announces SQL/DS

1982

ANSI forms SQL standards committee

1983

IBM announces DB2

1986

ANSI SQL1 standard is ratified

1986

Sybase introduces RDBMS for transaction processing

1987

ISO SQL1 standard is ratified

1988

Ashton-Tate and Microsoft announce SQL Server for OS/2

1989

First TPC benchmark (TPC-A) is published

1990

TPC-B benchmark is published

1991

SQL Access Group database access specification is published

1992

Microsoft publishes ODBC specification

1992

ANSI SQL2 standard (SQL-92) is ratified

1992

TPC-C (OLTP) benchmark is published

1993

Specialized SQL data warehousing systems are shipped for the first time

1993

ODBC products are shipped for the first time

1994

Parallel database server technology is shipped commercially

1995

Open source MySQL first released

1996

Standard API for OLAP database access and OLAP benchmark is published

1997

IBM DB2 UDB unifies DB2 architecture across IBM and other vendor platforms

1997

Major DBMS vendors announce Java integration strategies

1998

Microsoft SQL Server 7 provides enterprise-level database support for Windows NT

1998

Oracle 8i provides database/Internet integration and moves away from client/server model

1998

Commercial in-memory database products are shipped for the first time

1999

J2EE standardizes JDBC database access from application servers

1999

ANSI/ISO SQL:1999 standard ratified, adding object-oriented constructs into the language

2000

Oracle introduces application servers with integrated database caching

2000

Microsoft introduces SQL Server 2000, aimed at enterprise applications

2001

XML integration capabilities appear in mainstream RDBMS products

2001

IBM acquires Informix database business

2002

Gartner ranks IBM as #1 database vendor, passing Oracle

2003

ANSI/ISO SQL:2003 ratified, adding SQL/XML

2006

ANSI/ISO SQL:2006 ratified, significantly expanding SQL/XML and object-oriented constructs

2006

IDC and Gartner studies show Oracle leading in market share

2008

MySQL AB acquired by Sun Microsystems

2008

ANSI/ISO SQL:2008 ratified

TABLE 3-1 Milestones in SQL Development

PART I

Year

23

24

Part I:

An Overview of SQL

The publicity about System/R attracted the attention of a group of engineers in Menlo Park, California, who decided that IBM’s research foreshadowed a commercial market for relational databases. In 1977 they formed a company, Relational Software, Inc., to build a relational DBMS based on SQL. Their product named Oracle, shipped in 1979 and became the first commercially available relational DBMS. Oracle beat IBM’s first product to market by a full two years, and Oracle ran on Digital’s VAX minicomputers, which were less expensive than IBM mainframes. The company aggressively sold the merits of the new relational style of database management and eventually renamed itself after its flagship product. Today, Oracle Corporation is the leading vendor of relational database management systems and a major vendor of enterprise applications based on the Oracle database, with annual sales of tens of billions of dollars. Professors at the University of California’s Berkeley computer laboratories were also researching relational databases in the mid-1970s. Like the IBM research team, they built a prototype of a relational DBMS and called their system Ingres. The Ingres project included a query language named QUEL that, although more structured than SQL, was less Englishlike. Many database pioneers, key database developers, and founders of database startup companies trace their history back to the Berkeley Ingres project. In 1980, several professors left Berkeley and founded Relational Technology, Inc., to build a commercial version of Ingres, which was announced in 1981. Ingres and Oracle quickly became bitter archrivals, but their rivalry helped to call attention to relational database technology in this early stage. Despite its technical superiority in many areas, Ingres became a clear second-place player in the market, competing against the SQL-based capabilities (and the aggressive marketing and sales strategies) of Oracle. The original QUEL query language was effectively replaced by SQL in 1986, a testimony to the market power of the SQL standard. By the mid-1990s, the Ingres technology had been sold to Computer Associates, a leading mainframe software vendor. (Computer Associates sold its interest in Ingres to a private equity company in 2005.)

IBM Products While Oracle and Ingres raced to become commercial products, IBM’s System/R project had also turned into an effort to build a commercial product, named SQL/Data System (SQL/DS). IBM announced SQL/DS in 1981 and began shipping the product in 1982. In 1983, IBM announced a version of SQL/DS for VM/CMS, an operating system that was frequently used on IBM mainframes in corporate information center applications. In 1983, IBM also introduced Database 2 (DB2), another relational DBMS for its mainframe systems. DB2 operated under IBM’s MVS operating system, the workhorse operating system used in large mainframe data centers. The first release of DB2 began shipping in 1985, and IBM officials hailed it as a strategic piece of IBM software technology. DB2 has since become IBM’s flagship relational DBMS, and with IBM’s weight behind it, DB2’s SQL became the de facto standard database language. DB2 technology has now migrated across all IBM product lines, from personal computers to network servers to mainframes. In 1997, IBM took the DB2 cross-platform strategy even further, by announcing DB2 versions for servers from IBM hardware rivals Sun Microsystems and Hewlett-Packard. DB2 on mainframes remains the centerpiece of IBM’s database strategy, however, and is a vital force in enterprise computing.

Chapter 3:

SQL in Perspective

25

Commercial Acceptance

PART I

During the first half of the 1980s, the relational database vendors struggled for commercial acceptance of their products. The relational products had several disadvantages compared with the traditional database architectures. The performance of relational databases was seriously inferior to that of traditional databases. Except for the IBM products, the relational databases came from small upstart vendors. And, except for the IBM products, the relational databases tended to run on minicomputers rather than on IBM mainframes. The relational products did have one major advantage, however. Their relational query languages (SQL, QUEL, and others) allowed users to pose ad hoc queries to the database— and get immediate answers—without writing programs. As a result, relational databases began slowly turning up in information center applications as decision-support tools. By May 1985, Oracle proudly claimed to have over 1000 installations. Ingres was installed in a comparable number of sites. DB2 and SQL/DS were also being slowly accepted and counted their combined installations at slightly over 1000 sites. During the last half of the 1980s, SQL and relational databases were rapidly accepted as the database technology of the future. The performance of the relational database products improved dramatically. Ingres and Oracle, in particular, leapfrogged, with each new version claiming superiority over the competitor and two or three times the performance of the previous release. Improvements in the processing power of the underlying computer hardware also helped to boost performance. Market forces also boosted the popularity of SQL in the late 1980s. IBM stepped up its evangelism of SQL, positioning DB2 as the data management solution for the 1990s. Publication of the first ANSI/ISO standard for SQL (SQL1) in 1986 gave SQL official status as a standard. SQL also emerged as a standard on UNIX-based computer systems, whose popularity accelerated in the 1980s. As personal computers became more powerful and were linked in local area networks (LANs), they needed more sophisticated database management. PC database vendors embraced SQL as the solution to these needs, and minicomputer database vendors moved down market to compete in the emerging PC local area network market. Through the early 1990s, steadily improving SQL implementations and dramatic improvements in processor speeds made SQL a practical solution for transaction processing applications. SQL became a key part of the client/server architecture that used PCs, local area networks, and network servers to build much lower-cost information processing systems. When the Internet and the dot-com boom burst upon the IT landscape, SQL found a new role as the database language for Internet applications and e-commerce. SQL’s supremacy in the database world has not gone unchallenged. Object-oriented programming emerged in the 1990s as the method of choice for applications development, especially for personal computers and their graphical user interfaces. The object model, with its objects, classes, methods, and inheritance, did not fit well with the relational model of tables, rows, and columns of data. Early “object database” products included Servio Logic’s Gemstone, Graphael’s Gbase, and Ontologic’s Vbase. A new generation of venture capital–backed object database companies sprang up in the early to mid-1990s, hoping to make relational databases and their vendors obsolete, just as SQL had done to the earlier, nonrelational vendors. These products included Itasca Systems’ ITASCA, Fujitsu’s Jasmine, Matisse Software’s Matisse, Objectivity’s Objectivity/DB, Ontos, Inc.’s (renamed from Ontologic) ONTOS, O2 Technology’s O2, along with perhaps a half dozen others. However, SQL and the relational model more than withstood the challenge. A few of these products

26

Part I:

An Overview of SQL

remain in the market today, but most have been acquired or simply faded away. For example, O2 Technology merged with several companies, was acquired by Informix, and Informix was later acquired by IBM. Total annual revenues for object-oriented databases are measured in the low millions of dollars, while SQL and relational database systems, tools, and services produce tens of billions of dollars of sales per year. As SQL grew to address an ever-wider variety of data management tasks, the one-sizefits-all approach of the earlier SQL products showed serious strain. Specialized database systems sprang up to support different market needs. One of the fastest-growing segments was data warehousing, where databases were used to search through huge amounts of data to discover underlying trends and patterns. A second major trend was the incorporation of new data types (such as multimedia data) and object-oriented principles into SQL. A third important segment was mobile databases for portable personal computers that could operate when sometimes connected to, and sometimes disconnected from, a centralized database system. Another important application segment was embedded databases for use within intelligent devices such as network equipment. In-memory databases emerged as another segment, designed for very high levels of performance, and stream-oriented databases focused on managing data as it flowed over a network. Despite the emergence of subsegments of the database market, SQL has remained a common denominator across them all. Forty years after it first emerged, SQL has broadened tremendously, and SQL’s dominance as the database standard remains very strong. New challenges continue to emerge—the need to incorporate XML and its hierarchical data model and the need to support massive quantities of data to support data management on the scale of the Internet are two of the most recent. But the history of the past 40 years indicates that SQL and the relational model have a powerful ability to embrace and adapt to new data management needs.

SQL Standards One of the most important developments in the market acceptance of SQL is the emergence of SQL standards. References to “the SQL standard” usually mean the official standard adopted by the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO). However, there are other important SQL standards, including the de facto standard for some parts of the SQL that have been defined by IBM’s DB2 product family, and Oracle’s SQL dialect, which has a dominant installed-base market share.

The ANSI/ISO Standards Work on the official SQL standard began in 1982, when ANSI charged its X3H2 committee with defining a standard relational database language. At first, the committee debated the merits of various proposed database languages. However, as IBM’s commitment to SQL increased and SQL emerged as a de facto standard in the market, the committee selected SQL as their relational database language and turned their attention to standardizing it.

Chapter 3:

SQL in Perspective

PART I

The resulting ANSI standard for SQL was largely based on DB2 SQL, although it contained a few major differences from DB2. After several revisions, the standard was officially adopted as ANSI standard X3.135 in 1986, and as an ISO standard in 1987. The ANSI/ISO standard was also adopted as a Federal Information Processing Standard (FIPS) by the U. S. government. This early SQL standard, slightly revised and expanded in 1989, is usually called the SQL1 standard, or SQL-89. Many of the ANSI and ISO standards committee members were representatives from database vendors who had existing SQL products, each implementing a slightly different SQL dialect. Like dialects of human languages, the SQL dialects were generally very similar to one another, but were incompatible in their details. In many areas, the committee simply sidestepped these differences by omitting some parts of the language from the standard and specifying others as “implementer-defined.” These decisions allowed existing SQL implementations to claim broad adherence to the resulting ANSI/ISO standard, but made the initial standard relatively weak. To address the holes in the original standard, the ANSI committee continued its work, and drafts for a new, more rigorous “SQL2” standard were circulated. Unlike the 1989 standard, the SQL2 drafts specified features considerably beyond those found in current commercial SQL products. Even more far-reaching changes were proposed for a follow-on SQL3 standard. In addition, the draft standards attempted to officially standardize parts of the SQL where different proprietary standards had long since been set by the various major DBMS brands. As a result, the proposed SQL2 and SQL3 standards were a good deal more controversial than the initial SQL standard. The SQL2 standard weaved its way through the ANSI approval process and was finally approved in October 1992. While the original 1986 standard had less than 100 pages, the revised standard, officially called SQL-92, had nearly 600 pages. The standards committee acknowledged the large step from SQL-89 to SQL-92 by explicitly creating three levels of SQL standards compliance. The lowest compliance level (Entry-Level) required only minimal additional capability beyond the SQL-89 standard. The middle one (Intermediate-Level) was created as an achievable major step beyond SQL-89, but avoided the most complex, system-dependent, and DBMS brand-dependent issues. The third compliance level (Full) required a full implementation of all areas. Throughout the published standard, each description of each feature includes a definition of the specific aspects of that feature that must be supported to achieve various levels of compliance. Today, specialized databases, such as those used in embedded applications or supported by open source efforts, still offer lower levels of SQL standard compliance in some areas, but all of the major enterprise-class database systems have evolved to fully support the SQL-92 standard. After the adoption of SQL-92, SQL standards work took a different direction. The single standards committee broke up into a number of different committees, focused on different extensions to the language. Some of these, such as stored procedure capabilities, were already found in many commercial SQL products and posed the same standardization challenges faced by SQL2. Others, such as proposed object extensions to SQL, were not yet widely available or fully implemented, but the database vendors were under significant pressure to address them as object-oriented technologies and XML surged in importance. New revisions to the standard were produced in 1999 and again in 2003, 2006, and 2008. The 2006 revision included significant extension to the XML parts of the standard.

27

28

Part I:

An Overview of SQL

At this writing, the ANSI/ISO standard had been expanded into 14 defined “Parts.” A few of them were dropped after some initial activity, some were merged back into other parts, and some continue as stand-alone, parallel efforts within the overall standard: • Part 1 – SQL/Framework contains common definitions and serves as a “table of contents” for the other parts. • Part 2 – SQL/Foundation is the largest part and carries the mainstream definition of the SQL statements that define the structure of a database and manipulate data. It is the descendant of the early SQL-89 and SQL-92 versions of the standard. It has been significantly extended to include SQL structures for business analytics. • Part 3 – SQL/CLI (Call Level Interface) describes the procedural call-level interface, better known as Microsoft’s ODBC standard. It debuted in 1995. • Part 4 – SQL/PSM (Persistent Stored Modules) describes procedural extensions to SQL, paralleling the features found in popular SQL procedural languages like Oracle’s PL/SQL. • Part 5 – SQL/Bindings described how SQL is embedded in other procedural languages. This part was merged into Part 2 – SQL/Foundations in the SQL:2003 version of the standard. • Part 6 – SQL/Transaction was focused on extensions to the XA distributed transaction standard, but was dropped. • Part 7 – SQL/Temporal was focused on extending SQL to deal with time-oriented data, but was dropped. • Part 8 – SQL/Objects held the object-oriented extensions to SQL during the development of SQL3. These extensions were merged back into Part2 – Foundation in SQL:1999. • Part 9 – SQL/MED (Management of External Data) adds facilities to allow SQL to process non relational data sources, and appeared in SQL:2003. • Part 10 – SQL/OLB (Object Language Bindings) describes access to SQL from Java. It is related to JDBC and embedded SQL for Java, and appeared in SQL:2003. • Part 11 – SQL/Schemata contains standards for the “database catalog” or system information tables that self-describe a database. This specification was in Part 2 in SQL:1999, but was separated out into its own part in SQL:2003. • Part 12 – SQL/Replication was started to define standards for how replication from one SQL database to another is specified, but was dropped. • Part 13 – SQL/JRT (Java Routines and Types) describes routines and types used in the Java language to access SQL databases, and first appeared in SQL:2003. • Part 14 – SQL/XML describes how Extensible Markup Language (XML) is integrated into the SQL language. It first appeared in SQL:2003 and has been significantly expanded since. From a few hundred pages describing the common core features of the SQL language in 1986, the ANSI/ISO SQL standard has thus grown dramatically in complexity, scope and length. The “real” SQL standard, of course, is the SQL implemented in products that are

Chapter 3:

SQL in Perspective

Other Early SQL Standards Although it was the most widely recognized, the ANSI/ISO standard was not the only standard in the early days of SQL. X/OPEN, a European vendor group, also adopted SQL as part of its suite of standards for a portable application environment based on UNIX. The X/OPEN standards played a major role in the development of the European computer market, where portability among computer systems from different vendors was a key concern. IBM also included SQL in the specification of its bold 1990s Systems Application Architecture (SAA) blueprint, promising that all of its SQL products would eventually move to this SAA SQL dialect. Although SAA failed to achieve its promise of unifying the IBM product line, the momentum toward a unified IBM SQL continued. With its mainframe DB2 database as the flagship, IBM introduced DB2 implementations for OS/2, its personal computer operating system, and for its RS/6000 line of UNIX-based workstations and servers. The expansion of DB2 (not only across hardware systems, but also across many different types of data) was embodied in IBM’s naming one of its later implementations DB2 Universal Database, or UDB.

ODBC and the SQL Access Group An important area of database technology not addressed by the earlier official standards is database interoperability—the methods by which data can be exchanged among different databases, usually over a network. In 1989, a group of vendors formed the SQL Access Group to address this problem. The resulting SQL Access Group specification for Remote Database Access (RDA) was published in 1991. Unfortunately, the RDA specification was closely tied to the OSI networking protocols, which lost the networking battle to the Internet’s TCP/IP suite, so RDA was never widely implemented. A second standard from the SQL Access Group had far more market impact. At Microsoft’s urging and insistence, the SQL Access Group expanded its focus to include a call-level interface for SQL. Based on a draft from Microsoft, the resulting Call-Level Interface (CLI) specification was published in 1992. Microsoft’s own Open Database Connectivity (ODBC) specification, based on the CLI standard, was published the same year. With the market power of Microsoft behind it, and the “open standards” blessing of the SQL Access Group, ODBC emerged as the de facto standard interface for PC access to SQL databases. Apple and Microsoft announced an agreement to support ODBC on Macintosh and Windows in the spring of 1993, giving ODBC industry standard status in both popular graphical user interface environments. ODBC implementations for UNIXbased systems soon followed. In 1995, the ODBC interface effectively became an ANSI/ ISO standard, with the publication of the SQL/Call-Level Interface (CLI) standard. Over the past decade, ODBC has continued to evolve, but at a slower pace. Microsoft still supports ODBC, but has focused major effort into building higher-level, more object-oriented

PART I

broadly accepted by the marketplace. For the most part, programmers and users tend to stick with those parts of the language that are the same across a broad range of products. The innovation of the database vendors continues to drive the invention of most new SQL extensions. Some of these fail to gain traction and fade from the language over time. Others are introduced and remain years later only for backward compatibility; others find commercial success and move into the mainstream and eventually find their way into the official standard.

29

30

Part I:

An Overview of SQL

interfaces for universal database access. However, ODBC still plays a major role providing portability across databases for major enterprise applications and for database tools. It’s quite common for a database tool or an enterprise application to support specific “drivers” that optimize it for direct access to Oracle and DB2 and SQL Server using their proprietary call-level interfaces. The application will typically include an additional driver that uses ODBC as a way to support a broad range of other databases. Because so many applications and tools adopt this approach, nearly all database vendors offer ODBC access, sometimes as their primary call-level interface and sometimes as a supplement to a higher-performance, proprietary interface.

JDBC and Application Servers The explosive popularity of the Internet drove the further development of database access standards to support the accompanying rise of the object-oriented Java programming language. Java eventually became the standard language for building Internet-delivered applications that ran on Java-based application servers. Sun Microsystems, the inventor of Java, led the effort to standardize the use of Java for application servers through the Java2 Enterprise Edition (J2EE) specification. J2EE included Java Database Connectivity (JDBC) as its standard for Java access to relational databases. Unlike database access for the C programming language, where proprietary call-language interfaces predated ODBC by many years, the JDBC standard was developed relatively early in the explosion of Java popularity. As a result, proprietary Java interfaces failed to emerge, and JDBC is the standard for SQL access from the Java language.

SQL and Portability The existence of published SQL standards has spawned quite a few exaggerated claims about SQL and applications portability. Diagrams such as the one in Figure 3-1 are frequently drawn to show how an application using SQL can work interchangeably with any SQL-based database management system. In fact, the differences between SQL dialects are significant enough that an application must often be modified when moved from one

FIGURE 3-1

The SQL portability myth

Chapter 3:

SQL in Perspective

• Data types The SQL standard has evolved to address an ever-broader set of data types, but vendors keep adding new ones. Even older data types can cause portability issues—Oracle’s NUMBER data type, for example, is the most widely used to represent numeric data in an Oracle database, and its peculiarities are completely unique to Oracle. • Backward compatibility It’s not uncommon for enterprise applications to still be in use 10 or 20 years after they were first written, long after the programmers who developed them are gone. These programs tend to become “untouchable,” since the detailed knowledge of how they work has often been lost. Large sections of these programs may depend on older, proprietary SQL features, and database vendors are forced to maintain backward compatibility with them or risk “breaking” the applications. This perpetuates dialect differences that inhibit portability. • System tables The SQL standard addressed the system tables that provide information about the structure of the database itself starting with the SQL-92 standard. By this time, database vendors had built their own proprietary system table structures, and they have continued to evolve them, often containing useful information that goes well beyond the items specified in the standard. Applications that use these proprietary system tables are not portable. • Programmatic interface The early SQL standard specified an abstract technique for using SQL from within an applications program written in COBOL, C, FORTRAN, and other programming languages, which was not widely adopted. The 1995 SQL/ CLI standard finally addressed programmatic SQL access, but by then, commercial DBMS products had popularized proprietary interfaces and deeply embedded them in hundreds of thousands of user applications and application packages. Although standard APIs are now widely supported, most database vendors still maintain proprietary interfaces that offer higher performance and richer functionality, with the side-effect of locking in applications. • Semantic differences Because the standards specify certain details as implementer-defined, it’s possible to run the same query against two different conforming SQL implementations and produce two different sets of query results. Examples of these differences can be found in areas like the handling of NULL values, column functions, and duplicate row elimination. • Replication and data mirroring Many production databases contain tables that are replicated in two or more geographically separated databases, to provide high availability or disaster recovery, to spread out processing workloads, or to reduce network delays. The techniques for specifying and managing these replication schemes are proprietary to each database system, and attempts to standardize replication have been abandoned.

PART I

SQL database to another. Over time, the core of the language has become more standard and has broadened, but at the same time, new capabilities have been added by the database vendors, often with proprietary language extensions. Examples of areas where these differences arise include

31

32

Part I:

An Overview of SQL

• Error codes The SQL-92 standard introduced standard error codes to be returned when SQL detects an error, but all of the popular database systems had long been using their own proprietary error codes by this time. Even when used in a mode with standard error codes, proprietary extensions can generate their own errors that are outside the specified standard codes. • Database structure The SQL-89 standard specified the SQL language to be used once a particular database has been opened and is ready for processing. The details of database naming and how the initial connection to the database is established were already diverse and not portable by the time this initial standard was written. The SQL-92 standard created more uniformity, but the standard cannot completely mask these implementation details. Despite these differences, commercial database tools boasting portability across several different brands of SQL databases began to emerge in the early 1990s and are broadly popular today. In practice, these tools always include specific drivers for communicating with each of the major DBMS brands, which generate the appropriate SQL dialect, handle data type conversion, translate error codes, and so on.

SQL and Networking The dramatic growth of computer networking in the 1990s had a major impact on database management and gave SQL a new prominence. As networks became more common, applications that traditionally ran on a central minicomputer or mainframe moved to local area networks of desktop workstations and servers. In these networks, SQL plays a crucial role as the link between an application running on a desktop workstation with a graphical user interface and the DBMS that manages shared data on a cost-effective server. More recently, the exploding popularity of the Internet and the World Wide Web has reinforced the network role for SQL. In the emerging three-tier Internet architecture, SQL once again provides the link between the application logic (now running in the middle tier, on an application server or web server) and the database residing in the back-end tier. The next few sections discuss the evolution of database network architectures and the role of SQL in each one.

Centralized Architecture Figure 3-2 shows the traditional database architecture used by DB2 and the original minicomputer databases such as Oracle and Ingres. In this architecture, the DBMS and the physical data both reside on a central minicomputer or mainframe system, along with the application program that accepts input from the user’s terminal and displays data on the user’s screen. The application program communicates with the DBMS using SQL.

FIGURE 3-2

Database management in a centralized architecture

Chapter 3:

SQL in Perspective

File Server Architecture The introduction of personal computers and local area networks led to the development of the file server architecture, shown in Figure 3-3. In this architecture, an application running on a personal computer can transparently access data located on a file server, which stores shared files. When a PC application requests data from a shared file, the networking software automatically retrieves the requested block of the file from the server. Early PC databases, such as dBASE and later Microsoft’s Access, supported this file server approach, with each personal computer running its own copy of the DBMS software. For typical queries that retrieve only one row or a few rows from the database, this architecture provides excellent performance, because each user has the full power of a personal computer running its own copy of the DBMS. However, consider the query made in the previous example. Because the query requires a sequential scan of the database, the DBMS repeatedly requests blocks of data from the database, which is physically located across the network on the server. Eventually, every block of the file will be requested and sent across the network. Obviously, this architecture produces very heavy network traffic and slow performance for queries of this type.

FIGURE 3-3

Database management in a file server architecture

PART I

Suppose the user types a query that requires a sequential search of a database, such as a request to find the average order size for all orders. The DBMS receives the query, scans through the database fetching each record of data from the disk (or memory), calculates the average, and displays the result on the terminal screen. Both the application processing and the database processing occur on the central computer, so execution of this type of query (and in fact, all kinds of queries) is very efficient. The disadvantage of the centralized architecture is scalability. As more and more users are added, each of them adds application processing workload to the system. Because the system is shared, each user experiences degraded performance as the system becomes more heavily loaded.

33

34

Part I:

An Overview of SQL

Client/Server Architecture Figure 3-4 shows the next stage of network database evolution—the client/server database architecture. In this scheme, personal computers are combined in a local area network with a database server that stores shared databases. The functions of the DBMS are split into two parts. Database front-ends, such as interactive query tools, report writers, and application programs, run on the personal computer. The back-end database engine that stores and manages the data runs on the server. As the client/server architecture grew in popularity during the 1990s, SQL became the standard database language for communication between the front-end tools and the back-end engine in this architecture. Consider once more the query requesting the average order size. In the client/server architecture, the query travels across the network to the database server as a SQL request. The database engine on the server processes the request and scans the database, which also resides on the server. When the result is calculated, the database engine sends it back across the network as a single reply to the initial request, and the front-end application displays it on the PC screen. The client/server architecture reduces the network traffic and splits the database workload. User-intensive functions, such as handling input and displaying data, are concentrated on the user’s PC. Data-intensive functions, such as file I/O and query processing, are concentrated in the database server. Most importantly, the SQL provides a well-defined interface between the front-end and back-end systems, communicating database access requests in an efficient manner. By the mid-1990s, these advantages made the client/server architecture the most popular scheme for implementing new applications. All of the most popular DBMS products—Oracle, Informix, Sybase, SQL Server, DB2, and many more—offered client/ server capability. The database industry grew to include many companies offering tools for building client/server applications. Some of these came from the database companies themselves; others came from independent companies. Like all architectures, client/server had its disadvantages. The most serious of these was the problem of managing the applications software that was now distributed across hundreds or thousands of desktop PCs instead of running on a central minicomputer or mainframe. To update an application program in a large company, the information systems department had to update thousands of PC systems, one at a time. The situation was even worse if changes to the application program had to be synchronized with changes to other applications, or to the

FIGURE 3-4

Database management in a client/server architecture

Chapter 3:

SQL in Perspective

Multitier Architecture With the emergence of the Internet and especially the World Wide Web, network database architecture took another step in its evolution. At first, the Web was used to access (browse) static documents and evolved outside of the database world. But as the use of web browsers became widespread, it wasn’t long before companies thought about using them as a simple way to provide access to corporate databases as well. For example, suppose a company starts using the Web to provide product information to its customers by making product descriptions and graphics available on its web site. A natural next step is to give customers access to current product availability information through the same web browser interface. This requires linking the web server to the database system that stores the (constantly changing) current product inventory levels. The methods used to link web servers and DBMS systems evolved rapidly in the late 1990s and early 2000s, and have converged on the three-tier network architecture shown in Figure 3-5. The user interface is a web browser running on a PC or some other thin client

FIGURE 3-5

Database management in a three-tier Internet architecture

PART I

DBMS system itself. In addition, with personal computers on user’s desks, users tended to add new personal software of their own or to change the configuration of their systems. Such changes often disrupted existing applications, adding to the support burden. Companies developed strategies to deal with these issues, but by the late 1990s, there was growing concern about the manageability of client/server applications on large, distributed PC networks.

35

36

Part I:

An Overview of SQL

device, such as smart phone, in the front-end tier. It communicates with a web server in the middle tier. When the user request is for something more complex than a simple web page, the web server passes the request to an application server, whose role is to handle the business logic required to process the request. Often the request will involve access to an existing (legacy) application running on a mainframe system or to a corporate database. These systems run in the back-end tier of the architecture. As with the client/server architecture, SQL is solidly entrenched as the standard database language for communicating between the application server and back-end databases. All of the packaged application server products provide a SQL-based callable API for database access. As much of the application server market has converged around the Java2 Enterprise Edition (J2EE) standard, Java Database Connectivity (JDBC) has emerged as the leading standard API for application server access to databases.

The Proliferation of SQL As the standard for relational database access, SQL has had a major impact on all parts of the computer market. IBM’s SQL-based DB2 dominates mainframe data management. Oracle’s SQL-based database dominates the market for UNIX-based computer systems and servers. Microsoft’s SQL Server dominates on server-oriented Windows operating systems for workgroups and departmental applications. MySQL dominates the open-source database market. SQL is accepted as a technology for online transaction processing (OLTP), fully refuting the conventional wisdom of the 1980s that relational databases would never offer performance good enough for transaction processing applications. SQL-based data warehousing and data mining applications are the standard for helping companies to discover customer purchase patterns and to offer better products and services. On the Internet, SQL-based databases are the foundation of more personalized products, services, and information services that are a key benefit of electronic commerce.

SQL on Mainframes Although IBM’s hierarchical IMS database is still offered on IBM mainframes and still runs many high-performance mainframe applications, IBM’s SQL-based DB2 has been its flagship mainframe database for more than two decades. IBM offers DB2 implementations across different computer systems architectures, but the mainframe DB2 is still the “mother ship,” generating the vast majority of IBM’s database revenues. Any new database development on mainframe systems today uses DB2, cementing SQL’s dominant role for mainframe data management.

SQL on Minicomputers Minicomputers were one of the most fertile early markets for SQL-based database systems. Oracle and Ingres were both originally marketed on Digital’s VAX/VMS minicomputer systems. Both products were subsequently ported to many other platforms. Sybase, a later database system specialized for online transaction processing, also targeted the VAX as one of its primary platforms. Through the 1980s, the minicomputer vendors also developed their own proprietary relational databases featuring SQL. Digital considered relational databases so important that it bundled a runtime version of its Rdb/VMS database with every VAX/VMS system.

Chapter 3:

SQL in Perspective

SQL on UNIX-Based Systems SQL is firmly established as the data management solution of choice for UNIX-based computer systems. Originally developed at Bell Laboratories, UNIX became very popular in the 1980s as a vendor-independent, standard operating system. It runs on a wide range of computer systems, from workstations to mainframes, and has become the standard operating system for high-end server systems, including database servers. In the early 1980s, four major databases were already available for UNIX systems. Two of them, Ingres and Oracle, were UNIX versions of the products that ran on DEC’s proprietary minicomputers. The other two, Informix and Unify, were written specifically for UNIX. Neither of them originally offered SQL support, but by 1985, Unify offered a SQL query language, and Informix had been rewritten as Informix-SQL, with full SQL support. Today, the Oracle database dominates the UNIX-based database market and is available on all of the leading UNIX server platforms. Informix was acquired by IBM, which still offers the product for its own and other UNIX-based servers. UNIX-based (and increasingly, Linux-based) database servers are a mainstream building block for both client/server and three-tier Internet architectures. The constant search for higher SQL database performance has driven some of the most important trends in UNIX system hardware. These include the emergence of symmetric multiprocessing (SMP) as a mainstream server architecture, the development of multicore microprocessors which took SMP to the chip level, and the use of RAID (Redundant Array of Independent Disks) technology to boost I/O performance.

SQL on Personal Computers Databases have been popular on personal computers since the early days of the IBM PC. Ashton-Tate’s dBASE product reached an installed base of over 1 million MS-DOS-based PCs. Although these early PC databases often presented data in tabular form, they lacked the full power of a relational DBMS and a relational database language such as SQL. The first SQL-based PC databases were versions of popular minicomputer products that barely fit on personal computers. For example, Professional Oracle for the IBM PC, introduced in 1984, required two megabytes of memory—well above the typical 640KB PC configuration of the day. The real impact of SQL on personal computers began with the announcement of OS/2 by IBM and Microsoft in April 1987. In addition to the standard OS/2 product, IBM announced a proprietary OS/2 Extended Edition (OS/2 EE) with a built-in SQL database

PART I

Hewlett-Packard offered Allbase, a database that supported both its HPSQL dialect and a nonrelational interface. Data General’s DG/SQL database replaced its older nonrelational databases as DG’s strategic data management tool. In addition, many of the minicomputer vendors resold relational databases from the independent database software vendors. These efforts helped to establish SQL as an important technology for midrange computer systems. By the mid-1990s, the minicomputer vendors’ SQL products had largely disappeared, beaten in the marketplace by multiplatform software from Oracle, Informix, Sybase, and others. Oracle acquired Digital’s Rdb, and the other products were gradually dropped. Paralleling this trend, the importance of proprietary minicomputer operating systems faded, replaced by widespread use of UNIX on midrange systems. Yesterday’s minicomputer SQL market has effectively become today’s market for UNIX-based database servers based on SQL.

37

38

Part I:

An Overview of SQL

and communications support. With the introduction, IBM again signaled its strong commitment to SQL, saying in effect that SQL was so important that it belonged in the computer’s operating system. OS/2 Extended Edition presented Microsoft with a problem. As the developer and distributor of standard OS/2 to other personal computer manufacturers, Microsoft needed an alternative to the Extended Edition. Microsoft responded by licensing the Sybase DBMS, which had been developed for VAX, and began porting it to OS/2. In January 1988, in a surprise move, Microsoft and Ashton-Tate (the PC database leader at the time with its dBASE product) announced that they would jointly sell the resulting OS/2-based product, renamed SQL Server. Microsoft would sell SQL Server with OS/2 to computer manufacturers; AshtonTate would sell the product through retail channels to PC users. In September 1989, Lotus Development (the other member of the big three of PC software at the time) added its endorsement of SQL Server by investing in Sybase. Later that year, Ashton-Tate relinquished its exclusive retail distribution rights and sold its investment to Lotus. SQL Server for OS/2 met with only limited success (as did the OS/2 operating system itself). But in typical Microsoft fashion, Microsoft continued to invest heavily in SQL Server development and ported it to its Windows NT operating system. For a while, Microsoft and Sybase remained partners, with Sybase focused on the minicomputer and UNIX-based server markets and Microsoft focused on PC LANs and Windows NT. As Windows NT and UNIX systems became more and more competitive as database server operating system platforms, the relationship became less cooperative and more competitive. Eventually, Sybase and Microsoft went their separate ways. The common heritage of Sybase’s and Microsoft’s SQL products can still be seen in product capabilities and some common SQL extensions (for example, stored procedures), but the product lines have already diverged significantly. Today, SQL Server is a major database system on Windows-based servers. It has had a major new release every two to three years, adding major capabilities in areas as diverse as XML processing, special data, full-text search, data warehousing and analytics, and high availability. While UNIX-based servers and Oracle databases continue to dominate the largest database server installations, server configurations of the Windows operating system and the Intel architecture systems on which it runs have achieved credibility in the midrange market.

SQL and Transaction Processing SQL and relational databases originally had very little impact in online transaction processing (OLTP) applications. With their emphasis on queries, relational databases were confined to decision support and low-volume online applications, where their slower performance was not a disadvantage. For OLTP applications, where hundreds of users needed online access to data and subsecond response times, IBM’s nonrelational Information Management System (IMS) reigned as the dominant DBMS. In 1986, a new DBMS vendor, Sybase, introduced a new SQL-based database especially designed for OLTP applications. The Sybase DBMS ran on VAX/VMS minicomputers and Sun workstations, and focused on maximum online performance. Oracle Corporation and Relational Technology followed shortly with announcements that they, too, would offer OLTP versions of their popular Oracle and Ingres database systems. In the UNIX market, Informix announced an OLTP version of its DBMS, named Informix-Turbo.

Chapter 3:

SQL in Perspective

SQL and Workgroup Databases The dramatic growth of PC LANs through the 1980s and 1990s created a new opportunity for departmental or workgroup database management. The original database systems focused on this market segment ran on IBM’s OS/2 operating system. In fact, SQL Server, now a key part of Microsoft’s Windows strategy, originally made its debut as an OS/2 database product. In the mid-1990s, Novell also made a concentrated effort to make its NetWare operating system an attractive workgroup database server platform. From the earliest days of PC LANs, NetWare had become established as the dominant network operating system for file and print servers. Through deals with Oracle and others, Novell sought to extend this leadership to workgroup database servers as well. The arrival of Windows NT, a specialized version of Windows tuned for server use, was the catalyst that caused the workgroup database market to really take off. While NetWare offered a clear performance advantage over NT as a workgroup file server, NT had a more robust, general-purpose architecture, more like the minicomputer operating systems. Microsoft successfully positioned NT as a more attractive platform for running workgroup applications (as an application server) and workgroup databases. Microsoft’s own SQL Server product was marketed (and often bundled) with NT as a tightly integrated workgroup database platform. Corporate information systems departments were at first very cautious about using relatively new and unproven technology, but the NT/SQL Server combination allowed departments and non-IS executives to undertake smaller-scale, workgroup-level projects on their own, without corporate IS help. This phenomenon, like the grassroots support for personal computers a decade earlier, fueled the early growth of the workgroup database segment.

PART I

In 1988, IBM jumped on the relational OLTP bandwagon with DB2 Version 2, with benchmarks showing the new version operating at over 250 transactions per second on large mainframes. IBM claimed that DB2 performance was now suitable for all but the most demanding OLTP applications and encouraged customers to consider it as a serious alternative to IMS. OLTP benchmarks became a standard sales tool for relational databases, despite serious questions about how well the benchmarks actually measure performance in real applications. The suitability of SQL for OLTP improved dramatically over the next decade, with advances in relational technology and more powerful computer hardware both leading to ever-higher transaction rates. DBMS vendors started to position their products based on their OLTP performance, and for a few years database advertising focused almost entirely on these performance benchmark wars. A vendor-independent organization, the Transaction Processing Council, jumped into the benchmarking fray with a series of vendorindependent benchmarks (TPC-A, TPC-B, and TPC-C), which only served to intensify the performance focus of the vendors. By the early 2000s, SQL-based relational databases on high-end UNIX-based database servers evolved well past the 1000-transactions-per-second mark. Client/server systems using SQL databases have become the accepted architecture for implementing OLTP applications. From a position as “unsuitable for OLTP,” SQL has grown to be the industry standard foundation for building OLTP applications.

39

40

Part I:

An Overview of SQL

Today, SQL is well established as a workgroup database standard. In addition to Microsoft’s newer versions of Windows for servers, Linux has emerged as a very popular platform for workgroup servers. Microsoft SQL Server and Oracle share the largest part of the market, but open source databases like MySQL have emerged as a very strong and costeffective alternative. Postgres, another open source product developed at the University of California at Berkeley as a follow-on to Ingres, has also gained a smaller but loyal following in this segment.

SQL, Data Warehousing, and Business Intelligence For several years, the effort to make SQL a viable technology for OLTP applications shifted the focus away from the original relational database strengths of query processing and decision making. Performance benchmarks and competition among the major DBMS brands focused on simple transactions like adding a new order to the database or determining a customer’s account balance. Because of the power of the relational database model, the databases that companies used to handle daily business operations could also be used to analyze the growing amounts of data that were being accumulated. A frequent theme of conferences and tradeshow speeches for IS managers was that a corporation’s accumulated data (stored in SQL databases, of course) should be treated as a valuable asset and used to help improve the quality of business decision making. Although relational databases could, in theory, easily perform both OLTP and decisionmaking applications, there were some very significant practical problems. OLTP workloads consisted of many short database transactions, and the response time for users was very important. In contrast, decision-support queries could involve sequential scans of large database tables to answer questions like “What is the average order size by sales region?” or “How do inventory trends compare with the same time a year ago?” These queries could take minutes or hours. If a business analyst tried to run one of these queries during a time when business transaction volumes reached their peak, it could cause serious degradation in OLTP performance. Another problem was that the data to answer useful questions about business trends was often spread across many different databases, typically involving different DBMS vendors and different computer platforms. The desire to take advantage of accumulated business data, and the practical performance problems it caused for OLTP applications, led to the concept of a data warehouse, shown in Figure 3-6. Business data is extracted from OLTP systems, reformatted and validated as necessary, and then placed into a separate database that is dedicated to decision-making queries (the “warehouse”). The data extraction and transformation can be scheduled for off-hours batch processing. Ideally, only new or changed data can be extracted, minimizing the amount of data to be processed in the monthly, weekly, or daily warehouse refresh cycle. With this scheme, the time-consuming business analysis queries use the data warehouse, not the OLTP database, as their source of data. SQL-based relational databases were a clear choice for the warehouse data store because of their flexible query processing. A series of new companies was formed to build the data extraction, transformation, and database query tools needed by the data warehouse model. In addition, DBMS vendors started to focus on the kinds of database queries that customers tended to run in the data warehouse. These queries tended to be large and complex—such as analyzing tens or hundreds of millions of individual cash-register receipts looking for product purchase patterns. They often involved time-series data—for example, analyzing

Chapter 3:

SQL in Perspective

41

PART I

FIGURE 3-6

The data warehousing concept

product sales or market share data over time. They also tended to involve statistical summaries of data—total sales, average order volume, percent growth, and so on—rather than the individual data items themselves. To address the specialized needs of data warehousing applications (often called Online Analytical Processing or OLAP), specialized databases began to appear. These databases were optimized for OLAP workloads in several different ways. Their performance was tuned for complex, read-only query access. They supported advanced statistical and other data functions, such as built-in time-series processing. They supported precalculation of database statistical data, so that retrieving averages and totals could be dramatically faster. Some of these specialized databases did not use SQL, but many did (leading to the companion term ROLAP, for Relational Online Analytical Processing). As the market for data warehousing continued to evolve, the tools to tap the warehouse emerged as an important market segment in their own right, often labeled business intelligence. The lines between the vendors that supplied the warehouse databases, the tools to populate them, and the tools to analyze data gradually blurred as the market grew. Three of the largest business intelligence vendors became successful public companies in their own right before

42

Part I:

An Overview of SQL

being acquired by three of the industry giants. Business Objects was acquired by SAP, the leading vendor of enterprise applications. Hyperion was acquired by Oracle, and Cognos was acquired by IBM. As with so many segments of the IT market, SQL’s advantages as a standard proved to be a powerful force, and SQL-based data warehouses and analytic tools are firmly entrenched.

SQL and Internet Applications During the late 1990s, the World Wide Web and the web browsing capability that it enabled were the driving force behind the growth of the Internet. With its focus on delivering content in the form of text and graphics, the early uses of the Web had little to do with data management. By the mid-1990s, however, much of the content delivered from corporate web sites had its origins in SQL-based corporate databases. For example, on a commercial web site for a retailer, web pages that contain information about products available for sale, their prices, product availability, special promotions, and the like are typically created on demand, based on data retrieved from a SQL database. The vast majority of the pages displayed by an online auction site or by an online travel site are similarly based on data retrieved from SQL databases, transformed into the Web’s HTML page format. In the other direction, data entered by a user into browser page forms is almost always captured into SQL databases that form part of the web site architecture. By the early 2000s, industry attention had turned to the next phase of the Internet, and the role that Internet technologies can play in connecting computer applications to one another. These distributed applications architectures received widespread trade press coverage under the banner of web services. In the longstanding tradition of the computer industry, competing camps emerged, championing different sets of standards and languages for implementing them—a Microsoft-led camp under the .NET Framework, and a rival camp focused on Java and J2EE-based application servers. Both architectures embrace a key role for XML, a standard for exchanging structured data like the data that resides in SQL databases. In response to the industry attention on web services, a flurry of products has been announced that link XML-formatted messages to SQL-based databases. Startup database vendors and some of the object database vendors announced XML-based database products, arguing that they provide an ideal, native match for the XML-formatted exchange of data over the Internet. The established relational database players responded with their own XML initiatives, adding XML input/output capabilities, and then native XML data type support, to their products. Tighter integration between XML and SQL remains an active area of investment by all of the major database vendors today. The Internet approach to scalability is also having a major impact on database software products. Many Internet software elements operate at “Internet scale” through a horizontal scaling approach, spreading out their workload across dozens or hundreds of low-cost commodity servers. The Google search engine is one of the most extreme examples of this architecture, where even a single search can be distributed across dozens of servers, and the total search volume is distributed across tens of thousands of servers, all located in the “Internet cloud.” There are major challenges to applying this approach to database management, but providing data management “in the cloud” is an active topic of research and development in the database community.

Chapter 3:

SQL in Perspective

43

Summary • SQL was originally developed by IBM researchers, and IBM’s strong support of SQL was a key reason for its early success. • There is an official ANSI/ISO SQL standard, which has grown tremendously in scope and complexity since its debut in 1986. • Despite the existence of a standard, there are many small variations among commercial SQL dialects; no two SQL implementations are exactly the same. • SQL has become the standard database management language across a broad range of computer systems and applications areas, including mainframes, workstations, personal computers, OLTP systems, client/server systems, data warehousing, and the Internet.

PART I

This chapter described the development of SQL and its role as a standard language for relational database management:

This page intentionally left blank

4

CHAPTER

Relational Databases

D

atabase management systems organize and structure data so that it can be saved and retrieved by users and application programs. The data structures and access techniques provided by a particular DBMS are called its data model. A data model determines both the “personality” of a DBMS and the applications for which it is particularly well-suited. SQL is a database language for databases that use the relational data model. What exactly is a relational database? How is data stored in a relational database? How do relational databases compare with earlier technologies, such as hierarchical and network databases? What are the advantages and disadvantages of the relational model? This chapter describes the relational data model supported by SQL and compares it with earlier strategies for database organization.

Early Data Models As database management became popular during the 1970s and 1980s, a handful of popular data models emerged. Each of these early data models had advantages and disadvantages that played key roles in the development of the relational data model. In many ways, the relational data model represented an attempt to streamline and simplify the earlier data models. To understand the role and contribution of SQL and the relational model, it is useful to briefly examine some data models that preceded the development of SQL, some of which are still in use today.

File Management Systems Before the introduction of database management systems, all data permanently stored on a computer system, such as payroll and accounting records, was stored in individual files. A file management system, usually provided as part of the computer’s operating system, kept track of the names and locations of the files. File management systems are still widely used today—you are probably familiar with the files-and-folders structure provided by the file system on Microsoft Windows or Apple’s Macintosh operating systems. Similar file systems are used by UNIX-based servers and all commercial computer systems. A file management system basically has no data model; it knows nothing about the internal contents of files. At best, the file system might maintain “file type” information

45

46

Part I:

An Overview of SQL

along with the filename, allowing it to distinguish between a word processing document and a file containing payroll data. But knowledge about the internal contents of a file—what individual pieces of data it contains and how that data is organized—is embedded in the application programs that use the file, as shown in Figure 4-1. In this payroll application, each of the COBOL programs that processes the employee master file contains a file description (FD) that describes the layout of the data in the file. If the structure of the data changes—for example, if an additional item of data is to be stored for each employee— every program that accesses the file has to be modified. This isn’t a problem for files containing word processing documents or spreadsheets, which are usually processed by a single program. But in corporate data processing, files are often shared among dozens or even hundreds of programs, as in Figure 4-1. As the number of files and programs grows over time, more and more of a data-processing department’s effort goes into maintaining existing applications rather than developing new ones. The problems of maintaining large file-based systems led in the late 1960s to the development of database management systems. The idea behind these systems was simple: move the definition of a file’s content and structure out of the individual programs, and store it, together with the data, in a database. Using the information in the database, the DBMS that controlled it could take a much more active role in managing both the data and changes to the database structure. Moreover, DBMSs are an extension of file management systems rather than a replacement for them. DBMSs use file management systems (usually the ones supplied with the operating system) to store the database structures. The database user then references the DBMS and the DBMS handles the physical storage details. It is this layer of abstraction that provides physical data independence.

FIGURE 4-1

A payroll application using a file management system

Chapter 4:

Relational Databases

47

Hierarchical Databases

• Find a particular part by number (such as the left door) • Move “down” to the first child (the door handle) • Move “up” to its parent (the body) • Move “sideways” to the next child (the right door) Retrieving the data in a hierarchical database thus required navigating through the records: moving up, down, and sideways one record at a time.

FIGURE 4-2

A hierarchical bill-of-materials database

PART I

One of the most important applications for the earliest database management systems was managing operations for manufacturing companies. If an automobile manufacturer decided to produce 10,000 units of one car model and 5000 units of another model, it needed to know how many parts to order from its suppliers. To answer the question, the product (a car) had to be decomposed into hundreds of assemblies (engine, body, chassis), which were decomposed into thousands of subassemblies (valves, cylinders, spark plugs), and then into sub-subassemblies, and so on. Handling this list of parts, known as a bill of materials was a job tailor-made for computers. The bill of materials for a product has a natural hierarchical structure. To store this data, the hierarchical data model, illustrated in Figure 4-2, was developed. In this model, each record in the database represented a specific part. The records had parent/child relationships, linking each part to its subpart, and so on. To access the data in the database, a program could perform the following tasks:

48

Part I:

An Overview of SQL

One of the most popular hierarchical database management systems was IBM’s Information Management System (IMS), first introduced in 1968. The advantages of IMS and its hierarchical model follow. • Simple structure The organization of an IMS database was easy to understand. The database hierarchy paralleled that of a company organization chart or a family tree. • Parent/child organization An IMS database was excellent for representing parent/child relationships, such as “A is a part of B” or “A is owned by B.” • Performance IMS stored parent/child relationships as physical pointers from one data record to another, so that movement through the database was rapid. IMS is still a widely used DBMS on IBM mainframes. Its raw performance makes it ideal for very high volume transaction-processing applications such as processing credit card transactions or booking airline reservations. Dramatic improvements in relational database performance over the last two decades have narrowed IMS’s performance advantage, but the large amount of corporate data stored in IMS databases and the large number of mature applications that process that data ensure that IMS use will continue for many years to come.

Network Databases The simple structure of a hierarchical database became a disadvantage when the data had a more complex structure. In an order-processing database, for example, a single order might participate in three different parent/child relationships, linking the order to the customer who placed it, the salesperson who took it, and the product ordered, as shown in Figure 4-3. This type of data structure simply didn’t fit the strict hierarchy of IMS. To deal with applications such as order processing, a new network data model was developed. The network data model extended the hierarchical model by allowing a record to participate in multiple parent/child relationships, called sets, as shown in Figure 4-4.

FIGURE 4-3

Multiple parent/child relationships

Chapter 4:

Relational Databases

49

PART I

FIGURE 4-4

A network (CODASYL) order-processing database

In 1971, the Conference on Data Systems Languages published an official standard for network databases, called the CODASYL model. IBM never developed a network DBMS, but during the 1970s, independent software companies rushed to embrace the network model, creating products such as Cullinet’s IDMS, Cincom’s Total, and the Adabas DBMS that became very popular. However, IBM enhanced IMS to provide a workaround to the single-parent rule in classic hierarchical structures, calling the additional parents logical parents. The data model became known as the extended hierarchical model, and it made IMS a direct competitor with the network DBMS products. For a programmer, accessing a network database was very similar to accessing a hierarchical database. An application program could do the following: • Find a specific parent record by key (such as a customer number) • Move down to the first child in a particular set (the first order placed by this customer) • Move sideways from one child to the next in the set (the next order placed by the same customer) • Move up from a child to its parent in another set (the salesperson who took the order) Once again, the programmer had to navigate the database record by record, this time specifying which relationship to navigate as well as the direction. Network databases had several advantages: • Flexibility Multiple parent/child relationships allowed a network database to represent data that did not have a simple hierarchical structure. • Standardization The CODASYL standard boosted the popularity of the network model, making it easier for programmers to move between DBMS products. • Performance Parent/child sets were represented by pointers to physical data records, allowing rapid navigation through these relationships.

50

Part I:

An Overview of SQL

Network databases had their disadvantages, too. Like hierarchical databases, they were very rigid. The set relationships and the structure of the records had to be specified in advance. Changing the database structure typically required rebuilding the entire database. Both hierarchical and network databases were tools for programmers. To answer a question such as “What is the most popular product ordered by Acme Manufacturing?” or “How many orders are there for Size 4 Widgets?” a programmer had to write a program that navigated its way through the database, found the appropriate records, and calculated the result. The backlog of requests for custom reports often stretched to weeks or months, and by the time the program was written, the information it delivered was often worthless. The disadvantages of the hierarchical and network models led to intense interest in the new relational data model when it was first described by Ted Codd in 1970. At first the relational model was little more than an academic curiosity. Network databases continued to be important throughout the 1970s and early 1980s, particularly on the minicomputer systems that were surging in popularity. However, by the mid-1980s, the relational model was clearly emerging as the “new wave” in data management. By the early 1990s, network and hierarchical databases were clearly declining in importance, and today they play only a minor role in the database market.

The Relational Data Model The relational model proposed by Codd was an attempt to simplify database structure. It eliminated the explicit parent/child structures from the database and instead represented all data in the database as simple row/column tables of data values. Figure 4-5 shows a relational version of the network order-processing database in Figure 4-4.

FIGURE 4-5

A relational order-processing database

Chapter 4:

Relational Databases

A relational database is a database where all data visible to the user is organized strictly as tables of data values, and where all database operations work on these tables. The definition is intended specifically to rule out any user-visible structures such as the embedded pointers of a hierarchical or network database. A relational DBMS can represent parent/child relationships, but they are visible only through the data values contained in the database tables.

The Sample Database Figure 4-6 shows a small relational database for an order-processing application. This sample database is used throughout this book and provides the basis for most of the examples. Appendix A contains a complete description of the database structure and its contents. Figure 4-6 shows only a few rows of each table – the complete contents of each table are included in Appendix A.

SALESREPS Table EMPL_NUM

NAME

105

Bill Adams

109

Mary Jones

102

Sue Smith

AGE

REP_OFFICE

TITLE

HIRE_DATE

MANAGER

QUOTA

SALES

37

13

Sales Rep

2006-02-12

104

$350,000.00

$367,911.00

31

11

Sales Rep

2007-10-12

106

$300,000.00

$392.725.00

48

21

Sales Rep

2004-12-10

108

$350,000.00

$474,050.00

PRODUCTS Table MFR_ID PRODUCT_ID

DESCRIPTION

REI ACI

2A45C 4100Y

Ratchet Link Widget Remover

QSA

Xk47

Reducer

PRICE

QTY_ON_HAND

$79.00

210 25

$2,750.00 $355.00

38

ORDERS Table ORDER_NUM ORDER_DATE

CUST

REP

MFR

PRODUCT

2007-12-17 2008-01-11 2008-01-03

2117 2111 2101

106 105 106

REI ACI FEA

2A44L 41003 114X

112961 113012 112989

OFFICES Table OFFICE CITY 22 11

Denver

12

Chicago

New York

REGION

MGR

Western Eastern Eastern

108 106 104

QTY

AMOUNT

7 35 6

$31,500.00 $3,745.00 $1,458.00

TARGET

SALES

$300,000.00 $186,042.00 $575,000.00 $692,637.00 $800,000.00 $735,042.00

CUSTOMERS Table CUST_NUM 2111 2102 2103

FIGURE 4-6

The sample database (partial listing)

COMPANY JCP Inc. First Corp. Acme Mfg.

CUST_REP

CREDIT_LIMIT

103 101

$50,000.00 $65,000.00

105

$50,000.00

PART I

Codd’s work produced a precise, mathematical definition of a relational database, and a theoretical basis for the database operations that could be performed on it. However, a more informal definition of a relational database is useful:

51

52

Part I:

An Overview of SQL

The sample database contains five tables. Each table stores information about one particular kind of entity: • The SALESREPS table stores the employee number, name, age, year-to-date sales, and other data about each salesperson. • The PRODUCTS table stores data about each product available for sale, such as the manufacturer, product number, description, and price. • The ORDERS table keeps track of every order placed by a customer, identifying the salesperson who took the order, the product ordered, the quantity and amount of the order, and so on. For simplicity, each order is for only one product. • The OFFICES table stores data about each of the sales offices, including the city where the office is, the sales region it belongs to, and so on. • The CUSTOMERS table stores data about each customer, such as the company name, credit limit, and the salesperson who calls on the customer.

Tables The organizing principle in a relational database is the table, a rectangular row/column arrangement of data values. Each table in a database has a unique table name that identifies its contents. (Actually, each user can choose his or her own table names without worrying about the names chosen by other users, as explained in Chapter 5.) The row/column structure of a table is shown more clearly in Figure 4-7, which is an enlarged view of the OFFICES table. Each horizontal row of the OFFICES table represents a single physical entity—a single sales office. Together the five rows of the table represent all five of the company’s sales offices. All of the data in a particular row of the table applies to the office represented by that row. Each column of the OFFICES table represents one item of data that is stored in the database for each office. For example, the CITY column holds the location of each office. The SALES column contains each office’s year-to-date sales total. The MGR column shows the employee number of the person who manages the office. Each row of a table contains exactly one data value in each column. In the row representing the New York office, for example, the CITY column contains the value “New York”. The SALES column contains the value “$692,637.00 ”, which is the year-to-date sales total for the New York office. For each column of a table, all of the data values in that column hold the same type of data. For example, all of the CITY column values are text, all of the SALES values are money amounts, and all of the MGR values are integers (representing employee numbers). The set of data values that a column can contain is called the domain of the column. The domain of the CITY column is the set of all names of cities. The domain of the SALES column is any money amount. The domain of the REGION column is just two data values, “Eastern ” and “Western ”, because those are the only two sales regions the company has. Each column in a table has a column name, which is usually written as a heading at the top of the column. The columns of a table must all have different names, but there is no prohibition against two columns in two different tables having identical names. In fact, frequently used column names such as NAME, ADDRESS, QTY, PRICE, and SALES are often found in many different tables of a production database.

Chapter 4:

Relational Databases

53

PART I

FIGURE 4-7

The row/column structure of a relational table

The columns of a table have a left-to-right order, which is defined when the table is first created. A table always has at least one column. The ANSI/ISO SQL standard does not specify a maximum number of columns in a table, but almost all commercial SQL products do impose a limit, which is seldom less than 255 columns. Unlike the columns, the rows in a table do not have any particular order. In fact, if you use two consecutive database queries to display the contents of a table, there is no guarantee that the rows will be listed in the same order twice. Of course you can ask SQL to sort the rows before displaying them, but the sorted order has nothing to do with the actual arrangement of the rows within the table. A table can have any number of rows. A table of zero rows is perfectly legal and is called an empty table (for obvious reasons). An empty table still has a structure, imposed by its columns; it simply contains no data. The ANSI/ISO standard does not limit the number of rows in a table, and many SQL products will allow a table to grow until it exhausts the available storage. Other SQL products impose a limit, but it is always a very generous one— 2 billion rows or more is common.

Primary Keys Because the rows of a relational table are unordered, you cannot select a specific row by its position in the table. There is no “first row,” “last row,” or “13th row” of a table. How, then, can you specify a particular row, such as the row for the Denver sales office? In a well-designed relational database, every table has some column or combination of columns whose values uniquely identify each row in the table. This column (or columns) is called the primary key of the table. Look once again at the OFFICES table in Figure 4-7.

54

Part I:

An Overview of SQL

At first glance, either the OFFICE column or the CITY column could serve as a primary key for the table. But if the company expands and opens two sales offices in the same city, the CITY column could no longer serve as the primary key. In practice, “ID numbers” such as an office number (OFFICE in the OFFICES table), an employee number (EMPL_NUM in the SALESREPS table), and customer numbers (CUST_NUM in the CUSTOMERS table) are often chosen as primary keys. In the case of the ORDERS table, you have no choice—the only thing that uniquely identifies an order is its order number (ORDER_NUM). The PRODUCTS table, part of which is shown in Figure 4-8, is an example of a table where the primary key must be a combination of columns. The MFR_ID column identifies the manufacturer of each product in the table, and the PRODUCT_ID column specifies the manufacturer’s product number. The PRODUCT_ID column might appear to make a good primary key, but there’s nothing to prevent two different manufacturers from using the same number for their products. Therefore, a combination of the MFR_ID and PRODUCT_ID columns must be used as the primary key of the PRODUCTS table. Every product in the table is guaranteed to have a unique combination of data values in these two columns. The primary key has a different unique value for each row in a table, so no two rows of a table with a primary key are exact duplicates of one another. A table where every row is different from all other rows is called a relation in mathematical terms. The name “relational database” comes from this term, because relations (tables with distinct rows) are at the heart of a relational database. Although primary keys are an essential part of the relational data model, early relational database management systems (System/R, DB2, Oracle, and others) did not provide explicit support for primary keys. Database designers usually ensured that all of the tables in their databases had a primary key, but the DBMS itself did not provide a way to identify the primary key of a table. DB2 Version 2, introduced in April 1988, was the first of IBM’s commercial SQL products to support primary keys. The ANSI/ISO standard was subsequently expanded to include a definition of primary key support, and today, nearly all relational database management systems provide it.

FIGURE 4-8

A table with a composite primary key

Chapter 4:

Relational Databases

55

Relationships

FIGURE 4-9

A parent/child relationship in a relational database

PART I

One of the major differences between the relational model and earlier data models is that explicit pointers such as the parent/child relationships of a hierarchical database are banned from relational databases. Yet, obviously, these relationships exist in a relational database. For example, in the sample database, each salesperson is assigned to a particular sales office, so there is an obvious relationship between the rows of the OFFICES table and the rows of the SALESREPS table. Doesn’t the relational model “lose information” by banning these relationships from the database? As shown in Figure 4-9, the answer to the question is “no.” The figure shows a close-up of a few rows of the OFFICES and SALESREPS tables. Note that the REP_OFFICE column of the SALESREPS table contains the office number of the sales office where each salesperson works. The domain of this column (the set of legal values it may contain) is precisely the set of office numbers found in the OFFICE column of the OFFICES table. In fact, you can find the sales office where Mary Jones works by finding the value in Mary’s REP_OFFICE column (11) and finding the row of the OFFICES table that has a matching value in the OFFICE column (in the row for the New York office). Similarly, to find all the salespeople who work in New York, you could note the OFFICE value for the New York row (11) and then scan down the REP_OFFICE column of the SALESREPS table looking for matching values (in the rows for Mary Jones and Sam Clark). The parent/child relationship between a sales office and the people who work there isn’t lost by the relational model; it’s just not represented by an explicit pointer visible to the user. Instead, the relationship is represented by common data values stored in the two tables. All relationships in a relational database are represented this way. One of the main goals of the SQL is to let you retrieve related data from the database by manipulating these relationships in a simple, straightforward way.

56

Part I:

An Overview of SQL

Foreign Keys A column in one table whose value matches the primary key in some other table is called a foreign key. In Figure 4-9, the REP_OFFICE column is a foreign key for the OFFICES table. Although REP_OFFICE is a column in the SALESREPS table, the values that this column contains are office numbers. They match values in the OFFICE column, which is the primary key for the OFFICES table. Together, a primary key and a foreign key create a parent/child relationship between the tables that contain them, just like the parent/child relationships in a hierarchical database. Just as a combination of columns can serve as the primary key of a table, a foreign key can also be a combination of columns. In fact, the foreign key will always be a compound (multicolumn) key when it references a table with a compound primary key. Obviously, the number of columns and the data types of the columns in the foreign key and the primary key must be identical to one another. A table can contain more than one foreign key if it is related to more than one other table. Figure 4-10 shows the three foreign keys in the ORDERS table of the sample database: • The CUST column is a foreign key for the CUSTOMERS table, relating each order to the customer who placed it. • The REP column is a foreign key for the SALESREPS table, relating each order to the salesperson who took it. • The MFR and PRODUCT columns together are a composite foreign key for the PRODUCTS table, relating each order to the product being ordered. The multiple parent/child relationships created by the three foreign keys in the ORDERS table may seem familiar to you, and they should. They are precisely the same relationships as those in the network database of Figure 4-4. As the example shows, the relational data model has all of the power of the network model to express complex relationships.

FIGURE 4-10

Multiple parent/child relationships in a relational database

Chapter 4:

Relational Databases

Codd’s 12 Rules for Relational Databases* As the relational database model started to become very popular in the mid-1980s, every DBMS vendor scrambled to describe their product as “relational.” Some of these products had only a SQL-like query language layered on top of an underlying network or hierarchical database. Some of them implemented only a very rudimentary table structure and no query language at all. Soon the question of “What is a true relational database?” became a topic of debate, and DBMS vendors began claiming that their products were “more relational” than the competition. In 1985, Ted Codd, whose seminal technical article 15 years earlier had defined the relational data model, addressed this question in Computerworld, one of the leading trade publications. In his two-part article, entitled Is Your DBMS Really Relational? (October 14, 1985) and Does Your DBMS Run By the Rules? (October 21, 1985), Codd presented 12 rules that a database must obey if it is to be considered truly relational: 1. Information rule. All information in a relational database is represented explicitly at the logical level and in exactly one way—by values in tables. 2. Guaranteed access rule. Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name. 3. Systematic treatment of NULL values. NULL values (distinct from an empty character string or a string of blank characters and distinct from zero or any other number) are supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of the data type. 4. Dynamic online catalog based on the relational model. The database description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data. 5. Comprehensive data sublanguage rule. A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings, and that is comprehensive in supporting all of the following items: • Data definition • View definition • Data manipulation (interactive and by program) • Integrity constraints • Authorization • Transaction boundaries (begin, commit, and rollback)

PART I

Foreign keys are a fundamental part of the relational model because they create relationships among tables in the database. As with primary keys, foreign key support was missing from early relational database management systems. They were added to DB2 Version 2, were subsequently added to the ANSI/ISO standard, and now appear in all of the major commercial products.

57

58

Part I:

An Overview of SQL

6. View updating rule. All views that are theoretically updateable are also updateable by the system. 7. High-level insert, update, and delete. The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data, but also to the insertion, update, and deletion of data. 8. Physical data independence. Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods. 9. Logical data independence. Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables. 10. Integrity independence. Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs. 11. Distribution independence. A relational DBMS has distribution independence. 12. Nonsubversion rule. If a relational system has a low-level (single record at a time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher-level relational language (multiple records at a time). Although the controversy has long since died out, the 12 rules are interesting from a historical perspective, because they resolved the issue once and for all, and they do offer a good informal working definition. Rule 1 is basically the single-sentence fundamental definition presented earlier in this chapter; the others provide additional refinement and requirements. Rule 2 stresses the importance of primary keys for locating data in the database. The table name locates the correct table, the column name finds the correct column, and the primary key value finds the row containing an individual data item of interest. Rule 3 requires support for missing data through NULL values, which are described in Chapter 5. Rule 4 requires that a relational database be self-describing, through system tables whose columns describe the structure of the database itself. These tables are described in Chapter 16. Rule 5 mandates using a relational database language, such as SQL, although SQL is not specifically required. The language must be able to support all the central functions of a DBMS, not just database queries. Rule 6 deals with views, which are virtual tables used to give various users of a database different views of its structure. Views are described in Chapter 14. Rule 7 stresses the set-oriented nature of a relational database. It requires that rows be treated as sets in insert, delete, and update operations. It prohibits systems that support only row-at-a-time, navigational modification of the database. Rule 8 and Rule 9 insulate the user or application program from the low-level implementation of the database and even from changes in the structure of the tables. Rule 10 says that the database language should support the ability to define restrictions on the data that can be entered and the database modifications that can be made.

Chapter 4:

Relational Databases

Summary SQL is based on the relational data model that organizes the data in a database as a collection of tables: • Each table has a table name that uniquely identifies it. • Each table has one or more named columns, which are arranged in a specific, left-toright order. • Each table has zero or more rows, each containing a single data value in each column. The rows are unordered. • All data values in a given column have the same data type and are drawn from a set of legal values called the domain of the column. Tables are related to one another by the data they contain. The relational data model uses primary keys and foreign keys to represent these relationships among tables: • A primary key is a column or combination of columns in a table whose value(s) uniquely identify each row of the table. A table has only one primary key. • A foreign key is a column or combination of columns in a table whose value(s) are a primary key value for some other table. A table can contain more than one foreign key, linking it to one or more other tables. • A primary key/foreign key combination creates a parent/child relationship between the tables that contain them.

PART I

Rule 11 says that the database language must be able to manipulate distributed data located on other computer systems if the DBMS supports it. Finally, Rule 12 prevents “other paths” into the database that might subvert its relational structure and integrity.

59

This page intentionally left blank

II

PART

Retrieving Data

Q

ueries are the heart of SQL, and many people use SQL as a database query tool. The next five chapters describe SQL queries in depth. Chapter 5 describes the basic SQL structures that you use to form SQL statements. Chapter 6 discusses simple queries that draw data from a single table of data. Chapter 7 expands the discussion to multitable queries. Queries that summarize data are described in Chapter 8. Finally, Chapter 9 explains the SQL subquery capability that is used to handle complex queries.

CHAPTER 5 SQL Basics CHAPTER 6 Simple Queries CHAPTER 7 Multitable Queries (Joins) CHAPTER 8 Summary Queries CHAPTER 9 Subqueries and Query Expressions

This page intentionally left blank

5

CHAPTER

SQL Basics

T

his chapter begins a detailed description of the features of SQL. It describes the basic structure of a SQL statement and the basic elements of the language, such as keywords, data types, and expressions. How SQL handles missing data through NULL values is also described. Although these are basic features of SQL, they have some subtle differences in the way they are implemented by various popular SQL products, and in many cases, the SQL products provide significant extensions to the capabilities specified in the ANSI/ISO SQL standard. These differences and extensions are also described in this chapter.

Statements The main body of SQL consists of about 40 statements. The most important and frequently used statements are summarized in Table 5-1. (Note that not all SQL implementations support all these statements.) Each statement requests a specific action from the DBMS, such as creating a new table, retrieving data, or inserting new data into the database. All SQL statements have the same basic form, illustrated in Figure 5-1. Every SQL statement begins with a verb, a keyword that describes what the statement does. CREATE, INSERT, DELETE, and COMMIT are typical verbs. The statement continues with one or more clauses. A clause may specify the data to be acted on by the statement or provide more detail about what the statement is supposed to do. Every clause also begins with a keyword, such as WHERE, FROM, INTO, and HAVING. Some clauses are optional; others are required. The specific structure and content vary from one clause to another. Many clauses contain table or column names; some may contain additional keywords, constants, or expressions. The ANSI/ISO SQL standard specifies a set of reserved keywords and nonreserved keywords that are used within SQL statements. According to the standard, reserved keywords cannot be used as the exact name of database objects, such as tables, columns, and users. Many SQL implementations relax this restriction, but it’s generally a good idea

63

64

Part II:

Retrieving Data

Statement

Description

Data Manipulation SELECT

Retrieves data from the database

INSERT

Adds new rows of data to the database

UPDATE

Modifies existing database data

MERGE

Conditionally inserts/updates/deletes new and existing rows

DELETE

Removes rows of data from the database

Data Definition CREATE TABLE

Adds a new table to the database

DROP TABLE

Removes a table from the database

ALTER TABLE

Changes the structure of an existing table

CREATE VIEW

Adds a new view to the database

DROP VIEW

Removes a view from the database

CREATE INDEX

Builds an index for a column

DROP INDEX

Removes the index for a column

CREATE SCHEMA

Adds a new schema to the database

DROP SCHEMA

Removes a schema from the database

CREATE DOMAIN

Adds a new data value domain

ALTER DOMAIN

Changes a domain definition

DROP DOMAIN

Removes a domain from the database

Access Control GRANT

Grants user access privileges

REVOKE

Removes user access privileges

CREATE ROLE

Adds a new role to the database

GRANT ROLE

Grants role containing user access privileges

DROP ROLE

Removes a role from the database

Transaction Control COMMIT

Ends the current transaction

ROLLBACK

Aborts the current transaction

SET TRANSACTION

Defines data access characteristics of the current transaction

START TRANSACTION

Explicitly starts a new transaction

SAVEPOINT

Establishes a recovery point for a transaction

TABLE 5-1

Major SQL Statements

Chapter 5:

Statement

SQL Basics

65

Description

Programmatic SQL Defines a cursor for a query

EXPLAIN

Describes the data access plan for a query

OPEN

Opens a cursor to retrieve query results

FETCH

Retrieves a row of query results

CLOSE

Closes a cursor

PREPARE

Prepares a SQL statement for dynamic execution

EXECUTE

Executes a SQL statement dynamically

DESCRIBE

Describes a prepared query

TABLE 5-1

Major SQL Statements (continued)

to avoid the keywords when you name your tables and columns. Table 5-2 lists the reserved keywords included in the ANSI/ISO SQL:2006 standard. It’s also best to avoid the use of nonreserved keywords in naming database objects, because they are candidates for reserved keywords in future revisions of the standard. The nonreserved keywords in the SQL:2006 standard are listed in Table 5-3. Throughout this book, the acceptable forms of a SQL statement are illustrated by a syntax diagram, such as the one shown in Figure 5-2. A valid SQL statement or clause is constructed by “following the line” through the syntax diagram to the dot that marks the end of the diagram. Keywords in the syntax diagram and in the examples (such as DELETE and FROM in Figure 5-2) are always shown in UPPERCASE, but almost all SQL implementations accept both uppercase and lowercase keywords, and it’s often more convenient to actually type them in lowercase.

FIGURE 5-1

The structure of a SQL statement

PART II

DECLARE

66

Part II:

Retrieving Data

ABS

ALL

ALLOCATE

ALTER

AND

ANY

ARE

ARRAY

AS

ASENSITIVE

ASYMMETRIC

AT

ATOMIC

AUTHORIZATION

AVG

BEGIN

BETWEEN

BIGINT

BINARY

BLOB

BOOLEAN

BOTH

BY

CALL

CALLED

CARDINALITY

CASCADED

CASE

CAST

CEIL

CEILING

CHAR

CHAR_LENGTH

CHARACTER

CHARACTER_LENGTH

CHECK

CLOB

CLOSE

COALESCE

COLLATE

COLLECT

COLUMN

COMMIT

CONDITION

CONNECT

CONSTRAINT

CONVERT

CORR

CORRESPONDING

COUNT

COVAR_POP

COVAR_SAMP

CREATE

CROSS

CUBE

CUME_DIST

CURRENT

CURRENT_DATE

CURRENT_DEFAULT_ TRANSFORM_GROUP

CURRENT_PATH

CURRENT_ROLE

CURRENT_TIME

CURRENT_TIMESTAMP

CURRENT_TRANSFORM_ GROUP_FOR_TYPE

CURRENT_USER

CURSOR

CYCLE

DATE

DAY

DEALLOCATE

DEC

DECIMAL

DECLARE

DEFAULT

DELETE

DENSE_RANK

DEREF

DESCRIBE

DETERMINISTIC

DISCONNECT

DISTINCT

DOUBLE

DROP

DYNAMIC

EACH

ELEMENT

ELSE

END

END-EXEC

ESCAPE

EVERY

EXCEPT

EXEC

EXECUTE

EXISTS

EXP

EXTERNAL

EXTRACT

FALSE

FETCH

FILTER

FLOAT

FLOOR

FOR

FOREIGN

FREE

FROM

FULL

FUNCTION

FUSION

GET

GLOBAL

GRANT

GROUP

GROUPING

HAVING

HOLD

HOUR

IDENTITY

IN

INDICATOR

INNER

INOUT

INSENSITIVE

INSERT

INT

INTEGER

INTERSECT

INTERSECTION

INTERVAL

INTO

IS

JOIN

LANGUAGE

LARGE

LATERAL

LEADING

LEFT

LIKE

LN

TABLE 5-2

SQL:2006 Reserved Keywords

Chapter 5:

SQL Basics

LOCALTIME

LOCALTIMESTAMP

LOWER

MATCH

MAX

MEMBER

MERGE

METHOD

MIN

MINUTE

MOD

MODIFIES

MODULE

MONTH

MULTISET

NATIONAL

NATURAL

NCHAR

NCLOB

NEW

NO

NONE

NORMALIZE

NOT

NULL

NULLIF

NUMERIC

OCTET_LENGTH

OF

OLD

ON

ONLY

OPEN

OR

ORDER

OUT

OUTER

OVER

OVERLAPS

OVERLAY

PARAMETER

PARTITION

PERCENT_RANK

PERCENTILE_CONT

PERCENTILE_DISC

POSITION

POWER

PRECISION

PREPARE

PRIMARY

PROCEDURE

RANGE

RANK

READS

REAL

RECURSIVE

REF

REFERENCES

REFERENCING

REGR_AVGX

REGR_AVGY

REGR_COUNT

REGR_INTERCEPT

REGR_R2

REGR_SLOPE

REGR_SXX

REGR_SXY

REGR_SYY

RELEASE

RESULT

RETURN

RETURNS

REVOKE

RIGHT

ROLLBACK

ROLLUP

ROW

ROW_NUMBER

ROWS

SAVEPOINT

SCOPE

SCROLL

SEARCH

SECOND

SELECT

SENSITIVE

SESSION_USER

SET

SIMILAR

SMALLINT

SOME

SPECIFIC

SPECIFICTYPE

SQL

SQLEXCEPTION

SQLSTATE

SQLWARNING

SQRT

START

STATIC

STDDEV_POP

STDDEV_SAMP

SUBMULTISET

SUBSTRING

SUM

SYMMETRIC

SYSTEM

SYSTEM_USER

TABLE

TABLESAMPLE

THEN

TIME

TIMESTAMP

TIMEZONE_HOUR

TIMEZONE_MINUTE

TO

TRAILING

TRANSLATE

TRANSLATION

TREAT

TRIGGER

TRIM

TRUE

UESCAPE

UNION

UNIQUE

UNKNOWN

UNNEST

UPDATE

UPPER

USER

USING

VALUE

VALUES

VAR_POP

VAR_SAMP

VARCHAR

VARYING

WHEN

WHENEVER

WHERE

WIDTH_BUCKET

WINDOW

WITH

WITHIN

WITHOUT

YEAR

TABLE 5-2

SQL:2006 Reserved Keywords (continued)

PART II

LOCAL

67

68

Part II:

Retrieving Data

ABSOLUTE

ACTION

ADA

ADD

ADMIN

AFTER

ALWAYS

ASC

ASSERTION

ASSIGNMENT

ATTRIBUTE

ATTRIBUTES

BEFORE

BERNOULLI

BREADTH

CASCADE

CATALOG

CATALOG_NAME

CHAIN

CHARACTER_SET_CATALOG

CHARACTER_SET_NAME

CHARACTER_SET_SCHEMA

CHARACTERISTICS

CHARACTERS

CLASS_ORIGIN

COBOL

COLLATION

COLLATION_CATALOG

COLLATION_NAME

COLLATION_SCHEMA

COLUMN_NAME

COMMAND_FUNCTION

COMMAND_FUNCTION_CODE

COMMITTED

CONDITION_NUMBER

CONNECTION

CONNECTION_NAME

CONSTRAINT_CATALOG

CONSTRAINT_NAME

CONSTRAINT_SCHEMA

CONSTRAINTS

CONSTRUCTOR

CONTAINS

CONTINUE

CURSOR_NAME

DATA

DATETIME_INTERVAL_CODE

DATETIME_INTERVAL_PRECISION

DEFAULTS

DEFERRABLE

DEFERRED

DEFINED

DEFINER

DEGREE

DEPTH

DERIVED

DESC

DESCRIPTOR

DIAGNOSTICS

DISPATCH

DOMAIN

DYNAMIC_FUNCTION

DYNAMIC_FUNCTION_CODE

EQUALS

EXCEPTION

EXCLUDE

EXCLUDING

FINAL

FIRST

FOLLOWING

FORTRAN

FOUND

GENERAL

GENERATED

GO

GOTO

GRANTED

IMMEDIATE

IMPLEMENTATION

INCLUDING

INCREMENT

INITIALLY

INPUT

INSTANCE

INSTANTIABLE

INVOKER

ISOLATION

KEY

KEY_MEMBER

KEY_TYPE

LAST

LENGTH

LEVEL

LOCATOR

MAP

MATCHED

MAXVALUE

MESSAGE_LENGTH

MESSAGE_OCTET_LENGTH

MESSAGE_TEXT

MINVALUE

MORE

TABLE 5-3

SQL:2006 Nonreserved Keywords

Chapter 5:

SQL Basics

NAME

NAMES

NESTING

NEXT

NORMALIZED

NULLABLE

NULLS

NUMBER

OBJECT

OCTETS

OPTION

OPTIONS

ORDERING

ORDINALITY

OTHERS

OUTPUT

OVERRIDING

PAD

PARAMETER_MODE

PARAMETER_NAME

PARAMETER_ORDINAL_POSITION

PARAMETER_SPECIFIC_CATALOG

PARAMETER_SPECIFIC_NAME

PARAMETER_SPECIFIC_SCHEMA

PARTIAL

PASCAL

PATH

PLACING

PLI

PRECEDING

PRESERVE

PRIOR

PRIVILEGES

PUBLIC

READ

RELATIVE

REPEATABLE

RESTART

RESTRICT

RETURNED_CARDINALITY

RETURNED_LENGTH

RETURNED_OCTET_LENGTH

RETURNED_SQLSTATE

ROLE

ROUTINE

ROUTINE_CATALOG

ROUTINE_NAME

ROUTINE_SCHEMA

ROW_COUNT

SCALE

SCHEMA

SCHEMA_NAME

SCOPE_CATALOG

SCOPE_NAME

SCOPE_SCHEMA

SECTION

SECURITY

SELF

SEQUENCE

SERIALIZABLE

SERVER_NAME

SESSION

SETS

SIMPLE

SIZE

SOURCE

SPACE

SPECIFIC_NAME

STATE

STATEMENT

STRUCTURE

STYLE

SUBCLASS_ORIGIN

TABLE_NAME

TEMPORARY

TIES

TOP_LEVEL_COUNT

TRANSACTION

TRANSACTION_ACTIVE

TRANSACTIONS_COMMITTED

TRANSACTIONS_ROLLED_BACK

TRANSFORM

TRANSFORMS

TRIGGER_CATALOG

TRIGGER_NAME

TRIGGER_SCHEMA

TYPE

UNBOUNDED

UNCOMMITTED

UNDER

UNNAMED

USAGE

USER_DEFINED_TYPE_CATALOG

USER_DEFINED_TYPE_CODE

USER_DEFINED_TYPE_NAME

USER_DEFINED_TYPE_SCHEMA

VIEW

WORK

WRITE

ZONE

TABLE 5-3

SQL:2006 Nonreserved Keywords (continued)

PART II

MUMPS

69

70

Part II:

Retrieving Data

FIGURE 5-2

A sample syntax diagram

Variable items in a SQL statement (such as the table name and search condition in Figure 5-2) are shown in lowercase italics. It’s up to you to specify the appropriate item value(s) each time the statement is used. Optional clauses and keywords, such as the WHERE clause in Figure 5-2, are indicated by alternate paths through the syntax diagram. When a choice of optional keywords is offered, the default choice (that is, the behavior of the statement if no keyword is specified) is UNDERLINED.

Names The objects in a SQL-based database are identified by assigning them unique names. Names are used in SQL statements to identify the database object on which the statement should act. The most fundamental named objects in a relational database are table names (which identify tables), column names (which identify columns), and user names (which identify users of the database); the original SQL1 standard specified conventions for naming these objects. Subsequent versions of the SQL standard significantly expanded the list of named entities to include schemas (collections of tables), constraints (restrictions on the contents of tables and their relationships), domains (sets of legal values that may be assigned to a column), and several other types of objects. Many SQL implementations support additional named objects such as stored procedures, primary key/foreign key relationships, data entry forms, and data replication schemes. The original ANSI/ISO standard specified that SQL names must contain 1 to 18 characters, must begin with a letter, and may not contain any spaces or special punctuation characters. The SQL2 standard increased the maximum to 127 characters (the standard actually specifies “less than 128”), and this remains unchanged through SQL:2006. In practice, the names supported by SQL-based DBMS products vary significantly. It’s common to see tighter restrictions on names that are connected to other software outside of the database (such as user names, which may correspond to login names used by an operating system), and looser restrictions on names that are private (internal) to the database. The various products also differ in the special characters they permit in table names. For portability, it’s best to keep names relatively short and to avoid the use of special characters, except of course the underscore character (_), which is used to separate words in SQL names.

Table Names When you specify a table name in a SQL statement, SQL assumes that you are referring to one of your own tables (that is, a table that you created). Usually, you will want to choose table names that are short but descriptive.

Chapter 5:

SQL Basics

SAM.BIRTHDAYS

A qualified table name generally can be used in a SQL statement wherever a table name can appear. The ANSI/ISO SQL standard generalizes the notion of a qualified table name even further. It allows you to create a named collection of tables, called a schema. You can refer to a table in a specific schema by using a qualified table name. For example, the BIRTHDAYS table in the EMPLOYEE_INFO schema would be referenced as EMPLOYEE_INFO.BIRTHDAYS

Chapter 13 provides more information about schemas, users, and other aspects of SQL database structure. For now, keep in mind that users and schemas are not the same, and in fact, one user can be the owner of multiple schemas.

Column Names When you specify a column name in a SQL statement, SQL can normally determine from the context which column you intend. However, if the statement involves two columns with the same name from two different tables, you must use a qualified column name to unambiguously identify the column you intend. A qualified column name specifies both the name of the table containing the column and the name of the column, separated by a period (.). For example, the column named SALES in the SALESREPS table has the qualified column name SALESREPS.SALES

If the column comes from a table owned by another user, a qualified table name is used in the qualified column name. For example, the BIRTH_DATE column in the BIRTHDAYS table owned by the user SAM is specified by the fully qualified column name SAM.BIRTHDAYS.BIRTH_DATE

Qualified column names can generally be used in a SQL statement wherever a simple (unqualified) column name can appear; exceptions are noted in the descriptions of the individual SQL statements.

PART II

The table names in the sample database (ORDERS, CUSTOMERS, OFFICES, SALESREPS) are good examples. In a personal or departmental database, the choice of table names is usually up to the database developer or designer. A larger, shared-use corporate database, may have corporate standards for naming tables, to ensure that table names are consistent and do not conflict. In addition, most DBMS brands allow different users to create tables with the same name (that is, both Joe and Sam can create a table named BIRTHDAYS). The DBMS uses the appropriate table, depending on which user is requesting data. With the proper permission, you can also refer to tables owned by other users, by using a qualified table name. A qualified table name specifies both the name of the table’s owner and the name of the table, separated by a period (.). For example, Joe could access the BIRTHDAYS table owned by Sam by using the qualified table name:

71

72

Part II:

Retrieving Data

Data Types The ANSI/ISO SQL standard specifies the various types of data that can be stored in a SQL-based database and manipulated by SQL. The original SQL1 standard specified only a minimal set of data types. Subsequent versions of the standard expanded this list to include variable-length character strings, date and time data, bit strings, Extensible Markup Language (XML), and other types. Today, commercial DBMS products can process a rich variety of data, and there is considerable diversity in the particular data types supported across different DBMS brands. Typical data types include the following: • Integers Columns holding this type of data typically store counts, quantities, ages, and so on. Integer columns are also frequently used to contain ID numbers, such as customer, employee, and order numbers. • Decimal numbers Columns with this data type store numbers that have fractional parts and that must be calculated exactly, such as rates and percentages. They are also frequently used to store money amounts. • Floating point numbers Columns with this data type are used to store scientific numbers that can be calculated approximately, such as weights and distances. Floating point numbers can represent a larger range of values than decimal numbers, but can produce round-off errors in computations. • Fixed-length character strings Columns holding this type of data typically store character strings that are always the same length, such as postal codes, state/ province abbreviations, short descriptions, and so on. Whenever the string to be stored is smaller than the length defined for a fixed-length column, it is padded with spaces so it fits the exact storage length. • Variable-length character strings This data type allows a column to store character strings that vary in length from row to row, up to some maximum length. (The SQL1 standard permitted only fixed-length character strings, which are easier for the DBMS to process but can waste considerable space.) Columns holding this type of data typically store names of people and companies, addresses, descriptions, and so on. Unlike fixed-length character strings, variable-length strings are not padded with spaces—the exact number of characters provided is stored, along with the length of the data string. • Money amounts Some SQL products support a MONEY or CURRENCY type, which is usually stored as a decimal or floating point number. Having a distinct money type allows the DBMS to properly format money amounts when they are displayed. However, the SQL Standard does not specify such a data type. • Dates and times Support for date/time values is also common in SQL products, although the details can vary considerably from one product to another, largely because vendors implemented these data types before the SQL standard was developed. Various combinations of dates, times, timestamps, time intervals, and date/time arithmetic are generally supported. The SQL standard includes an elaborate specification for DATE, TIME, TIMESTAMP, and INTERVAL data types, including support for time zones and time precision (for example, tenths or hundredths of seconds).

Chapter 5:

SQL Basics

73

• Boolean data Some SQL products, such as Microsoft SQL Server, support logical (TRUE or FALSE) values as an explicit type, and some permit logical operations (comparison, AND/OR, and so on) on the stored data within SQL statements.

• Large binary objects The SQL:1999 standard also added the BLOB data type that supports storing unstructured, variable-length sequences of bytes. Columns containing this data are used to store compressed video images, executable code, and other types of unstructured data. Prior to the publication of the standard, vendors implemented their own proprietary solutions, such as SQL Server’s IMAGE and Oracle’s LONG RAW data types, which can store up to 2 gigabytes of data. • Non-Roman characters As databases grew to support global applications, DBMS vendors added support for fixed-length and variable-length strings of multibyte characters used to represent Kanji and other Asian and Arabic characters using proprietary types such as the GRAPHIC and VARGRAPHIC data types in SQL Server. The ANSI/ISO standard now specifies national character set versions of the various character data types (NCHAR, NVARCHAR, and NCLOB). While most modern databases support storing and retrieving such characters (often using the UNICODE convention for representing them), support for searching and sorting on these types varies widely. Table 5-4 lists the data types specified in the ANSI/ISO SQL standard. The differences between the data types offered in various SQL implementations form one of the practical barriers to the portability of SQL-based applications. These differences have come about as a result of innovation as relational databases have evolved to include a broader range of capabilities. This has been the typical pattern: • A DBMS vendor adds a new data type that provides useful new capabilities for a certain group of users. • Other DBMS vendors add the same or similar data types, adding their own innovations to differentiate their products from the others. • Over several years, the popularity of the data type grows, and it becomes a part of the “mainstream” set of data types supported by most SQL implementations. • The standards bodies become involved to try to standardize the new data type and eliminate arbitrary differences between the vendor implementations. The more well-entrenched the data type has become, the more difficult the set of compromises faced by the standards group. Usually, this results in an addition to the standard that does not exactly match any of the current implementations.

PART II

• Large character objects The SQL:1999 standard added the CLOB data type that supports storing large character strings, up to a specified amount with a typical maximum length in the multi-gigabyte range. This allows the database to store entire documents, product descriptions, technical papers, résumés, and similar unstructured text data. Several SQL-based databases support proprietary data types (added before the SQL:1999 standard) capable of storing long text strings (typically up to 32,000 or 65,000 characters, and in some cases even larger). The DBMS usually restricts the use of large character columns in interactive queries and searches.

74

Part II:

Retrieving Data

• DBMS vendors slowly add support for the new standardized data type as an option to their systems, but because they have a large installed base that is using the older (now “proprietary”) version of the data type, they must maintain support for this form of the data type as well. • Over a very long period (typically several major releases of the DBMS product), users migrate to the new, standardized form of the data type, and the DBMS vendor can begin the process of phasing out the proprietary version. Data Type

Abbreviation(s)

Description

CHARACTER(len)

CHAR

Fixed-length character strings

CHARACTER VARYING(len)

CHAR VARYING, VARCHAR

Variable-length character strings

CHARACTER LARGE OBJECT(len)

CLOB

Large fixed-length character strings

NATIONAL CHARACTER(len)

NATIONAL CHAR, NCHAR

Fixed-length national character strings

NATIONAL CHARACTER VARYING(len)

NATIONAL CHAR VARYING, NCHAR

Variable-length national character strings

NATIONAL CHARACTER LARGE OBJECT(len)

NCLOB

Large variable-length national character strings

BIT(len)

Fixed-length bit strings

BIT VARYING(len)

Variable-length bit strings

INTEGER

INT

Integers

SMALLINT

Small integers

NUMERIC(precision, scale)

Decimal numbers

DECIMAL(precision, scale)

DEC

Decimal numbers

FLOAT(precision)

Floating point numbers

REAL

Low-precision floating point numbers

DOUBLE PRECISION

High-precision floating point numbers

DATE

Calendar dates

TIME(precision)

Clock times

TIME WITH TIME ZONE (precision)

Clock times with time zones

TIMESTAMP(precision)

Dates and times

TIMESTAMP WITH TIME ZONE (precision)

Dates and times with time zones

INTERVAL

Time intervals

XML(type modifier [secondary type modifier])

Character data formatted as Extensible Markup Language (XML)

TABLE 5-4

ANSI/ISO SQL Data Types

Chapter 5:

SQL Basics

75

Date/time data provides an excellent example of this phenomenon and the data type variations it creates. DB2 offered early date/time support, with three different date/time data types: • DATE

Stores a date like June 30, 2008

• TIME

Stores a time of day like 12:30:00 P.M.

• TIMESTAMP A specific instant in history, with a precision down to the nanosecond

SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE >= '05/30/2007' + 15 DAYS;

SQL Server was introduced with a single date/time data type, called DATETIME, which closely resembles the DB2 TIMESTAMP data type. If HIRE_DATE contained DATETIME data, SQL Server could accept this version of the query (without the date arithmetic): SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE >= '06/14/2007';

Since no specific time on June 14, 2007, is specified in the query, SQL Server defaults to midnight on that date. The SQL Server query thus really means SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE >= '06/14/2007 12:00AM';

SQL Server also supports date arithmetic through a set of built-in functions. Thus, the DB2-style query can also be specified in this way: SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE >= DATEADD(DAY, 15, '05/30/2007')

which is considerably different from the DB2 syntax. Oracle has long supported date/time data with a single data type called DATE. (Note, however, that Oracle added support for the SQL Standard DATETIME and TIMESTAMP data types starting with Oracle 9i.) Like SQL Server’s DATETIME type, an Oracle DATE is, in fact, a timestamp. Also as with SQL Server, the time part of an Oracle DATE value defaults to midnight if no time is explicitly specified. The default Oracle date format is different from the DB2 and SQL Server formats, so the Oracle version of the query becomes SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE >= '14-JUN-07';

PART II

Specific dates and times can be specified as string constants, and date arithmetic is supported. Here is an example of a valid query using DB2 dates, assuming that the HIRE_ DATE column contains DATE data:

76

Part II:

Retrieving Data

Oracle also supports limited date arithmetic, so the DB2-style query can also be specified, but without the DAYS keyword: SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE >= '30-MAY-07' + 15;

Note, however, that this statement requires the DBMS to implicitly convert the string to an appropriate date data type before adding 15 to it, and that not all SQL implementations support such conversion. Oracle, for example, will report an error unless a function such as TO_DATE or CAST converts the character string to an Oracle DATE or DATETIME type before attempting date arithmetic. Fortunately, with the advent of the year 2000 conversion, most DBMS vendors added universal support for dates in SQL statements with four-digit years in a standard YYYYMM-DD format, which we use for most of the examples in this book. In Oracle’s case, the default format is still as shown in the preceding examples, but it can be changed at either the database or user session with a simple command. If you are using Oracle and you try any of the examples in this book, simply enter this command to change your default date format: ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD';

Care must be taken when forming queries that search for exact date matches using the equal (=) operator, and the dates have time components stored in them. Consider the following example: SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE = '06/14/2007';

If a salesperson’s hire date were stored in the database as noon on June 14, 2007, the salesperson would not be included in the query results from Oracle or SQL Server databases. The DBMS would assume a time of midnight for the string supplied with the SQL statement, and since midnight is not equal to noon, the row would not be selected. On the other hand, for a DB2 database, where the time is not stored with a DATE data type, the row would appear in the query results. Finally, starting with SQL2, the ANSI/ISO standard added support for date/time data with a set of data types based on, but not identical to, the DB2 types. In addition to the DATE, TIME, and TIMESTAMP data types, the standard specifies an INTERVAL data type, which can be used to store a time interval (for example, a timespan measured in days, or a duration measured in hours, minutes, and seconds). The standard also provides a very elaborate and complex method for dealing with date/time arithmetic, specifying the precision of intervals, adjusting for time zone differences, and so on. Most SQL implementations now have support for these standard types. One notable exception, however, is that SQL Server has long used the TIMESTAMP data type for an entirely different purpose, so supporting the ANSI/ISO specification for it presents a very real challenge. As these examples illustrate, the subtle differences in data types among various SQL products lead to some significant differences in SQL statement syntax.

Chapter 5:

SQL Basics

77

They can even cause the same SQL query to produce slightly different results on different database management systems. The widely praised portability of SQL is thus true but only at a general level. An application can be moved from one SQL database to another, and it can be highly portable if it uses only the most mainstream, basic SQL capabilities. However, the subtle variations in SQL implementations mean that data types and SQL statements must almost always be adjusted somewhat if they are to be moved across DBMS brands. The more complex the application, the more likely it is to become dependent on DBMS-specific features and nuances, and the less portable it will become.

In some SQL statements, a numeric, character, or date data value must be expressed in text form. For example, in this INSERT statement which adds a salesperson to the database: INSERT INTO SALESREPS (EMPL_NUM, NAME, QUOTA, HIRE_DATE, SALES) VALUES (115, 'Dennis Irving', 175000.00, '2008-06-21', 0.00);

the value for each column in the newly inserted row is specified in the VALUES clause. Constant data values are also used in expressions, such as in this SELECT statement: SELECT CITY FROM OFFICES WHERE TARGET > (1.1 * SALES) + 10000.00;

The ANSI/ISO SQL standard specifies the format of numeric and string constants, or literals, which represent specific data values. These conventions are followed by most SQL implementations.

Numeric Constants Integer and decimal constants (also called exact numeric literals) are written as ordinary decimal numbers in SQL statements, with an optional leading plus or minus sign: 21

-375

2000.00

+497500.8778

You must not put a comma between the digits of a numeric constant, and not all SQL dialects allow the leading plus sign, so it’s best to avoid it. For money data, most SQL implementations simply use integer or decimal constants, although some allow the constant to be specified with a currency symbol: $0.75

$5000.00

$-567.89

Floating point constants (also called approximate numeric literals) are specified using the E notation commonly found in programming languages such as C and FORTRAN. Here are some valid SQL floating point constants: 1.5E3

-3.14159E1

2.5E-7

0.783926E21

The E is read “times ten to the power of,” so the first constant becomes “1.5 times ten to the third power,” or 1500.

PART II

Constants

78

Part II:

Retrieving Data

String Constants The ANSI/ISO standard specifies that SQL constants for character data be enclosed in single quotes ('. . . '), as in these examples: 'Jones, John J.'

'New York'

'Western'

If a single quote is to be included in the constant text, it is written within the constant as two consecutive single-quote characters. Thus, this constant value: 'I can''t'

becomes the seven-character string "I can't". Some SQL implementations, such as SQL Server, accept string constants enclosed in double quotes (". . ."): "Jones, John J."

"New York"

"Western"

Unfortunately, the double quotes can pose portability problems with other SQL products. The SQL standard provides the additional capability to specify string constants from a specific national character set (for example, French or German) or from a userdefined character set. The user-defined character set capabilities have typically not been implemented in mainstream SQL products.

Date and Time Constants In SQL products that support date/time data, constant values for dates, times, and time intervals are specified as string constants. The format of these constants varies from one DBMS to the next. Even more variation is introduced by the differences in the way dates and times are written in different countries. IBM’s DB2 supports several different international formats for date, time, and timestamp constants, as shown in Table 5-5. The choice of format is made when the DBMS is installed. DB2 also supports durations specified as special constants, as in this example: HIRE_DATE + 30 DAYS

Note that a duration can’t be stored in the database, however, because DB2 doesn’t have an explicit DURATION data type. SQL Server also supports date/time data and accepts a variety of different formats for date and time constants. The DBMS automatically accepts all of the alternate formats, and you can intermix them if you like. Here are some examples of legal SQL Server date constants: March 15, 2008

Mar 15 2008

3/15/2008

3-15-08

2008 MAR 15

Format Name

Date Format

Date Example

Time Format

Time Example

American

mm/dd/yyyy

5/19/2008

hh:mm am/pm

2:18 PM

European

dd.mm.yyyy

19.5.2008

hh.mm.ss

14.18.08

Japanese

yyyy-mm-dd

2008-5-19

hh:mm:ss

14:18:08

ISO

yyyy-mm-dd

2008-5-19

hh.mm.ss

14.18.08

TABLE 5-5

DB2 SQL Date and Time Formats

Chapter 5:

SQL Basics

79

and here are some legal time constants: 15:30:25

3:30:25 PM

3:30:25 pm

3 PM

Oracle dates and times are also written as string constants, using this format: 15-MAR-90

You can also use Oracle’s built-in TO_DATE() function to convert date constants written in other formats, as in this example:

The SQL2 standard specifies a format for date and time constants, based on the ISO format in Table 5-5, except that time constants are written with colons instead of periods separating the hours, minutes, and seconds. The SQL Standard TIMESTAMP type, not shown in the table, has a format of yyyy-mm-dd-hh.mm.ss.nnnnnn—for example “1960-0519-14.18.08.048632” represents 5/19/60 at roughly 2:18 p.m.

Symbolic Constants In addition to user-supplied constants, the SQL includes special symbolic constants that return data values maintained by the DBMS itself. For example, in some DBMS brands, the symbolic constant CURRENT_DATE yields the value of the current date and can be used in queries such as the following, which lists the salespeople whose hire date is still in the future: SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE > CURRENT_DATE;

The SQL1 standard specified only a single symbolic constant (the USER constant described in Chapter 15), but most SQL products provide many more. Generally, a symbolic constant can appear in a SQL statement anywhere that an ordinary constant of the same data type could appear. The SQL2 standard adopted the most useful symbolic constants from current SQL implementations and provides for CURRENT_DATE, CURRENT_TIME, and CURRENT_TIMESTAMP (note the underscores) as well as USER, SESSION_USER, and SYSTEM_USER. Some SQL products, including SQL Server, provide access to system values through built-in functions rather than symbolic constants. The SQL Server version of the preceding query is SELECT NAME, HIRE_DATE FROM SALESREPS WHERE HIRE_DATE > GETDATE();

Built-in functions are described later in this chapter, in the section “Built-In Functions.”

PART II

SELECT NAME, AGE FROM SALESREPS WHERE HIRE_DATE = TO_DATE('JUN 14 2007', 'MON DD YYYY');

80

Part II:

Retrieving Data

Expressions Expressions are used in the SQL to calculate values that are retrieved from a database and to calculate values used in searching the database. For example, this query calculates the sales of each office as a percentage of its target: SELECT CITY, TARGET, SALES, (SALES/TARGET) * 100 FROM OFFICES;

and this query lists the offices whose sales are more than $50,000 over target: SELECT CITY FROM OFFICES WHERE SALES > TARGET + 50000.00;