0% found this document useful (0 votes)
104 views1,422 pages

Postgresql 8.0 US PDF

Uploaded by

Maximo Sisti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views1,422 pages

Postgresql 8.0 US PDF

Uploaded by

Maximo Sisti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

PostgreSQL 8.0.

0 Documentation

The PostgreSQL Global Development Group


PostgreSQL 8.0.0 Documentation
by The PostgreSQL Global Development Group
Copyright © 1996-2005 by The PostgreSQL Global Development Group

Legal Notice

PostgreSQL is Copyright © 1996-2005 by the PostgreSQL Global Development Group and is distributed under the terms of the license of the
University of California below.
Postgres95 is Copyright © 1994-5 by the Regents of the University of California.
Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCI-
DENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IM-
PLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HERE-
UNDER IS ON AN “AS-IS” BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Table of Contents
Preface .........................................................................................................................................................i
1. What is PostgreSQL? ......................................................................................................................i
2. A Brief History of PostgreSQL..................................................................................................... ii
2.1. The Berkeley POSTGRES Project ................................................................................... ii
2.2. Postgres95........................................................................................................................ iii
2.3. PostgreSQL...................................................................................................................... iii
3. Conventions.................................................................................................................................. iii
4. Further Information.......................................................................................................................iv
5. Bug Reporting Guidelines.............................................................................................................iv
5.1. Identifying Bugs ................................................................................................................v
5.2. What to report....................................................................................................................v
5.3. Where to report bugs ...................................................................................................... vii
I. Tutorial....................................................................................................................................................1
1. Getting Started ...............................................................................................................................1
1.1. Installation .........................................................................................................................1
1.2. Architectural Fundamentals...............................................................................................1
1.3. Creating a Database...........................................................................................................2
1.4. Accessing a Database ........................................................................................................3
2. The SQL Language ........................................................................................................................6
2.1. Introduction .......................................................................................................................6
2.2. Concepts ............................................................................................................................6
2.3. Creating a New Table ........................................................................................................6
2.4. Populating a Table With Rows ..........................................................................................7
2.5. Querying a Table ...............................................................................................................8
2.6. Joins Between Tables.......................................................................................................10
2.7. Aggregate Functions........................................................................................................12
2.8. Updates ............................................................................................................................14
2.9. Deletions..........................................................................................................................14
3. Advanced Features .......................................................................................................................16
3.1. Introduction .....................................................................................................................16
3.2. Views ...............................................................................................................................16
3.3. Foreign Keys....................................................................................................................16
3.4. Transactions.....................................................................................................................17
3.5. Inheritance .......................................................................................................................19
3.6. Conclusion.......................................................................................................................21
II. The SQL Language.............................................................................................................................22
4. SQL Syntax ..................................................................................................................................24
4.1. Lexical Structure..............................................................................................................24
4.1.1. Identifiers and Key Words...................................................................................24
4.1.2. Constants.............................................................................................................25
4.1.2.1. String Constants .....................................................................................25
4.1.2.2. Dollar-Quoted String Constants .............................................................26
4.1.2.3. Bit-String Constants ...............................................................................27
4.1.2.4. Numeric Constants .................................................................................27
4.1.2.5. Constants of Other Types .......................................................................28

iii
4.1.3. Operators.............................................................................................................28
4.1.4. Special Characters...............................................................................................29
4.1.5. Comments ...........................................................................................................30
4.1.6. Lexical Precedence .............................................................................................30
4.2. Value Expressions............................................................................................................31
4.2.1. Column References.............................................................................................32
4.2.2. Positional Parameters..........................................................................................32
4.2.3. Subscripts............................................................................................................33
4.2.4. Field Selection ....................................................................................................33
4.2.5. Operator Invocations...........................................................................................33
4.2.6. Function Calls .....................................................................................................34
4.2.7. Aggregate Expressions........................................................................................34
4.2.8. Type Casts ...........................................................................................................35
4.2.9. Scalar Subqueries................................................................................................36
4.2.10. Array Constructors............................................................................................36
4.2.11. Row Constructors..............................................................................................37
4.2.12. Expression Evaluation Rules ............................................................................38
5. Data Definition .............................................................................................................................40
5.1. Table Basics.....................................................................................................................40
5.2. Default Values .................................................................................................................41
5.3. Constraints.......................................................................................................................42
5.3.1. Check Constraints ...............................................................................................42
5.3.2. Not-Null Constraints...........................................................................................44
5.3.3. Unique Constraints..............................................................................................45
5.3.4. Primary Keys.......................................................................................................46
5.3.5. Foreign Keys .......................................................................................................47
5.4. System Columns..............................................................................................................49
5.5. Inheritance .......................................................................................................................50
5.6. Modifying Tables.............................................................................................................53
5.6.1. Adding a Column................................................................................................53
5.6.2. Removing a Column ...........................................................................................54
5.6.3. Adding a Constraint ............................................................................................54
5.6.4. Removing a Constraint .......................................................................................54
5.6.5. Changing a Column’s Default Value...................................................................55
5.6.6. Changing a Column’s Data Type ........................................................................55
5.6.7. Renaming a Column ...........................................................................................55
5.6.8. Renaming a Table ...............................................................................................55
5.7. Privileges .........................................................................................................................56
5.8. Schemas...........................................................................................................................56
5.8.1. Creating a Schema ..............................................................................................57
5.8.2. The Public Schema .............................................................................................58
5.8.3. The Schema Search Path.....................................................................................58
5.8.4. Schemas and Privileges.......................................................................................59
5.8.5. The System Catalog Schema ..............................................................................60
5.8.6. Usage Patterns.....................................................................................................60
5.8.7. Portability............................................................................................................61
5.9. Other Database Objects ...................................................................................................61
5.10. Dependency Tracking....................................................................................................61

iv
6. Data Manipulation........................................................................................................................63
6.1. Inserting Data ..................................................................................................................63
6.2. Updating Data..................................................................................................................64
6.3. Deleting Data...................................................................................................................65
7. Queries .........................................................................................................................................66
7.1. Overview .........................................................................................................................66
7.2. Table Expressions ............................................................................................................66
7.2.1. The FROM Clause.................................................................................................67
7.2.1.1. Joined Tables ..........................................................................................67
7.2.1.2. Table and Column Aliases......................................................................70
7.2.1.3. Subqueries ..............................................................................................71
7.2.1.4. Table Functions ......................................................................................72
7.2.2. The WHERE Clause...............................................................................................73
7.2.3. The GROUP BY and HAVING Clauses..................................................................74
7.3. Select Lists.......................................................................................................................76
7.3.1. Select-List Items .................................................................................................76
7.3.2. Column Labels ....................................................................................................77
7.3.3. DISTINCT ...........................................................................................................77
7.4. Combining Queries..........................................................................................................77
7.5. Sorting Rows ...................................................................................................................78
7.6. LIMIT and OFFSET..........................................................................................................79
8. Data Types....................................................................................................................................81
8.1. Numeric Types.................................................................................................................82
8.1.1. Integer Types.......................................................................................................83
8.1.2. Arbitrary Precision Numbers ..............................................................................83
8.1.3. Floating-Point Types ...........................................................................................84
8.1.4. Serial Types.........................................................................................................85
8.2. Monetary Types ...............................................................................................................86
8.3. Character Types ...............................................................................................................86
8.4. Binary Data Types ...........................................................................................................88
8.5. Date/Time Types..............................................................................................................90
8.5.1. Date/Time Input ..................................................................................................91
8.5.1.1. Dates.......................................................................................................92
8.5.1.2. Times ......................................................................................................92
8.5.1.3. Time Stamps...........................................................................................93
8.5.1.4. Intervals ..................................................................................................94
8.5.1.5. Special Values ........................................................................................94
8.5.2. Date/Time Output ...............................................................................................95
8.5.3. Time Zones .........................................................................................................96
8.5.4. Internals...............................................................................................................97
8.6. Boolean Type...................................................................................................................97
8.7. Geometric Types..............................................................................................................98
8.7.1. Points ..................................................................................................................98
8.7.2. Line Segments.....................................................................................................99
8.7.3. Boxes...................................................................................................................99
8.7.4. Paths....................................................................................................................99
8.7.5. Polygons..............................................................................................................99
8.7.6. Circles ...............................................................................................................100

v
8.8. Network Address Types.................................................................................................100
8.8.1. inet ..................................................................................................................100
8.8.2. cidr ..................................................................................................................101
8.8.3. inet vs. cidr ...................................................................................................101
8.8.4. macaddr ...........................................................................................................102
8.9. Bit String Types .............................................................................................................102
8.10. Arrays ..........................................................................................................................103
8.10.1. Declaration of Array Types.............................................................................103
8.10.2. Array Value Input............................................................................................104
8.10.3. Accessing Arrays ............................................................................................105
8.10.4. Modifying Arrays............................................................................................107
8.10.5. Searching in Arrays.........................................................................................109
8.10.6. Array Input and Output Syntax.......................................................................110
8.11. Composite Types .........................................................................................................111
8.11.1. Declaration of Composite Types.....................................................................111
8.11.2. Composite Value Input....................................................................................112
8.11.3. Accessing Composite Types ...........................................................................113
8.11.4. Modifying Composite Types...........................................................................114
8.11.5. Composite Type Input and Output Syntax......................................................114
8.12. Object Identifier Types ................................................................................................115
8.13. Pseudo-Types...............................................................................................................117
9. Functions and Operators ............................................................................................................119
9.1. Logical Operators ..........................................................................................................119
9.2. Comparison Operators...................................................................................................119
9.3. Mathematical Functions and Operators.........................................................................121
9.4. String Functions and Operators .....................................................................................124
9.5. Binary String Functions and Operators .........................................................................132
9.6. Bit String Functions and Operators ...............................................................................134
9.7. Pattern Matching ...........................................................................................................135
9.7.1. LIKE ..................................................................................................................135
9.7.2. SIMILAR TO Regular Expressions ...................................................................136
9.7.3. POSIX Regular Expressions .............................................................................137
9.7.3.1. Regular Expression Details ..................................................................138
9.7.3.2. Bracket Expressions .............................................................................140
9.7.3.3. Regular Expression Escapes.................................................................141
9.7.3.4. Regular Expression Metasyntax...........................................................144
9.7.3.5. Regular Expression Matching Rules ....................................................145
9.7.3.6. Limits and Compatibility .....................................................................146
9.7.3.7. Basic Regular Expressions ...................................................................147
9.8. Data Type Formatting Functions ...................................................................................147
9.9. Date/Time Functions and Operators..............................................................................153
9.9.1. EXTRACT, date_part ......................................................................................155
9.9.2. date_trunc .....................................................................................................159
9.9.3. AT TIME ZONE.................................................................................................159
9.9.4. Current Date/Time ............................................................................................160
9.10. Geometric Functions and Operators............................................................................162
9.11. Network Address Functions and Operators.................................................................165
9.12. Sequence Manipulation Functions ..............................................................................167

vi
9.13. Conditional Expressions..............................................................................................169
9.13.1. CASE ................................................................................................................169
9.13.2. COALESCE .......................................................................................................170
9.13.3. NULLIF............................................................................................................171
9.14. Array Functions and Operators ...................................................................................171
9.15. Aggregate Functions....................................................................................................172
9.16. Subquery Expressions .................................................................................................175
9.16.1. EXISTS............................................................................................................175
9.16.2. IN ....................................................................................................................175
9.16.3. NOT IN............................................................................................................176
9.16.4. ANY/SOME ........................................................................................................176
9.16.5. ALL ..................................................................................................................177
9.16.6. Row-wise Comparison....................................................................................178
9.17. Row and Array Comparisons ......................................................................................178
9.17.1. IN ....................................................................................................................178
9.17.2. NOT IN............................................................................................................179
9.17.3. ANY/SOME (array) ............................................................................................179
9.17.4. ALL (array) ......................................................................................................179
9.17.5. Row-wise Comparison....................................................................................180
9.18. Set Returning Functions ..............................................................................................180
9.19. System Information Functions ....................................................................................181
9.20. System Administration Functions ...............................................................................186
10. Type Conversion.......................................................................................................................189
10.1. Overview .....................................................................................................................189
10.2. Operators .....................................................................................................................190
10.3. Functions .....................................................................................................................193
10.4. Value Storage...............................................................................................................196
10.5. UNION, CASE, and ARRAY Constructs ..........................................................................196
11. Indexes .....................................................................................................................................199
11.1. Introduction .................................................................................................................199
11.2. Index Types..................................................................................................................200
11.3. Multicolumn Indexes...................................................................................................201
11.4. Unique Indexes ............................................................................................................202
11.5. Indexes on Expressions ...............................................................................................202
11.6. Operator Classes..........................................................................................................203
11.7. Partial Indexes .............................................................................................................204
11.8. Examining Index Usage...............................................................................................206
12. Concurrency Control................................................................................................................208
12.1. Introduction .................................................................................................................208
12.2. Transaction Isolation ...................................................................................................208
12.2.1. Read Committed Isolation Level ....................................................................209
12.2.2. Serializable Isolation Level.............................................................................210
12.2.2.1. Serializable Isolation versus True Serializability ...............................211
12.3. Explicit Locking ..........................................................................................................212
12.3.1. Table-Level Locks...........................................................................................212
12.3.2. Row-Level Locks ............................................................................................213
12.3.3. Deadlocks........................................................................................................214
12.4. Data Consistency Checks at the Application Level.....................................................215

vii
12.5. Locking and Indexes....................................................................................................215
13. Performance Tips .....................................................................................................................217
13.1. Using EXPLAIN ...........................................................................................................217
13.2. Statistics Used by the Planner .....................................................................................220
13.3. Controlling the Planner with Explicit JOIN Clauses...................................................222
13.4. Populating a Database .................................................................................................223
13.4.1. Disable Autocommit .......................................................................................224
13.4.2. Use COPY .........................................................................................................224
13.4.3. Remove Indexes ..............................................................................................224
13.4.4. Increase maintenance_work_mem ...............................................................224
13.4.5. Increase checkpoint_segments .................................................................224
13.4.6. Run ANALYZE Afterwards...............................................................................225
III. Server Administration ....................................................................................................................226
14. Installation Instructions............................................................................................................228
14.1. Short Version ...............................................................................................................228
14.2. Requirements...............................................................................................................228
14.3. Getting The Source......................................................................................................230
14.4. If You Are Upgrading..................................................................................................230
14.5. Installation Procedure..................................................................................................232
14.6. Post-Installation Setup.................................................................................................237
14.6.1. Shared Libraries ..............................................................................................237
14.6.2. Environment Variables....................................................................................238
14.7. Supported Platforms ....................................................................................................239
15. Client-Only Installation on Windows.......................................................................................245
16. Server Run-time Environment .................................................................................................246
16.1. The PostgreSQL User Account ...................................................................................246
16.2. Creating a Database Cluster ........................................................................................246
16.3. Starting the Database Server........................................................................................247
16.3.1. Server Start-up Failures ..................................................................................248
16.3.2. Client Connection Problems ...........................................................................249
16.4. Run-time Configuration...............................................................................................250
16.4.1. File Locations..................................................................................................251
16.4.2. Connections and Authentication .....................................................................252
16.4.2.1. Connection Settings............................................................................252
16.4.2.2. Security and Authentication ...............................................................253
16.4.3. Resource Consumption ...................................................................................254
16.4.3.1. Memory ..............................................................................................254
16.4.3.2. Free Space Map ..................................................................................255
16.4.3.3. Kernel Resource Usage ......................................................................255
16.4.3.4. Cost-Based Vacuum Delay.................................................................256
16.4.3.5. Background Writer .............................................................................257
16.4.4. Write Ahead Log.............................................................................................258
16.4.4.1. Settings ...............................................................................................258
16.4.4.2. Checkpoints........................................................................................259
16.4.4.3. Archiving............................................................................................259
16.4.5. Query Planning ...............................................................................................260
16.4.5.1. Planner Method Configuration ...........................................................260

viii
16.4.5.2. Planner Cost Constants.......................................................................261
16.4.5.3. Genetic Query Optimizer ...................................................................261
16.4.5.4. Other Planner Options ........................................................................262
16.4.6. Error Reporting and Logging..........................................................................263
16.4.6.1. Where to log .......................................................................................263
16.4.6.2. When To Log......................................................................................264
16.4.6.3. What To Log.......................................................................................266
16.4.7. Runtime Statistics ...........................................................................................268
16.4.7.1. Statistics Monitoring ..........................................................................268
16.4.7.2. Query and Index Statistics Collector..................................................268
16.4.8. Client Connection Defaults.............................................................................269
16.4.8.1. Statement Behavior ............................................................................269
16.4.8.2. Locale and Formatting........................................................................270
16.4.8.3. Other Defaults ....................................................................................271
16.4.9. Lock Management ..........................................................................................272
16.4.10. Version and Platform Compatibility .............................................................273
16.4.10.1. Previous PostgreSQL Versions.........................................................273
16.4.10.2. Platform and Client Compatibility ...................................................273
16.4.11. Preset Options ...............................................................................................274
16.4.12. Customized Options......................................................................................275
16.4.13. Developer Options ........................................................................................276
16.4.14. Short Options ................................................................................................277
16.5. Managing Kernel Resources........................................................................................277
16.5.1. Shared Memory and Semaphores ...................................................................277
16.5.2. Resource Limits ..............................................................................................282
16.5.3. Linux Memory Overcommit ...........................................................................283
16.6. Shutting Down the Server............................................................................................284
16.7. Secure TCP/IP Connections with SSL ........................................................................284
16.8. Secure TCP/IP Connections with SSH Tunnels ..........................................................285
17. Database Users and Privileges .................................................................................................287
17.1. Database Users ............................................................................................................287
17.2. User Attributes.............................................................................................................288
17.3. Groups .........................................................................................................................288
17.4. Privileges .....................................................................................................................289
17.5. Functions and Triggers ................................................................................................289
18. Managing Databases ................................................................................................................291
18.1. Overview .....................................................................................................................291
18.2. Creating a Database.....................................................................................................291
18.3. Template Databases .....................................................................................................292
18.4. Database Configuration ...............................................................................................293
18.5. Destroying a Database .................................................................................................294
18.6. Tablespaces..................................................................................................................294
19. Client Authentication ...............................................................................................................297
19.1. The pg_hba.conf file ................................................................................................297
19.2. Authentication methods...............................................................................................302
19.2.1. Trust authentication.........................................................................................302
19.2.2. Password authentication..................................................................................302
19.2.3. Kerberos authentication ..................................................................................302

ix
19.2.4. Ident-based authentication ..............................................................................303
19.2.4.1. Ident Authentication over TCP/IP ......................................................303
19.2.4.2. Ident Authentication over Local Sockets ...........................................304
19.2.4.3. Ident Maps..........................................................................................304
19.2.5. PAM authentication.........................................................................................305
19.3. Authentication problems .............................................................................................305
20. Localization..............................................................................................................................307
20.1. Locale Support.............................................................................................................307
20.1.1. Overview.........................................................................................................307
20.1.2. Behavior ..........................................................................................................308
20.1.3. Problems .........................................................................................................309
20.2. Character Set Support..................................................................................................309
20.2.1. Supported Character Sets................................................................................309
20.2.2. Setting the Character Set.................................................................................310
20.2.3. Automatic Character Set Conversion Between Server and Client..................311
20.2.4. Further Reading ..............................................................................................314
21. Routine Database Maintenance Tasks......................................................................................315
21.1. Routine Vacuuming .....................................................................................................315
21.1.1. Recovering disk space.....................................................................................315
21.1.2. Updating planner statistics..............................................................................316
21.1.3. Preventing transaction ID wraparound failures ..............................................317
21.2. Routine Reindexing .....................................................................................................319
21.3. Log File Maintenance..................................................................................................319
22. Backup and Restore .................................................................................................................321
22.1. SQL Dump...................................................................................................................321
22.1.1. Restoring the dump .........................................................................................321
22.1.2. Using pg_dumpall...........................................................................................322
22.1.3. Handling large databases ................................................................................322
22.1.4. Caveats ............................................................................................................323
22.2. File system level backup..............................................................................................323
22.3. On-line backup and point-in-time recovery (PITR) ....................................................324
22.3.1. Setting up WAL archiving...............................................................................325
22.3.2. Making a Base Backup ...................................................................................327
22.3.3. Recovering with an On-line Backup...............................................................328
22.3.3.1. Recovery Settings...............................................................................330
22.3.4. Timelines.........................................................................................................331
22.3.5. Caveats ............................................................................................................332
22.4. Migration Between Releases .......................................................................................332
23. Monitoring Database Activity..................................................................................................334
23.1. Standard Unix Tools ....................................................................................................334
23.2. The Statistics Collector................................................................................................334
23.2.1. Statistics Collection Configuration .................................................................335
23.2.2. Viewing Collected Statistics ...........................................................................335
23.3. Viewing Locks.............................................................................................................339
24. Monitoring Disk Usage ............................................................................................................341
24.1. Determining Disk Usage .............................................................................................341
24.2. Disk Full Failure..........................................................................................................342
25. Write-Ahead Logging (WAL) ..................................................................................................343

x
25.1. Benefits of WAL ..........................................................................................................343
25.2. WAL Configuration .....................................................................................................343
25.3. Internals .......................................................................................................................345
26. Regression Tests.......................................................................................................................347
26.1. Running the Tests ........................................................................................................347
26.2. Test Evaluation ............................................................................................................348
26.2.1. Error message differences...............................................................................348
26.2.2. Locale differences ...........................................................................................349
26.2.3. Date and time differences ...............................................................................349
26.2.4. Floating-point differences ...............................................................................349
26.2.5. Row ordering differences................................................................................350
26.2.6. The “random” test ...........................................................................................350
26.3. Platform-specific comparison files ..............................................................................350
IV. Client Interfaces ..............................................................................................................................352
27. libpq - C Library ......................................................................................................................354
27.1. Database Connection Control Functions .....................................................................354
27.2. Connection Status Functions .......................................................................................360
27.3. Command Execution Functions ..................................................................................364
27.3.1. Main Functions ...............................................................................................364
27.3.2. Retrieving Query Result Information .............................................................370
27.3.3. Retrieving Result Information for Other Commands .....................................373
27.3.4. Escaping Strings for Inclusion in SQL Commands ........................................374
27.3.5. Escaping Binary Strings for Inclusion in SQL Commands ............................375
27.4. Asynchronous Command Processing ..........................................................................376
27.5. Cancelling Queries in Progress ...................................................................................380
27.6. The Fast-Path Interface................................................................................................381
27.7. Asynchronous Notification..........................................................................................382
27.8. Functions Associated with the COPY Command .........................................................383
27.8.1. Functions for Sending COPY Data...................................................................384
27.8.2. Functions for Receiving COPY Data................................................................385
27.8.3. Obsolete Functions for COPY ..........................................................................386
27.9. Control Functions ........................................................................................................388
27.10. Notice Processing ......................................................................................................388
27.11. Environment Variables ..............................................................................................390
27.12. The Password File .....................................................................................................391
27.13. SSL Support...............................................................................................................392
27.14. Behavior in Threaded Programs ................................................................................392
27.15. Building libpq Programs............................................................................................392
27.16. Example Programs.....................................................................................................394
28. Large Objects ...........................................................................................................................403
28.1. History .........................................................................................................................403
28.2. Implementation Features .............................................................................................403
28.3. Client Interfaces...........................................................................................................403
28.3.1. Creating a Large Object ..................................................................................403
28.3.2. Importing a Large Object................................................................................404
28.3.3. Exporting a Large Object................................................................................404
28.3.4. Opening an Existing Large Object..................................................................404

xi
28.3.5. Writing Data to a Large Object.......................................................................405
28.3.6. Reading Data from a Large Object .................................................................405
28.3.7. Seeking in a Large Object...............................................................................405
28.3.8. Obtaining the Seek Position of a Large Object...............................................405
28.3.9. Closing a Large Object Descriptor .................................................................405
28.3.10. Removing a Large Object .............................................................................406
28.4. Server-Side Functions..................................................................................................406
28.5. Example Program ........................................................................................................406
29. ECPG - Embedded SQL in C...................................................................................................412
29.1. The Concept.................................................................................................................412
29.2. Connecting to the Database Server..............................................................................412
29.3. Closing a Connection ..................................................................................................413
29.4. Running SQL Commands............................................................................................414
29.5. Choosing a Connection................................................................................................415
29.6. Using Host Variables ...................................................................................................415
29.6.1. Overview.........................................................................................................415
29.6.2. Declare Sections..............................................................................................416
29.6.3. SELECT INTO and FETCH INTO ...................................................................416
29.6.4. Indicators.........................................................................................................417
29.7. Dynamic SQL..............................................................................................................418
29.8. Using SQL Descriptor Areas.......................................................................................419
29.9. Error Handling.............................................................................................................421
29.9.1. Setting Callbacks ............................................................................................421
29.9.2. sqlca ................................................................................................................423
29.9.3. SQLSTATE vs SQLCODE...................................................................................424
29.10. Including Files ...........................................................................................................426
29.11. Processing Embedded SQL Programs.......................................................................427
29.12. Library Functions ......................................................................................................428
29.13. Internals .....................................................................................................................428
30. The Information Schema..........................................................................................................431
30.1. The Schema .................................................................................................................431
30.2. Data Types ...................................................................................................................431
30.3. information_schema_catalog_name ..................................................................432
30.4. applicable_roles...................................................................................................432
30.5. check_constraints ................................................................................................432
30.6. column_domain_usage ............................................................................................433
30.7. column_privileges ................................................................................................433
30.8. column_udt_usage...................................................................................................434
30.9. columns ......................................................................................................................435
30.10. constraint_column_usage .................................................................................439
30.11. constraint_table_usage....................................................................................439
30.12. data_type_privileges ........................................................................................440
30.13. domain_constraints ............................................................................................441
30.14. domain_udt_usage.................................................................................................441
30.15. domains ....................................................................................................................442
30.16. element_types .......................................................................................................445
30.17. enabled_roles .......................................................................................................447
30.18. key_column_usage.................................................................................................448

xii
30.19. parameters..............................................................................................................448
30.20. referential_constraints .................................................................................451
30.21. role_column_grants ............................................................................................452
30.22. role_routine_grants ..........................................................................................452
30.23. role_table_grants ..............................................................................................453
30.24. role_usage_grants ..............................................................................................454
30.25. routine_privileges ............................................................................................454
30.26. routines ..................................................................................................................455
30.27. schemata ..................................................................................................................459
30.28. sql_features .........................................................................................................460
30.29. sql_implementation_info .................................................................................460
30.30. sql_languages .......................................................................................................461
30.31. sql_packages .........................................................................................................462
30.32. sql_sizing..............................................................................................................462
30.33. sql_sizing_profiles ..........................................................................................463
30.34. table_constraints ..............................................................................................463
30.35. table_privileges.................................................................................................464
30.36. tables ......................................................................................................................465
30.37. triggers ..................................................................................................................465
30.38. usage_privileges.................................................................................................467
30.39. view_column_usage ..............................................................................................467
30.40. view_table_usage.................................................................................................468
30.41. views ........................................................................................................................469
V. Server Programming ........................................................................................................................470
31. Extending SQL.........................................................................................................................472
31.1. How Extensibility Works.............................................................................................472
31.2. The PostgreSQL Type System.....................................................................................472
31.2.1. Base Types ......................................................................................................472
31.2.2. Composite Types.............................................................................................472
31.2.3. Domains ..........................................................................................................473
31.2.4. Pseudo-Types ..................................................................................................473
31.2.5. Polymorphic Types .........................................................................................473
31.3. User-Defined Functions...............................................................................................474
31.4. Query Language (SQL) Functions ..............................................................................474
31.4.1. SQL Functions on Base Types ........................................................................475
31.4.2. SQL Functions on Composite Types ..............................................................476
31.4.3. SQL Functions as Table Sources ....................................................................479
31.4.4. SQL Functions Returning Sets .......................................................................480
31.4.5. Polymorphic SQL Functions ..........................................................................481
31.5. Function Overloading ..................................................................................................482
31.6. Function Volatility Categories .....................................................................................483
31.7. Procedural Language Functions ..................................................................................484
31.8. Internal Functions........................................................................................................484
31.9. C-Language Functions.................................................................................................485
31.9.1. Dynamic Loading............................................................................................485
31.9.2. Base Types in C-Language Functions.............................................................486
31.9.3. Calling Conventions Version 0 for C-Language Functions ............................489

xiii
31.9.4. Calling Conventions Version 1 for C-Language Functions ............................491
31.9.5. Writing Code...................................................................................................494
31.9.6. Compiling and Linking Dynamically-Loaded Functions ...............................494
31.9.7. Extension Building Infrastructure...................................................................497
31.9.8. Composite-Type Arguments in C-Language Functions..................................499
31.9.9. Returning Rows (Composite Types) from C-Language Functions.................500
31.9.10. Returning Sets from C-Language Functions.................................................501
31.9.11. Polymorphic Arguments and Return Types ..................................................506
31.10. User-Defined Aggregates ..........................................................................................507
31.11. User-Defined Types ...................................................................................................509
31.12. User-Defined Operators.............................................................................................512
31.13. Operator Optimization Information...........................................................................513
31.13.1. COMMUTATOR .................................................................................................514
31.13.2. NEGATOR .......................................................................................................514
31.13.3. RESTRICT .....................................................................................................515
31.13.4. JOIN ..............................................................................................................516
31.13.5. HASHES..........................................................................................................516
31.13.6. MERGES (SORT1, SORT2, LTCMP, GTCMP).....................................................517
31.14. Interfacing Extensions To Indexes.............................................................................518
31.14.1. Index Methods and Operator Classes ...........................................................518
31.14.2. Index Method Strategies ...............................................................................519
31.14.3. Index Method Support Routines ...................................................................520
31.14.4. An Example ..................................................................................................521
31.14.5. Cross-Data-Type Operator Classes ...............................................................524
31.14.6. System Dependencies on Operator Classes ..................................................525
31.14.7. Special Features of Operator Classes............................................................525
32. Triggers ....................................................................................................................................527
32.1. Overview of Trigger Behavior.....................................................................................527
32.2. Visibility of Data Changes...........................................................................................528
32.3. Writing Trigger Functions in C ...................................................................................529
32.4. A Complete Example ..................................................................................................531
33. The Rule System ......................................................................................................................535
33.1. The Query Tree............................................................................................................535
33.2. Views and the Rule System .........................................................................................537
33.2.1. How SELECT Rules Work ...............................................................................537
33.2.2. View Rules in Non-SELECT Statements .........................................................542
33.2.3. The Power of Views in PostgreSQL ...............................................................543
33.2.4. Updating a View..............................................................................................544
33.3. Rules on INSERT, UPDATE, and DELETE ....................................................................544
33.3.1. How Update Rules Work ................................................................................544
33.3.1.1. A First Rule Step by Step...................................................................545
33.3.2. Cooperation with Views..................................................................................549
33.4. Rules and Privileges ....................................................................................................554
33.5. Rules and Command Status.........................................................................................555
33.6. Rules versus Triggers ..................................................................................................556
34. Procedural Languages ..............................................................................................................559
34.1. Installing Procedural Languages .................................................................................559
35. PL/pgSQL - SQL Procedural Language ..................................................................................561

xiv
35.1. Overview .....................................................................................................................561
35.1.1. Advantages of Using PL/pgSQL ....................................................................562
35.1.2. Supported Argument and Result Data Types ..................................................562
35.2. Tips for Developing in PL/pgSQL...............................................................................563
35.2.1. Handling of Quotation Marks .........................................................................563
35.3. Structure of PL/pgSQL................................................................................................565
35.4. Declarations.................................................................................................................566
35.4.1. Aliases for Function Parameters .....................................................................567
35.4.2. Copying Types ................................................................................................568
35.4.3. Row Types.......................................................................................................569
35.4.4. Record Types ..................................................................................................569
35.4.5. RENAME............................................................................................................570
35.5. Expressions..................................................................................................................570
35.6. Basic Statements..........................................................................................................571
35.6.1. Assignment .....................................................................................................571
35.6.2. SELECT INTO .................................................................................................572
35.6.3. Executing an Expression or Query With No Result........................................573
35.6.4. Doing Nothing At All .....................................................................................573
35.6.5. Executing Dynamic Commands .....................................................................574
35.6.6. Obtaining the Result Status.............................................................................575
35.7. Control Structures........................................................................................................576
35.7.1. Returning From a Function.............................................................................576
35.7.1.1. RETURN ...............................................................................................576
35.7.1.2. RETURN NEXT ....................................................................................576
35.7.2. Conditionals ....................................................................................................577
35.7.2.1. IF-THEN .............................................................................................577
35.7.2.2. IF-THEN-ELSE ..................................................................................578
35.7.2.3. IF-THEN-ELSE IF............................................................................578
35.7.2.4. IF-THEN-ELSIF-ELSE .....................................................................579
35.7.2.5. IF-THEN-ELSEIF-ELSE ...................................................................579
35.7.3. Simple Loops ..................................................................................................579
35.7.3.1. LOOP ...................................................................................................579
35.7.3.2. EXIT ...................................................................................................580
35.7.3.3. WHILE .................................................................................................580
35.7.3.4. FOR (integer variant)...........................................................................581
35.7.4. Looping Through Query Results ....................................................................581
35.7.5. Trapping Errors ...............................................................................................582
35.8. Cursors.........................................................................................................................584
35.8.1. Declaring Cursor Variables .............................................................................584
35.8.2. Opening Cursors .............................................................................................584
35.8.2.1. OPEN FOR SELECT............................................................................585
35.8.2.2. OPEN FOR EXECUTE .........................................................................585
35.8.2.3. Opening a Bound Cursor....................................................................585
35.8.3. Using Cursors..................................................................................................586
35.8.3.1. FETCH .................................................................................................586
35.8.3.2. CLOSE .................................................................................................586
35.8.3.3. Returning Cursors ..............................................................................586
35.9. Errors and Messages....................................................................................................588

xv
35.10. Trigger Procedures ....................................................................................................589
35.11. Porting from Oracle PL/SQL.....................................................................................594
35.11.1. Porting Examples ..........................................................................................594
35.11.2. Other Things to Watch For............................................................................600
35.11.2.1. Implicit Rollback after Exceptions...................................................601
35.11.2.2. EXECUTE ...........................................................................................601
35.11.2.3. Optimizing PL/pgSQL Functions.....................................................601
35.11.3. Appendix.......................................................................................................601
36. PL/Tcl - Tcl Procedural Language...........................................................................................605
36.1. Overview .....................................................................................................................605
36.2. PL/Tcl Functions and Arguments................................................................................605
36.3. Data Values in PL/Tcl..................................................................................................606
36.4. Global Data in PL/Tcl .................................................................................................607
36.5. Database Access from PL/Tcl .....................................................................................607
36.6. Trigger Procedures in PL/Tcl ......................................................................................609
36.7. Modules and the unknown command..........................................................................611
36.8. Tcl Procedure Names ..................................................................................................611
37. PL/Perl - Perl Procedural Language.........................................................................................612
37.1. PL/Perl Functions and Arguments...............................................................................612
37.2. Database Access from PL/Perl ....................................................................................614
37.3. Data Values in PL/Perl.................................................................................................615
37.4. Global Values in PL/Perl .............................................................................................616
37.5. Trusted and Untrusted PL/Perl ....................................................................................617
37.6. PL/Perl Triggers ..........................................................................................................617
37.7. Limitations and Missing Features ...............................................................................619
38. PL/Python - Python Procedural Language...............................................................................620
38.1. PL/Python Functions ...................................................................................................620
38.2. Trigger Functions ........................................................................................................621
38.3. Database Access ..........................................................................................................621
39. Server Programming Interface .................................................................................................623
39.1. Interface Functions ......................................................................................................623
SPI_connect ................................................................................................................623
SPI_finish....................................................................................................................625
SPI_push .....................................................................................................................626
SPI_pop.......................................................................................................................627
SPI_execute.................................................................................................................628
SPI_exec......................................................................................................................631
SPI_prepare.................................................................................................................632
SPI_getargcount ..........................................................................................................634
SPI_getargtypeid.........................................................................................................635
SPI_is_cursor_plan .....................................................................................................636
SPI_execute_plan........................................................................................................637
SPI_execp....................................................................................................................639
SPI_cursor_open .........................................................................................................640
SPI_cursor_find...........................................................................................................642
SPI_cursor_fetch.........................................................................................................643
SPI_cursor_move ........................................................................................................644
SPI_cursor_close.........................................................................................................645

xvi
SPI_saveplan...............................................................................................................646
39.2. Interface Support Functions ........................................................................................647
SPI_fname...................................................................................................................647
SPI_fnumber ...............................................................................................................648
SPI_getvalue ...............................................................................................................649
SPI_getbinval ..............................................................................................................650
SPI_gettype .................................................................................................................651
SPI_gettypeid..............................................................................................................652
SPI_getrelname ...........................................................................................................653
39.3. Memory Management .................................................................................................654
SPI_palloc ...................................................................................................................654
SPI_repalloc................................................................................................................656
SPI_pfree.....................................................................................................................657
SPI_copytuple .............................................................................................................658
SPI_returntuple ...........................................................................................................659
SPI_modifytuple .........................................................................................................660
SPI_freetuple...............................................................................................................662
SPI_freetuptable..........................................................................................................663
SPI_freeplan................................................................................................................664
39.4. Visibility of Data Changes...........................................................................................665
39.5. Examples .....................................................................................................................665
VI. Reference..........................................................................................................................................669
I. SQL Commands..........................................................................................................................671
ABORT.................................................................................................................................672
ALTER AGGREGATE.........................................................................................................674
ALTER CONVERSION.......................................................................................................676
ALTER DATABASE ............................................................................................................678
ALTER DOMAIN ................................................................................................................680
ALTER FUNCTION ............................................................................................................683
ALTER GROUP ...................................................................................................................685
ALTER INDEX ....................................................................................................................687
ALTER LANGUAGE...........................................................................................................689
ALTER OPERATOR ............................................................................................................690
ALTER OPERATOR CLASS...............................................................................................692
ALTER SCHEMA ................................................................................................................693
ALTER SEQUENCE............................................................................................................694
ALTER TABLE ....................................................................................................................696
ALTER TABLESPACE ........................................................................................................703
ALTER TRIGGER ...............................................................................................................705
ALTER TYPE.......................................................................................................................706
ALTER USER ......................................................................................................................707
ANALYZE............................................................................................................................710
BEGIN..................................................................................................................................712
CHECKPOINT.....................................................................................................................714
CLOSE .................................................................................................................................715
CLUSTER ............................................................................................................................717
COMMENT..........................................................................................................................720

xvii
COMMIT..............................................................................................................................723
COPY ...................................................................................................................................725
CREATE AGGREGATE ......................................................................................................733
CREATE CAST....................................................................................................................736
CREATE CONSTRAINT TRIGGER ..................................................................................740
CREATE CONVERSION ....................................................................................................741
CREATE DATABASE..........................................................................................................743
CREATE DOMAIN..............................................................................................................746
CREATE FUNCTION..........................................................................................................749
CREATE GROUP.................................................................................................................754
CREATE INDEX..................................................................................................................756
CREATE LANGUAGE ........................................................................................................759
CREATE OPERATOR .........................................................................................................762
CREATE OPERATOR CLASS ............................................................................................766
CREATE RULE....................................................................................................................769
CREATE SCHEMA .............................................................................................................772
CREATE SEQUENCE .........................................................................................................775
CREATE TABLE .................................................................................................................779
CREATE TABLE AS ...........................................................................................................789
CREATE TABLESPACE......................................................................................................791
CREATE TRIGGER.............................................................................................................793
CREATE TYPE ....................................................................................................................796
CREATE USER....................................................................................................................802
CREATE VIEW....................................................................................................................805
DEALLOCATE ....................................................................................................................808
DECLARE............................................................................................................................809
DELETE ...............................................................................................................................812
DROP AGGREGATE...........................................................................................................814
DROP CAST ........................................................................................................................816
DROP CONVERSION.........................................................................................................818
DROP DATABASE ..............................................................................................................819
DROP DOMAIN ..................................................................................................................820
DROP FUNCTION ..............................................................................................................821
DROP GROUP .....................................................................................................................823
DROP INDEX ......................................................................................................................824
DROP LANGUAGE.............................................................................................................825
DROP OPERATOR ..............................................................................................................826
DROP OPERATOR CLASS.................................................................................................828
DROP RULE ........................................................................................................................830
DROP SCHEMA ..................................................................................................................832
DROP SEQUENCE..............................................................................................................834
DROP TABLE ......................................................................................................................835
DROP TABLESPACE ..........................................................................................................837
DROP TRIGGER .................................................................................................................838
DROP TYPE.........................................................................................................................840
DROP USER ........................................................................................................................841
DROP VIEW ........................................................................................................................843
END......................................................................................................................................844

xviii
EXECUTE............................................................................................................................846
EXPLAIN .............................................................................................................................848
FETCH .................................................................................................................................851
GRANT ................................................................................................................................855
INSERT ................................................................................................................................860
LISTEN ................................................................................................................................863
LOAD ...................................................................................................................................865
LOCK ...................................................................................................................................866
MOVE...................................................................................................................................869
NOTIFY................................................................................................................................871
PREPARE .............................................................................................................................873
REINDEX.............................................................................................................................875
RELEASE SAVEPOINT......................................................................................................878
RESET..................................................................................................................................880
REVOKE ..............................................................................................................................881
ROLLBACK .........................................................................................................................884
ROLLBACK TO SAVEPOINT ............................................................................................886
SAVEPOINT ........................................................................................................................888
SELECT ...............................................................................................................................890
SELECT INTO .....................................................................................................................902
SET .......................................................................................................................................904
SET CONSTRAINTS ..........................................................................................................907
SET SESSION AUTHORIZATION.....................................................................................908
SET TRANSACTION ..........................................................................................................910
SHOW ..................................................................................................................................912
START TRANSACTION .....................................................................................................915
TRUNCATE .........................................................................................................................916
UNLISTEN...........................................................................................................................917
UPDATE ...............................................................................................................................919
VACUUM .............................................................................................................................922
II. PostgreSQL Client Applications ...............................................................................................925
clusterdb ...............................................................................................................................926
createdb.................................................................................................................................929
createlang..............................................................................................................................932
createuser..............................................................................................................................935
dropdb...................................................................................................................................938
droplang................................................................................................................................941
dropuser ................................................................................................................................944
ecpg.......................................................................................................................................947
pg_config ..............................................................................................................................949
pg_dump ...............................................................................................................................951
pg_dumpall ...........................................................................................................................958
pg_restore .............................................................................................................................962
psql .......................................................................................................................................969
vacuumdb..............................................................................................................................993
III. PostgreSQL Server Applications .............................................................................................996
initdb.....................................................................................................................................997
ipcclean...............................................................................................................................1000

xix
pg_controldata ....................................................................................................................1001
pg_ctl ..................................................................................................................................1002
pg_resetxlog .......................................................................................................................1006
postgres...............................................................................................................................1008
postmaster...........................................................................................................................1012
VII. Internals........................................................................................................................................1018
40. Overview of PostgreSQL Internals ........................................................................................1020
40.1. The Path of a Query...................................................................................................1020
40.2. How Connections are Established .............................................................................1020
40.3. The Parser Stage ........................................................................................................1021
40.3.1. Parser.............................................................................................................1021
40.3.2. Transformation Process.................................................................................1022
40.4. The PostgreSQL Rule System ...................................................................................1022
40.5. Planner/Optimizer......................................................................................................1023
40.5.1. Generating Possible Plans.............................................................................1023
40.6. Executor.....................................................................................................................1024
41. System Catalogs .....................................................................................................................1026
41.1. Overview ...................................................................................................................1026
41.2. pg_aggregate .........................................................................................................1027
41.3. pg_am ........................................................................................................................1028
41.4. pg_amop ....................................................................................................................1029
41.5. pg_amproc ................................................................................................................1029
41.6. pg_attrdef..............................................................................................................1030
41.7. pg_attribute .........................................................................................................1030
41.8. pg_cast ....................................................................................................................1033
41.9. pg_class ..................................................................................................................1034
41.10. pg_constraint .....................................................................................................1037
41.11. pg_conversion .....................................................................................................1038
41.12. pg_database .........................................................................................................1039
41.13. pg_depend ..............................................................................................................1040
41.14. pg_description ...................................................................................................1042
41.15. pg_group ................................................................................................................1042
41.16. pg_index ................................................................................................................1043
41.17. pg_inherits .........................................................................................................1045
41.18. pg_language .........................................................................................................1045
41.19. pg_largeobject ...................................................................................................1046
41.20. pg_listener .........................................................................................................1047
41.21. pg_namespace .......................................................................................................1047
41.22. pg_opclass............................................................................................................1048
41.23. pg_operator .........................................................................................................1049
41.24. pg_proc ..................................................................................................................1050
41.25. pg_rewrite............................................................................................................1052
41.26. pg_shadow ..............................................................................................................1053
41.27. pg_statistic .......................................................................................................1053
41.28. pg_tablespace .....................................................................................................1055
41.29. pg_trigger............................................................................................................1056
41.30. pg_type ..................................................................................................................1057

xx
41.31. System Views ..........................................................................................................1063
41.32. pg_indexes............................................................................................................1064
41.33. pg_locks ................................................................................................................1065
41.34. pg_rules ................................................................................................................1066
41.35. pg_settings .........................................................................................................1066
41.36. pg_stats ................................................................................................................1067
41.37. pg_tables ..............................................................................................................1069
41.38. pg_user ..................................................................................................................1070
41.39. pg_views ................................................................................................................1071
42. Frontend/Backend Protocol....................................................................................................1072
42.1. Overview ...................................................................................................................1072
42.1.1. Messaging Overview.....................................................................................1072
42.1.2. Extended Query Overview............................................................................1073
42.1.3. Formats and Format Codes ...........................................................................1073
42.2. Message Flow ............................................................................................................1074
42.2.1. Start-Up.........................................................................................................1074
42.2.2. Simple Query ................................................................................................1076
42.2.3. Extended Query ............................................................................................1077
42.2.4. Function Call.................................................................................................1080
42.2.5. COPY Operations .........................................................................................1081
42.2.6. Asynchronous Operations.............................................................................1082
42.2.7. Cancelling Requests in Progress...................................................................1082
42.2.8. Termination ...................................................................................................1083
42.2.9. SSL Session Encryption................................................................................1083
42.3. Message Data Types ..................................................................................................1084
42.4. Message Formats .......................................................................................................1085
42.5. Error and Notice Message Fields ..............................................................................1101
42.6. Summary of Changes since Protocol 2.0...................................................................1103
43. PostgreSQL Coding Conventions ..........................................................................................1105
43.1. Formatting .................................................................................................................1105
43.2. Reporting Errors Within the Server...........................................................................1105
43.3. Error Message Style Guide........................................................................................1107
43.3.1. What goes where...........................................................................................1108
43.3.2. Formatting.....................................................................................................1108
43.3.3. Quotation marks............................................................................................1108
43.3.4. Use of quotes.................................................................................................1109
43.3.5. Grammar and punctuation.............................................................................1109
43.3.6. Upper case vs. lower case .............................................................................1109
43.3.7. Avoid passive voice.......................................................................................1109
43.3.8. Present vs past tense......................................................................................1109
43.3.9. Type of the object..........................................................................................1110
43.3.10. Brackets.......................................................................................................1110
43.3.11. Assembling error messages.........................................................................1110
43.3.12. Reasons for errors .......................................................................................1110
43.3.13. Function names ...........................................................................................1111
43.3.14. Tricky words to avoid .................................................................................1111
43.3.15. Proper spelling ............................................................................................1111
43.3.16. Localization.................................................................................................1112

xxi
44. Native Language Support.......................................................................................................1113
44.1. For the Translator ......................................................................................................1113
44.1.1. Requirements ................................................................................................1113
44.1.2. Concepts........................................................................................................1113
44.1.3. Creating and maintaining message catalogs .................................................1114
44.1.4. Editing the PO files .......................................................................................1115
44.2. For the Programmer...................................................................................................1116
44.2.1. Mechanics .....................................................................................................1116
44.2.2. Message-writing guidelines ..........................................................................1117
45. Writing A Procedural Language Handler ..............................................................................1119
46. Genetic Query Optimizer .......................................................................................................1121
46.1. Query Handling as a Complex Optimization Problem..............................................1121
46.2. Genetic Algorithms ...................................................................................................1121
46.3. Genetic Query Optimization (GEQO) in PostgreSQL ..............................................1122
46.3.1. Future Implementation Tasks for PostgreSQL GEQO .................................1123
46.4. Further Reading .........................................................................................................1123
47. Index Cost Estimation Functions ...........................................................................................1125
48. GiST Indexes..........................................................................................................................1128
48.1. Introduction ...............................................................................................................1128
48.2. Extensibility...............................................................................................................1128
48.3. Implementation..........................................................................................................1128
48.4. Limitations.................................................................................................................1129
48.5. Examples ...................................................................................................................1129
49. Database Physical Storage .....................................................................................................1131
49.1. Database File Layout.................................................................................................1131
49.2. TOAST ......................................................................................................................1132
49.3. Database Page Layout ...............................................................................................1134
50. BKI Backend Interface...........................................................................................................1137
50.1. BKI File Format ........................................................................................................1137
50.2. BKI Commands .........................................................................................................1137
50.3. Example.....................................................................................................................1138
VIII. Appendixes..................................................................................................................................1139
A. PostgreSQL Error Codes.........................................................................................................1140
B. Date/Time Support ..................................................................................................................1147
B.1. Date/Time Input Interpretation ...................................................................................1147
B.2. Date/Time Key Words.................................................................................................1148
B.3. History of Units ..........................................................................................................1164
C. SQL Key Words.......................................................................................................................1166
D. SQL Conformance ..................................................................................................................1186
D.1. Supported Features .....................................................................................................1187
D.2. Unsupported Features .................................................................................................1198
E. Release Notes ..........................................................................................................................1206
E.1. Release 8.0 ..................................................................................................................1206
E.1.1. Overview ........................................................................................................1206
E.1.2. Migration to version 8.0 .................................................................................1207
E.1.3. Deprecated Features .......................................................................................1208
E.1.4. Changes ..........................................................................................................1209

xxii
E.1.4.1. Performance Improvements ...............................................................1209
E.1.4.2. Server Changes ..................................................................................1211
E.1.4.3. Query Changes...................................................................................1213
E.1.4.4. Object Manipulation Changes ...........................................................1214
E.1.4.5. Utility Command Changes.................................................................1215
E.1.4.6. Data Type and Function Changes ......................................................1216
E.1.4.7. Server-Side Language Changes .........................................................1218
E.1.4.8. psql Changes ......................................................................................1219
E.1.4.9. pg_dump Changes..............................................................................1220
E.1.4.10. libpq Changes ..................................................................................1221
E.1.4.11. Source Code Changes ......................................................................1221
E.1.4.12. Contrib Changes ..............................................................................1222
E.2. Release 7.4.6 ...............................................................................................................1223
E.2.1. Migration to version 7.4.6 ..............................................................................1223
E.2.2. Changes ..........................................................................................................1223
E.3. Release 7.4.5 ...............................................................................................................1224
E.3.1. Migration to version 7.4.5 ..............................................................................1224
E.3.2. Changes ..........................................................................................................1224
E.4. Release 7.4.4 ...............................................................................................................1225
E.4.1. Migration to version 7.4.4 ..............................................................................1225
E.4.2. Changes ..........................................................................................................1225
E.5. Release 7.4.3 ...............................................................................................................1225
E.5.1. Migration to version 7.4.3 ..............................................................................1226
E.5.2. Changes ..........................................................................................................1226
E.6. Release 7.4.2 ...............................................................................................................1226
E.6.1. Migration to version 7.4.2 ..............................................................................1227
E.6.2. Changes ..........................................................................................................1228
E.7. Release 7.4.1 ...............................................................................................................1229
E.7.1. Migration to version 7.4.1 ..............................................................................1229
E.7.2. Changes ..........................................................................................................1229
E.8. Release 7.4 ..................................................................................................................1230
E.8.1. Overview ........................................................................................................1230
E.8.2. Migration to version 7.4 .................................................................................1232
E.8.3. Changes ..........................................................................................................1233
E.8.3.1. Server Operation Changes .................................................................1233
E.8.3.2. Performance Improvements ...............................................................1235
E.8.3.3. Server Configuration Changes ...........................................................1236
E.8.3.4. Query Changes...................................................................................1238
E.8.3.5. Object Manipulation Changes ...........................................................1238
E.8.3.6. Utility Command Changes.................................................................1240
E.8.3.7. Data Type and Function Changes ......................................................1241
E.8.3.8. Server-Side Language Changes .........................................................1243
E.8.3.9. psql Changes ......................................................................................1244
E.8.3.10. pg_dump Changes............................................................................1244
E.8.3.11. libpq Changes ..................................................................................1245
E.8.3.12. JDBC Changes.................................................................................1246
E.8.3.13. Miscellaneous Interface Changes ....................................................1246
E.8.3.14. Source Code Changes ......................................................................1246

xxiii
E.8.3.15. Contrib Changes ..............................................................................1247
E.9. Release 7.3.8 ...............................................................................................................1248
E.9.1. Migration to version 7.3.8 ..............................................................................1248
E.9.2. Changes ..........................................................................................................1248
E.10. Release 7.3.7 .............................................................................................................1249
E.10.1. Migration to version 7.3.7 ............................................................................1249
E.10.2. Changes ........................................................................................................1249
E.11. Release 7.3.6 .............................................................................................................1249
E.11.1. Migration to version 7.3.6 ............................................................................1249
E.11.2. Changes ........................................................................................................1250
E.12. Release 7.3.5 .............................................................................................................1250
E.12.1. Migration to version 7.3.5 ............................................................................1250
E.12.2. Changes ........................................................................................................1251
E.13. Release 7.3.4 .............................................................................................................1251
E.13.1. Migration to version 7.3.4 ............................................................................1251
E.13.2. Changes ........................................................................................................1252
E.14. Release 7.3.3 .............................................................................................................1252
E.14.1. Migration to version 7.3.3 ............................................................................1252
E.14.2. Changes ........................................................................................................1252
E.15. Release 7.3.2 .............................................................................................................1254
E.15.1. Migration to version 7.3.2 ............................................................................1254
E.15.2. Changes ........................................................................................................1255
E.16. Release 7.3.1 .............................................................................................................1256
E.16.1. Migration to version 7.3.1 ............................................................................1256
E.16.2. Changes ........................................................................................................1256
E.17. Release 7.3 ................................................................................................................1256
E.17.1. Overview ......................................................................................................1256
E.17.2. Migration to version 7.3 ...............................................................................1257
E.17.3. Changes ........................................................................................................1258
E.17.3.1. Server Operation ..............................................................................1258
E.17.3.2. Performance .....................................................................................1258
E.17.3.3. Privileges..........................................................................................1259
E.17.3.4. Server Configuration........................................................................1259
E.17.3.5. Queries .............................................................................................1260
E.17.3.6. Object Manipulation ........................................................................1261
E.17.3.7. Utility Commands............................................................................1261
E.17.3.8. Data Types and Functions................................................................1263
E.17.3.9. Internationalization ..........................................................................1264
E.17.3.10. Server-side Languages ...................................................................1264
E.17.3.11. psql.................................................................................................1265
E.17.3.12. libpq ...............................................................................................1265
E.17.3.13. JDBC..............................................................................................1265
E.17.3.14. Miscellaneous Interfaces................................................................1266
E.17.3.15. Source Code ...................................................................................1266
E.17.3.16. Contrib ...........................................................................................1267
E.18. Release 7.2.6 .............................................................................................................1268
E.18.1. Migration to version 7.2.6 ............................................................................1268
E.18.2. Changes ........................................................................................................1268

xxiv
E.19. Release 7.2.5 .............................................................................................................1269
E.19.1. Migration to version 7.2.5 ............................................................................1269
E.19.2. Changes ........................................................................................................1269
E.20. Release 7.2.4 .............................................................................................................1270
E.20.1. Migration to version 7.2.4 ............................................................................1270
E.20.2. Changes ........................................................................................................1270
E.21. Release 7.2.3 .............................................................................................................1270
E.21.1. Migration to version 7.2.3 ............................................................................1270
E.21.2. Changes ........................................................................................................1271
E.22. Release 7.2.2 .............................................................................................................1271
E.22.1. Migration to version 7.2.2 ............................................................................1271
E.22.2. Changes ........................................................................................................1271
E.23. Release 7.2.1 .............................................................................................................1272
E.23.1. Migration to version 7.2.1 ............................................................................1272
E.23.2. Changes ........................................................................................................1272
E.24. Release 7.2 ................................................................................................................1272
E.24.1. Overview ......................................................................................................1273
E.24.2. Migration to version 7.2 ...............................................................................1273
E.24.3. Changes ........................................................................................................1274
E.24.3.1. Server Operation ..............................................................................1274
E.24.3.2. Performance .....................................................................................1275
E.24.3.3. Privileges..........................................................................................1275
E.24.3.4. Client Authentication .......................................................................1275
E.24.3.5. Server Configuration........................................................................1275
E.24.3.6. Queries .............................................................................................1276
E.24.3.7. Schema Manipulation ......................................................................1276
E.24.3.8. Utility Commands............................................................................1277
E.24.3.9. Data Types and Functions................................................................1277
E.24.3.10. Internationalization ........................................................................1278
E.24.3.11. PL/pgSQL ......................................................................................1279
E.24.3.12. PL/Perl ...........................................................................................1279
E.24.3.13. PL/Tcl ............................................................................................1279
E.24.3.14. PL/Python ......................................................................................1279
E.24.3.15. psql.................................................................................................1279
E.24.3.16. libpq ...............................................................................................1280
E.24.3.17. JDBC..............................................................................................1280
E.24.3.18. ODBC ............................................................................................1281
E.24.3.19. ECPG .............................................................................................1281
E.24.3.20. Misc. Interfaces..............................................................................1281
E.24.3.21. Build and Install.............................................................................1282
E.24.3.22. Source Code ...................................................................................1282
E.24.3.23. Contrib ...........................................................................................1283
E.25. Release 7.1.3 .............................................................................................................1283
E.25.1. Migration to version 7.1.3 ............................................................................1283
E.25.2. Changes ........................................................................................................1283
E.26. Release 7.1.2 .............................................................................................................1284
E.26.1. Migration to version 7.1.2 ............................................................................1284
E.26.2. Changes ........................................................................................................1284

xxv
E.27. Release 7.1.1 .............................................................................................................1284
E.27.1. Migration to version 7.1.1 ............................................................................1284
E.27.2. Changes ........................................................................................................1284
E.28. Release 7.1 ................................................................................................................1285
E.28.1. Migration to version 7.1 ...............................................................................1285
E.28.2. Changes ........................................................................................................1286
E.29. Release 7.0.3 .............................................................................................................1289
E.29.1. Migration to version 7.0.3 ............................................................................1289
E.29.2. Changes ........................................................................................................1289
E.30. Release 7.0.2 .............................................................................................................1290
E.30.1. Migration to version 7.0.2 ............................................................................1291
E.30.2. Changes ........................................................................................................1291
E.31. Release 7.0.1 .............................................................................................................1291
E.31.1. Migration to version 7.0.1 ............................................................................1291
E.31.2. Changes ........................................................................................................1291
E.32. Release 7.0 ................................................................................................................1292
E.32.1. Migration to version 7.0 ...............................................................................1292
E.32.2. Changes ........................................................................................................1293
E.33. Release 6.5.3 .............................................................................................................1299
E.33.1. Migration to version 6.5.3 ............................................................................1299
E.33.2. Changes ........................................................................................................1299
E.34. Release 6.5.2 .............................................................................................................1299
E.34.1. Migration to version 6.5.2 ............................................................................1300
E.34.2. Changes ........................................................................................................1300
E.35. Release 6.5.1 .............................................................................................................1300
E.35.1. Migration to version 6.5.1 ............................................................................1301
E.35.2. Changes ........................................................................................................1301
E.36. Release 6.5 ................................................................................................................1301
E.36.1. Migration to version 6.5 ...............................................................................1302
E.36.1.1. Multiversion Concurrency Control ..................................................1303
E.36.2. Changes ........................................................................................................1303
E.37. Release 6.4.2 .............................................................................................................1306
E.37.1. Migration to version 6.4.2 ............................................................................1307
E.37.2. Changes ........................................................................................................1307
E.38. Release 6.4.1 .............................................................................................................1307
E.38.1. Migration to version 6.4.1 ............................................................................1307
E.38.2. Changes ........................................................................................................1307
E.39. Release 6.4 ................................................................................................................1308
E.39.1. Migration to version 6.4 ...............................................................................1309
E.39.2. Changes ........................................................................................................1309
E.40. Release 6.3.2 .............................................................................................................1312
E.40.1. Changes ........................................................................................................1313
E.41. Release 6.3.1 .............................................................................................................1313
E.41.1. Changes ........................................................................................................1314
E.42. Release 6.3 ................................................................................................................1315
E.42.1. Migration to version 6.3 ...............................................................................1316
E.42.2. Changes ........................................................................................................1316
E.43. Release 6.2.1 .............................................................................................................1319

xxvi
E.43.1. Migration from version 6.2 to version 6.2.1.................................................1320
E.43.2. Changes ........................................................................................................1320
E.44. Release 6.2 ................................................................................................................1320
E.44.1. Migration from version 6.1 to version 6.2....................................................1320
E.44.2. Migration from version 1.x to version 6.2 ...................................................1321
E.44.3. Changes ........................................................................................................1321
E.45. Release 6.1.1 .............................................................................................................1323
E.45.1. Migration from version 6.1 to version 6.1.1.................................................1323
E.45.2. Changes ........................................................................................................1323
E.46. Release 6.1 ................................................................................................................1324
E.46.1. Migration to version 6.1 ...............................................................................1324
E.46.2. Changes ........................................................................................................1324
E.47. Release 6.0 ................................................................................................................1326
E.47.1. Migration from version 1.09 to version 6.0..................................................1326
E.47.2. Migration from pre-1.09 to version 6.0 ........................................................1327
E.47.3. Changes ........................................................................................................1327
E.48. Release 1.09 ..............................................................................................................1329
E.49. Release 1.02 ..............................................................................................................1329
E.49.1. Migration from version 1.02 to version 1.02.1.............................................1329
E.49.2. Dump/Reload Procedure ..............................................................................1330
E.49.3. Changes ........................................................................................................1330
E.50. Release 1.01 ..............................................................................................................1331
E.50.1. Migration from version 1.0 to version 1.01..................................................1331
E.50.2. Changes ........................................................................................................1332
E.51. Release 1.0 ................................................................................................................1333
E.51.1. Changes ........................................................................................................1333
E.52. Postgres95 Release 0.03............................................................................................1334
E.52.1. Changes ........................................................................................................1334
E.53. Postgres95 Release 0.02............................................................................................1336
E.53.1. Changes ........................................................................................................1337
E.54. Postgres95 Release 0.01............................................................................................1337
F. The CVS Repository ................................................................................................................1339
F.1. Getting The Source Via Anonymous CVS ..................................................................1339
F.2. CVS Tree Organization ...............................................................................................1340
F.3. Getting The Source Via CVSup...................................................................................1342
F.3.1. Preparing A CVSup Client System.................................................................1342
F.3.2. Running a CVSup Client ................................................................................1342
F.3.3. Installing CVSup.............................................................................................1344
F.3.4. Installation from Sources ................................................................................1345
G. Documentation ........................................................................................................................1348
G.1. DocBook.....................................................................................................................1348
G.2. Tool Sets .....................................................................................................................1348
G.2.1. Linux RPM Installation..................................................................................1349
G.2.2. FreeBSD Installation......................................................................................1349
G.2.3. Debian Packages ............................................................................................1350
G.2.4. Manual Installation from Source....................................................................1350
G.2.4.1. Installing OpenJade ...........................................................................1350
G.2.4.2. Installing the DocBook DTD Kit ......................................................1351

xxvii
G.2.4.3. Installing the DocBook DSSSL Style Sheets ....................................1352
G.2.4.4. Installing JadeTeX .............................................................................1352
G.2.5. Detection by configure ...............................................................................1352
G.3. Building The Documentation .....................................................................................1353
G.3.1. HTML ............................................................................................................1353
G.3.2. Manpages .......................................................................................................1353
G.3.3. Print Output via JadeTex................................................................................1354
G.3.4. Print Output via RTF......................................................................................1354
G.3.5. Plain Text Files...............................................................................................1356
G.3.6. Syntax Check .................................................................................................1356
G.4. Documentation Authoring ..........................................................................................1356
G.4.1. Emacs/PSGML...............................................................................................1356
G.4.2. Other Emacs modes .......................................................................................1358
G.5. Style Guide .................................................................................................................1358
G.5.1. Reference Pages .............................................................................................1358
H. External Projects .....................................................................................................................1361
H.1. Externally Developed Interfaces.................................................................................1361
H.2. Extensions...................................................................................................................1362
Bibliography .........................................................................................................................................1363
Index......................................................................................................................................................1365

xxviii
List of Tables
4-1. Operator Precedence (decreasing)......................................................................................................30
8-1. Data Types ..........................................................................................................................................81
8-2. Numeric Types....................................................................................................................................82
8-3. Monetary Types ..................................................................................................................................86
8-4. Character Types ..................................................................................................................................86
8-5. Special Character Types .....................................................................................................................88
8-6. Binary Data Types ..............................................................................................................................88
8-7. bytea Literal Escaped Octets ............................................................................................................89
8-8. bytea Output Escaped Octets............................................................................................................90
8-9. Date/Time Types.................................................................................................................................90
8-10. Date Input .........................................................................................................................................92
8-11. Time Input ........................................................................................................................................92
8-12. Time Zone Input ...............................................................................................................................93
8-13. Special Date/Time Inputs .................................................................................................................94
8-14. Date/Time Output Styles ..................................................................................................................95
8-15. Date Order Conventions ...................................................................................................................95
8-16. Geometric Types...............................................................................................................................98
8-17. Network Address Types .................................................................................................................100
8-18. cidr Type Input Examples ............................................................................................................101
8-19. Object Identifier Types ...................................................................................................................116
8-20. Pseudo-Types..................................................................................................................................117
9-1. Comparison Operators......................................................................................................................119
9-2. Mathematical Operators ...................................................................................................................121
9-3. Mathematical Functions ...................................................................................................................122
9-4. Trigonometric Functions ..................................................................................................................123
9-5. SQL String Functions and Operators ...............................................................................................124
9-6. Other String Functions .....................................................................................................................125
9-7. Built-in Conversions.........................................................................................................................129
9-8. SQL Binary String Functions and Operators ...................................................................................132
9-9. Other Binary String Functions .........................................................................................................133
9-10. Bit String Operators........................................................................................................................134
9-11. Regular Expression Match Operators.............................................................................................137
9-12. Regular Expression Atoms .............................................................................................................138
9-13. Regular Expression Quantifiers......................................................................................................139
9-14. Regular Expression Constraints .....................................................................................................140
9-15. Regular Expression Character-Entry Escapes ................................................................................142
9-16. Regular Expression Class-Shorthand Escapes ...............................................................................143
9-17. Regular Expression Constraint Escapes .........................................................................................143
9-18. Regular Expression Back References.............................................................................................143
9-19. ARE Embedded-Option Letters .....................................................................................................144
9-20. Formatting Functions .....................................................................................................................147
9-21. Template Patterns for Date/Time Formatting .................................................................................148
9-22. Template Pattern Modifiers for Date/Time Formatting ..................................................................150
9-23. Template Patterns for Numeric Formatting ....................................................................................151
9-24. to_char Examples ........................................................................................................................152
9-25. Date/Time Operators ......................................................................................................................153

xxix
9-26. Date/Time Functions ......................................................................................................................154
9-27. AT TIME ZONE Variants ................................................................................................................160
9-28. Geometric Operators ......................................................................................................................162
9-29. Geometric Functions ......................................................................................................................163
9-30. Geometric Type Conversion Functions ..........................................................................................164
9-31. cidr and inet Operators ..............................................................................................................166
9-32. cidr and inet Functions ..............................................................................................................166
9-33. macaddr Functions ........................................................................................................................167
9-34. Sequence Functions........................................................................................................................167
9-35. array Operators ............................................................................................................................171
9-36. array Functions ............................................................................................................................172
9-37. Aggregate Functions.......................................................................................................................173
9-38. Series Generating Functions...........................................................................................................180
9-39. Session Information Functions.......................................................................................................181
9-40. Access Privilege Inquiry Functions................................................................................................182
9-41. Schema Visibility Inquiry Functions..............................................................................................184
9-42. System Catalog Information Functions..........................................................................................184
9-43. Comment Information Functions ...................................................................................................186
9-44. Configuration Settings Functions ...................................................................................................186
9-45. Backend Signalling Functions........................................................................................................187
9-46. Backup Control Functions..............................................................................................................187
12-1. SQL Transaction Isolation Levels ..................................................................................................208
16-1. Short option key .............................................................................................................................277
16-2. System V IPC parameters...............................................................................................................278
20-1. Server Character Sets .....................................................................................................................309
20-2. Client/Server Character Set Conversions .......................................................................................312
23-1. Standard Statistics Views ...............................................................................................................336
23-2. Statistics Access Functions ............................................................................................................337
30-1. information_schema_catalog_name Columns......................................................................432
30-2. applicable_roles Columns ......................................................................................................432
30-3. check_constraints Columns....................................................................................................432
30-4. column_domain_usage Columns ...............................................................................................433
30-5. column_privileges Columns....................................................................................................433
30-6. column_udt_usage Columns ......................................................................................................434
30-7. columns Columns .........................................................................................................................435
30-8. constraint_column_usage Columns.......................................................................................439
30-9. constraint_table_usage Columns .........................................................................................439
30-10. data_type_privileges Columns ...........................................................................................440
30-11. domain_constraints Columns................................................................................................441
30-12. domain_udt_usage Columns ....................................................................................................442
30-13. domains Columns .......................................................................................................................442
30-14. element_types Columns ..........................................................................................................445
30-15. enabled_roles Columns ..........................................................................................................447
30-16. key_column_usage Columns ....................................................................................................448
30-17. parameters Columns .................................................................................................................448
30-18. referential_constraints Columns.....................................................................................451
30-19. role_column_grants Columns................................................................................................452
30-20. role_routine_grants Columns .............................................................................................452

xxx
30-21. role_table_grants Columns..................................................................................................453
30-22. role_usage_grants Columns..................................................................................................454
30-23. routine_privileges Columns................................................................................................454
30-24. routines Columns .....................................................................................................................455
30-25. schemata Columns .....................................................................................................................459
30-26. sql_features Columns.............................................................................................................460
30-27. sql_implementation_info Columns.....................................................................................461
30-28. sql_languages Columns ..........................................................................................................461
30-29. sql_packages Columns.............................................................................................................462
30-30. sql_sizing Columns .................................................................................................................462
30-31. sql_sizing_profiles Columns .............................................................................................463
30-32. table_constraints Columns..................................................................................................463
30-33. table_privileges Columns ....................................................................................................464
30-34. tables Columns..........................................................................................................................465
30-35. triggers Columns .....................................................................................................................466
30-36. usage_privileges Columns ....................................................................................................467
30-37. view_column_usage Columns..................................................................................................467
30-38. view_table_usage Columns ....................................................................................................468
30-39. views Columns............................................................................................................................469
31-1. Equivalent C Types for Built-In SQL Types ..................................................................................488
31-2. B-tree Strategies .............................................................................................................................519
31-3. Hash Strategies ...............................................................................................................................519
31-4. R-tree Strategies .............................................................................................................................520
31-5. B-tree Support Functions................................................................................................................520
31-6. Hash Support Functions .................................................................................................................521
31-7. R-tree Support Functions................................................................................................................521
31-8. GiST Support Functions.................................................................................................................521
41-1. System Catalogs ...........................................................................................................................1026
41-2. pg_aggregate Columns.............................................................................................................1027
41-3. pg_am Columns............................................................................................................................1028
41-4. pg_amop Columns .......................................................................................................................1029
41-5. pg_amproc Columns ...................................................................................................................1029
41-6. pg_attrdef Columns .................................................................................................................1030
41-7. pg_attribute Columns.............................................................................................................1030
41-8. pg_cast Columns .......................................................................................................................1033
41-9. pg_class Columns .....................................................................................................................1034
41-10. pg_constraint Columns ........................................................................................................1037
41-11. pg_conversion Columns ........................................................................................................1038
41-12. pg_database Columns.............................................................................................................1039
41-13. pg_depend Columns .................................................................................................................1041
41-14. pg_description Columns ......................................................................................................1042
41-15. pg_group Columns ...................................................................................................................1043
41-16. pg_index Columns ...................................................................................................................1043
41-17. pg_inherits Columns.............................................................................................................1045
41-18. pg_language Columns.............................................................................................................1045
41-19. pg_largeobject Columns ......................................................................................................1046
41-20. pg_listener Columns.............................................................................................................1047
41-21. pg_namespace Columns...........................................................................................................1048

xxxi
41-22. pg_opclass Columns ...............................................................................................................1048
41-23. pg_operator Columns.............................................................................................................1049
41-24. pg_proc Columns .....................................................................................................................1050
41-25. pg_rewrite Columns ...............................................................................................................1052
41-26. pg_shadow Columns .................................................................................................................1053
41-27. pg_statistic Columns...........................................................................................................1054
41-28. pg_tablespace Columns ........................................................................................................1055
41-29. pg_trigger Columns ...............................................................................................................1056
41-30. pg_type Columns .....................................................................................................................1057
41-31. System Views .............................................................................................................................1064
41-32. pg_indexes Columns ...............................................................................................................1064
41-33. pg_locks Columns ...................................................................................................................1065
41-34. pg_rules Columns ...................................................................................................................1066
41-35. pg_settings Columns.............................................................................................................1066
41-36. pg_stats Columns ...................................................................................................................1067
41-37. pg_tables Columns .................................................................................................................1070
41-38. pg_user Columns .....................................................................................................................1070
41-39. pg_views Columns ...................................................................................................................1071
49-1. Contents of PGDATA .....................................................................................................................1131
49-2. Overall Page Layout .....................................................................................................................1134
49-3. PageHeaderData Layout...............................................................................................................1135
49-4. HeapTupleHeaderData Layout .....................................................................................................1136
A-1. PostgreSQL Error Codes ...............................................................................................................1140
B-1. Month Names.................................................................................................................................1148
B-2. Day of the Week Names ................................................................................................................1148
B-3. Date/Time Field Modifiers.............................................................................................................1149
B-4. Time Zone Abbreviations for Input ...............................................................................................1149
B-5. Australian Time Zone Abbreviations for Input .............................................................................1152
B-6. Time Zone Names for Setting timezone .....................................................................................1153
C-1. SQL Key Words.............................................................................................................................1166

List of Figures
46-1. Structured Diagram of a Genetic Algorithm ................................................................................1122

List of Examples
8-1. Using the character types ...................................................................................................................88
8-2. Using the boolean type.....................................................................................................................97
8-3. Using the bit string types..................................................................................................................103
10-1. Exponentiation Operator Type Resolution .....................................................................................192
10-2. String Concatenation Operator Type Resolution............................................................................192
10-3. Absolute-Value and Negation Operator Type Resolution ..............................................................192
10-4. Rounding Function Argument Type Resolution.............................................................................194
10-5. Substring Function Type Resolution ..............................................................................................195
10-6. character Storage Type Conversion ...........................................................................................196

xxxii
10-7. Type Resolution with Underspecified Types in a Union ................................................................197
10-8. Type Resolution in a Simple Union................................................................................................197
10-9. Type Resolution in a Transposed Union.........................................................................................197
11-1. Setting up a Partial Index to Exclude Common Values..................................................................204
11-2. Setting up a Partial Index to Exclude Uninteresting Values...........................................................205
11-3. Setting up a Partial Unique Index...................................................................................................206
19-1. Example pg_hba.conf entries .....................................................................................................300
19-2. An example pg_ident.conf file .................................................................................................305
27-1. libpq Example Program 1...............................................................................................................394
27-2. libpq Example Program 2...............................................................................................................396
27-3. libpq Example Program 3...............................................................................................................399
28-1. Large Objects with libpq Example Program ..................................................................................407
34-1. Manual Installation of PL/pgSQL ..................................................................................................560
35-1. A PL/pgSQL Trigger Procedure.....................................................................................................590
35-2. A PL/pgSQL Trigger Procedure For Auditing ...............................................................................591
35-3. A PL/pgSQL Trigger Procedure For Maintaining A Summary Table ...........................................592
35-4. Porting a Simple Function from PL/SQL to PL/pgSQL ................................................................595
35-5. Porting a Function that Creates Another Function from PL/SQL to PL/pgSQL ...........................595
35-6. Porting a Procedure With String Manipulation and OUT Parameters from PL/SQL to PL/pgSQL597
35-7. Porting a Procedure from PL/SQL to PL/pgSQL...........................................................................599

xxxiii
Preface
This book is the official documentation of PostgreSQL. It is being written by the PostgreSQL develop-
ers and other volunteers in parallel to the development of the PostgreSQL software. It describes all the
functionality that the current version of PostgreSQL officially supports.
To make the large amount of information about PostgreSQL manageable, this book has been organized
in several parts. Each part is targeted at a different class of users, or at users in different stages of their
PostgreSQL experience:

• Part I is an informal introduction for new users.


• Part II documents the SQL query language environment, including data types and functions, as well as
user-level performance tuning. Every PostgreSQL user should read this.
• Part III describes the installation and administration of the server. Everyone who runs a PostgreSQL
server, be it for private use or for others, should read this part.
• Part IV describes the programming interfaces for PostgreSQL client programs.
• Part V contains information for advanced users about the extensibility capabilities of the server. Topics
are, for instance, user-defined data types and functions.
• Part VI contains reference information about SQL commands, client and server programs. This part
supports the other parts with structured information sorted by command or program.
• Part VII contains assorted information that may be of use to PostgreSQL developers.

1. What is PostgreSQL?
PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES,
Version 4.21, developed at the University of California at Berkeley Computer Science Department. POST-
GRES pioneered many concepts that only became available in some commercial database systems much
later.
PostgreSQL is an open-source descendant of this original Berkeley code. It supports a large part of the
SQL:2003 standard and offers many modern features:

• complex queries
• foreign keys
• triggers
• views
• transactional integrity
• multiversion concurrency control
Also, PostgreSQL can be extended by the user in many ways, for example by adding new

• data types
• functions
1. http://s2k-ftp.CS.Berkeley.EDU:8000/postgres/postgres.html

i
Preface

• operators
• aggregate functions
• index methods
• procedural languages

And because of the liberal license, PostgreSQL can be used, modified, and distributed by everyone free
of charge for any purpose, be it private, commercial, or academic.

2. A Brief History of PostgreSQL


The object-relational database management system now known as PostgreSQL is derived from the POST-
GRES package written at the University of California at Berkeley. With over a decade of development
behind it, PostgreSQL is now the most advanced open-source database available anywhere.

2.1. The Berkeley POSTGRES Project


The POSTGRES project, led by Professor Michael Stonebraker, was sponsored by the Defense Advanced
Research Projects Agency (DARPA), the Army Research Office (ARO), the National Science Foundation
(NSF), and ESL, Inc. The implementation of POSTGRES began in 1986. The initial concepts for the
system were presented in The design of POSTGRES and the definition of the initial data model appeared
in The POSTGRES data model. The design of the rule system at that time was described in The design of
the POSTGRES rules system. The rationale and architecture of the storage manager were detailed in The
design of the POSTGRES storage system.
POSTGRES has undergone several major releases since then. The first “demoware” system became op-
erational in 1987 and was shown at the 1988 ACM-SIGMOD Conference. Version 1, described in The
implementation of POSTGRES, was released to a few external users in June 1989. In response to a critique
of the first rule system (A commentary on the POSTGRES rules system), the rule system was redesigned
(On Rules, Procedures, Caching and Views in Database Systems) and Version 2 was released in June 1990
with the new rule system. Version 3 appeared in 1991 and added support for multiple storage managers,
an improved query executor, and a rewritten rule system. For the most part, subsequent releases until
Postgres95 (see below) focused on portability and reliability.
POSTGRES has been used to implement many different research and production applications. These in-
clude: a financial data analysis system, a jet engine performance monitoring package, an asteroid tracking
database, a medical information database, and several geographic information systems. POSTGRES has
also been used as an educational tool at several universities. Finally, Illustra Information Technologies
(later merged into Informix2, which is now owned by IBM3.) picked up the code and commercialized it.
In late 1992, POSTGRES became the primary data manager for the Sequoia 20004 scientific computing
project.
The size of the external user community nearly doubled during 1993. It became increasingly obvious that
maintenance of the prototype code and support was taking up large amounts of time that should have been
devoted to database research. In an effort to reduce this support burden, the Berkeley POSTGRES project
officially ended with Version 4.2.

2. http://www.informix.com/
3. http://www.ibm.com/
4. http://meteora.ucsd.edu/s2k/s2k_home.html

ii
Preface

2.2. Postgres95
In 1994, Andrew Yu and Jolly Chen added a SQL language interpreter to POSTGRES. Under a new
name, Postgres95 was subsequently released to the web to find its own way in the world as an open-
source descendant of the original POSTGRES Berkeley code.
Postgres95 code was completely ANSI C and trimmed in size by 25%. Many internal changes improved
performance and maintainability. Postgres95 release 1.0.x ran about 30-50% faster on the Wisconsin
Benchmark compared to POSTGRES, Version 4.2. Apart from bug fixes, the following were the major
enhancements:

• The query language PostQUEL was replaced with SQL (implemented in the server). Subqueries were
not supported until PostgreSQL (see below), but they could be imitated in Postgres95 with user-defined
SQL functions. Aggregate functions were re-implemented. Support for the GROUP BY query clause was
also added.
• A new program (psql) was provided for interactive SQL queries, which used GNU Readline. This
largely superseded the old monitor program.
• A new front-end library, libpgtcl, supported Tcl-based clients. A sample shell, pgtclsh, provided
new Tcl commands to interface Tcl programs with the Postgres95 server.
• The large-object interface was overhauled. The inversion large objects were the only mechanism for
storing large objects. (The inversion file system was removed.)
• The instance-level rule system was removed. Rules were still available as rewrite rules.
• A short tutorial introducing regular SQL features as well as those of Postgres95 was distributed with
the source code
• GNU make (instead of BSD make) was used for the build. Also, Postgres95 could be compiled with an
unpatched GCC (data alignment of doubles was fixed).

2.3. PostgreSQL
By 1996, it became clear that the name “Postgres95” would not stand the test of time. We chose a new
name, PostgreSQL, to reflect the relationship between the original POSTGRES and the more recent ver-
sions with SQL capability. At the same time, we set the version numbering to start at 6.0, putting the
numbers back into the sequence originally begun by the Berkeley POSTGRES project.
The emphasis during development of Postgres95 was on identifying and understanding existing problems
in the server code. With PostgreSQL, the emphasis has shifted to augmenting features and capabilities,
although work continues in all areas.
Details about what has happened in PostgreSQL since then can be found in Appendix E.

3. Conventions
This book uses the following typographical conventions to mark certain portions of text: new terms,
foreign phrases, and other important passages are emphasized in italics. Everything that represents in-

iii
Preface

put or output of the computer, in particular commands, program code, and screen output, is shown in a
monospaced font (example). Within such passages, italics (example) indicate placeholders; you must
insert an actual value instead of the placeholder. On occasion, parts of program code are emphasized in
bold face (example), if they have been added or changed since the preceding example.
The following conventions are used in the synopsis of a command: brackets ([ and ]) indicate optional
parts. (In the synopsis of a Tcl command, question marks (?) are used instead, as is usual in Tcl.) Braces
({ and }) and vertical lines (|) indicate that you must choose one alternative. Dots (...) mean that the
preceding element can be repeated.
Where it enhances the clarity, SQL commands are preceded by the prompt =>, and shell commands are
preceded by the prompt $. Normally, prompts are not shown, though.
An administrator is generally a person who is in charge of installing and running the server. A user
could be anyone who is using, or wants to use, any part of the PostgreSQL system. These terms should
not be interpreted too narrowly; this book does not have fixed presumptions about system administration
procedures.

4. Further Information
Besides the documentation, that is, this book, there are other resources about PostgreSQL:

FAQs
The FAQ list contains continuously updated answers to frequently asked questions.
READMEs
README files are available for most contributed packages.

Web Site
The PostgreSQL web site5 carries details on the latest release and other information to make your
work or play with PostgreSQL more productive.
Mailing Lists
The mailing lists are a good place to have your questions answered, to share experiences with other
users, and to contact the developers. Consult the PostgreSQL web site for details.
Yourself!
PostgreSQL is an open-source project. As such, it depends on the user community for ongoing sup-
port. As you begin to use PostgreSQL, you will rely on others for help, either through the documen-
tation or through the mailing lists. Consider contributing your knowledge back. Read the mailing
lists and answer questions. If you learn something which is not in the documentation, write it up and
contribute it. If you add features to the code, contribute them.

5. http://www.postgresql.org

iv
Preface

5. Bug Reporting Guidelines


When you find a bug in PostgreSQL we want to hear about it. Your bug reports play an important part
in making PostgreSQL more reliable because even the utmost care cannot guarantee that every part of
PostgreSQL will work on every platform under every circumstance.
The following suggestions are intended to assist you in forming bug reports that can be handled in an
effective fashion. No one is required to follow them but doing so tends to be to everyone’s advantage.
We cannot promise to fix every bug right away. If the bug is obvious, critical, or affects a lot of users,
chances are good that someone will look into it. It could also happen that we tell you to update to a newer
version to see if the bug happens there. Or we might decide that the bug cannot be fixed before some
major rewrite we might be planning is done. Or perhaps it is simply too hard and there are more important
things on the agenda. If you need help immediately, consider obtaining a commercial support contract.

5.1. Identifying Bugs


Before you report a bug, please read and re-read the documentation to verify that you can really do
whatever it is you are trying. If it is not clear from the documentation whether you can do something or
not, please report that too; it is a bug in the documentation. If it turns out that a program does something
different from what the documentation says, that is a bug. That might include, but is not limited to, the
following circumstances:

• A program terminates with a fatal signal or an operating system error message that would point to a
problem in the program. (A counterexample might be a “disk full” message, since you have to fix that
yourself.)
• A program produces the wrong output for any given input.
• A program refuses to accept valid input (as defined in the documentation).
• A program accepts invalid input without a notice or error message. But keep in mind that your idea of
invalid input might be our idea of an extension or compatibility with traditional practice.
• PostgreSQL fails to compile, build, or install according to the instructions on supported platforms.
Here “program” refers to any executable, not only the backend server.
Being slow or resource-hogging is not necessarily a bug. Read the documentation or ask on one of the
mailing lists for help in tuning your applications. Failing to comply to the SQL standard is not necessarily
a bug either, unless compliance for the specific feature is explicitly claimed.
Before you continue, check on the TODO list and in the FAQ to see if your bug is already known. If you
cannot decode the information on the TODO list, report your problem. The least we can do is make the
TODO list clearer.

5.2. What to report


The most important thing to remember about bug reporting is to state all the facts and only facts. Do not
speculate what you think went wrong, what “it seemed to do”, or which part of the program has a fault.
If you are not familiar with the implementation you would probably guess wrong and not help us a bit.
And even if you are, educated explanations are a great supplement to but no substitute for facts. If we are
going to fix the bug we still have to see it happen for ourselves first. Reporting the bare facts is relatively

v
Preface

straightforward (you can probably copy and paste them from the screen) but all too often important details
are left out because someone thought it does not matter or the report would be understood anyway.
The following items should be contained in every bug report:

• The exact sequence of steps from program start-up necessary to reproduce the problem. This should
be self-contained; it is not enough to send in a bare SELECT statement without the preceding CREATE
TABLE and INSERT statements, if the output should depend on the data in the tables. We do not have
the time to reverse-engineer your database schema, and if we are supposed to make up our own data we
would probably miss the problem.
The best format for a test case for SQL-related problems is a file that can be run through the psql
frontend that shows the problem. (Be sure to not have anything in your ~/.psqlrc start-up file.) An
easy start at this file is to use pg_dump to dump out the table declarations and data needed to set the
scene, then add the problem query. You are encouraged to minimize the size of your example, but this
is not absolutely necessary. If the bug is reproducible, we will find it either way.
If your application uses some other client interface, such as PHP, then please try to isolate the offending
queries. We will probably not set up a web server to reproduce your problem. In any case remember
to provide the exact input files; do not guess that the problem happens for “large files” or “midsize
databases”, etc. since this information is too inexact to be of use.

• The output you got. Please do not say that it “didn’t work” or “crashed”. If there is an error message,
show it, even if you do not understand it. If the program terminates with an operating system error,
say which. If nothing at all happens, say so. Even if the result of your test case is a program crash or
otherwise obvious it might not happen on our platform. The easiest thing is to copy the output from the
terminal, if possible.

Note: If you are reporting an error message, please obtain the most verbose form of the message.
In psql, say \set VERBOSITY verbose beforehand. If you are extracting the message from the
server log, set the run-time parameter log_error_verbosity to verbose so that all details are logged.

Note: In case of fatal errors, the error message reported by the client might not contain all the
information available. Please also look at the log output of the database server. If you do not keep
your server’s log output, this would be a good time to start doing so.

• The output you expected is very important to state. If you just write “This command gives me that
output.” or “This is not what I expected.”, we might run it ourselves, scan the output, and think it
looks OK and is exactly what we expected. We should not have to spend the time to decode the exact
semantics behind your commands. Especially refrain from merely saying that “This is not what SQL
says/Oracle does.” Digging out the correct behavior from SQL is not a fun undertaking, nor do we all
know how all the other relational databases out there behave. (If your problem is a program crash, you
can obviously omit this item.)

vi
Preface

• Any command line options and other start-up options, including any relevant environment variables or
configuration files that you changed from the default. Again, please provide exact information. If you
are using a prepackaged distribution that starts the database server at boot time, you should try to find
out how that is done.
• Anything you did at all differently from the installation instructions.
• The PostgreSQL version. You can run the command SELECT version(); to find out the version of
the server you are connected to. Most executable programs also support a --version option; at least
postmaster --version and psql --version should work. If the function or the options do not
exist then your version is more than old enough to warrant an upgrade. If you run a prepackaged version,
such as RPMs, say so, including any subversion the package may have. If you are talking about a CVS
snapshot, mention that, including its date and time.
If your version is older than 8.0.0 we will almost certainly tell you to upgrade. There are many bug
fixes and improvements in each new release, so it is quite possible that a bug you have encountered in
an older release of PostgreSQL has already been fixed. We can only provide limited support for sites
using older releases of PostgreSQL; if you require more than we can provide, consider acquiring a
commercial support contract.

• Platform information. This includes the kernel name and version, C library, processor, memory infor-
mation, and so on. In most cases it is sufficient to report the vendor and version, but do not assume
everyone knows what exactly “Debian” contains or that everyone runs on Pentiums. If you have instal-
lation problems then information about the toolchain on your machine (compiler, make, and so on) is
also necessary.
Do not be afraid if your bug report becomes rather lengthy. That is a fact of life. It is better to report
everything the first time than us having to squeeze the facts out of you. On the other hand, if your input
files are huge, it is fair to ask first whether somebody is interested in looking into it.
Do not spend all your time to figure out which changes in the input make the problem go away. This will
probably not help solving it. If it turns out that the bug cannot be fixed right away, you will still have time
to find and share your work-around. Also, once again, do not waste your time guessing why the bug exists.
We will find that out soon enough.
When writing a bug report, please avoid confusing terminology. The software package in total is called
“PostgreSQL”, sometimes “Postgres” for short. If you are specifically talking about the backend server,
mention that, do not just say “PostgreSQL crashes”. A crash of a single backend server process is quite
different from crash of the parent “postmaster” process; please don’t say “the postmaster crashed” when
you mean a single backend process went down, nor vice versa. Also, client programs such as the interactive
frontend “psql” are completely separate from the backend. Please try to be specific about whether the
problem is on the client or server side.

5.3. Where to report bugs


In general, send bug reports to the bug report mailing list at <[email protected]>. You are
requested to use a descriptive subject for your email message, perhaps parts of the error message.
Another method is to fill in the bug report web-form available at the project’s web site
http://www.postgresql.org/. Entering a bug report this way causes it to be mailed to the

vii
Preface

<[email protected]> mailing list.


Do not send bug reports to any of the user mailing lists, such as <[email protected]> or
<[email protected]>. These mailing lists are for answering user questions, and their
subscribers normally do not wish to receive bug reports. More importantly, they are unlikely to fix them.
Also, please do not send reports to the developers’ mailing list <[email protected]>.
This list is for discussing the development of PostgreSQL, and it would be nice if we could keep the bug
reports separate. We might choose to take up a discussion about your bug report on pgsql-hackers, if
the problem needs more review.
If you have a problem with the documentation, the best place to report it is the documentation mailing
list <[email protected]>. Please be specific about what part of the documentation you are
unhappy with.
If your bug is a portability problem on a non-supported platform, send mail to
<[email protected]>, so we (and you) can work on porting PostgreSQL to your
platform.

Note: Due to the unfortunate amount of spam going around, all of the above email addresses are
closed mailing lists. That is, you need to be subscribed to a list to be allowed to post on it. (You need
not be subscribed to use the bug-report web form, however.) If you would like to send mail but do not
want to receive list traffic, you can subscribe and set your subscription option to nomail. For more
information send mail to <[email protected]> with the single word help in the body of the
message.

viii
I. Tutorial
Welcome to the PostgreSQL Tutorial. The following few chapters are intended to give a simple introduc-
tion to PostgreSQL, relational database concepts, and the SQL language to those who are new to any one
of these aspects. We only assume some general knowledge about how to use computers. No particular
Unix or programming experience is required. This part is mainly intended to give you some hands-on
experience with important aspects of the PostgreSQL system. It makes no attempt to be a complete or
thorough treatment of the topics it covers.
After you have worked through this tutorial you might want to move on to reading Part II to gain a more
formal knowledge of the SQL language, or Part IV for information about developing applications for
PostgreSQL. Those who set up and manage their own server should also read Part III.
Chapter 1. Getting Started

1.1. Installation
Before you can use PostgreSQL you need to install it, of course. It is possible that PostgreSQL is already
installed at your site, either because it was included in your operating system distribution or because
the system administrator already installed it. If that is the case, you should obtain information from the
operating system documentation or your system administrator about how to access PostgreSQL.
If you are not sure whether PostgreSQL is already available or whether you can use it for your experimen-
tation then you can install it yourself. Doing so is not hard and it can be a good exercise. PostgreSQL can
be installed by any unprivileged user; no superuser (root) access is required.
If you are installing PostgreSQL yourself, then refer to Chapter 14 for instructions on installation, and
return to this guide when the installation is complete. Be sure to follow closely the section about setting
up the appropriate environment variables.
If your site administrator has not set things up in the default way, you may have some more work to do. For
example, if the database server machine is a remote machine, you will need to set the PGHOST environment
variable to the name of the database server machine. The environment variable PGPORT may also have to
be set. The bottom line is this: if you try to start an application program and it complains that it cannot
connect to the database, you should consult your site administrator or, if that is you, the documentation
to make sure that your environment is properly set up. If you did not understand the preceding paragraph
then read the next section.

1.2. Architectural Fundamentals


Before we proceed, you should understand the basic PostgreSQL system architecture. Understanding how
the parts of PostgreSQL interact will make this chapter somewhat clearer.
In database jargon, PostgreSQL uses a client/server model. A PostgreSQL session consists of the follow-
ing cooperating processes (programs):

• A server process, which manages the database files, accepts connections to the database from client
applications, and performs actions on the database on behalf of the clients. The database server program
is called postmaster.
• The user’s client (frontend) application that wants to perform database operations. Client applications
can be very diverse in nature: a client could be a text-oriented tool, a graphical application, a web server
that accesses the database to display web pages, or a specialized database maintenance tool. Some client
applications are supplied with the PostgreSQL distribution; most are developed by users.

As is typical of client/server applications, the client and the server can be on different hosts. In that case
they communicate over a TCP/IP network connection. You should keep this in mind, because the files that
can be accessed on a client machine might not be accessible (or might only be accessible using a different
file name) on the database server machine.

1
Chapter 1. Getting Started

The PostgreSQL server can handle multiple concurrent connections from clients. For that purpose it starts
(“forks”) a new process for each connection. From that point on, the client and the new server process
communicate without intervention by the original postmaster process. Thus, the postmaster is always
running, waiting for client connections, whereas client and associated server processes come and go. (All
of this is of course invisible to the user. We only mention it here for completeness.)

1.3. Creating a Database


The first test to see whether you can access the database server is to try to create a database. A running
PostgreSQL server can manage many databases. Typically, a separate database is used for each project or
for each user.
Possibly, your site administrator has already created a database for your use. He should have told you
what the name of your database is. In that case you can omit this step and skip ahead to the next section.
To create a new database, in this example named mydb, you use the following command:

$ createdb mydb

This should produce as response:

CREATE DATABASE

If so, this step was successful and you can skip over the remainder of this section.
If you see a message similar to

createdb: command not found

then PostgreSQL was not installed properly. Either it was not installed at all or the search path was not set
correctly. Try calling the command with an absolute path instead:

$ /usr/local/pgsql/bin/createdb mydb

The path at your site might be different. Contact your site administrator or check back in the installation
instructions to correct the situation.
Another response could be this:

createdb: could not connect to database template1: could not connect to server:
No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

This means that the server was not started, or it was not started where createdb expected it. Again, check
the installation instructions or consult the administrator.
Another response could be this:

createdb: could not connect to database template1: FATAL: user "joe" does not
exist

where your own login name is mentioned. This will happen if the administrator has not created a Post-
greSQL user account for you. (PostgreSQL user accounts are distinct from operating system user ac-

2
Chapter 1. Getting Started

counts.) If you are the administrator, see Chapter 17 for help creating accounts. You will need to become
the operating system user under which PostgreSQL was installed (usually postgres) to create the first
user account. It could also be that you were assigned a PostgreSQL user name that is different from your
operating system user name; in that case you need to use the -U switch or set the PGUSER environment
variable to specify your PostgreSQL user name.
If you have a user account but it does not have the privileges required to create a database, you will see
the following:

createdb: database creation failed: ERROR: permission denied to create database

Not every user has authorization to create new databases. If PostgreSQL refuses to create databases for
you then the site administrator needs to grant you permission to create databases. Consult your site ad-
ministrator if this occurs. If you installed PostgreSQL yourself then you should log in for the purposes of
this tutorial under the user account that you started the server as. 1
You can also create databases with other names. PostgreSQL allows you to create any number of databases
at a given site. Database names must have an alphabetic first character and are limited to 63 characters in
length. A convenient choice is to create a database with the same name as your current user name. Many
tools assume that database name as the default, so it can save you some typing. To create that database,
simply type

$ createdb

If you do not want to use your database anymore you can remove it. For example, if you are the owner
(creator) of the database mydb, you can destroy it using the following command:

$ dropdb mydb

(For this command, the database name does not default to the user account name. You always need to
specify it.) This action physically removes all files associated with the database and cannot be undone, so
this should only be done with a great deal of forethought.
More about createdb and dropdb may be found in createdb and dropdb respectively.

1.4. Accessing a Database


Once you have created a database, you can access it by:

• Running the PostgreSQL interactive terminal program, called psql, which allows you to interactively
enter, edit, and execute SQL commands.
• Using an existing graphical frontend tool like PgAccess or an office suite with ODBC support to create
and manipulate a database. These possibilities are not covered in this tutorial.

1. As an explanation for why this works: PostgreSQL user names are separate from operating system user accounts. If you
connect to a database, you can choose what PostgreSQL user name to connect as; if you don’t, it will default to the same name
as your current operating system account. As it happens, there will always be a PostgreSQL user account that has the same name
as the operating system user that started the server, and it also happens that that user always has permission to create databases.
Instead of logging in as that user you can also specify the -U option everywhere to select a PostgreSQL user name to connect as.

3
Chapter 1. Getting Started

• Writing a custom application, using one of the several available language bindings. These possibilities
are discussed further in Part IV.
You probably want to start up psql, to try out the examples in this tutorial. It can be activated for the
mydb database by typing the command:

$ psql mydb

If you leave off the database name then it will default to your user account name. You already discovered
this scheme in the previous section.
In psql, you will be greeted with the following message:

Welcome to psql 8.0.0, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms


\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

mydb=>

The last line could also be

mydb=#

That would mean you are a database superuser, which is most likely the case if you installed PostgreSQL
yourself. Being a superuser means that you are not subject to access controls. For the purpose of this
tutorial this is not of importance.
If you encounter problems starting psql then go back to the previous section. The diagnostics of
createdb and psql are similar, and if the former worked the latter should work as well.

The last line printed out by psql is the prompt, and it indicates that psql is listening to you and that you
can type SQL queries into a work space maintained by psql. Try out these commands:

mydb=> SELECT version();


version
----------------------------------------------------------------
PostgreSQL 8.0.0 on i586-pc-linux-gnu, compiled by GCC 2.96
(1 row)

mydb=> SELECT current_date;


date
------------
2002-08-31
(1 row)

mydb=> SELECT 2 + 2;
?column?
----------
4
(1 row)

4
Chapter 1. Getting Started

The psql program has a number of internal commands that are not SQL commands. They begin with
the backslash character, “\”. Some of these commands were listed in the welcome message. For example,
you can get help on the syntax of various PostgreSQL SQL commands by typing:

mydb=> \h

To get out of psql, type

mydb=> \q

and psql will quit and return you to your command shell. (For more internal commands, type \? at the
psql prompt.) The full capabilities of psql are documented in psql. If PostgreSQL is installed correctly
you can also type man psql at the operating system shell prompt to see the documentation. In this tutorial
we will not use these features explicitly, but you can use them yourself when you see fit.

5
Chapter 2. The SQL Language

2.1. Introduction
This chapter provides an overview of how to use SQL to perform simple operations. This tutorial is only
intended to give you an introduction and is in no way a complete tutorial on SQL. Numerous books have
been written on SQL, including Understanding the New SQL and A Guide to the SQL Standard. You
should be aware that some PostgreSQL language features are extensions to the standard.
In the examples that follow, we assume that you have created a database named mydb, as described in the
previous chapter, and have started psql.
Examples in this manual can also be found in the PostgreSQL source distribution in the directory
src/tutorial/. To use those files, first change to that directory and run make:

$ cd ..../src/tutorial
$ make

This creates the scripts and compiles the C files containing user-defined functions and types. (You must
use GNU make for this — it may be named something different on your system, often gmake.) Then, to
start the tutorial, do the following:

$ cd ..../src/tutorial
$ psql -s mydb
...

mydb=> \i basics.sql

The \i command reads in commands from the specified file. The -s option puts you in single step mode
which pauses before sending each statement to the server. The commands used in this section are in the
file basics.sql.

2.2. Concepts
PostgreSQL is a relational database management system (RDBMS). That means it is a system for man-
aging data stored in relations. Relation is essentially a mathematical term for table. The notion of storing
data in tables is so commonplace today that it might seem inherently obvious, but there are a number of
other ways of organizing databases. Files and directories on Unix-like operating systems form an example
of a hierarchical database. A more modern development is the object-oriented database.
Each table is a named collection of rows. Each row of a given table has the same set of named columns,
and each column is of a specific data type. Whereas columns have a fixed order in each row, it is important
to remember that SQL does not guarantee the order of the rows within the table in any way (although they
can be explicitly sorted for display).
Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server
instance constitutes a database cluster.

6
Chapter 2. The SQL Language

2.3. Creating a New Table


You can create a new table by specifying the table name, along with all column names and their types:

CREATE TABLE weather (


city varchar(80),
temp_lo int, -- low temperature
temp_hi int, -- high temperature
prcp real, -- precipitation
date date
);

You can enter this into psql with the line breaks. psql will recognize that the command is not terminated
until the semicolon.
White space (i.e., spaces, tabs, and newlines) may be used freely in SQL commands. That means you can
type the command aligned differently than above, or even all on one line. Two dashes (“--”) introduce
comments. Whatever follows them is ignored up to the end of the line. SQL is case insensitive about key
words and identifiers, except when identifiers are double-quoted to preserve the case (not done above).
varchar(80) specifies a data type that can store arbitrary character strings up to 80 characters in length.
int is the normal integer type. real is a type for storing single precision floating-point numbers. date
should be self-explanatory. (Yes, the column of type date is also named date. This may be convenient
or confusing — you choose.)
PostgreSQL supports the standard SQL types int, smallint, real, double precision, char(N ),
varchar(N ), date, time, timestamp, and interval, as well as other types of general utility and a
rich set of geometric types. PostgreSQL can be customized with an arbitrary number of user-defined data
types. Consequently, type names are not syntactical key words, except where required to support special
cases in the SQL standard.
The second example will store cities and their associated geographical location:

CREATE TABLE cities (


name varchar(80),
location point
);

The point type is an example of a PostgreSQL-specific data type.


Finally, it should be mentioned that if you don’t need a table any longer or want to recreate it differently
you can remove it using the following command:

DROP TABLE tablename;

2.4. Populating a Table With Rows


The INSERT statement is used to populate a table with rows:

INSERT INTO weather VALUES (’San Francisco’, 46, 50, 0.25, ’1994-11-27’);

7
Chapter 2. The SQL Language

Note that all data types use rather obvious input formats. Constants that are not simple numeric values
usually must be surrounded by single quotes (’), as in the example. The date type is actually quite flexible
in what it accepts, but for this tutorial we will stick to the unambiguous format shown here.
The point type requires a coordinate pair as input, as shown here:

INSERT INTO cities VALUES (’San Francisco’, ’(-194.0, 53.0)’);

The syntax used so far requires you to remember the order of the columns. An alternative syntax allows
you to list the columns explicitly:

INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)


VALUES (’San Francisco’, 43, 57, 0.0, ’1994-11-29’);

You can list the columns in a different order if you wish or even omit some columns, e.g., if the precipi-
tation is unknown:

INSERT INTO weather (date, city, temp_hi, temp_lo)


VALUES (’1994-11-29’, ’Hayward’, 54, 37);

Many developers consider explicitly listing the columns better style than relying on the order implicitly.
Please enter all the commands shown above so you have some data to work with in the following sections.
You could also have used COPY to load large amounts of data from flat-text files. This is usually faster
because the COPY command is optimized for this application while allowing less flexibility than INSERT.
An example would be:

COPY weather FROM ’/home/user/weather.txt’;

where the file name for the source file must be available to the backend server machine, not the client,
since the backend server reads the file directly. You can read more about the COPY command in COPY.

2.5. Querying a Table


To retrieve data from a table, the table is queried. An SQL SELECT statement is used to do this. The
statement is divided into a select list (the part that lists the columns to be returned), a table list (the part
that lists the tables from which to retrieve the data), and an optional qualification (the part that specifies
any restrictions). For example, to retrieve all the rows of table weather, type:

SELECT * FROM weather;

Here * is a shorthand for “all columns”. 1 So the same result would be had with:

SELECT city, temp_lo, temp_hi, prcp, date FROM weather;

The output should be:

city | temp_lo | temp_hi | prcp | date

1. While SELECT * is useful for off-the-cuff queries, it is widely considered bad style in production code, since adding a column
to the table would change the results.

8
Chapter 2. The SQL Language

---------------+---------+---------+------+------------
San Francisco | 46 | 50 | 0.25 | 1994-11-27
San Francisco | 43 | 57 | 0 | 1994-11-29
Hayward | 37 | 54 | | 1994-11-29
(3 rows)

You can write expressions, not just simple column references, in the select list. For example, you can do:

SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;

This should give:

city | temp_avg | date


---------------+----------+------------
San Francisco | 48 | 1994-11-27
San Francisco | 50 | 1994-11-29
Hayward | 45 | 1994-11-29
(3 rows)

Notice how the AS clause is used to relabel the output column. (The AS clause is optional.)
A query can be “qualified” by adding a WHERE clause that specifies which rows are wanted. The WHERE
clause contains a Boolean (truth value) expression, and only rows for which the Boolean expression is
true are returned. The usual Boolean operators (AND, OR, and NOT) are allowed in the qualification. For
example, the following retrieves the weather of San Francisco on rainy days:

SELECT * FROM weather


WHERE city = ’San Francisco’ AND prcp > 0.0;

Result:

city | temp_lo | temp_hi | prcp | date


---------------+---------+---------+------+------------
San Francisco | 46 | 50 | 0.25 | 1994-11-27
(1 row)

You can request that the results of a query be returned in sorted order:

SELECT * FROM weather


ORDER BY city;

city | temp_lo | temp_hi | prcp | date


---------------+---------+---------+------+------------
Hayward | 37 | 54 | | 1994-11-29
San Francisco | 43 | 57 | 0 | 1994-11-29
San Francisco | 46 | 50 | 0.25 | 1994-11-27

In this example, the sort order isn’t fully specified, and so you might get the San Francisco rows in either
order. But you’d always get the results shown above if you do

SELECT * FROM weather

9
Chapter 2. The SQL Language

ORDER BY city, temp_lo;

You can request that duplicate rows be removed from the result of a query:

SELECT DISTINCT city


FROM weather;

city
---------------
Hayward
San Francisco
(2 rows)

Here again, the result row ordering might vary. You can ensure consistent results by using DISTINCT and
ORDER BY together: 2

SELECT DISTINCT city


FROM weather
ORDER BY city;

2.6. Joins Between Tables


Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or
access the same table in such a way that multiple rows of the table are being processed at the same time.
A query that accesses multiple rows of the same or different tables at one time is called a join query. As
an example, say you wish to list all the weather records together with the location of the associated city.
To do that, we need to compare the city column of each row of the weather table with the name column
of all rows in the cities table, and select the pairs of rows where these values match.

Note: This is only a conceptual model. The join is usually performed in a more efficient manner than
actually comparing each possible pair of rows, but this is invisible to the user.

This would be accomplished by the following query:

SELECT *
FROM weather, cities
WHERE city = name;

city | temp_lo | temp_hi | prcp | date | name | location


---------------+---------+---------+------+------------+---------------+-----------
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(2 rows)

2. In some database systems, including older versions of PostgreSQL, the implementation of DISTINCT automatically orders the
rows and so ORDER BY is redundant. But this is not required by the SQL standard, and current PostgreSQL doesn’t guarantee that
DISTINCT causes the rows to be ordered.

10
Chapter 2. The SQL Language

Observe two things about the result set:

• There is no result row for the city of Hayward. This is because there is no matching entry in the cities
table for Hayward, so the join ignores the unmatched rows in the weather table. We will see shortly how
this can be fixed.
• There are two columns containing the city name. This is correct because the lists of columns of the
weather and the cities table are concatenated. In practice this is undesirable, though, so you will
probably want to list the output columns explicitly rather than using *:
SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;

Exercise: Attempt to find out the semantics of this query when the WHERE clause is omitted.
Since the columns all had different names, the parser automatically found out which table they belong to,
but it is good style to fully qualify column names in join queries:

SELECT weather.city, weather.temp_lo, weather.temp_hi,


weather.prcp, weather.date, cities.location
FROM weather, cities
WHERE cities.name = weather.city;

Join queries of the kind seen thus far can also be written in this alternative form:

SELECT *
FROM weather INNER JOIN cities ON (weather.city = cities.name);

This syntax is not as commonly used as the one above, but we show it here to help you understand the
following topics.
Now we will figure out how we can get the Hayward records back in. What we want the query to do is to
scan the weather table and for each row to find the matching cities row. If no matching row is found
we want some “empty values” to be substituted for the cities table’s columns. This kind of query is
called an outer join. (The joins we have seen so far are inner joins.) The command looks like this:

SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);

city | temp_lo | temp_hi | prcp | date | name | location


---------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(3 rows)

This query is called a left outer join because the table mentioned on the left of the join operator will have
each of its rows in the output at least once, whereas the table on the right will only have those rows output
that match some row of the left table. When outputting a left-table row for which there is no right-table
match, empty (null) values are substituted for the right-table columns.

11
Chapter 2. The SQL Language

Exercise: There are also right outer joins and full outer joins. Try to find out what those do.
We can also join a table against itself. This is called a self join. As an example, suppose we wish to find
all the weather records that are in the temperature range of other weather records. So we need to compare
the temp_lo and temp_hi columns of each weather row to the temp_lo and temp_hi columns of all
other weather rows. We can do this with the following query:

SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,


W2.city, W2.temp_lo AS low, W2.temp_hi AS high
FROM weather W1, weather W2
WHERE W1.temp_lo < W2.temp_lo
AND W1.temp_hi > W2.temp_hi;

city | low | high | city | low | high


---------------+-----+------+---------------+-----+------
San Francisco | 43 | 57 | San Francisco | 46 | 50
Hayward | 37 | 54 | San Francisco | 46 | 50
(2 rows)

Here we have relabeled the weather table as W1 and W2 to be able to distinguish the left and right side of
the join. You can also use these kinds of aliases in other queries to save some typing, e.g.:

SELECT *
FROM weather w, cities c
WHERE w.city = c.name;

You will encounter this style of abbreviating quite frequently.

2.7. Aggregate Functions


Like most other relational database products, PostgreSQL supports aggregate functions. An aggregate
function computes a single result from multiple input rows. For example, there are aggregates to compute
the count, sum, avg (average), max (maximum) and min (minimum) over a set of rows.
As an example, we can find the highest low-temperature reading anywhere with

SELECT max(temp_lo) FROM weather;

max
-----
46
(1 row)

If we wanted to know what city (or cities) that reading occurred in, we might try

SELECT city FROM weather WHERE temp_lo = max(temp_lo); WRONG

but this will not work since the aggregate max cannot be used in the WHERE clause. (This restriction
exists because the WHERE clause determines the rows that will go into the aggregation stage; so it has to
be evaluated before aggregate functions are computed.) However, as is often the case the query can be
restated to accomplish the intended result, here by using a subquery:

12
Chapter 2. The SQL Language

SELECT city FROM weather


WHERE temp_lo = (SELECT max(temp_lo) FROM weather);

city
---------------
San Francisco
(1 row)

This is OK because the subquery is an independent computation that computes its own aggregate sepa-
rately from what is happening in the outer query.
Aggregates are also very useful in combination with GROUP BY clauses. For example, we can get the
maximum low temperature observed in each city with

SELECT city, max(temp_lo)


FROM weather
GROUP BY city;

city | max
---------------+-----
Hayward | 37
San Francisco | 46
(2 rows)

which gives us one output row per city. Each aggregate result is computed over the table rows matching
that city. We can filter these grouped rows using HAVING:

SELECT city, max(temp_lo)


FROM weather
GROUP BY city
HAVING max(temp_lo) < 40;

city | max
---------+-----
Hayward | 37
(1 row)

which gives us the same results for only the cities that have all temp_lo values below 40. Finally, if we
only care about cities whose names begin with “S”, we might do

SELECT city, max(temp_lo)


FROM weather
WHERE city LIKE ’S%’➊
GROUP BY city
HAVING max(temp_lo) < 40;

➊ The LIKE operator does pattern matching and is explained in Section 9.7.

It is important to understand the interaction between aggregates and SQL’s WHERE and HAVING clauses.
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups
and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas
HAVING selects group rows after groups and aggregates are computed. Thus, the WHERE clause must not

13
Chapter 2. The SQL Language

contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will
be inputs to the aggregates. On the other hand, the HAVING clause always contains aggregate functions.
(Strictly speaking, you are allowed to write a HAVING clause that doesn’t use aggregates, but it’s wasteful.
The same condition could be used more efficiently at the WHERE stage.)
In the previous example, we can apply the city name restriction in WHERE, since it needs no aggregate.
This is more efficient than adding the restriction to HAVING, because we avoid doing the grouping and
aggregate calculations for all rows that fail the WHERE check.

2.8. Updates
You can update existing rows using the UPDATE command. Suppose you discover the temperature readings
are all off by 2 degrees as of November 28. You may update the data as follows:

UPDATE weather
SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
WHERE date > ’1994-11-28’;

Look at the new state of the data:

SELECT * FROM weather;

city | temp_lo | temp_hi | prcp | date


---------------+---------+---------+------+------------
San Francisco | 46 | 50 | 0.25 | 1994-11-27
San Francisco | 41 | 55 | 0 | 1994-11-29
Hayward | 35 | 52 | | 1994-11-29
(3 rows)

2.9. Deletions
Rows can be removed from a table using the DELETE command. Suppose you are no longer interested in
the weather of Hayward. Then you can do the following to delete those rows from the table:

DELETE FROM weather WHERE city = ’Hayward’;

All weather records belonging to Hayward are removed.

SELECT * FROM weather;

city | temp_lo | temp_hi | prcp | date


---------------+---------+---------+------+------------
San Francisco | 46 | 50 | 0.25 | 1994-11-27
San Francisco | 41 | 55 | 0 | 1994-11-29
(2 rows)

14
Chapter 2. The SQL Language

One should be wary of statements of the form

DELETE FROM tablename;

Without a qualification, DELETE will remove all rows from the given table, leaving it empty. The system
will not request confirmation before doing this!

15
Chapter 3. Advanced Features

3.1. Introduction
In the previous chapter we have covered the basics of using SQL to store and access your data in Post-
greSQL. We will now discuss some more advanced features of SQL that simplify management and prevent
loss or corruption of your data. Finally, we will look at some PostgreSQL extensions.
This chapter will on occasion refer to examples found in Chapter 2 to change or improve them, so it will
be of advantage if you have read that chapter. Some examples from this chapter can also be found in
advanced.sql in the tutorial directory. This file also contains some example data to load, which is not
repeated here. (Refer to Section 2.1 for how to use the file.)

3.2. Views
Refer back to the queries in Section 2.6. Suppose the combined listing of weather records and city location
is of particular interest to your application, but you do not want to type the query each time you need it.
You can create a view over the query, which gives a name to the query that you can refer to like an ordinary
table.

CREATE VIEW myview AS


SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;

SELECT * FROM myview;

Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate
the details of the structure of your tables, which may change as your application evolves, behind consistent
interfaces.
Views can be used in almost any place a real table can be used. Building views upon other views is not
uncommon.

3.3. Foreign Keys


Recall the weather and cities tables from Chapter 2. Consider the following problem: You want to
make sure that no one can insert rows in the weather table that do not have a matching entry in the
cities table. This is called maintaining the referential integrity of your data. In simplistic database
systems this would be implemented (if at all) by first looking at the cities table to check if a matching
record exists, and then inserting or rejecting the new weather records. This approach has a number of
problems and is very inconvenient, so PostgreSQL can do this for you.
The new declaration of the tables would look like this:

CREATE TABLE cities (

16
Chapter 3. Advanced Features

city varchar(80) primary key,


location point
);

CREATE TABLE weather (


city varchar(80) references cities(city),
temp_lo int,
temp_hi int,
prcp real,
date date
);

Now try inserting an invalid record:

INSERT INTO weather VALUES (’Berkeley’, 45, 53, 0.0, ’1994-11-28’);

ERROR: insert or update on table "weather" violates foreign key constraint "weather_cit
DETAIL: Key (city)=(Berkeley) is not present in table "cities".

The behavior of foreign keys can be finely tuned to your application. We will not go beyond this simple
example in this tutorial, but just refer you to Chapter 5 for more information. Making correct use of foreign
keys will definitely improve the quality of your database applications, so you are strongly encouraged to
learn about them.

3.4. Transactions
Transactions are a fundamental concept of all database systems. The essential point of a transaction is that
it bundles multiple steps into a single, all-or-nothing operation. The intermediate states between the steps
are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction
from completing, then none of the steps affect the database at all.
For example, consider a bank database that contains balances for various customer accounts, as well as
total deposit balances for branches. Suppose that we want to record a payment of $100.00 from Alice’s
account to Bob’s account. Simplifying outrageously, the SQL commands for this might look like

UPDATE accounts SET balance = balance - 100.00


WHERE name = ’Alice’;
UPDATE branches SET balance = balance - 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = ’Alice’);
UPDATE accounts SET balance = balance + 100.00
WHERE name = ’Bob’;
UPDATE branches SET balance = balance + 100.00
WHERE name = (SELECT branch_name FROM accounts WHERE name = ’Bob’);

The details of these commands are not important here; the important point is that there are several separate
updates involved to accomplish this rather simple operation. Our bank’s officers will want to be assured
that either all these updates happen, or none of them happen. It would certainly not do for a system failure
to result in Bob receiving $100.00 that was not debited from Alice. Nor would Alice long remain a happy

17
Chapter 3. Advanced Features

customer if she was debited without Bob being credited. We need a guarantee that if something goes
wrong partway through the operation, none of the steps executed so far will take effect. Grouping the
updates into a transaction gives us this guarantee. A transaction is said to be atomic: from the point of
view of other transactions, it either happens completely or not at all.
We also want a guarantee that once a transaction is completed and acknowledged by the database system,
it has indeed been permanently recorded and won’t be lost even if a crash ensues shortly thereafter. For
example, if we are recording a cash withdrawal by Bob, we do not want any chance that the debit to
his account will disappear in a crash just after he walks out the bank door. A transactional database
guarantees that all the updates made by a transaction are logged in permanent storage (i.e., on disk) before
the transaction is reported complete.
Another important property of transactional databases is closely related to the notion of atomic updates:
when multiple transactions are running concurrently, each one should not be able to see the incomplete
changes made by others. For example, if one transaction is busy totalling all the branch balances, it would
not do for it to include the debit from Alice’s branch but not the credit to Bob’s branch, nor vice versa.
So transactions must be all-or-nothing not only in terms of their permanent effect on the database, but
also in terms of their visibility as they happen. The updates made so far by an open transaction are in-
visible to other transactions until the transaction completes, whereupon all the updates become visible
simultaneously.
In PostgreSQL, a transaction is set up by surrounding the SQL commands of the transaction with BEGIN
and COMMIT commands. So our banking transaction would actually look like

BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = ’Alice’;
-- etc etc
COMMIT;

If, partway through the transaction, we decide we do not want to commit (perhaps we just noticed that
Alice’s balance went negative), we can issue the command ROLLBACK instead of COMMIT, and all our
updates so far will be canceled.
PostgreSQL actually treats every SQL statement as being executed within a transaction. If you do not is-
sue a BEGIN command, then each individual statement has an implicit BEGIN and (if successful) COMMIT
wrapped around it. A group of statements surrounded by BEGIN and COMMIT is sometimes called a trans-
action block.

Note: Some client libraries issue BEGIN and COMMIT commands automatically, so that you may get the
effect of transaction blocks without asking. Check the documentation for the interface you are using.

It’s possible to control the statements in a transaction in a more granular fashion through the use of save-
points. Savepoints allow you to selectively discard parts of the transaction, while committing the rest.
After defining a savepoint with SAVEPOINT, you can if needed roll back to the savepoint with ROLLBACK
TO. All the transaction’s database changes between defining the savepoint and rolling back to it are dis-
carded, but changes earlier than the savepoint are kept.

18
Chapter 3. Advanced Features

After rolling back to a savepoint, it continues to be defined, so you can roll back to it several times.
Conversely, if you are sure you won’t need to roll back to a particular savepoint again, it can be released,
so the system can free some resources. Keep in mind that either releasing or rolling back to a savepoint
will automatically release all savepoints that were defined after it.
All this is happening within the transaction block, so none of it is visible to other database sessions. When
and if you commit the transaction block, the committed actions become visible as a unit to other sessions,
while the rolled-back actions never become visible at all.
Remembering the bank database, suppose we debit $100.00 from Alice’s account, and credit Bob’s ac-
count, only to find later that we should have credited Wally’s account. We could do it using savepoints
like this:

BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = ’Alice’;
SAVEPOINT my_savepoint;
UPDATE accounts SET balance = balance + 100.00
WHERE name = ’Bob’;
-- oops ... forget that and use Wally’s account
ROLLBACK TO my_savepoint;
UPDATE accounts SET balance = balance + 100.00
WHERE name = ’Wally’;
COMMIT;

This example is, of course, oversimplified, but there’s a lot of control to be had over a transaction block
through the use of savepoints. Moreover, ROLLBACK TO is the only way to regain control of a transaction
block that was put in aborted state by the system due to an error, short of rolling it back completely and
starting again.

3.5. Inheritance
Inheritance is a concept from object-oriented databases. It opens up interesting new possibilities of
database design.
Let’s create two tables: A table cities and a table capitals. Naturally, capitals are also cities, so you
want some way to show the capitals implicitly when you list all cities. If you’re really clever you might
invent some scheme like this:

CREATE TABLE capitals (


name text,
population real,
altitude int, -- (in ft)
state char(2)
);

CREATE TABLE non_capitals (


name text,
population real,
altitude int -- (in ft)

19
Chapter 3. Advanced Features

);

CREATE VIEW cities AS


SELECT name, population, altitude FROM capitals
UNION
SELECT name, population, altitude FROM non_capitals;

This works OK as far as querying goes, but it gets ugly when you need to update several rows, for one
thing.
A better solution is this:

CREATE TABLE cities (


name text,
population real,
altitude int -- (in ft)
);

CREATE TABLE capitals (


state char(2)
) INHERITS (cities);

In this case, a row of capitals inherits all columns (name, population, and altitude) from its
parent, cities. The type of the column name is text, a native PostgreSQL type for variable length
character strings. State capitals have an extra column, state, that shows their state. In PostgreSQL, a table
can inherit from zero or more other tables.
For example, the following query finds the names of all cities, including state capitals, that are located at
an altitude over 500 ft.:

SELECT name, altitude


FROM cities
WHERE altitude > 500;

which returns:

name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
Madison | 845
(3 rows)

On the other hand, the following query finds all the cities that are not state capitals and are situated at an
altitude of 500 ft. or higher:

SELECT name, altitude


FROM ONLY cities
WHERE altitude > 500;

name | altitude

20
Chapter 3. Advanced Features

-----------+----------
Las Vegas | 2174
Mariposa | 1953
(2 rows)

Here the ONLY before cities indicates that the query should be run over only the cities table, and not
tables below cities in the inheritance hierarchy. Many of the commands that we have already discussed
— SELECT, UPDATE, and DELETE — support this ONLY notation.

Note: Although inheritance is frequently useful, it has not been integrated with unique constraints or
foreign keys, which limits its usefulness. See Section 5.5 for more detail.

3.6. Conclusion
PostgreSQL has many features not touched upon in this tutorial introduction, which has been oriented
toward newer users of SQL. These features are discussed in more detail in the remainder of this book.
If you feel you need more introductory material, please visit the PostgreSQL web site1 for links to more
resources.

1. http://www.postgresql.org

21
II. The SQL Language
This part describes the use of the SQL language in PostgreSQL. We start with describing the general
syntax of SQL, then explain how to create the structures to hold data, how to populate the database, and
how to query it. The middle part lists the available data types and functions for use in SQL commands.
The rest treats several aspects that are important for tuning a database for optimal performance.
The information in this part is arranged so that a novice user can follow it start to end to gain a full un-
derstanding of the topics without having to refer forward too many times. The chapters are intended to be
self-contained, so that advanced users can read the chapters individually as they choose. The information
in this part is presented in a narrative fashion in topical units. Readers looking for a complete description
of a particular command should look into Part VI.
Readers of this part should know how to connect to a PostgreSQL database and issue SQL commands.
Readers that are unfamiliar with these issues are encouraged to read Part I first. SQL commands are
typically entered using the PostgreSQL interactive terminal psql, but other programs that have similar
functionality can be used as well.
Chapter 4. SQL Syntax
This chapter describes the syntax of SQL. It forms the foundation for understanding the following chapters
which will go into detail about how the SQL commands are applied to define and modify data.
We also advise users who are already familiar with SQL to read this chapter carefully because there are
several rules and concepts that are implemented inconsistently among SQL databases or that are specific
to PostgreSQL.

4.1. Lexical Structure


SQL input consists of a sequence of commands. A command is composed of a sequence of tokens, ter-
minated by a semicolon (“;”). The end of the input stream also terminates a command. Which tokens are
valid depends on the syntax of the particular command.
A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character
symbol. Tokens are normally separated by whitespace (space, tab, newline), but need not be if there is no
ambiguity (which is generally only the case if a special character is adjacent to some other token type).
Additionally, comments can occur in SQL input. They are not tokens, they are effectively equivalent to
whitespace.
For example, the following is (syntactically) valid SQL input:

SELECT * FROM MY_TABLE;


UPDATE MY_TABLE SET A = 5;
INSERT INTO MY_TABLE VALUES (3, ’hi there’);

This is a sequence of three commands, one per line (although this is not required; more than one command
can be on a line, and commands can usefully be split across lines).
The SQL syntax is not very consistent regarding what tokens identify commands and which are operands
or parameters. The first few tokens are generally the command name, so in the above example we would
usually speak of a “SELECT”, an “UPDATE”, and an “INSERT” command. But for instance the UPDATE
command always requires a SET token to appear in a certain position, and this particular variation of
INSERT also requires a VALUES in order to be complete. The precise syntax rules for each command are
described in Part VI.

4.1.1. Identifiers and Key Words


Tokens such as SELECT, UPDATE, or VALUES in the example above are examples of key words, that is,
words that have a fixed meaning in the SQL language. The tokens MY_TABLE and A are examples of
identifiers. They identify names of tables, columns, or other database objects, depending on the command
they are used in. Therefore they are sometimes simply called “names”. Key words and identifiers have
the same lexical structure, meaning that one cannot know whether a token is an identifier or a key word
without knowing the language. A complete list of key words can be found in Appendix C.
SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and
non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters,
underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according
to the letter of the SQL standard, so their use may render applications less portable. The SQL standard

24
Chapter 4. SQL Syntax

will not define a key word that contains digits or starts or ends with an underscore, so identifiers of this
form are safe against possible conflict with future extensions of the standard.
The system uses no more than NAMEDATALEN-1 characters of an identifier; longer names can be written
in commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier
length is 63. If this limit is problematic, it can be raised by changing the NAMEDATALEN constant in
src/include/postgres_ext.h.

Identifier and key word names are case insensitive. Therefore

UPDATE MY_TABLE SET A = 5;

can equivalently be written as

uPDaTE my_TabLE SeT a = 5;

A convention often used is to write key words in upper case and names in lower case, e.g.,

UPDATE my_table SET a = 5;

There is a second kind of identifier: the delimited identifier or quoted identifier. It is formed by enclosing
an arbitrary sequence of characters in double-quotes ("). A delimited identifier is always an identifier,
never a key word. So "select" could be used to refer to a column or table named “select”, whereas an
unquoted select would be taken as a key word and would therefore provoke a parse error when used
where a table or column name is expected. The example can be written with quoted identifiers like this:

UPDATE "my_table" SET "a" = 5;

Quoted identifiers can contain any character other than a double quote itself. (To include a double quote,
write two double quotes.) This allows constructing table or column names that would otherwise not be
possible, such as ones containing spaces or ampersands. The length limitation still applies.
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower
case. For example, the identifiers FOO, foo, and "foo" are considered the same by PostgreSQL, but
"Foo" and "FOO" are different from these three and each other. (The folding of unquoted names to lower
case in PostgreSQL is incompatible with the SQL standard, which says that unquoted names should be
folded to upper case. Thus, foo should be equivalent to "FOO" not "foo" according to the standard. If
you want to write portable applications you are advised to always quote a particular name or never quote
it.)

4.1.2. Constants
There are three kinds of implicitly-typed constants in PostgreSQL: strings, bit strings, and numbers. Con-
stants can also be specified with explicit types, which can enable more accurate representation and more
efficient handling by the system. These alternatives are discussed in the following subsections.

25
Chapter 4. SQL Syntax

4.1.2.1. String Constants


A string constant in SQL is an arbitrary sequence of characters bounded by single quotes (’), for example
’This is a string’. The standard-compliant way of writing a single-quote character within a string
constant is to write two adjacent single quotes, e.g. ’Dianne”s horse’. PostgreSQL also allows single
quotes to be escaped with a backslash (\), so for example the same string could be written ’Dianne\’s
horse’.

Another PostgreSQL extension is that C-style backslash escapes are available: \b is a backspace, \f is a
form feed, \n is a newline, \r is a carriage return, \t is a tab, and \xxx , where xxx is an octal number,
is a byte with the corresponding code. (It is your responsibility that the byte sequences you create are
valid characters in the server character set encoding.) Any other character following a backslash is taken
literally. Thus, to include a backslash in a string constant, write two backslashes.
The character with the code zero cannot be in a string constant.
Two string constants that are only separated by whitespace with at least one newline are concatenated and
effectively treated as if the string had been written in one constant. For example:

SELECT ’foo’
’bar’;

is equivalent to

SELECT ’foobar’;

but

SELECT ’foo’ ’bar’;

is not valid syntax. (This slightly bizarre behavior is specified by SQL; PostgreSQL is following the
standard.)

4.1.2.2. Dollar-Quoted String Constants


While the standard syntax for specifying string constants is usually convenient, it can be difficult to un-
derstand when the desired string contains many single quotes or backslashes, since each of those must
be doubled. To allow more readable queries in such situations, PostgreSQL provides another way, called
“dollar quoting”, to write string constants. A dollar-quoted string constant consists of a dollar sign ($),
an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that
makes up the string content, a dollar sign, the same tag that began this dollar quote, and a dollar sign. For
example, here are two different ways to specify the string “Dianne’s horse” using dollar quoting:

$$Dianne’s horse$$
$SomeTag$Dianne’s horse$SomeTag$

Notice that inside the dollar-quoted string, single quotes can be used without needing to be escaped.
Indeed, no characters inside a dollar-quoted string are ever escaped: the string content is always written
literally. Backslashes are not special, and neither are dollar signs, unless they are part of a sequence
matching the opening tag.
It is possible to nest dollar-quoted string constants by choosing different tags at each nesting level. This is
most commonly used in writing function definitions. For example:

26
Chapter 4. SQL Syntax

$function$
BEGIN
RETURN ($1 ~ $q$[\t\r\n\v\\]$q$);
END;
$function$

Here, the sequence $q$[\t\r\n\v\\]$q$ represents a dollar-quoted literal string [\t\r\n\v\\],


which will be recognized when the function body is executed by PostgreSQL. But since the sequence
does not match the outer dollar quoting delimiter $function$, it is just some more characters within the
constant so far as the outer string is concerned.
The tag, if any, of a dollar-quoted string follows the same rules as an unquoted identifier, except that it
cannot contain a dollar sign. Tags are case sensitive, so $tag$String content$tag$ is correct, but
$TAG$String content$tag$ is not.

A dollar-quoted string that follows a keyword or identifier must be separated from it by whitespace;
otherwise the dollar quoting delimiter would be taken as part of the preceding identifier.
Dollar quoting is not part of the SQL standard, but it is often a more convenient way to write complicated
string literals than the standard-compliant single quote syntax. It is particularly useful when representing
string constants inside other constants, as is often needed in procedural function definitions. With single-
quote syntax, each backslash in the above example would have to be written as four backslashes, which
would be reduced to two backslashes in parsing the original string constant, and then to one when the
inner string constant is re-parsed during function execution.

4.1.2.3. Bit-String Constants


Bit-string constants look like regular string constants with a B (upper or lower case) immediately before
the opening quote (no intervening whitespace), e.g., B’1001’. The only characters allowed within bit-
string constants are 0 and 1.
Alternatively, bit-string constants can be specified in hexadecimal notation, using a leading X (upper or
lower case), e.g., X’1FF’. This notation is equivalent to a bit-string constant with four binary digits for
each hexadecimal digit.
Both forms of bit-string constant can be continued across lines in the same way as regular string constants.
Dollar quoting cannot be used in a bit-string constant.

4.1.2.4. Numeric Constants


Numeric constants are accepted in these general forms:

digits
digits.[digits][e[+-]digits]
[digits].digits[e[+-]digits]
digitse[+-]digits

where digits is one or more decimal digits (0 through 9). At least one digit must be before or after the
decimal point, if one is used. At least one digit must follow the exponent marker (e), if one is present.
There may not be any spaces or other characters embedded in the constant. Note that any leading plus or
minus sign is not actually considered part of the constant; it is an operator applied to the constant.

27
Chapter 4. SQL Syntax

These are some examples of valid numeric constants:

42
3.5
4.
.001
5e2
1.925e-3

A numeric constant that contains neither a decimal point nor an exponent is initially presumed to be type
integer if its value fits in type integer (32 bits); otherwise it is presumed to be type bigint if its value
fits in type bigint (64 bits); otherwise it is taken to be type numeric. Constants that contain decimal
points and/or exponents are always initially presumed to be type numeric.
The initially assigned data type of a numeric constant is just a starting point for the type resolution algo-
rithms. In most cases the constant will be automatically coerced to the most appropriate type depending
on context. When necessary, you can force a numeric value to be interpreted as a specific data type by
casting it. For example, you can force a numeric value to be treated as type real (float4) by writing

REAL ’1.23’ -- string style


1.23::REAL -- PostgreSQL (historical) style

These are actually just special cases of the general casting notations discussed next.

4.1.2.5. Constants of Other Types


A constant of an arbitrary type can be entered using any one of the following notations:

type ’string’
’string’::type
CAST ( ’string’ AS type )

The string constant’s text is passed to the input conversion routine for the type called type. The result is
a constant of the indicated type. The explicit type cast may be omitted if there is no ambiguity as to the
type the constant must be (for example, when it is assigned directly to a table column), in which case it is
automatically coerced.
The string constant can be written using either regular SQL notation or dollar-quoting.
It is also possible to specify a type coercion using a function-like syntax:

typename ( ’string’ )

but not all type names may be used in this way; see Section 4.2.8 for details.
The ::, CAST(), and function-call syntaxes can also be used to specify run-time type conversions of
arbitrary expressions, as discussed in Section 4.2.8. But the form type ’string’ can only be used to
specify the type of a literal constant. Another restriction on type ’string’ is that it does not work for
array types; use :: or CAST() to specify the type of an array constant.

28
Chapter 4. SQL Syntax

4.1.3. Operators
An operator name is a sequence of up to NAMEDATALEN-1 (63 by default) characters from the following
list:

+-*/<>=~!@#%^&|‘?

There are a few restrictions on operator names, however:

• -- and /* cannot appear anywhere in an operator name, since they will be taken as the start of a
comment.
• A multiple-character operator name cannot end in + or -, unless the name also contains at least one of
these characters:
~!@#%^&|‘?
For example, @- is an allowed operator name, but *- is not. This restriction allows PostgreSQL to parse
SQL-compliant queries without requiring spaces between tokens.

When working with non-SQL-standard operator names, you will usually need to separate adjacent opera-
tors with spaces to avoid ambiguity. For example, if you have defined a left unary operator named @, you
cannot write X*@Y; you must write X* @Y to ensure that PostgreSQL reads it as two operator names not
one.

4.1.4. Special Characters


Some characters that are not alphanumeric have a special meaning that is different from being an operator.
Details on the usage can be found at the location where the respective syntax element is described. This
section only exists to advise the existence and summarize the purposes of these characters.

• A dollar sign ($) followed by digits is used to represent a positional parameter in the body of a function
definition or a prepared statement. In other contexts the dollar sign may be part of an identifier or a
dollar-quoted string constant.
• Parentheses (()) have their usual meaning to group expressions and enforce precedence. In some cases
parentheses are required as part of the fixed syntax of a particular SQL command.
• Brackets ([]) are used to select the elements of an array. See Section 8.10 for more information on
arrays.
• Commas (,) are used in some syntactical constructs to separate the elements of a list.
• The semicolon (;) terminates an SQL command. It cannot appear anywhere within a command, except
within a string constant or quoted identifier.
• The colon (:) is used to select “slices” from arrays. (See Section 8.10.) In certain SQL dialects (such
as Embedded SQL), the colon is used to prefix variable names.
• The asterisk (*) is used in some contexts to denote all the fields of a table row or composite value. It
also has a special meaning when used as the argument of the COUNT aggregate function.

29
Chapter 4. SQL Syntax

• The period (.) is used in numeric constants, and to separate schema, table, and column names.

4.1.5. Comments
A comment is an arbitrary sequence of characters beginning with double dashes and extending to the end
of the line, e.g.:

-- This is a standard SQL comment

Alternatively, C-style block comments can be used:

/* multiline comment
* with nesting: /* nested block comment */
*/

where the comment begins with /* and extends to the matching occurrence of */. These block comments
nest, as specified in the SQL standard but unlike C, so that one can comment out larger blocks of code
that may contain existing block comments.
A comment is removed from the input stream before further syntax analysis and is effectively replaced by
whitespace.

4.1.6. Lexical Precedence


Table 4-1 shows the precedence and associativity of the operators in PostgreSQL. Most operators have
the same precedence and are left-associative. The precedence and associativity of the operators is hard-
wired into the parser. This may lead to non-intuitive behavior; for example the Boolean operators < and
> have a different precedence than the Boolean operators <= and >=. Also, you will sometimes need to
add parentheses when using combinations of binary and unary operators. For instance

SELECT 5 ! - 6;

will be parsed as

SELECT 5 ! (- 6);

because the parser has no idea — until it is too late — that ! is defined as a postfix operator, not an infix
one. To get the desired behavior in this case, you must write

SELECT (5 !) - 6;

This is the price one pays for extensibility.

Table 4-1. Operator Precedence (decreasing)

Operator/Element Associativity Description


. left table/column name separator

30
Chapter 4. SQL Syntax

Operator/Element Associativity Description


:: left PostgreSQL-style typecast
[] left array element selection
- right unary minus
^ left exponentiation
*/% left multiplication, division, modulo
+- left addition, subtraction
IS IS TRUE, IS FALSE, IS
UNKNOWN, IS NULL
ISNULL test for null
NOTNULL test for not null
(any other) left all other native and user-defined
operators
IN set membership
BETWEEN range containment
OVERLAPS time interval overlap
LIKE ILIKE SIMILAR string pattern matching
<> less than, greater than
= right equality, assignment
NOT right logical negation
AND left logical conjunction
OR left logical disjunction

Note that the operator precedence rules also apply to user-defined operators that have the same names as
the built-in operators mentioned above. For example, if you define a “+” operator for some custom data
type it will have the same precedence as the built-in “+” operator, no matter what yours does.
When a schema-qualified operator name is used in the OPERATOR syntax, as for example in

SELECT 3 OPERATOR(pg_catalog.+) 4;

the OPERATOR construct is taken to have the default precedence shown in Table 4-1 for “any other” oper-
ator. This is true no matter which specific operator name appears inside OPERATOR().

4.2. Value Expressions


Value expressions are used in a variety of contexts, such as in the target list of the SELECT command,
as new column values in INSERT or UPDATE, or in search conditions in a number of commands. The
result of a value expression is sometimes called a scalar, to distinguish it from the result of a table ex-
pression (which is a table). Value expressions are therefore also called scalar expressions (or even simply
expressions). The expression syntax allows the calculation of values from primitive parts using arithmetic,
logical, set, and other operations.
A value expression is one of the following:

31
Chapter 4. SQL Syntax

• A constant or literal value.


• A column reference.
• A positional parameter reference, in the body of a function definition or prepared statement.
• A subscripted expression.
• A field selection expression.
• An operator invocation.
• A function call.
• An aggregate expression.
• A type cast.
• A scalar subquery.
• An array constructor.
• A row constructor.
• Another value expression in parentheses, useful to group subexpressions and override precedence.

In addition to this list, there are a number of constructs that can be classified as an expression but do
not follow any general syntax rules. These generally have the semantics of a function or operator and are
explained in the appropriate location in Chapter 9. An example is the IS NULL clause.
We have already discussed constants in Section 4.1.2. The following sections discuss the remaining op-
tions.

4.2.1. Column References


A column can be referenced in the form

correlation.columnname

correlation is the name of a table (possibly qualified with a schema name), or an alias for a table
defined by means of a FROM clause, or one of the key words NEW or OLD. (NEW and OLD can only appear in
rewrite rules, while other correlation names can be used in any SQL statement.) The correlation name and
separating dot may be omitted if the column name is unique across all the tables being used in the current
query. (See also Chapter 7.)

4.2.2. Positional Parameters


A positional parameter reference is used to indicate a value that is supplied externally to an SQL statement.
Parameters are used in SQL function definitions and in prepared queries. Some client libraries also support
specifying data values separately from the SQL command string, in which case parameters are used to
refer to the out-of-line data values. The form of a parameter reference is:

$number

32
Chapter 4. SQL Syntax

For example, consider the definition of a function, dept, as

CREATE FUNCTION dept(text) RETURNS dept


AS $$ SELECT * FROM dept WHERE name = $1 $$
LANGUAGE SQL;

Here the $1 will be replaced by the first function argument when the function is invoked.

4.2.3. Subscripts
If an expression yields a value of an array type, then a specific element of the array value can be extracted
by writing

expression[subscript]

or multiple adjacent elements (an “array slice”) can be extracted by writing

expression[lower_subscript:upper_subscript]

(Here, the brackets [ ] are meant to appear literally.) Each subscript is itself an expression, which
must yield an integer value.
In general the array expression must be parenthesized, but the parentheses may be omitted when the
expression to be subscripted is just a column reference or positional parameter. Also, multiple subscripts
can be concatenated when the original array is multi-dimensional. For example,

mytable.arraycolumn[4]
mytable.two_d_column[17][34]
$1[10:42]
(arrayfunction(a,b))[42]

The parentheses in the last example are required. See Section 8.10 for more about arrays.

4.2.4. Field Selection


If an expression yields a value of a composite type (row type), then a specific field of the row can be
extracted by writing

expression.fieldname

In general the row expression must be parenthesized, but the parentheses may be omitted when the
expression to be selected from is just a table reference or positional parameter. For example,

mytable.mycolumn
$1.somecolumn
(rowfunction(a,b)).col3

(Thus, a qualified column reference is actually just a special case of the field selection syntax.)

33
Chapter 4. SQL Syntax

4.2.5. Operator Invocations


There are three possible syntaxes for an operator invocation:

expression operator expression (binary infix operator)


operator expression (unary prefix operator)
expression operator (unary postfix operator)
where the operator token follows the syntax rules of Section 4.1.3, or is one of the key words AND, OR,
and NOT, or is a qualified operator name in the form

OPERATOR(schema.operatorname)

Which particular operators exist and whether they are unary or binary depends on what operators have
been defined by the system or the user. Chapter 9 describes the built-in operators.

4.2.6. Function Calls


The syntax for a function call is the name of a function (possibly qualified with a schema name), followed
by its argument list enclosed in parentheses:

function ([expression [, expression ... ]] )

For example, the following computes the square root of 2:

sqrt(2)

The list of built-in functions is in Chapter 9. Other functions may be added by the user.

4.2.7. Aggregate Expressions


An aggregate expression represents the application of an aggregate function across the rows selected by a
query. An aggregate function reduces multiple inputs to a single output value, such as the sum or average
of the inputs. The syntax of an aggregate expression is one of the following:

aggregate_name (expression)
aggregate_name (ALL expression)
aggregate_name (DISTINCT expression)
aggregate_name ( * )

where aggregate_name is a previously defined aggregate (possibly qualified with a schema name),
and expression is any value expression that does not itself contain an aggregate expression.
The first form of aggregate expression invokes the aggregate across all input rows for which the given
expression yields a non-null value. (Actually, it is up to the aggregate function whether to ignore null
values or not — but all the standard ones do.) The second form is the same as the first, since ALL is the
default. The third form invokes the aggregate for all distinct non-null values of the expression found in
the input rows. The last form invokes the aggregate once for each input row regardless of null or non-null

34
Chapter 4. SQL Syntax

values; since no particular input value is specified, it is generally only useful for the count() aggregate
function.
For example, count(*) yields the total number of input rows; count(f1) yields the number of input
rows in which f1 is non-null; count(distinct f1) yields the number of distinct non-null values of
f1.

The predefined aggregate functions are described in Section 9.15. Other aggregate functions may be added
by the user.
An aggregate expression may only appear in the result list or HAVING clause of a SELECT command. It is
forbidden in other clauses, such as WHERE, because those clauses are logically evaluated before the results
of aggregates are formed.
When an aggregate expression appears in a subquery (see Section 4.2.9 and Section 9.16), the aggregate
is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate’s argument
contains only outer-level variables: the aggregate then belongs to the nearest such outer level, and is
evaluated over the rows of that query. The aggregate expression as a whole is then an outer reference for
the subquery it appears in, and acts as a constant over any one evaluation of that subquery. The restriction
about appearing only in the result list or HAVING clause applies with respect to the query level that the
aggregate belongs to.

4.2.8. Type Casts


A type cast specifies a conversion from one data type to another. PostgreSQL accepts two equivalent
syntaxes for type casts:

CAST ( expression AS type )


expression::type

The CAST syntax conforms to SQL; the syntax with :: is historical PostgreSQL usage.
When a cast is applied to a value expression of a known type, it represents a run-time type conversion. The
cast will succeed only if a suitable type conversion operation has been defined. Notice that this is subtly
different from the use of casts with constants, as shown in Section 4.1.2.5. A cast applied to an unadorned
string literal represents the initial assignment of a type to a literal constant value, and so it will succeed
for any type (if the contents of the string literal are acceptable input syntax for the data type).
An explicit type cast may usually be omitted if there is no ambiguity as to the type that a value expression
must produce (for example, when it is assigned to a table column); the system will automatically apply
a type cast in such cases. However, automatic casting is only done for casts that are marked “OK to
apply implicitly” in the system catalogs. Other casts must be invoked with explicit casting syntax. This
restriction is intended to prevent surprising conversions from being applied silently.
It is also possible to specify a type cast using a function-like syntax:

typename ( expression )

However, this only works for types whose names are also valid as function names. For example, double
precision can’t be used this way, but the equivalent float8 can. Also, the names interval, time,
and timestamp can only be used in this fashion if they are double-quoted, because of syntactic conflicts.
Therefore, the use of the function-like cast syntax leads to inconsistencies and should probably be avoided
in new applications. (The function-like syntax is in fact just a function call. When one of the two standard

35
Chapter 4. SQL Syntax

cast syntaxes is used to do a run-time conversion, it will internally invoke a registered function to perform
the conversion. By convention, these conversion functions have the same name as their output type, and
thus the “function-like syntax” is nothing more than a direct invocation of the underlying conversion
function. Obviously, this is not something that a portable application should rely on.)

4.2.9. Scalar Subqueries


A scalar subquery is an ordinary SELECT query in parentheses that returns exactly one row with one
column. (See Chapter 7 for information about writing queries.) The SELECT query is executed and the
single returned value is used in the surrounding value expression. It is an error to use a query that returns
more than one row or more than one column as a scalar subquery. (But if, during a particular execution,
the subquery returns no rows, there is no error; the scalar result is taken to be null.) The subquery can
refer to variables from the surrounding query, which will act as constants during any one evaluation of the
subquery. See also Section 9.16 for other expressions involving subqueries.
For example, the following finds the largest city population in each state:

SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)


FROM states;

4.2.10. Array Constructors


An array constructor is an expression that builds an array value from values for its member elements. A
simple array constructor consists of the key word ARRAY, a left square bracket [, one or more expressions
(separated by commas) for the array element values, and finally a right square bracket ]. For example,

SELECT ARRAY[1,2,3+4];
array
---------
{1,2,7}
(1 row)

The array element type is the common type of the member expressions, determined using the same rules
as for UNION or CASE constructs (see Section 10.5).
Multidimensional array values can be built by nesting array constructors. In the inner constructors, the
key word ARRAY may be omitted. For example, these produce the same result:

SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];


array
---------------
{{1,2},{3,4}}
(1 row)

SELECT ARRAY[[1,2],[3,4]];
array
---------------
{{1,2},{3,4}}
(1 row)

36
Chapter 4. SQL Syntax

Since multidimensional arrays must be rectangular, inner constructors at the same level must produce
sub-arrays of identical dimensions.
Multidimensional array constructor elements can be anything yielding an array of the proper kind, not
only a sub-ARRAY construct. For example:

CREATE TABLE arr(f1 int[], f2 int[]);

INSERT INTO arr VALUES (ARRAY[[1,2],[3,4]], ARRAY[[5,6],[7,8]]);

SELECT ARRAY[f1, f2, ’{{9,10},{11,12}}’::int[]] FROM arr;


array
------------------------------------------------
{{{1,2},{3,4}},{{5,6},{7,8}},{{9,10},{11,12}}}
(1 row)

It is also possible to construct an array from the results of a subquery. In this form, the array constructor
is written with the key word ARRAY followed by a parenthesized (not bracketed) subquery. For example:

SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE ’bytea%’);


?column?
-------------------------------------------------------------
{2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31}
(1 row)

The subquery must return a single column. The resulting one-dimensional array will have an element for
each row in the subquery result, with an element type matching that of the subquery’s output column.
The subscripts of an array value built with ARRAY always begin with one. For more information about
arrays, see Section 8.10.

4.2.11. Row Constructors


A row constructor is an expression that builds a row value (also called a composite value) from values
for its member fields. A row constructor consists of the key word ROW, a left parenthesis, zero or more
expressions (separated by commas) for the row field values, and finally a right parenthesis. For example,

SELECT ROW(1,2.5,’this is a test’);

The key word ROW is optional when there is more than one expression in the list.
By default, the value created by a ROW expression is of an anonymous record type. If necessary, it can be
cast to a named composite type — either the row type of a table, or a composite type created with CREATE
TYPE AS. An explicit cast may be needed to avoid ambiguity. For example:

CREATE TABLE mytable(f1 int, f2 float, f3 text);

CREATE FUNCTION getf1(mytable) RETURNS int AS ’SELECT $1.f1’ LANGUAGE SQL;

-- No cast needed since only one getf1() exists


SELECT getf1(ROW(1,2.5,’this is a test’));

37
Chapter 4. SQL Syntax

getf1
-------
1
(1 row)

CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);

CREATE FUNCTION getf1(myrowtype) RETURNS int AS ’SELECT $1.f1’ LANGUAGE SQL;

-- Now we need a cast to indicate which function to call:


SELECT getf1(ROW(1,2.5,’this is a test’));
ERROR: function getf1(record) is not unique

SELECT getf1(ROW(1,2.5,’this is a test’)::mytable);


getf1
-------
1
(1 row)

SELECT getf1(CAST(ROW(11,’this is a test’,2.5) AS myrowtype));


getf1
-------
11
(1 row)

Row constructors can be used to build composite values to be stored in a composite-type table column,
or to be passed to a function that accepts a composite parameter. Also, it is possible to compare two row
values or test a row with IS NULL or IS NOT NULL, for example

SELECT ROW(1,2.5,’this is a test’) = ROW(1, 3, ’not the same’);

SELECT ROW(a, b, c) IS NOT NULL FROM table;

For more detail see Section 9.17. Row constructors can also be used in connection with subqueries, as
discussed in Section 9.16.

4.2.12. Expression Evaluation Rules


The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function
are not necessarily evaluated left-to-right or in any other fixed order.
Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then
other subexpressions might not be evaluated at all. For instance, if one wrote

SELECT true OR somefunc();

then somefunc() would (probably) not be called at all. The same would be the case if one wrote

SELECT somefunc() OR true;

38
Chapter 4. SQL Syntax

Note that this is not the same as the left-to-right “short-circuiting” of Boolean operators that is found in
some programming languages.
As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is
particularly dangerous to rely on side effects or evaluation order in WHERE and HAVING clauses, since
those clauses are extensively reprocessed as part of developing an execution plan. Boolean expressions
(AND/OR/NOT combinations) in those clauses may be reorganized in any manner allowed by the laws of
Boolean algebra.
When it is essential to force evaluation order, a CASE construct (see Section 9.13) may be used. For
example, this is an untrustworthy way of trying to avoid division by zero in a WHERE clause:

SELECT ... WHERE x <> 0 AND y/x > 1.5;

But this is safe:

SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false END;

A CASE construct used in this fashion will defeat optimization attempts, so it should only be done when
necessary. (In this particular example, it would doubtless be best to sidestep the problem by writing y >
1.5*x instead.)

39
Chapter 5. Data Definition
This chapter covers how one creates the database structures that will hold one’s data. In a relational
database, the raw data is stored in tables, so the majority of this chapter is devoted to explaining how
tables are created and modified and what features are available to control what data is stored in the tables.
Subsequently, we discuss how tables can be organized into schemas, and how privileges can be assigned to
tables. Finally, we will briefly look at other features that affect the data storage, such as views, functions,
and triggers.

5.1. Table Basics


A table in a relational database is much like a table on paper: It consists of rows and columns. The number
and order of the columns is fixed, and each column has a name. The number of rows is variable -- it
reflects how much data is stored at a given moment. SQL does not make any guarantees about the order of
the rows in a table. When a table is read, the rows will appear in random order, unless sorting is explicitly
requested. This is covered in Chapter 7. Furthermore, SQL does not assign unique identifiers to rows, so it
is possible to have several completely identical rows in a table. This is a consequence of the mathematical
model that underlies SQL but is usually not desirable. Later in this chapter we will see how to deal with
this issue.
Each column has a data type. The data type constrains the set of possible values that can be assigned to a
column and assigns semantics to the data stored in the column so that it can be used for computations. For
instance, a column declared to be of a numerical type will not accept arbitrary text strings, and the data
stored in such a column can be used for mathematical computations. By contrast, a column declared to be
of a character string type will accept almost any kind of data but it does not lend itself to mathematical
calculations, although other operations such as string concatenation are available.
PostgreSQL includes a sizable set of built-in data types that fit many applications. Users can also define
their own data types. Most built-in data types have obvious names and semantics, so we defer a detailed ex-
planation to Chapter 8. Some of the frequently used data types are integer for whole numbers, numeric
for possibly fractional numbers, text for character strings, date for dates, time for time-of-day values,
and timestamp for values containing both date and time.
To create a table, you use the aptly named CREATE TABLE command. In this command you specify at
least a name for the new table, the names of the columns and the data type of each column. For example:

CREATE TABLE my_first_table (


first_column text,
second_column integer
);

This creates a table named my_first_table with two columns. The first column is named
first_column and has a data type of text; the second column has the name second_column and the
type integer. The table and column names follow the identifier syntax explained in Section 4.1.1.
The type names are usually also identifiers, but there are some exceptions. Note that the column list is
comma-separated and surrounded by parentheses.
Of course, the previous example was heavily contrived. Normally, you would give names to your tables
and columns that convey what kind of data they store. So let’s look at a more realistic example:

40
Chapter 5. Data Definition

CREATE TABLE products (


product_no integer,
name text,
price numeric
);

(The numeric type can store fractional components, as would be typical of monetary amounts.)

Tip: When you create many interrelated tables it is wise to choose a consistent naming pattern for the
tables and columns. For instance, there is a choice of using singular or plural nouns for table names,
both of which are favored by some theorist or other.

There is a limit on how many columns a table can contain. Depending on the column types, it is between
250 and 1600. However, defining a table with anywhere near this many columns is highly unusual and
often a questionable design.
If you no longer need a table, you can remove it using the DROP TABLE command. For example:

DROP TABLE my_first_table;


DROP TABLE products;

Attempting to drop a table that does not exist is an error. Nevertheless, it is common in SQL script files to
unconditionally try to drop each table before creating it, ignoring the error messages.
If you need to modify a table that already exists look into Section 5.6 later in this chapter.
With the tools discussed so far you can create fully functional tables. The remainder of this chapter is
concerned with adding features to the table definition to ensure data integrity, security, or convenience. If
you are eager to fill your tables with data now you can skip ahead to Chapter 6 and read the rest of this
chapter later.

5.2. Default Values


A column can be assigned a default value. When a new row is created and no values are specified for
some of the columns, the columns will be filled with their respective default values. A data manipulation
command can also request explicitly that a column be set to its default value, without having to know what
that value is. (Details about data manipulation commands are in Chapter 6.)
If no default value is declared explicitly, the default value is the null value. This usually makes sense
because a null value can be considered to represent unknown data.
In a table definition, default values are listed after the column data type. For example:

CREATE TABLE products (


product_no integer,
name text,
price numeric DEFAULT 9.99
);

41
Chapter 5. Data Definition

The default value may be an expression, which will be evaluated whenever the default value is inserted
(not when the table is created). A common example is that a timestamp column may have a default of
now(), so that it gets set to the time of row insertion. Another common example is generating a “serial
number” for each row. In PostgreSQL this is typically done by something like

CREATE TABLE products (


product_no integer DEFAULT nextval(’products_product_no_seq’),
...
);

where the nextval() function supplies successive values from a sequence object (see Section 9.12). This
arrangement is sufficiently common that there’s a special shorthand for it:

CREATE TABLE products (


product_no SERIAL,
...
);

The SERIAL shorthand is discussed further in Section 8.1.4.

5.3. Constraints
Data types are a way to limit the kind of data that can be stored in a table. For many applications, however,
the constraint they provide is too coarse. For example, a column containing a product price should prob-
ably only accept positive values. But there is no data type that accepts only positive numbers. Another
issue is that you might want to constrain column data with respect to other columns or rows. For example,
in a table containing product information, there should only be one row for each product number.
To that end, SQL allows you to define constraints on columns and tables. Constraints give you as much
control over the data in your tables as you wish. If a user attempts to store data in a column that would
violate a constraint, an error is raised. This applies even if the value came from the default value definition.

5.3.1. Check Constraints


A check constraint is the most generic constraint type. It allows you to specify that the value in a certain
column must satisfy a Boolean (truth-value) expression. For instance, to require positive product prices,
you could use:

CREATE TABLE products (


product_no integer,
name text,
price numeric CHECK (price > 0)
);

As you see, the constraint definition comes after the data type, just like default value definitions. Default
values and constraints can be listed in any order. A check constraint consists of the key word CHECK
followed by an expression in parentheses. The check constraint expression should involve the column
thus constrained, otherwise the constraint would not make too much sense.

42
Chapter 5. Data Definition

You can also give the constraint a separate name. This clarifies error messages and allows you to refer to
the constraint when you need to change it. The syntax is:

CREATE TABLE products (


product_no integer,
name text,
price numeric CONSTRAINT positive_price CHECK (price > 0)
);

So, to specify a named constraint, use the key word CONSTRAINT followed by an identifier followed by
the constraint definition. (If you don’t specify a constraint name in this way, the system chooses a name
for you.)
A check constraint can also refer to several columns. Say you store a regular price and a discounted price,
and you want to ensure that the discounted price is lower than the regular price.

CREATE TABLE products (


product_no integer,
name text,
price numeric CHECK (price > 0),
discounted_price numeric CHECK (discounted_price > 0),
CHECK (price > discounted_price)
);

The first two constraints should look familiar. The third one uses a new syntax. It is not attached to a
particular column, instead it appears as a separate item in the comma-separated column list. Column
definitions and these constraint definitions can be listed in mixed order.
We say that the first two constraints are column constraints, whereas the third one is a table constraint
because it is written separately from any one column definition. Column constraints can also be written
as table constraints, while the reverse is not necessarily possible, since a column constraint is supposed to
refer to only the column it is attached to. (PostgreSQL doesn’t enforce that rule, but you should follow it
if you want your table definitions to work with other database systems.) The above example could also be
written as

CREATE TABLE products (


product_no integer,
name text,
price numeric,
CHECK (price > 0),
discounted_price numeric,
CHECK (discounted_price > 0),
CHECK (price > discounted_price)
);

or even

CREATE TABLE products (


product_no integer,
name text,
price numeric CHECK (price > 0),
discounted_price numeric,

43
Chapter 5. Data Definition

CHECK (discounted_price > 0 AND price > discounted_price)


);

It’s a matter of taste.


Names can be assigned to table constraints in just the same way as for column constraints:

CREATE TABLE products (


product_no integer,
name text,
price numeric,
CHECK (price > 0),
discounted_price numeric,
CHECK (discounted_price > 0),
CONSTRAINT valid_discount CHECK (price > discounted_price)
);

It should be noted that a check constraint is satisfied if the check expression evaluates to true or the null
value. Since most expressions will evaluate to the null value if any operand is null, they will not prevent
null values in the constrained columns. To ensure that a column does not contain null values, the not-null
constraint described in the next section can be used.

5.3.2. Not-Null Constraints


A not-null constraint simply specifies that a column must not assume the null value. A syntax example:

CREATE TABLE products (


product_no integer NOT NULL,
name text NOT NULL,
price numeric
);

A not-null constraint is always written as a column constraint. A not-null constraint is functionally equiv-
alent to creating a check constraint CHECK (column_name IS NOT NULL), but in PostgreSQL creating
an explicit not-null constraint is more efficient. The drawback is that you cannot give explicit names to
not-null constraints created that way.
Of course, a column can have more than one constraint. Just write the constraints one after another:

CREATE TABLE products (


product_no integer NOT NULL,
name text NOT NULL,
price numeric NOT NULL CHECK (price > 0)
);

The order doesn’t matter. It does not necessarily determine in which order the constraints are checked.
The NOT NULL constraint has an inverse: the NULL constraint. This does not mean that the column must
be null, which would surely be useless. Instead, this simply selects the default behavior that the column
may be null. The NULL constraint is not defined in the SQL standard and should not be used in portable

44
Chapter 5. Data Definition

applications. (It was only added to PostgreSQL to be compatible with some other database systems.) Some
users, however, like it because it makes it easy to toggle the constraint in a script file. For example, you
could start with

CREATE TABLE products (


product_no integer NULL,
name text NULL,
price numeric NULL
);

and then insert the NOT key word where desired.

Tip: In most database designs the majority of columns should be marked not null.

5.3.3. Unique Constraints


Unique constraints ensure that the data contained in a column or a group of columns is unique with respect
to all the rows in the table. The syntax is

CREATE TABLE products (


product_no integer UNIQUE,
name text,
price numeric
);

when written as a column constraint, and

CREATE TABLE products (


product_no integer,
name text,
price numeric,
UNIQUE (product_no)
);

when written as a table constraint.


If a unique constraint refers to a group of columns, the columns are listed separated by commas:

CREATE TABLE example (


a integer,
b integer,
c integer,
UNIQUE (a, c)
);

This specifies that the combination of values in the indicated columns is unique across the whole table,
though any one of the columns need not be (and ordinarily isn’t) unique.
You can assign your own name for a unique constraint, in the usual way:

CREATE TABLE products (

45
Chapter 5. Data Definition

product_no integer CONSTRAINT must_be_different UNIQUE,


name text,
price numeric
);

In general, a unique constraint is violated when there are two or more rows in the table where the values
of all of the columns included in the constraint are equal. However, null values are not considered equal in
this comparison. That means even in the presence of a unique constraint it is possible to store an unlimited
number of rows that contain a null value in at least one of the constrained columns. This behavior conforms
to the SQL standard, but we have heard that other SQL databases may not follow this rule. So be careful
when developing applications that are intended to be portable.

5.3.4. Primary Keys


Technically, a primary key constraint is simply a combination of a unique constraint and a not-null con-
straint. So, the following two table definitions accept the same data:

CREATE TABLE products (


product_no integer UNIQUE NOT NULL,
name text,
price numeric
);

CREATE TABLE products (


product_no integer PRIMARY KEY,
name text,
price numeric
);

Primary keys can also constrain more than one column; the syntax is similar to unique constraints:

CREATE TABLE example (


a integer,
b integer,
c integer,
PRIMARY KEY (a, c)
);

A primary key indicates that a column or group of columns can be used as a unique identifier for rows in
the table. (This is a direct consequence of the definition of a primary key. Note that a unique constraint
does not, by itself, provide a unique identifier because it does not exclude null values.) This is useful
both for documentation purposes and for client applications. For example, a GUI application that allows
modifying row values probably needs to know the primary key of a table to be able to identify rows
uniquely.

46
Chapter 5. Data Definition

A table can have at most one primary key (while it can have many unique and not-null constraints).
Relational database theory dictates that every table must have a primary key. This rule is not enforced by
PostgreSQL, but it is usually best to follow it.

5.3.5. Foreign Keys


A foreign key constraint specifies that the values in a column (or a group of columns) must match the
values appearing in some row of another table. We say this maintains the referential integrity between two
related tables.
Say you have the product table that we have used several times already:

CREATE TABLE products (


product_no integer PRIMARY KEY,
name text,
price numeric
);

Let’s also assume you have a table storing orders of those products. We want to ensure that the orders table
only contains orders of products that actually exist. So we define a foreign key constraint in the orders
table that references the products table:

CREATE TABLE orders (


order_id integer PRIMARY KEY,
product_no integer REFERENCES products (product_no),
quantity integer
);

Now it is impossible to create orders with product_no entries that do not appear in the products table.
We say that in this situation the orders table is the referencing table and the products table is the referenced
table. Similarly, there are referencing and referenced columns.
You can also shorten the above command to

CREATE TABLE orders (


order_id integer PRIMARY KEY,
product_no integer REFERENCES products,
quantity integer
);

because in absence of a column list the primary key of the referenced table is used as the referenced
column(s).
A foreign key can also constrain and reference a group of columns. As usual, it then needs to be written
in table constraint form. Here is a contrived syntax example:

CREATE TABLE t1 (
a integer PRIMARY KEY,
b integer,
c integer,
FOREIGN KEY (b, c) REFERENCES other_table (c1, c2)
);

47
Chapter 5. Data Definition

Of course, the number and type of the constrained columns need to match the number and type of the
referenced columns.
You can assign your own name for a foreign key constraint, in the usual way.
A table can contain more than one foreign key constraint. This is used to implement many-to-many rela-
tionships between tables. Say you have tables about products and orders, but now you want to allow one
order to contain possibly many products (which the structure above did not allow). You could use this
table structure:

CREATE TABLE products (


product_no integer PRIMARY KEY,
name text,
price numeric
);

CREATE TABLE orders (


order_id integer PRIMARY KEY,
shipping_address text,
...
);

CREATE TABLE order_items (


product_no integer REFERENCES products,
order_id integer REFERENCES orders,
quantity integer,
PRIMARY KEY (product_no, order_id)
);

Notice that the primary key overlaps with the foreign keys in the last table.
We know that the foreign keys disallow creation of orders that do not relate to any products. But what if
a product is removed after an order is created that references it? SQL allows you to handle that as well.
Intuitively, we have a few options:

• Disallow deleting a referenced product


• Delete the orders as well
• Something else?

To illustrate this, let’s implement the following policy on the many-to-many relationship example above:
when someone wants to remove a product that is still referenced by an order (via order_items), we
disallow it. If someone removes an order, the order items are removed as well.

CREATE TABLE products (


product_no integer PRIMARY KEY,
name text,
price numeric
);

CREATE TABLE orders (


order_id integer PRIMARY KEY,
shipping_address text,

48
Chapter 5. Data Definition

...
);

CREATE TABLE order_items (


product_no integer REFERENCES products ON DELETE RESTRICT,
order_id integer REFERENCES orders ON DELETE CASCADE,
quantity integer,
PRIMARY KEY (product_no, order_id)
);

Restricting and cascading deletes are the two most common options. RESTRICT prevents deletion of a
referenced row. NO ACTION means that if any referencing rows still exist when the constraint is checked,
an error is raised; this is the default behavior if you do not specify anything. (The essential difference
between these two choices is that NO ACTION allows the check to be deferred until later in the transaction,
whereas RESTRICT does not.) CASCADE specifies that when a referenced row is deleted, row(s) referencing
it should be automatically deleted as well. There are two other options: SET NULL and SET DEFAULT.
These cause the referencing columns to be set to nulls or default values, respectively, when the referenced
row is deleted. Note that these do not excuse you from observing any constraints. For example, if an action
specifies SET DEFAULT but the default value would not satisfy the foreign key, the operation will fail.
Analogous to ON DELETE there is also ON UPDATE which is invoked when a referenced column is
changed (updated). The possible actions are the same.
More information about updating and deleting data is in Chapter 6.
Finally, we should mention that a foreign key must reference columns that either are a primary key or
form a unique constraint. If the foreign key references a unique constraint, there are some additional
possibilities regarding how null values are matched. These are explained in the reference documentation
for CREATE TABLE.

5.4. System Columns


Every table has several system columns that are implicitly defined by the system. Therefore, these names
cannot be used as names of user-defined columns. (Note that these restrictions are separate from whether
the name is a key word or not; quoting a name will not allow you to escape these restrictions.) You do not
really need to be concerned about these columns, just know they exist.

oid

The object identifier (object ID) of a row. This is a serial number that is automatically added by
PostgreSQL to all table rows (unless the table was created using WITHOUT OIDS, in which case this
column is not present). This column is of type oid (same name as the column); see Section 8.12 for
more information about the type.
tableoid

The OID of the table containing this row. This column is particularly handy for queries that select
from inheritance hierarchies, since without it, it’s difficult to tell which individual table a row came
from. The tableoid can be joined against the oid column of pg_class to obtain the table name.

49
Chapter 5. Data Definition

xmin

The identity (transaction ID) of the inserting transaction for this row version. (A row version is an
individual state of a row; each update of a row creates a new row version for the same logical row.)
cmin

The command identifier (starting at zero) within the inserting transaction.


xmax

The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version. It
is possible for this column to be nonzero in a visible row version. That usually indicates that the
deleting transaction hasn’t committed yet, or that an attempted deletion was rolled back.
cmax

The command identifier within the deleting transaction, or zero.


ctid

The physical location of the row version within its table. Note that although the ctid can be used to
locate the row version very quickly, a row’s ctid will change each time it is updated or moved by
VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a
user-defined serial number, should be used to identify logical rows.

OIDs are 32-bit quantities and are assigned from a single cluster-wide counter. In a large or long-lived
database, it is possible for the counter to wrap around. Hence, it is bad practice to assume that OIDs are
unique, unless you take steps to ensure that this is the case. If you need to identify the rows in a table,
using a sequence generator is strongly recommended. However, OIDs can be used as well, provided that
a few additional precautions are taken:

• A unique constraint should be created on the OID column of each table for which the OID will be used
to identify rows.
• OIDs should never be assumed to be unique across tables; use the combination of tableoid and row
OID if you need a database-wide identifier.
• The tables in question should be created using WITH OIDS to ensure forward compatibility with future
releases of PostgreSQL. It is planned that WITHOUT OIDS will become the default.

Transaction identifiers are also 32-bit quantities. In a long-lived database it is possible for transaction IDs
to wrap around. This is not a fatal problem given appropriate maintenance procedures; see Chapter 21 for
details. It is unwise, however, to depend on the uniqueness of transaction IDs over the long term (more
than one billion transactions).
Command identifiers are also 32-bit quantities. This creates a hard limit of 232 (4 billion) SQL commands
within a single transaction. In practice this limit is not a problem — note that the limit is on number of
SQL commands, not number of rows processed.

50
Chapter 5. Data Definition

5.5. Inheritance
Let’s create two tables. The capitals table contains state capitals which are also cities. Naturally, the
capitals table should inherit from cities.

CREATE TABLE cities (


name text,
population float,
altitude int -- (in ft)
);

CREATE TABLE capitals (


state char(2)
) INHERITS (cities);

In this case, a row of capitals inherits all attributes (name, population, and altitude) from its parent, cities.
State capitals have an extra attribute, state, that shows their state. In PostgreSQL, a table can inherit from
zero or more other tables, and a query can reference either all rows of a table or all rows of a table plus all
of its descendants.

Note: The inheritance hierarchy is actually a directed acyclic graph.

For example, the following query finds the names of all cities, including state capitals, that are located at
an altitude over 500ft:

SELECT name, altitude


FROM cities
WHERE altitude > 500;

which returns:

name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953
Madison | 845

On the other hand, the following query finds all the cities that are not state capitals and are situated at an
altitude over 500ft:

SELECT name, altitude


FROM ONLY cities
WHERE altitude > 500;

name | altitude
-----------+----------
Las Vegas | 2174
Mariposa | 1953

51
Chapter 5. Data Definition

Here the “ONLY” before cities indicates that the query should be run over only cities and not tables below
cities in the inheritance hierarchy. Many of the commands that we have already discussed -- SELECT,
UPDATE and DELETE -- support this “ONLY” notation.

Deprecated: In previous versions of PostgreSQL, the default behavior was not to include child tables
in queries. This was found to be error prone and is also in violation of the SQL:1999 standard. Under
the old syntax, to get the sub-tables you append * to the table name. For example

SELECT * from cities*;

You can still explicitly specify scanning child tables by appending *, as well as explicitly specify not
scanning child tables by writing “ONLY”. But beginning in version 7.1, the default behavior for an
undecorated table name is to scan its child tables too, whereas before the default was not to do so.
To get the old default behavior, set the configuration option SQL_Inheritance to off, e.g.,

SET SQL_Inheritance TO OFF;

or add a line in your postgresql.conf file.

In some cases you may wish to know which table a particular row originated from. There is a system
column called tableoid in each table which can tell you the originating table:

SELECT c.tableoid, c.name, c.altitude


FROM cities c
WHERE c.altitude > 500;

which returns:

tableoid | name | altitude


----------+-----------+----------
139793 | Las Vegas | 2174
139793 | Mariposa | 1953
139798 | Madison | 845

(If you try to reproduce this example, you will probably get different numeric OIDs.) By doing a join with
pg_class you can see the actual table names:

SELECT p.relname, c.name, c.altitude


FROM cities c, pg_class p
WHERE c.altitude > 500 and c.tableoid = p.oid;

which returns:

relname | name | altitude


----------+-----------+----------
cities | Las Vegas | 2174
cities | Mariposa | 1953
capitals | Madison | 845

52
Chapter 5. Data Definition

A table can inherit from more than one parent table, in which case it has the union of the columns defined
by the parent tables (plus any columns declared specifically for the child table).
A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign
key constraints only apply to single tables, not to their inheritance children. This is true on both the
referencing and referenced sides of a foreign key constraint. Thus, in the terms of the above example:

• If we declared cities.name to be UNIQUE or a PRIMARY KEY, this would not stop the capitals
table from having rows with names duplicating rows in cities. And those duplicate rows would by
default show up in queries from cities. In fact, by default capitals would have no unique constraint
at all, and so could contain multiple rows with the same name. You could add a unique constraint to
capitals, but this would not prevent duplication compared to cities.

• Similarly, if we were to specify that cities.name REFERENCES some other table, this constraint would
not automatically propagate to capitals. In this case you could work around it by manually adding
the same REFERENCES constraint to capitals.
• Specifying that another table’s column REFERENCES cities(name) would allow the other table to
contain city names, but not capital names. There is no good workaround for this case.
These deficiencies will probably be fixed in some future release, but in the meantime considerable care is
needed in deciding whether inheritance is useful for your problem.

5.6. Modifying Tables


When you create a table and you realize that you made a mistake, or the requirements of the application
change, then you can drop the table and create it again. But this is not a convenient option if the table
is already filled with data, or if the table is referenced by other database objects (for instance a foreign
key constraint). Therefore PostgreSQL provides a family of commands to make modifications to existing
tables. Note that this is conceptually distinct from altering the data contained in the table: here we are
interested in altering the definition, or structure, of the table.
You can

• Add columns,
• Remove columns,
• Add constraints,
• Remove constraints,
• Change default values,
• Change column data types,
• Rename columns,
• Rename tables.
All these actions are performed using the ALTER TABLE command.

5.6.1. Adding a Column


To add a column, use a command like this:

ALTER TABLE products ADD COLUMN description text;

53
Chapter 5. Data Definition

The new column is initially filled with whatever default value is given (null if you don’t specify a DEFAULT
clause).
You can also define constraints on the column at the same time, using the usual syntax:

ALTER TABLE products ADD COLUMN description text CHECK (description <> ”);

In fact all the options that can be applied to a column description in CREATE TABLE can be used here.
Keep in mind however that the default value must satisfy the given constraints, or the ADD will fail.
Alternatively, you can add constraints later (see below) after you’ve filled in the new column correctly.

5.6.2. Removing a Column


To remove a column, use a command like this:

ALTER TABLE products DROP COLUMN description;

Whatever data was in the column disappears. Table constraints involving the column are dropped, too.
However, if the column is referenced by a foreign key constraint of another table, PostgreSQL will not
silently drop that constraint. You can authorize dropping everything that depends on the column by adding
CASCADE:

ALTER TABLE products DROP COLUMN description CASCADE;

See Section 5.10 for a description of the general mechanism behind this.

5.6.3. Adding a Constraint


To add a constraint, the table constraint syntax is used. For example:

ALTER TABLE products ADD CHECK (name <> ”);


ALTER TABLE products ADD CONSTRAINT some_name UNIQUE (product_no);
ALTER TABLE products ADD FOREIGN KEY (product_group_id) REFERENCES product_groups;

To add a not-null constraint, which cannot be written as a table constraint, use this syntax:

ALTER TABLE products ALTER COLUMN product_no SET NOT NULL;

The constraint will be checked immediately, so the table data must satisfy the constraint before it can be
added.

5.6.4. Removing a Constraint


To remove a constraint you need to know its name. If you gave it a name then that’s easy. Otherwise the
system assigned a generated name, which you need to find out. The psql command \d tablename can
be helpful here; other interfaces might also provide a way to inspect table details. Then the command is:

ALTER TABLE products DROP CONSTRAINT some_name;

54
Chapter 5. Data Definition

(If you are dealing with a generated constraint name like $2, don’t forget that you’ll need to double-quote
it to make it a valid identifier.)
As with dropping a column, you need to add CASCADE if you want to drop a constraint that something else
depends on. An example is that a foreign key constraint depends on a unique or primary key constraint on
the referenced column(s).
This works the same for all constraint types except not-null constraints. To drop a not null constraint use

ALTER TABLE products ALTER COLUMN product_no DROP NOT NULL;

(Recall that not-null constraints do not have names.)

5.6.5. Changing a Column’s Default Value


To set a new default for a column, use a command like this:

ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77;

Note that this doesn’t affect any existing rows in the table, it just changes the default for future INSERT
commands.
To remove any default value, use

ALTER TABLE products ALTER COLUMN price DROP DEFAULT;

This is effectively the same as setting the default to null. As a consequence, it is not an error to drop a
default where one hadn’t been defined, because the default is implicitly the null value.

5.6.6. Changing a Column’s Data Type


To convert a column to a different data type, use a command like this:

ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2);

This will succeed only if each existing entry in the column can be converted to the new type by an implicit
cast. If a more complex conversion is needed, you can add a USING clause that specifies how to compute
the new values from the old.
PostgreSQL will attempt to convert the column’s default value (if any) to the new type, as well as any
constraints that involve the column. But these conversions may fail, or may produce surprising results.
It’s often best to drop any constraints on the column before altering its type, and then add back suitably
modified constraints afterwards.

5.6.7. Renaming a Column


To rename a column:

ALTER TABLE products RENAME COLUMN product_no TO product_number;

55
Chapter 5. Data Definition

5.6.8. Renaming a Table


To rename a table:

ALTER TABLE products RENAME TO items;

5.7. Privileges
When you create a database object, you become its owner. By default, only the owner of an object can
do anything with the object. In order to allow other users to use it, privileges must be granted. (However,
users that have the superuser attribute can always access any object.)
There are several different privileges: SELECT, INSERT, UPDATE, DELETE, RULE, REFERENCES,
TRIGGER, CREATE, TEMPORARY, EXECUTE, and USAGE. The privileges applicable to a particular object
vary depending on the object’s type (table, function, etc). For complete information on the different types
of privileges supported by PostgreSQL, refer to the GRANT reference page. The following sections and
chapters will also show you how those privileges are used.
The right to modify or destroy an object is always the privilege of the owner only.

Note: To change the owner of a table, index, sequence, or view, use the ALTER TABLE command.
There are corresponding ALTER commands for other object types.

To assign privileges, the GRANT command is used. For example, if joe is an existing user, and accounts
is an existing table, the privilege to update the table can be granted with

GRANT UPDATE ON accounts TO joe;

To grant a privilege to a group, use this syntax:

GRANT SELECT ON accounts TO GROUP staff;

The special “user” name PUBLIC can be used to grant a privilege to every user on the system. Writing
ALL in place of a specific privilege grants all privileges that are relevant for the object type.

To revoke a privilege, use the fittingly named REVOKE command:

REVOKE ALL ON accounts FROM PUBLIC;

The special privileges of the object owner (i.e., the right to do DROP, GRANT, REVOKE, etc.) are always
implicit in being the owner, and cannot be granted or revoked. But the object owner can choose to revoke
his own ordinary privileges, for example to make a table read-only for himself as well as others.
Ordinarily, only the object’s owner (or a superuser) can grant or revoke privileges on an object. However,
it is possible to grant a privilege “with grant option”, which gives the recipient the right to grant it in
turn to others. If the grant option is subsequently revoked then all who received the privilege from that
recipient (directly or through a chain of grants) will lose the privilege. For details see the GRANT and
REVOKE reference pages.

56
Chapter 5. Data Definition

5.8. Schemas
A PostgreSQL database cluster contains one or more named databases. Users and groups of users are
shared across the entire cluster, but no other data is shared across databases. Any given client connection
to the server can access only the data in a single database, the one specified in the connection request.

Note: Users of a cluster do not necessarily have the privilege to access every database in the cluster.
Sharing of user names means that there cannot be different users named, say, joe in two databases in
the same cluster; but the system can be configured to allow joe access to only some of the databases.

A database contains one or more named schemas, which in turn contain tables. Schemas also contain
other kinds of named objects, including data types, functions, and operators. The same object name can
be used in different schemas without conflict; for example, both schema1 and myschema may contain
tables named mytable. Unlike databases, schemas are not rigidly separated: a user may access objects in
any of the schemas in the database he is connected to, if he has privileges to do so.
There are several reasons why one might want to use schemas:

• To allow many users to use one database without interfering with each other.
• To organize database objects into logical groups to make them more manageable.
• Third-party applications can be put into separate schemas so they cannot collide with the names of
other objects.
Schemas are analogous to directories at the operating system level, except that schemas cannot be nested.

5.8.1. Creating a Schema


To create a schema, use the command CREATE SCHEMA. Give the schema a name of your choice. For
example:

CREATE SCHEMA myschema;

To create or access objects in a schema, write a qualified name consisting of the schema name and table
name separated by a dot:

schema.table

This works anywhere a table name is expected, including the table modification commands and the data
access commands discussed in the following chapters. (For brevity we will speak of tables only, but the
same ideas apply to other kinds of named objects, such as types and functions.)
Actually, the even more general syntax

database.schema.table

can be used too, but at present this is just for pro forma compliance with the SQL standard. If you write a
database name, it must be the same as the database you are connected to.
So to create a table in the new schema, use

57
Chapter 5. Data Definition

CREATE TABLE myschema.mytable (


...
);

To drop a schema if it’s empty (all objects in it have been dropped), use

DROP SCHEMA myschema;

To drop a schema including all contained objects, use

DROP SCHEMA myschema CASCADE;

See Section 5.10 for a description of the general mechanism behind this.
Often you will want to create a schema owned by someone else (since this is one of the ways to restrict
the activities of your users to well-defined namespaces). The syntax for that is:

CREATE SCHEMA schemaname AUTHORIZATION username;

You can even omit the schema name, in which case the schema name will be the same as the user name.
See Section 5.8.6 for how this can be useful.
Schema names beginning with pg_ are reserved for system purposes and may not be created by users.

5.8.2. The Public Schema


In the previous sections we created tables without specifying any schema names. By default, such tables
(and other objects) are automatically put into a schema named “public”. Every new database contains such
a schema. Thus, the following are equivalent:

CREATE TABLE products ( ... );

and

CREATE TABLE public.products ( ... );

5.8.3. The Schema Search Path


Qualified names are tedious to write, and it’s often best not to wire a particular schema name into applica-
tions anyway. Therefore tables are often referred to by unqualified names, which consist of just the table
name. The system determines which table is meant by following a search path, which is a list of schemas
to look in. The first matching table in the search path is taken to be the one wanted. If there is no match in
the search path, an error is reported, even if matching table names exist in other schemas in the database.
The first schema named in the search path is called the current schema. Aside from being the first schema
searched, it is also the schema in which new tables will be created if the CREATE TABLE command does
not specify a schema name.
To show the current search path, use the following command:

58
Chapter 5. Data Definition

SHOW search_path;

In the default setup this returns:

search_path
--------------
$user,public

The first element specifies that a schema with the same name as the current user is to be searched. If no
such schema exists, the entry is ignored. The second element refers to the public schema that we have
seen already.
The first schema in the search path that exists is the default location for creating new objects. That is
the reason that by default objects are created in the public schema. When objects are referenced in any
other context without schema qualification (table modification, data modification, or query commands)
the search path is traversed until a matching object is found. Therefore, in the default configuration, any
unqualified access again can only refer to the public schema.
To put our new schema in the path, we use

SET search_path TO myschema,public;

(We omit the $user here because we have no immediate need for it.) And then we can access the table
without schema qualification:

DROP TABLE mytable;

Also, since myschema is the first element in the path, new objects would by default be created in it.
We could also have written

SET search_path TO myschema;

Then we no longer have access to the public schema without explicit qualification. There is nothing special
about the public schema except that it exists by default. It can be dropped, too.
See also Section 9.19 for other ways to manipulate the schema search path.
The search path works in the same way for data type names, function names, and operator names as it
does for table names. Data type and function names can be qualified in exactly the same way as table
names. If you need to write a qualified operator name in an expression, there is a special provision: you
must write

OPERATOR(schema.operator )

This is needed to avoid syntactic ambiguity. An example is

SELECT 3 OPERATOR(pg_catalog.+) 4;

In practice one usually relies on the search path for operators, so as not to have to write anything so ugly
as that.

59
Chapter 5. Data Definition

5.8.4. Schemas and Privileges


By default, users cannot access any objects in schemas they do not own. To allow that, the owner of the
schema needs to grant the USAGE privilege on the schema. To allow users to make use of the objects in the
schema, additional privileges may need to be granted, as appropriate for the object.
A user can also be allowed to create objects in someone else’s schema. To allow that, the CREATE privilege
on the schema needs to be granted. Note that by default, everyone has CREATE and USAGE privileges on
the schema public. This allows all users that are able to connect to a given database to create objects in
its public schema. If you do not want to allow that, you can revoke that privilege:

REVOKE CREATE ON SCHEMA public FROM PUBLIC;

(The first “public” is the schema, the second “public” means “every user”. In the first sense it is an
identifier, in the second sense it is a key word, hence the different capitalization; recall the guidelines
from Section 4.1.1.)

5.8.5. The System Catalog Schema


In addition to public and user-created schemas, each database contains a pg_catalog schema, which
contains the system tables and all the built-in data types, functions, and operators. pg_catalog is always
effectively part of the search path. If it is not named explicitly in the path then it is implicitly searched
before searching the path’s schemas. This ensures that built-in names will always be findable. However,
you may explicitly place pg_catalog at the end of your search path if you prefer to have user-defined
names override built-in names.
In PostgreSQL versions before 7.3, table names beginning with pg_ were reserved. This is no longer true:
you may create such a table name if you wish, in any non-system schema. However, it’s best to continue
to avoid such names, to ensure that you won’t suffer a conflict if some future version defines a system
table named the same as your table. (With the default search path, an unqualified reference to your table
name would be resolved as the system table instead.) System tables will continue to follow the convention
of having names beginning with pg_, so that they will not conflict with unqualified user-table names so
long as users avoid the pg_ prefix.

5.8.6. Usage Patterns


Schemas can be used to organize your data in many ways. There are a few usage patterns that are recom-
mended and are easily supported by the default configuration:

• If you do not create any schemas then all users access the public schema implicitly. This simulates the
situation where schemas are not available at all. This setup is mainly recommended when there is only
a single user or a few cooperating users in a database. This setup also allows smooth transition from the
non-schema-aware world.
• You can create a schema for each user with the same name as that user. Recall that the default search
path starts with $user, which resolves to the user name. Therefore, if each user has a separate schema,
they access their own schemas by default.
If you use this setup then you might also want to revoke access to the public schema (or drop it alto-
gether), so users are truly constrained to their own schemas.

60
Chapter 5. Data Definition

• To install shared applications (tables to be used by everyone, additional functions provided by third
parties, etc.), put them into separate schemas. Remember to grant appropriate privileges to allow the
other users to access them. Users can then refer to these additional objects by qualifying the names with
a schema name, or they can put the additional schemas into their search path, as they choose.

5.8.7. Portability
In the SQL standard, the notion of objects in the same schema being owned by different users does not
exist. Moreover, some implementations do not allow you to create schemas that have a different name
than their owner. In fact, the concepts of schema and user are nearly equivalent in a database system
that implements only the basic schema support specified in the standard. Therefore, many users consider
qualified names to really consist of username.tablename. This is how PostgreSQL will effectively
behave if you create a per-user schema for every user.
Also, there is no concept of a public schema in the SQL standard. For maximum conformance to the
standard, you should not use (perhaps even remove) the public schema.
Of course, some SQL database systems might not implement schemas at all, or provide namespace sup-
port by allowing (possibly limited) cross-database access. If you need to work with those systems, then
maximum portability would be achieved by not using schemas at all.

5.9. Other Database Objects


Tables are the central objects in a relational database structure, because they hold your data. But they are
not the only objects that exist in a database. Many other kinds of objects can be created to make the use
and management of the data more efficient or convenient. They are not discussed in this chapter, but we
give you a list here so that you are aware of what is possible.

• Views
• Functions and operators
• Data types and domains
• Triggers and rewrite rules
Detailed information on these topics appears in Part V.

5.10. Dependency Tracking


When you create complex database structures involving many tables with foreign key constraints, views,
triggers, functions, etc. you will implicitly create a net of dependencies between the objects. For instance,
a table with a foreign key constraint depends on the table it references.
To ensure the integrity of the entire database structure, PostgreSQL makes sure that you cannot drop
objects that other objects still depend on. For example, attempting to drop the products table we had

61
Chapter 5. Data Definition

considered in Section 5.3.5, with the orders table depending on it, would result in an error message such
as this:

DROP TABLE products;

NOTICE: constraint orders_product_no_fkey on table orders depends on table products


ERROR: cannot drop table products because other objects depend on it
HINT: Use DROP ... CASCADE to drop the dependent objects too.

The error message contains a useful hint: if you do not want to bother deleting all the dependent objects
individually, you can run

DROP TABLE products CASCADE;

and all the dependent objects will be removed. In this case, it doesn’t remove the orders table, it only
removes the foreign key constraint. (If you want to check what DROP ... CASCADE will do, run DROP
without CASCADE and read the NOTICE messages.)
All drop commands in PostgreSQL support specifying CASCADE. Of course, the nature of the possible
dependencies varies with the type of the object. You can also write RESTRICT instead of CASCADE to get
the default behavior, which is to prevent drops of objects that other objects depend on.

Note: According to the SQL standard, specifying either RESTRICT or CASCADE is required. No database
system actually enforces that rule, but whether the default behavior is RESTRICT or CASCADE varies
across systems.

Note: Foreign key constraint dependencies and serial column dependencies from PostgreSQL ver-
sions prior to 7.3 are not maintained or created during the upgrade process. All other dependency
types will be properly created during an upgrade from a pre-7.3 database.

62
Chapter 6. Data Manipulation
The previous chapter discussed how to create tables and other structures to hold your data. Now it is
time to fill the tables with data. This chapter covers how to insert, update, and delete table data. We also
introduce ways to effect automatic data changes when certain events occur: triggers and rewrite rules. The
chapter after this will finally explain how to extract your long-lost data back out of the database.

6.1. Inserting Data


When a table is created, it contains no data. The first thing to do before a database can be of much use is
to insert data. Data is conceptually inserted one row at a time. Of course you can also insert more than one
row, but there is no way to insert less than one row at a time. Even if you know only some column values,
a complete row must be created.
To create a new row, use the INSERT command. The command requires the table name and a value for
each of the columns of the table. For example, consider the products table from Chapter 5:

CREATE TABLE products (


product_no integer,
name text,
price numeric
);

An example command to insert a row would be:

INSERT INTO products VALUES (1, ’Cheese’, 9.99);

The data values are listed in the order in which the columns appear in the table, separated by commas.
Usually, the data values will be literals (constants), but scalar expressions are also allowed.
The above syntax has the drawback that you need to know the order of the columns in the table. To avoid
that you can also list the columns explicitly. For example, both of the following commands have the same
effect as the one above:

INSERT INTO products (product_no, name, price) VALUES (1, ’Cheese’, 9.99);
INSERT INTO products (name, price, product_no) VALUES (’Cheese’, 9.99, 1);

Many users consider it good practice to always list the column names.
If you don’t have values for all the columns, you can omit some of them. In that case, the columns will be
filled with their default values. For example,

INSERT INTO products (product_no, name) VALUES (1, ’Cheese’);


INSERT INTO products VALUES (1, ’Cheese’);

The second form is a PostgreSQL extension. It fills the columns from the left with as many values as are
given, and the rest will be defaulted.
For clarity, you can also request default values explicitly, for individual columns or for the entire row:

INSERT INTO products (product_no, name, price) VALUES (1, ’Cheese’, DEFAULT);
INSERT INTO products DEFAULT VALUES;

63
Chapter 6. Data Manipulation

Tip: To do “bulk loads”, that is, inserting a lot of data, take a look at the COPY command. It is not as
flexible as the INSERT command, but is more efficient.

6.2. Updating Data


The modification of data that is already in the database is referred to as updating. You can update individual
rows, all the rows in a table, or a subset of all rows. Each column can be updated separately; the other
columns are not affected.
To perform an update, you need three pieces of information:

1. The name of the table and column to update,


2. The new value of the column,
3. Which row(s) to update.

Recall from Chapter 5 that SQL does not, in general, provide a unique identifier for rows. Therefore it is
not necessarily possible to directly specify which row to update. Instead, you specify which conditions
a row must meet in order to be updated. Only if you have a primary key in the table (no matter whether
you declared it or not) can you reliably address individual rows, by choosing a condition that matches the
primary key. Graphical database access tools rely on this fact to allow you to update rows individually.
For example, this command updates all products that have a price of 5 to have a price of 10:

UPDATE products SET price = 10 WHERE price = 5;

This may cause zero, one, or many rows to be updated. It is not an error to attempt an update that does not
match any rows.
Let’s look at that command in detail. First is the key word UPDATE followed by the table name. As usual,
the table name may be schema-qualified, otherwise it is looked up in the path. Next is the key word SET
followed by the column name, an equals sign and the new column value. The new column value can be
any scalar expression, not just a constant. For example, if you want to raise the price of all products by
10% you could use:

UPDATE products SET price = price * 1.10;

As you see, the expression for the new value can refer to the existing value(s) in the row. We also left
out the WHERE clause. If it is omitted, it means that all rows in the table are updated. If it is present, only
those rows that match the WHERE condition are updated. Note that the equals sign in the SET clause is an
assignment while the one in the WHERE clause is a comparison, but this does not create any ambiguity. Of
course, the WHERE condition does not have to be an equality test. Many other operators are available (see
Chapter 9). But the expression needs to evaluate to a Boolean result.
You can update more than one column in an UPDATE command by listing more than one assignment in
the SET clause. For example:

64
Chapter 6. Data Manipulation

UPDATE mytable SET a = 5, b = 3, c = 1 WHERE a > 0;

6.3. Deleting Data


So far we have explained how to add data to tables and how to change data. What remains is to discuss how
to remove data that is no longer needed. Just as adding data is only possible in whole rows, you can only
remove entire rows from a table. In the previous section we explained that SQL does not provide a way
to directly address individual rows. Therefore, removing rows can only be done by specifying conditions
that the rows to be removed have to match. If you have a primary key in the table then you can specify the
exact row. But you can also remove groups of rows matching a condition, or you can remove all rows in
the table at once.
You use the DELETE command to remove rows; the syntax is very similar to the UPDATE command. For
instance, to remove all rows from the products table that have a price of 10, use

DELETE FROM products WHERE price = 10;

If you simply write

DELETE FROM products;

then all rows in the table will be deleted! Caveat programmer.

65
Chapter 7. Queries
The previous chapters explained how to create tables, how to fill them with data, and how to manipulate
that data. Now we finally discuss how to retrieve the data out of the database.

7.1. Overview
The process of retrieving or the command to retrieve data from a database is called a query. In SQL the
SELECT command is used to specify queries. The general syntax of the SELECT command is

SELECT select_list FROM table_expression [sort_specification]

The following sections describe the details of the select list, the table expression, and the sort specification.
The simplest kind of query has the form

SELECT * FROM table1;

Assuming that there is a table called table1, this command would retrieve all rows and all columns from
table1. (The method of retrieval depends on the client application. For example, the psql program will
display an ASCII-art table on the screen, while client libraries will offer functions to extract individual
values from the query result.) The select list specification * means all columns that the table expression
happens to provide. A select list can also select a subset of the available columns or make calculations
using the columns. For example, if table1 has columns named a, b, and c (and perhaps others) you can
make the following query:

SELECT a, b + c FROM table1;

(assuming that b and c are of a numerical data type). See Section 7.3 for more details.
FROM table1 is a particularly simple kind of table expression: it reads just one table. In general, table
expressions can be complex constructs of base tables, joins, and subqueries. But you can also omit the
table expression entirely and use the SELECT command as a calculator:

SELECT 3 * 4;

This is more useful if the expressions in the select list return varying results. For example, you could call
a function this way:

SELECT random();

7.2. Table Expressions


A table expression computes a table. The table expression contains a FROM clause that is optionally fol-
lowed by WHERE, GROUP BY, and HAVING clauses. Trivial table expressions simply refer to a table on
disk, a so-called base table, but more complex expressions can be used to modify or combine base tables
in various ways.

66
Chapter 7. Queries

The optional WHERE, GROUP BY, and HAVING clauses in the table expression specify a pipeline of succes-
sive transformations performed on the table derived in the FROM clause. All these transformations produce
a virtual table that provides the rows that are passed to the select list to compute the output rows of the
query.

7.2.1. The FROM Clause


The FROM Clause derives a table from one or more other tables given in a comma-separated table refer-
ence list.

FROM table_reference [, table_reference [, ...]]

A table reference may be a table name (possibly schema-qualified), or a derived table such as a subquery,
a table join, or complex combinations of these. If more than one table reference is listed in the FROM
clause they are cross-joined (see below) to form the intermediate virtual table that may then be subject to
transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table
expression.
When a table reference names a table that is the supertable of a table inheritance hierarchy, the table
reference produces rows of not only that table but all of its subtable successors, unless the key word ONLY
precedes the table name. However, the reference produces only the columns that appear in the named table
— any columns added in subtables are ignored.

7.2.1.1. Joined Tables


A joined table is a table derived from two other (real or derived) tables according to the rules of the
particular join type. Inner, outer, and cross-joins are available.

Join Types
Cross join
T1 CROSS JOIN T2

For each combination of rows from T1 and T2, the derived table will contain a row consisting of
all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the
joined table will have N * M rows.
FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2. It is also equivalent to FROM T1 INNER
JOIN T2 ON TRUE (see below).
Qualified joins
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 ON boolean_expression
T1 { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2 USING ( join column list )
T1 NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN T2

The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT, and FULL
imply an outer join.
The join condition is specified in the ON or USING clause, or implicitly by the word NATURAL. The
join condition determines which rows from the two source tables are considered to “match”, as
explained in detail below.

67
Chapter 7. Queries

The ON clause is the most general kind of join condition: it takes a Boolean value expression of the
same kind as is used in a WHERE clause. A pair of rows from T1 and T2 match if the ON expression
evaluates to true for them.
USING is a shorthand notation: it takes a comma-separated list of column names, which the joined
tables must have in common, and forms a join condition specifying equality of each of these pairs
of columns. Furthermore, the output of a JOIN USING has one column for each of the equated pairs
of input columns, followed by all of the other columns from each table. Thus, USING (a, b, c)
is equivalent to ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) with the exception
that if ON is used there will be two columns a, b, and c in the result, whereas with USING there will
be only one of each.
Finally, NATURAL is a shorthand form of USING: it forms a USING list consisting of exactly those
column names that appear in both input tables. As with USING, these columns appear only once in
the output table.
The possible types of qualified join are:

INNER JOIN

For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join
condition with R1.
LEFT OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition
with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined
table unconditionally has at least one row for each row in T1.
RIGHT OUTER JOIN

First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition
with any row in T1, a joined row is added with null values in columns of T1. This is the converse
of a left join: the result table will unconditionally have a row for each row in T2.
FULL OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition
with any row in T2, a joined row is added with null values in columns of T2. Also, for each row
of T2 that does not satisfy the join condition with any row in T1, a joined row with null values
in the columns of T1 is added.

Joins of all types can be chained together or nested: either or both of T1 and T2 may be joined tables.
Parentheses may be used around JOIN clauses to control the join order. In the absence of parentheses,
JOIN clauses nest left-to-right.
To put this together, assume we have tables t1

num | name
-----+------
1 | a
2 | b

68
Chapter 7. Queries

3 | c

and t2

num | value
-----+-------
1 | xxx
3 | yyy
5 | zzz

then we get the following results for the various joins:

=> SELECT * FROM t1 CROSS JOIN t2;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
1 | a | 3 | yyy
1 | a | 5 | zzz
2 | b | 1 | xxx
2 | b | 3 | yyy
2 | b | 5 | zzz
3 | c | 1 | xxx
3 | c | 3 | yyy
3 | c | 5 | zzz
(9 rows)

=> SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
3 | c | 3 | yyy
(2 rows)

=> SELECT * FROM t1 INNER JOIN t2 USING (num);


num | name | value
-----+------+-------
1 | a | xxx
3 | c | yyy
(2 rows)

=> SELECT * FROM t1 NATURAL INNER JOIN t2;


num | name | value
-----+------+-------
1 | a | xxx
3 | c | yyy
(2 rows)

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | 3 | yyy

69
Chapter 7. Queries

(3 rows)

=> SELECT * FROM t1 LEFT JOIN t2 USING (num);


num | name | value
-----+------+-------
1 | a | xxx
2 | b |
3 | c | yyy
(3 rows)

=> SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
3 | c | 3 | yyy
| | 5 | zzz
(3 rows)

=> SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;


num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | 3 | yyy
| | 5 | zzz
(4 rows)

The join condition specified with ON can also contain conditions that do not relate directly to the join. This
can prove useful for some queries but needs to be thought out carefully. For example:

=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = ’xxx’;
num | name | num | value
-----+------+-----+-------
1 | a | 1 | xxx
2 | b | |
3 | c | |
(3 rows)

7.2.1.2. Table and Column Aliases


A temporary name can be given to tables and complex table references to be used for references to the
derived table in the rest of the query. This is called a table alias.
To create a table alias, write

FROM table_reference AS alias

or

70
Chapter 7. Queries

FROM table_reference alias

The AS key word is noise. alias can be any identifier.


A typical application of table aliases is to assign short identifiers to long table names to keep the join
clauses readable. For example:

SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.nu

The alias becomes the new name of the table reference for the current query — it is no longer possible to
refer to the table by the original name. Thus

SELECT * FROM my_table AS m WHERE my_table.a > 5;

is not valid SQL syntax. What will actually happen (this is a PostgreSQL extension to the standard) is that
an implicit table reference is added to the FROM clause, so the query is processed as if it were written as

SELECT * FROM my_table AS m, my_table AS my_table WHERE my_table.a > 5;

which will result in a cross join, which is usually not what you want.
Table aliases are mainly for notational convenience, but it is necessary to use them when joining a table
to itself, e.g.,

SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...

Additionally, an alias is required if the table reference is a subquery (see Section 7.2.1.3).
Parentheses are used to resolve ambiguities. The following statement will assign the alias b to the result
of the join, unlike the previous example:

SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...

Another form of table aliasing gives temporary names to the columns of the table, as well as the table
itself:

FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )

If fewer column aliases are specified than the actual table has columns, the remaining columns are not
renamed. This syntax is especially useful for self-joins or subqueries.
When an alias is applied to the output of a JOIN clause, using any of these forms, the alias hides the
original names within the JOIN. For example,

SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...

is valid SQL, but

SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c

is not valid: the table alias a is not visible outside the alias c.

71
Chapter 7. Queries

7.2.1.3. Subqueries
Subqueries specifying a derived table must be enclosed in parentheses and must be assigned a table alias
name. (See Section 7.2.1.2.) For example:

FROM (SELECT * FROM table1) AS alias_name

This example is equivalent to FROM table1 AS alias_name. More interesting cases, which can’t be
reduced to a plain join, arise when the subquery involves grouping or aggregation.

7.2.1.4. Table Functions


Table functions are functions that produce a set of rows, made up of either base data types (scalar types)
or composite data types (table rows). They are used like a table, view, or subquery in the FROM clause of
a query. Columns returned by table functions may be included in SELECT, JOIN, or WHERE clauses in the
same manner as a table, view, or subquery column.
If a table function returns a base data type, the single result column is named like the function. If the
function returns a composite type, the result columns get the same names as the individual attributes of
the type.
A table function may be aliased in the FROM clause, but it also may be left unaliased. If a function is used
in the FROM clause with no alias, the function name is used as the resulting table name.
Some examples:

CREATE TABLE foo (fooid int, foosubid int, fooname text);

CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$


SELECT * FROM foo WHERE fooid = $1;
$$ LANGUAGE SQL;

SELECT * FROM getfoo(1) AS t1;

SELECT * FROM foo


WHERE foosubid IN (select foosubid from getfoo(foo.fooid) z
where z.fooid = foo.fooid);

CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);

SELECT * FROM vw_getfoo;

In some cases it is useful to define table functions that can return different column sets depending on how
they are invoked. To support this, the table function can be declared as returning the pseudotype record.
When such a function is used in a query, the expected row structure must be specified in the query itself,
so that the system can know how to parse and plan the query. Consider this example:

SELECT *
FROM dblink(’dbname=mydb’, ’select proname, prosrc from pg_proc’)
AS t1(proname name, prosrc text)

72
Chapter 7. Queries

WHERE proname LIKE ’bytea%’;

The dblink function executes a remote query (see contrib/dblink). It is declared to return record
since it might be used for any kind of query. The actual column set must be specified in the calling query
so that the parser knows, for example, what * should expand to.

7.2.2. The WHERE Clause


The syntax of the WHERE Clause is

WHERE search_condition

where search_condition is any value expression (see Section 4.2) that returns a value of type
boolean.

After the processing of the FROM clause is done, each row of the derived virtual table is checked against
the search condition. If the result of the condition is true, the row is kept in the output table, otherwise
(that is, if the result is false or null) it is discarded. The search condition typically references at least some
column of the table generated in the FROM clause; this is not required, but otherwise the WHERE clause will
be fairly useless.

Note: The join condition of an inner join can be written either in the WHERE clause or in the JOIN clause.
For example, these table expressions are equivalent:

FROM a, b WHERE a.id = b.id AND b.val > 5

and

FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5

or perhaps even

FROM a NATURAL JOIN b WHERE b.val > 5

Which one of these you use is mainly a matter of style. The JOIN syntax in the FROM clause is probably
not as portable to other SQL database management systems. For outer joins there is no choice in any
case: they must be done in the FROM clause. An ON/USING clause of an outer join is not equivalent to
a WHERE condition, because it determines the addition of rows (for unmatched input rows) as well as
the removal of rows from the final result.

Here are some examples of WHERE clauses:

SELECT ... FROM fdt WHERE c1 > 5

SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)

SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)

SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)

73
Chapter 7. Queries

SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100

SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)

fdt is the table derived in the FROM clause. Rows that do not meet the search condition of the WHERE
clause are eliminated from fdt. Notice the use of scalar subqueries as value expressions. Just like any
other query, the subqueries can employ complex table expressions. Notice also how fdt is referenced in
the subqueries. Qualifying c1 as fdt.c1 is only necessary if c1 is also the name of a column in the derived
input table of the subquery. But qualifying the column name adds clarity even when it is not needed. This
example shows how the column naming scope of an outer query extends into its inner queries.

7.2.3. The GROUP BY and HAVING Clauses


After passing the WHERE filter, the derived input table may be subject to grouping, using the GROUP BY
clause, and elimination of group rows using the HAVING clause.

SELECT select_list
FROM ...
[WHERE ...]
GROUP BY grouping_column_reference [, grouping_column_reference]...

The GROUP BY Clause is used to group together those rows in a table that share the same values in all
the columns listed. The order in which the columns are listed does not matter. The effect is to combine
each set of rows sharing common values into one group row that is representative of all rows in the group.
This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups.
For instance:

=> SELECT * FROM test1;


x | y
---+---
a | 3
c | 2
b | 5
a | 1
(4 rows)

=> SELECT x FROM test1 GROUP BY x;


x
---
a
b
c
(3 rows)

In the second query, we could not have written SELECT * FROM test1 GROUP BY x, because there is
no single value for the column y that could be associated with each group. The grouped-by columns can
be referenced in the select list since they have a single value in each group.
In general, if a table is grouped, columns that are not used in the grouping cannot be referenced except in
aggregate expressions. An example with aggregate expressions is:

74
Chapter 7. Queries

=> SELECT x, sum(y) FROM test1 GROUP BY x;


x | sum
---+-----
a | 4
b | 5
c | 2
(3 rows)

Here sum is an aggregate function that computes a single value over the entire group. More information
about the available aggregate functions can be found in Section 9.15.

Tip: Grouping without aggregate expressions effectively calculates the set of distinct values in a col-
umn. This can also be achieved using the DISTINCT clause (see Section 7.3.3).

Here is another example: it calculates the total sales for each product (rather than the total sales on all
products).

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales


FROM products p LEFT JOIN sales s USING (product_id)
GROUP BY product_id, p.name, p.price;

In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since
they are referenced in the query select list. (Depending on how exactly the products table is set up, name
and price may be fully dependent on the product ID, so the additional groupings could theoretically be
unnecessary, but this is not implemented yet.) The column s.units does not have to be in the GROUP BY
list since it is only used in an aggregate expression (sum(...)), which represents the sales of a product.
For each product, the query returns a summary row about all sales of the product.
In strict SQL, GROUP BY can only group by columns of the source table but PostgreSQL extends this to
also allow GROUP BY to group by columns in the select list. Grouping by value expressions instead of
simple column names is also allowed.
If a table has been grouped using a GROUP BY clause, but then only certain groups are of interest, the
HAVING clause can be used, much like a WHERE clause, to eliminate groups from a grouped table. The
syntax is:

SELECT select_list FROM ... [WHERE ...] GROUP BY ... HAVING boolean_expression

Expressions in the HAVING clause can refer both to grouped expressions and to ungrouped expressions
(which necessarily involve an aggregate function).
Example:

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;


x | sum
---+-----
a | 4
b | 5
(2 rows)

=> SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < ’c’;


x | sum

75
Chapter 7. Queries

---+-----
a | 4
b | 5
(2 rows)

Again, a more realistic example:

SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit


FROM products p LEFT JOIN sales s USING (product_id)
WHERE s.date > CURRENT_DATE - INTERVAL ’4 weeks’
GROUP BY product_id, p.name, p.price, p.cost
HAVING sum(p.price * s.units) > 5000;

In the example above, the WHERE clause is selecting rows by a column that is not grouped (the expression
is only true for sales during the last four weeks), while the HAVING clause restricts the output to groups
with total gross sales over 5000. Note that the aggregate expressions do not necessarily need to be the
same in all parts of the query.

7.3. Select Lists


As shown in the previous section, the table expression in the SELECT command constructs an intermediate
virtual table by possibly combining tables, views, eliminating rows, grouping, etc. This table is finally
passed on to processing by the select list. The select list determines which columns of the intermediate
table are actually output.

7.3.1. Select-List Items


The simplest kind of select list is * which emits all columns that the table expression produces. Otherwise,
a select list is a comma-separated list of value expressions (as defined in Section 4.2). For instance, it could
be a list of column names:

SELECT a, b, c FROM ...

The columns names a, b, and c are either the actual names of the columns of tables referenced in the
FROM clause, or the aliases given to them as explained in Section 7.2.1.2. The name space available in the
select list is the same as in the WHERE clause, unless grouping is used, in which case it is the same as in
the HAVING clause.
If more than one table has a column of the same name, the table name must also be given, as in

SELECT tbl1.a, tbl2.a, tbl1.b FROM ...

When working with multiple tables, it can also be useful to ask for all the columns of a particular table:

SELECT tbl1.*, tbl2.a FROM ...

(See also Section 7.2.2.)

76
Chapter 7. Queries

If an arbitrary value expression is used in the select list, it conceptually adds a new virtual column to the
returned table. The value expression is evaluated once for each result row, with the row’s values substituted
for any column references. But the expressions in the select list do not have to reference any columns in the
table expression of the FROM clause; they could be constant arithmetic expressions as well, for instance.

7.3.2. Column Labels


The entries in the select list can be assigned names for further processing. The “further processing” in
this case is an optional sort specification and the client application (e.g., column headers for display). For
example:

SELECT a AS value, b + c AS sum FROM ...

If no output column name is specified using AS, the system assigns a default name. For simple column
references, this is the name of the referenced column. For function calls, this is the name of the function.
For complex expressions, the system will generate a generic name.

Note: The naming of output columns here is different from that done in the FROM clause (see Section
7.2.1.2). This pipeline will in fact allow you to rename the same column twice, but the name chosen in
the select list is the one that will be passed on.

7.3.3. DISTINCT
After the select list has been processed, the result table may optionally be subject to the elimination of
duplicate rows. The DISTINCT key word is written directly after SELECT to specify this:

SELECT DISTINCT select_list ...

(Instead of DISTINCT the key word ALL can be used to specify the default behavior of retaining all rows.)
Obviously, two rows are considered distinct if they differ in at least one column value. Null values are
considered equal in this comparison.
Alternatively, an arbitrary expression can determine what rows are to be considered distinct:

SELECT DISTINCT ON (expression [, expression ...]) select_list ...

Here expression is an arbitrary value expression that is evaluated for all rows. A set of rows for which
all the expressions are equal are considered duplicates, and only the first row of the set is kept in the
output. Note that the “first row” of a set is unpredictable unless the query is sorted on enough columns
to guarantee a unique ordering of the rows arriving at the DISTINCT filter. (DISTINCT ON processing
occurs after ORDER BY sorting.)
The DISTINCT ON clause is not part of the SQL standard and is sometimes considered bad style because
of the potentially indeterminate nature of its results. With judicious use of GROUP BY and subqueries in
FROM the construct can be avoided, but it is often the most convenient alternative.

77
Chapter 7. Queries

7.4. Combining Queries


The results of two queries can be combined using the set operations union, intersection, and difference.
The syntax is

query1 UNION [ALL] query2


query1 INTERSECT [ALL] query2
query1 EXCEPT [ALL] query2

query1 and query2 are queries that can use any of the features discussed up to this point. Set operations
can also be nested and chained, for example

query1 UNION query2 UNION query3

which really says

(query1 UNION query2) UNION query3

UNION effectively appends the result of query2 to the result of query1 (although there is no guarantee
that this is the order in which the rows are actually returned). Furthermore, it eliminates duplicate rows
from its result, in the same way as DISTINCT, unless UNION ALL is used.
INTERSECT returns all rows that are both in the result of query1 and in the result of query2. Duplicate
rows are eliminated unless INTERSECT ALL is used.
EXCEPT returns all rows that are in the result of query1 but not in the result of query2. (This is
sometimes called the difference between two queries.) Again, duplicates are eliminated unless EXCEPT
ALL is used.

In order to calculate the union, intersection, or difference of two queries, the two queries must be “union
compatible”, which means that they return the same number of columns and the corresponding columns
have compatible data types, as described in Section 10.5.

7.5. Sorting Rows


After a query has produced an output table (after the select list has been processed) it can optionally be
sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that
case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A
particular output ordering can only be guaranteed if the sort step is explicitly chosen.
The ORDER BY clause specifies the sort order:

SELECT select_list
FROM table_expression
ORDER BY column1 [ASC | DESC] [, column2 [ASC | DESC] ...]

column1, etc., refer to select list columns. These can be either the output name of a column (see Section
7.3.2) or the number of a column. Some examples:

SELECT a, b FROM table1 ORDER BY a;


SELECT a + b AS sum, c FROM table1 ORDER BY sum;

78
Chapter 7. Queries

SELECT a, sum(b) FROM table1 GROUP BY a ORDER BY 1;

As an extension to the SQL standard, PostgreSQL also allows ordering by arbitrary expressions:

SELECT a, b FROM table1 ORDER BY a + b;

References to column names of the FROM clause that are not present in the select list are also allowed:

SELECT a FROM table1 ORDER BY b;

But these extensions do not work in queries involving UNION, INTERSECT, or EXCEPT, and are not
portable to other SQL databases.
Each column specification may be followed by an optional ASC or DESC to set the sort direction to ascend-
ing or descending. ASC order is the default. Ascending order puts smaller values first, where “smaller” is
defined in terms of the < operator. Similarly, descending order is determined with the > operator. 1
If more than one sort column is specified, the later entries are used to sort rows that are equal under the
order imposed by the earlier sort columns.

7.6. LIMIT and OFFSET


LIMIT and OFFSET allow you to retrieve just a portion of the rows that are generated by the rest of the
query:

SELECT select_list
FROM table_expression
[LIMIT { number | ALL }] [OFFSET number]

If a limit count is given, no more than that many rows will be returned (but possibly less, if the query itself
yields less rows). LIMIT ALL is the same as omitting the LIMIT clause.
OFFSET says to skip that many rows before beginning to return rows. OFFSET 0 is the same as omitting
the OFFSET clause. If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to
count the LIMIT rows that are returned.
When using LIMIT, it is important to use an ORDER BY clause that constrains the result rows into a unique
order. Otherwise you will get an unpredictable subset of the query’s rows. You may be asking for the tenth
through twentieth rows, but tenth through twentieth in what ordering? The ordering is unknown, unless
you specified ORDER BY.
The query optimizer takes LIMIT into account when generating a query plan, so you are very likely to get
different plans (yielding different row orders) depending on what you give for LIMIT and OFFSET. Thus,
using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent
results unless you enforce a predictable result ordering with ORDER BY. This is not a bug; it is an inherent

1. Actually, PostgreSQL uses the default B-tree operator class for the column’s data type to determine the sort ordering for ASC
and DESC. Conventionally, data types will be set up so that the < and > operators correspond to this sort ordering, but a user-defined
data type’s designer could choose to do something different.

79
Chapter 7. Queries

consequence of the fact that SQL does not promise to deliver the results of a query in any particular order
unless ORDER BY is used to constrain the order.
The rows skipped by an OFFSET clause still have to be computed inside the server; therefore a large
OFFSET can be inefficient.

80
Chapter 8. Data Types
PostgreSQL has a rich set of native data types available to users. Users may add new types to PostgreSQL
using the CREATE TYPE command.
Table 8-1 shows all the built-in general-purpose data types. Most of the alternative names listed in the
“Aliases” column are the names used internally by PostgreSQL for historical reasons. In addition, some
internally used or deprecated types are available, but they are not listed here.

Table 8-1. Data Types

Name Aliases Description


bigint int8 signed eight-byte integer
bigserial serial8 autoincrementing eight-byte
integer
bit [ (n) ] fixed-length bit string
bit varying [ (n) ] varbit variable-length bit string
boolean bool logical Boolean (true/false)
box rectangular box in the plane
bytea binary data (“byte array”)
character varying [ (n) ] varchar [ (n) ] variable-length character string
character [ (n) ] char [ (n) ] fixed-length character string
cidr IPv4 or IPv6 network address
circle circle in the plane
date calendar date (year, month, day)
double precision float8 double precision floating-point
number
inet IPv4 or IPv6 host address
integer int, int4 signed four-byte integer
interval [ (p) ] time span
line infinite line in the plane
lseg line segment in the plane
macaddr MAC address
money currency amount
numeric [ (p, s) ] decimal [ (p, s) ] exact numeric of selectable
precision
path geometric path in the plane
point geometric point in the plane
polygon closed geometric path in the plane
real float4 single precision floating-point
number
smallint int2 signed two-byte integer

81
Chapter 8. Data Types

Name Aliases Description


serial serial4 autoincrementing four-byte integer

text variable-length character string


time [ (p) ] [ without time of day
time zone ]
time [ (p) ] with time timetz time of day, including time zone
zone
timestamp [ (p) ] [ date and time
without time zone ]
timestamp [ (p) ] with timestamptz date and time, including time zone
time zone

Compatibility: The following types (or spellings thereof) are specified by SQL: bit, bit varying,
boolean, char, character varying, character, varchar, date, double precision, integer,
interval, numeric, decimal, real, smallint, time (with or without time zone), timestamp (with or
without time zone).

Each data type has an external representation determined by its input and output functions. Many of the
built-in types have obvious external formats. However, several types are either unique to PostgreSQL,
such as geometric paths, or have several possibilities for formats, such as the date and time types. Some
of the input and output functions are not invertible. That is, the result of an output function may lose
accuracy when compared to the original input.

8.1. Numeric Types


Numeric types consist of two-, four-, and eight-byte integers, four- and eight-byte floating-point numbers,
and selectable-precision decimals. Table 8-2 lists the available types.

Table 8-2. Numeric Types

Name Storage Size Description Range


smallint 2 bytes small-range integer -32768 to +32767
integer 4 bytes usual choice for integer -2147483648 to
+2147483647
bigint 8 bytes large-range integer -9223372036854775808
to
9223372036854775807
decimal variable user-specified precision, no limit
exact
numeric variable user-specified precision, no limit
exact

82
Chapter 8. Data Types

Name Storage Size Description Range


real 4 bytes variable-precision, 6 decimal digits precision
inexact
double precision 8 bytes variable-precision, 15 decimal digits
inexact precision
serial 4 bytes autoincrementing integer 1 to 2147483647
bigserial 8 bytes large autoincrementing 1 to
integer 9223372036854775807

The syntax of constants for the numeric types is described in Section 4.1.2. The numeric types have a full
set of corresponding arithmetic operators and functions. Refer to Chapter 9 for more information. The
following sections describe the types in detail.

8.1.1. Integer Types


The types smallint, integer, and bigint store whole numbers, that is, numbers without fractional
components, of various ranges. Attempts to store values outside of the allowed range will result in an
error.
The type integer is the usual choice, as it offers the best balance between range, storage size, and
performance. The smallint type is generally only used if disk space is at a premium. The bigint type
should only be used if the integer range is not sufficient, because the latter is definitely faster.
The bigint type may not function correctly on all platforms, since it relies on compiler support for eight-
byte integers. On a machine without such support, bigint acts the same as integer (but still takes up
eight bytes of storage). However, we are not aware of any reasonable platform where this is actually the
case.
SQL only specifies the integer types integer (or int) and smallint. The type bigint, and the type
names int2, int4, and int8 are extensions, which are shared with various other SQL database systems.

8.1.2. Arbitrary Precision Numbers


The type numeric can store numbers with up to 1000 digits of precision and perform calculations ex-
actly. It is especially recommended for storing monetary amounts and other quantities where exactness is
required. However, arithmetic on numeric values is very slow compared to the integer types, or to the
floating-point types described in the next section.
In what follows we use these terms: The scale of a numeric is the count of decimal digits in the fractional
part, to the right of the decimal point. The precision of a numeric is the total count of significant digits in
the whole number, that is, the number of digits to both sides of the decimal point. So the number 23.5141
has a precision of 6 and a scale of 4. Integers can be considered to have a scale of zero.
Both the maximum precision and the maximum scale of a numeric column can be configured. To declare
a column of type numeric use the syntax

NUMERIC(precision, scale)

The precision must be positive, the scale zero or positive. Alternatively,

83
Chapter 8. Data Types

NUMERIC(precision)

selects a scale of 0. Specifying

NUMERIC

without any precision or scale creates a column in which numeric values of any precision and scale can
be stored, up to the implementation limit on precision. A column of this kind will not coerce input values
to any particular scale, whereas numeric columns with a declared scale will coerce input values to that
scale. (The SQL standard requires a default scale of 0, i.e., coercion to integer precision. We find this a bit
useless. If you’re concerned about portability, always specify the precision and scale explicitly.)
If the scale of a value to be stored is greater than the declared scale of the column, the system will round
the value to the specified number of fractional digits. Then, if the number of digits to the left of the decimal
point exceeds the declared precision minus the declared scale, an error is raised.
Numeric values are physically stored without any extra leading or trailing zeroes. Thus, the declared
precision and scale of a column are maximums, not fixed allocations. (In this sense the numeric type is
more akin to varchar(n) than to char(n).)
In addition to ordinary numeric values, the numeric type allows the special value NaN, meaning “not-
a-number”. Any operation on NaN yields another NaN. When writing this value as a constant in a SQL
command, you must put quotes around it, for example UPDATE table SET x = ’NaN’. On input, the
string NaN is recognized in a case-insensitive manner.
The types decimal and numeric are equivalent. Both types are part of the SQL standard.

8.1.3. Floating-Point Types


The data types real and double precision are inexact, variable-precision numeric types. In practice,
these types are usually implementations of IEEE Standard 754 for Binary Floating-Point Arithmetic (sin-
gle and double precision, respectively), to the extent that the underlying processor, operating system, and
compiler support it.
Inexact means that some values cannot be converted exactly to the internal format and are stored as
approximations, so that storing and printing back out a value may show slight discrepancies. Managing
these errors and how they propagate through calculations is the subject of an entire branch of mathematics
and computer science and will not be discussed further here, except for the following points:

• If you require exact storage and calculations (such as for monetary amounts), use the numeric type
instead.
• If you want to do complicated calculations with these types for anything important, especially if you
rely on certain behavior in boundary cases (infinity, underflow), you should evaluate the implementation
carefully.
• Comparing two floating-point values for equality may or may not work as expected.

On most platforms, the real type has a range of at least 1E-37 to 1E+37 with a precision of at least 6
decimal digits. The double precision type typically has a range of around 1E-307 to 1E+308 with a
precision of at least 15 digits. Values that are too large or too small will cause an error. Rounding may take

84
Chapter 8. Data Types

place if the precision of an input number is too high. Numbers too close to zero that are not representable
as distinct from zero will cause an underflow error.
In addition to ordinary numeric values, the floating-point types have several special values:

Infinity
-Infinity
NaN

These represent the IEEE 754 special values “infinity”, “negative infinity”, and “not-a-number”, respec-
tively. (On a machine whose floating-point arithmetic does not follow IEEE 754, these values will prob-
ably not work as expected.) When writing these values as constants in a SQL command, you must put
quotes around them, for example UPDATE table SET x = ’Infinity’. On input, these strings are
recognized in a case-insensitive manner.
PostgreSQL also supports the SQL-standard notations float and float(p) for specifying inexact nu-
meric types. Here, p specifies the minimum acceptable precision in binary digits. PostgreSQL accepts
float(1) to float(24) as selecting the real type, while float(25) to float(53) select double
precision. Values of p outside the allowed range draw an error. float with no precision specified is
taken to mean double precision.

Note: Prior to PostgreSQL 7.4, the precision in float(p) was taken to mean so many decimal digits.
This has been corrected to match the SQL standard, which specifies that the precision is measured
in binary digits. The assumption that real and double precision have exactly 24 and 53 bits in
the mantissa respectively is correct for IEEE-standard floating point implementations. On non-IEEE
platforms it may be off a little, but for simplicity the same ranges of p are used on all platforms.

8.1.4. Serial Types


The data types serial and bigserial are not true types, but merely a notational convenience for set-
ting up unique identifier columns (similar to the AUTO_INCREMENT property supported by some other
databases). In the current implementation, specifying

CREATE TABLE tablename (


colname SERIAL
);

is equivalent to specifying:

CREATE SEQUENCE tablename_colname_seq;


CREATE TABLE tablename (
colname integer DEFAULT nextval(’tablename_colname_seq’) NOT NULL
);

Thus, we have created an integer column and arranged for its default values to be assigned from a sequence
generator. A NOT NULL constraint is applied to ensure that a null value cannot be explicitly inserted, either.
In most cases you would also want to attach a UNIQUE or PRIMARY KEY constraint to prevent duplicate
values from being inserted by accident, but this is not automatic.

85
Chapter 8. Data Types

Note: Prior to PostgreSQL 7.3, serial implied UNIQUE. This is no longer automatic. If you wish a
serial column to be in a unique constraint or a primary key, it must now be specified, same as with any
other data type.

To insert the next value of the sequence into the serial column, specify that the serial column should
be assigned its default value. This can be done either by excluding the column from the list of columns in
the INSERT statement, or through the use of the DEFAULT key word.
The type names serial and serial4 are equivalent: both create integer columns. The type names
bigserial and serial8 work just the same way, except that they create a bigint column. bigserial
should be used if you anticipate the use of more than 231 identifiers over the lifetime of the table.
The sequence created for a serial column is automatically dropped when the owning column is dropped,
and cannot be dropped otherwise. (This was not true in PostgreSQL releases before 7.3. Note that this
automatic drop linkage will not occur for a sequence created by reloading a dump from a pre-7.3 database;
the dump file does not contain the information needed to establish the dependency link.) Furthermore,
this dependency between sequence and column is made only for the serial column itself. If any other
columns reference the sequence (perhaps by manually calling the nextval function), they will be broken
if the sequence is removed. Using a serial column’s sequence in such a fashion is considered bad
form; if you wish to feed several columns from the same sequence generator, create the sequence as an
independent object.

8.2. Monetary Types

Note: The money type is deprecated. Use numeric or decimal instead, in combination with the
to_char function.

The money type stores a currency amount with a fixed fractional precision; see Table 8-3. Input is ac-
cepted in a variety of formats, including integer and floating-point literals, as well as “typical” currency
formatting, such as ’$1,000.00’. Output is generally in the latter form but depends on the locale.

Table 8-3. Monetary Types

Name Storage Size Description Range


money 4 bytes currency amount -21474836.48 to
+21474836.47

8.3. Character Types

Table 8-4. Character Types

Name Description

86
Chapter 8. Data Types

Name Description
character varying(n), varchar(n) variable-length with limit
character(n), char(n) fixed-length, blank padded
text variable unlimited length

Table 8-4 shows the general-purpose character types available in PostgreSQL.


SQL defines two primary character types: character varying(n) and character(n), where n is a
positive integer. Both of these types can store strings up to n characters in length. An attempt to store a
longer string into a column of these types will result in an error, unless the excess characters are all spaces,
in which case the string will be truncated to the maximum length. (This somewhat bizarre exception is
required by the SQL standard.) If the string to be stored is shorter than the declared length, values of type
character will be space-padded; values of type character varying will simply store the shorter
string.
If one explicitly casts a value to character varying(n) or character(n), then an over-length value
will be truncated to n characters without raising an error. (This too is required by the SQL standard.)

Note: Prior to PostgreSQL 7.2, strings that were too long were always truncated without raising an
error, in either explicit or implicit casting contexts.

The notations varchar(n) and char(n) are aliases for character varying(n) and
character(n), respectively. character without length specifier is equivalent to character(1). If
character varying is used without length specifier, the type accepts strings of any size. The latter is
a PostgreSQL extension.
In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type
text is not in the SQL standard, several other SQL database management systems have it as well.

Values of type character are physically padded with spaces to the specified width n, and are stored
and displayed that way. However, the padding spaces are treated as semantically insignificant. Trailing
spaces are disregarded when comparing two values of type character, and they will be removed when
converting a character value to one of the other string types. Note that trailing spaces are semantically
significant in character varying and text values.
The storage requirement for data of these types is 4 bytes plus the actual string, and in case of character
plus the padding. Long strings are compressed by the system automatically, so the physical requirement
on disk may be less. Long values are also stored in background tables so they do not interfere with rapid
access to the shorter column values. In any case, the longest possible character string that can be stored
is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than
that. It wouldn’t be very useful to change this because with multibyte character encodings the number of
characters and bytes can be quite different anyway. If you desire to store long strings with no specific upper
limit, use text or character varying without a length specifier, rather than making up an arbitrary
length limit.)

Tip: There are no performance differences between these three types, apart from the increased stor-
age size when using the blank-padded type. While character(n) has performance advantages in
some other database systems, it has no such advantages in PostgreSQL. In most situations text or
character varying should be used instead.

87
Chapter 8. Data Types

Refer to Section 4.1.2.1 for information about the syntax of string literals, and to Chapter 9 for information
about available operators and functions. The database character set determines the character set used to
store textual values; for more information on character set support, refer to Section 20.2.

Example 8-1. Using the character types

CREATE TABLE test1 (a character(4));


INSERT INTO test1 VALUES (’ok’);
SELECT a, char_length(a) FROM test1; -- ➊
a | char_length
------+-------------
ok | 2

CREATE TABLE test2 (b varchar(5));


INSERT INTO test2 VALUES (’ok’);
INSERT INTO test2 VALUES (’good ’);
INSERT INTO test2 VALUES (’too long’);
ERROR: value too long for type character varying(5)
INSERT INTO test2 VALUES (’too long’::varchar(5)); -- explicit truncation
SELECT b, char_length(b) FROM test2;
b | char_length
-------+-------------
ok | 2
good | 5
too l | 5

➊ The char_length function is discussed in Section 9.4.

There are two other fixed-length character types in PostgreSQL, shown in Table 8-5. The name type exists
only for storage of identifiers in the internal system catalogs and is not intended for use by the general user.
Its length is currently defined as 64 bytes (63 usable characters plus terminator) but should be referenced
using the constant NAMEDATALEN. The length is set at compile time (and is therefore adjustable for special
uses); the default maximum length may change in a future release. The type "char" (note the quotes) is
different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs
as a poor-man’s enumeration type.

Table 8-5. Special Character Types

Name Storage Size Description


"char" 1 byte single-character internal type
name 64 bytes internal type for object names

8.4. Binary Data Types


The bytea data type allows storage of binary strings; see Table 8-6.

88
Chapter 8. Data Types

Table 8-6. Binary Data Types

Name Storage Size Description


bytea 4 bytes plus the actual binary variable-length binary string
string

A binary string is a sequence of octets (or bytes). Binary strings are distinguished from character strings
by two characteristics: First, binary strings specifically allow storing octets of value zero and other “non-
printable” octets (usually, octets outside the range 32 to 126). Character strings disallow zero octets,
and also disallow any other octet values and sequences of octet values that are invalid according to the
database’s selected character set encoding. Second, operations on binary strings process the actual bytes,
whereas the processing of character strings depends on locale settings. In short, binary strings are ap-
propriate for storing data that the programmer thinks of as “raw bytes”, whereas character strings are
appropriate for storing text.
When entering bytea values, octets of certain values must be escaped (but all octet values may be escaped)
when used as part of a string literal in an SQL statement. In general, to escape an octet, it is converted into
the three-digit octal number equivalent of its decimal octet value, and preceded by two backslashes. Table
8-7 shows the characters that must be escaped, and gives the alternate escape sequences where applicable.

Table 8-7. bytea Literal Escaped Octets

Decimal Octet Description Escaped Input Example Output


Value Representation Representation
0 zero octet ’\\000’ SELECT \000
’\\000’::bytea;
39 single quote ’\” or ’\\047’ SELECT ’
’\”::bytea;
92 backslash ’\\\\’ or ’\\134’SELECT \\
’\\\\’::bytea;
0 to 31 and 127 to “non-printable” ’\\xxx’ (octal SELECT \001
255 octets value) ’\\001’::bytea;

The requirement to escape “non-printable” octets actually varies depending on locale settings. In some
instances you can get away with leaving them unescaped. Note that the result in each of the examples
in Table 8-7 was exactly one octet in length, even though the output representation of the zero octet and
backslash are more than one character.
The reason that you have to write so many backslashes, as shown in Table 8-7, is that an input string written
as a string literal must pass through two parse phases in the PostgreSQL server. The first backslash of each
pair is interpreted as an escape character by the string-literal parser and is therefore consumed, leaving the
second backslash of the pair. The remaining backslash is then recognized by the bytea input function as
starting either a three digit octal value or escaping another backslash. For example, a string literal passed
to the server as ’\\001’ becomes \001 after passing through the string-literal parser. The \001 is then
sent to the bytea input function, where it is converted to a single octet with a decimal value of 1. Note
that the apostrophe character is not treated specially by bytea, so it follows the normal rules for string
literals. (See also Section 4.1.2.1.)
Bytea octets are also escaped in the output. In general, each “non-printable” octet is converted into its

89
Chapter 8. Data Types

equivalent three-digit octal value and preceded by one backslash. Most “printable” octets are represented
by their standard representation in the client character set. The octet with decimal value 92 (backslash)
has a special alternative output representation. Details are in Table 8-8.

Table 8-8. bytea Output Escaped Octets

Decimal Octet Description Escaped Output Example Output Result


Value Representation
92 backslash \\ SELECT \\
’\\134’::bytea;
0 to 31 and 127 to “non-printable” \xxx (octal value) SELECT \001
255 octets ’\\001’::bytea;
32 to 126 “printable” octets client character set SELECT ~
representation ’\\176’::bytea;

Depending on the front end to PostgreSQL you use, you may have additional work to do in terms of
escaping and unescaping bytea strings. For example, you may also have to escape line feeds and carriage
returns if your interface automatically translates these.
The SQL standard defines a different binary string type, called BLOB or BINARY LARGE OBJECT. The
input format is different from bytea, but the provided functions and operators are mostly the same.

8.5. Date/Time Types


PostgreSQL supports the full set of SQL date and time types, shown in Table 8-9. The operations available
on these data types are described in Section 9.9.

Table 8-9. Date/Time Types

Name Storage Size Description Low Value High Value Resolution


timestamp [ 8 bytes both date and 4713 BC 5874897 AD 1 microsecond /
(p) ] [ time 14 digits
without time
zone ]
timestamp [ 8 bytes both date and 4713 BC 5874897 AD 1 microsecond /
(p) ] with time, with time 14 digits
time zone zone
interval [ 12 bytes time intervals -178000000 178000000 years 1 microsecond /
(p) ] years 14 digits
date 4 bytes dates only 4713 BC 32767 AD 1 day
time [ (p) ] 8 bytes times of day only00:00:00.00 23:59:59.99 1 microsecond /
[ without 14 digits
time zone ]
time [ (p) ] 12 bytes times of day 00:00:00.00+12 23:59:59.99-12 1 microsecond /
with time only, with time 14 digits
zone zone

90
Chapter 8. Data Types

Note: Prior to PostgreSQL 7.3, writing just timestamp was equivalent to timestamp with time
zone. This was changed for SQL compliance.

time, timestamp, and interval accept an optional precision value p which specifies the number of
fractional digits retained in the seconds field. By default, there is no explicit bound on precision. The
allowed range of p is from 0 to 6 for the timestamp and interval types.

Note: When timestamp values are stored as double precision floating-point numbers (currently the
default), the effective limit of precision may be less than 6. timestamp values are stored as seconds
before or after midnight 2000-01-01. Microsecond precision is achieved for dates within a few years of
2000-01-01, but the precision degrades for dates further away. When timestamp values are stored as
eight-byte integers (a compile-time option), microsecond precision is available over the full range of
values. However eight-byte integer timestamps have a more limited range of dates than shown above:
from 4713 BC up to 294276 AD. The same compile-time option also determines whether time and
interval values are stored as floating-point or eight-byte integers. In the floating-point case, large
interval values degrade in precision as the size of the interval increases.

For the time types, the allowed range of p is from 0 to 6 when eight-byte integer storage is used, or from
0 to 10 when floating-point storage is used.
The type time with time zone is defined by the SQL standard, but the definition exhibits properties
which lead to questionable usefulness. In most cases, a combination of date, time, timestamp
without time zone, and timestamp with time zone should provide a complete range of
date/time functionality required by any application.
The types abstime and reltime are lower precision types which are used internally. You are discour-
aged from using these types in new applications and are encouraged to move any old ones over when
appropriate. Any or all of these internal types might disappear in a future release.

8.5.1. Date/Time Input


Date and time input is accepted in almost any reasonable format, including ISO 8601, SQL-compatible,
traditional POSTGRES, and others. For some formats, ordering of month, day, and year in date input is
ambiguous and there is support for specifying the expected ordering of these fields. Set the DateStyle
parameter to MDY to select month-day-year interpretation, DMY to select day-month-year interpretation, or
YMD to select year-month-day interpretation.

PostgreSQL is more flexible in handling date/time input than the SQL standard requires. See Appendix B
for the exact parsing rules of date/time input and for the recognized text fields including months, days of
the week, and time zones.
Remember that any date or time literal input needs to be enclosed in single quotes, like text strings. Refer
to Section 4.1.2.5 for more information. SQL requires the following syntax

type [ (p) ] ’value’

where p in the optional precision specification is an integer corresponding to the number of fractional
digits in the seconds field. Precision can be specified for time, timestamp, and interval types. The

91
Chapter 8. Data Types

allowed values are mentioned above. If no precision is specified in a constant specification, it defaults to
the precision of the literal value.

8.5.1.1. Dates
Table 8-10 shows some possible inputs for the date type.

Table 8-10. Date Input

Example Description
January 8, 1999 unambiguous in any datestyle input mode
1999-01-08 ISO 8601; January 8 in any mode (recommended
format)
1/8/1999 January 8 in MDY mode; August 1 in DMY mode
1/18/1999 January 18 in MDY mode; rejected in other modes
01/02/03 January 2, 2003 in MDY mode; February 1, 2003 in
DMY mode; February 3, 2001 in YMD mode
1999-Jan-08 January 8 in any mode
Jan-08-1999 January 8 in any mode
08-Jan-1999 January 8 in any mode
99-Jan-08 January 8 in YMD mode, else error
08-Jan-99 January 8, except error in YMD mode
Jan-08-99 January 8, except error in YMD mode
19990108 ISO 8601; January 8, 1999 in any mode
990108 ISO 8601; January 8, 1999 in any mode
1999.008 year and day of year
J2451187 Julian day
January 8, 99 BC year 99 before the Common Era

8.5.1.2. Times
The time-of-day types are time [ (p) ] without time zone and time [ (p) ] with time
zone. Writing just time is equivalent to time without time zone.

Valid input for these types consists of a time of day followed by an optional time zone. (See Table 8-11
and Table 8-12.) If a time zone is specified in the input for time without time zone, it is silently
ignored.

Table 8-11. Time Input

Example Description
04:05:06.789 ISO 8601
04:05:06 ISO 8601
04:05 ISO 8601

92
Chapter 8. Data Types

Example Description
040506 ISO 8601
04:05 AM same as 04:05; AM does not affect value
04:05 PM same as 16:05; input hour must be <= 12
04:05:06.789-8 ISO 8601
04:05:06-08:00 ISO 8601
04:05-08:00 ISO 8601
040506-08 ISO 8601
04:05:06 PST time zone specified by name

Table 8-12. Time Zone Input

Example Description
PST Pacific Standard Time
-8:00 ISO-8601 offset for PST
-800 ISO-8601 offset for PST
-8 ISO-8601 offset for PST
zulu Military abbreviation for UTC
z Short form of zulu

Refer to Appendix B for a list of time zone names that are recognized for input.

8.5.1.3. Time Stamps


Valid input for the time stamp types consists of a concatenation of a date and a time, followed by an
optional time zone, followed by an optional AD or BC. (Alternatively, AD/BC can appear before the time
zone, but this is not the preferred ordering.) Thus

1999-01-08 04:05:06

and

1999-01-08 04:05:06 -8:00

are valid values, which follow the ISO 8601 standard. In addition, the wide-spread format

January 8 04:05:06 1999 PST

is supported.
The SQL standard differentiates timestamp without time zone and timestamp with time
zone literals by the existence of a “+”; or “-”. Hence, according to the standard,

TIMESTAMP ’2004-10-19 10:23:54’

is a timestamp without time zone, while

93
Chapter 8. Data Types

TIMESTAMP ’2004-10-19 10:23:54+02’

is a timestamp with time zone. PostgreSQL differs from the standard by requiring that timestamp
with time zone literals be explicitly typed:

TIMESTAMP WITH TIME ZONE ’2004-10-19 10:23:54+02’

If a literal is not explicitly indicated as being of timestamp with time zone, PostgreSQL will silently
ignore any time zone indication in the literal. That is, the resulting date/time value is derived from the
date/time fields in the input value, and is not adjusted for time zone.
For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated
Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone
specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in
the input string, then it is assumed to be in the time zone indicated by the system’s timezone parameter,
and is converted to UTC using the offset for the timezone zone.
When a timestamp with time zone value is output, it is always converted from UTC to the current
timezone zone, and displayed as local time in that zone. To see the time in another time zone, either
change timezone or use the AT TIME ZONE construct (see Section 9.9.3).
Conversions between timestamp without time zone and timestamp with time zone normally
assume that the timestamp without time zone value should be taken or given as timezone local
time. A different zone reference can be specified for the conversion using AT TIME ZONE.

8.5.1.4. Intervals
interval values can be written with the following syntax:

[@] quantity unit [quantity unit...] [direction]

Where: quantity is a number (possibly signed); unit is second, minute, hour, day, week, month,
year, decade, century, millennium, or abbreviations or plurals of these units; direction can be
ago or empty. The at sign (@) is optional noise. The amounts of different units are implicitly added up
with appropriate sign accounting.
Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For ex-
ample, ’1 12:59:10’ is read the same as ’1 day 12 hours 59 min 10 sec’.
The optional precision p should be between 0 and 6, and defaults to the precision of the input literal.

8.5.1.5. Special Values


PostgreSQL supports several special date/time input values for convenience, as shown in Table 8-13. The
values infinity and -infinity are specially represented inside the system and will be displayed the
same way; but the others are simply notational shorthands that will be converted to ordinary date/time
values when read. (In particular, now and related strings are converted to a specific time value as soon
as they are read.) All of these values need to be written in single quotes when used as constants in SQL
commands.

94
Chapter 8. Data Types

Table 8-13. Special Date/Time Inputs

Input String Valid Types Description


epoch date, timestamp 1970-01-01 00:00
infinity timestamp later than all other
-infinity timestamp earlier than all oth
now date, time, timestamp current transaction
today date, timestamp midnight today
tomorrow date, timestamp midnight tomorrow
yesterday date, timestamp midnight yesterda
allballs time 00:00:00.00 UTC

The following SQL-compatible functions can also be used to obtain the current time value for the
corresponding data type: CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME,
LOCALTIMESTAMP. The latter four accept an optional precision specification. (See Section 9.9.4.) Note
however that these are SQL functions and are not recognized as data input strings.

8.5.2. Date/Time Output


The output format of the date/time types can be set to one of the four styles ISO 8601, SQL (Ingres), tra-
ditional POSTGRES, and German, using the command SET datestyle. The default is the ISO format.
(The SQL standard requires the use of the ISO 8601 format. The name of the “SQL” output format is a
historical accident.) Table 8-14 shows examples of each output style. The output of the date and time
types is of course only the date or time part in accordance with the given examples.

Table 8-14. Date/Time Output Styles

Style Specification Description Example


ISO ISO 8601/SQL standard 1997-12-17 07:37:16-08
SQL traditional style 12/17/1997 07:37:16.00 PST
POSTGRES original style Wed Dec 17 07:37:16 1997 PST
German regional style 17.12.1997 07:37:16.00 PST

In the SQL and POSTGRES styles, day appears before month if DMY field ordering has been specified,
otherwise month appears before day. (See Section 8.5.1 for how this setting also affects interpretation of
input values.) Table 8-15 shows an example.

Table 8-15. Date Order Conventions

datestyle Setting Input Ordering Example Output


SQL, DMY day/month/year 17/12/1997 15:37:16.00 CET
SQL, MDY month/day/year 12/17/1997 07:37:16.00 PST
Postgres, DMY day/month/year Wed 17 Dec 07:37:16 1997 PST

95
Chapter 8. Data Types

interval output looks like the input format, except that units like century or week are converted to
years and days and ago is converted to an appropriate sign. In ISO mode the output looks like

[ quantity unit [ ... ] ] [ days ] [ hours:minutes:seconds ]

The date/time styles can be selected by the user using the SET datestyle command, the DateStyle
parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment variable on
the server or client. The formatting function to_char (see Section 9.8) is also available as a more flexible
way to format the date/time output.

8.5.3. Time Zones


Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry.
Time zones around the world became somewhat standardized during the 1900’s, but continue to be prone
to arbitrary changes, particularly with respect to daylight-savings rules. PostgreSQL currently supports
daylight-savings rules over the time period 1902 through 2038 (corresponding to the full range of conven-
tional Unix system time). Times outside that range are taken to be in “standard time” for the selected time
zone, no matter what part of the year they fall in.
PostgreSQL endeavors to be compatible with the SQL standard definitions for typical usage. However,
the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are:

• Although the date type does not have an associated time zone, the time type can. Time zones in the
real world have little meaning unless associated with a date as well as a time, since the offset may vary
through the year with daylight-saving time boundaries.
• The default time zone is specified as a constant numeric offset from UTC. It is therefore not possible to
adapt to daylight-saving time when doing date/time arithmetic across DST boundaries.

To address these difficulties, we recommend using date/time types that contain both date and time when
using time zones. We recommend not using the type time with time zone (though it is supported
by PostgreSQL for legacy applications and for compliance with the SQL standard). PostgreSQL assumes
your local time zone for any type containing only date or time.
All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the
zone specified by the timezone configuration parameter before being displayed to the client.
The timezone configuration parameter can be set in the file postgresql.conf, or in any of the other
standard ways described in Section 16.4. There are also several special ways to set it:

• If timezone is not specified in postgresql.conf nor as a postmaster command-line switch, the


server attempts to use the value of the TZ environment variable as the default time zone. If TZ is not
defined or is not any of the time zone names known to PostgreSQL, the server attempts to determine the
operating system’s default time zone by checking the behavior of the C library function localtime().
The default time zone is selected as the closest match among PostgreSQL’s known time zones.
• The SQL command SET TIME ZONE sets the time zone for the session. This is an alternative spelling
of SET TIMEZONE TO with a more SQL-spec-compatible syntax.

96
Chapter 8. Data Types

• The PGTZ environment variable, if set at the client, is used by libpq applications to send a SET TIME
ZONE command to the server upon connection.

Refer to Appendix B for a list of available time zones.

8.5.4. Internals
PostgreSQL uses Julian dates for all date/time calculations. They have the nice property of correctly
predicting/calculating any date more recent than 4713 BC to far into the future, using the assumption that
the length of the year is 365.2425 days.
Date conventions before the 19th century make for interesting reading, but are not consistent enough to
warrant coding into a date/time handler.

8.6. Boolean Type


PostgreSQL provides the standard SQL type boolean. boolean can have one of only two states: “true”
or “false”. A third state, “unknown”, is represented by the SQL null value.
Valid literal values for the “true” state are:

TRUE
’t’
’true’
’y’
’yes’
’1’

For the “false” state, the following values can be used:

FALSE
’f’
’false’
’n’
’no’
’0’

Using the key words TRUE and FALSE is preferred (and SQL-compliant).

Example 8-2. Using the boolean type

CREATE TABLE test1 (a boolean, b text);


INSERT INTO test1 VALUES (TRUE, ’sic est’);
INSERT INTO test1 VALUES (FALSE, ’non est’);
SELECT * FROM test1;
a | b
---+---------
t | sic est

97
Chapter 8. Data Types

f | non est

SELECT * FROM test1 WHERE a;


a | b
---+---------
t | sic est

Example 8-2 shows that boolean values are output using the letters t and f.

Tip: Values of the boolean type cannot be cast directly to other types (e.g., CAST (boolval AS
integer) does not work). This can be accomplished using the CASE expression: CASE WHEN boolval
THEN ’value if true’ ELSE ’value if false’ END. See Section 9.13.

boolean uses 1 byte of storage.

8.7. Geometric Types


Geometric data types represent two-dimensional spatial objects. Table 8-16 shows the geometric types
available in PostgreSQL. The most fundamental type, the point, forms the basis for all of the other types.

Table 8-16. Geometric Types

Name Storage Size Representation Description


point 16 bytes Point on the plane (x,y)
line 32 bytes Infinite line (not fully ((x1,y1),(x2,y2))
implemented)
lseg 32 bytes Finite line segment ((x1,y1),(x2,y2))
box 32 bytes Rectangular box ((x1,y1),(x2,y2))
path 16+16n bytes Closed path (similar to ((x1,y1),...)
polygon)
path 16+16n bytes Open path [(x1,y1),...]
polygon 40+16n bytes Polygon (similar to closed ((x1,y1),...)
path)
circle 24 bytes Circle <(x,y),r> (center and
radius)

A rich set of functions and operators is available to perform various geometric operations such as scaling,
translation, rotation, and determining intersections. They are explained in Section 9.10.

8.7.1. Points
Points are the fundamental two-dimensional building block for geometric types. Values of type point are
specified using the following syntax:

( x , y )

98
Chapter 8. Data Types

x , y

where x and y are the respective coordinates as floating-point numbers.

8.7.2. Line Segments


Line segments (lseg) are represented by pairs of points. Values of type lseg are specified using the
following syntax:

( ( x1 , y1 ) , ( x2 , y2 ) )
( x1 , y1 ) , ( x2 , y2 )
x1 , y1 , x2 , y2

where (x1,y1) and (x2,y2) are the end points of the line segment.

8.7.3. Boxes
Boxes are represented by pairs of points that are opposite corners of the box. Values of type box are
specified using the following syntax:

( ( x1 , y1 ) , ( x2 , y2 ) )
( x1 , y1 ) , ( x2 , y2 )
x1 , y1 , x2 , y2

where (x1,y1) and (x2,y2) are any two opposite corners of the box.
Boxes are output using the first syntax. The corners are reordered on input to store the upper right corner,
then the lower left corner. Other corners of the box can be entered, but the lower left and upper right
corners are determined from the input and stored.

8.7.4. Paths
Paths are represented by lists of connected points. Paths can be open, where the first and last points in the
list are not considered connected, or closed, where the first and last points are considered connected.
Values of type path are specified using the following syntax:

( ( x1 , y1 ) , ... , ( xn , yn ) )
[ ( x1 , y1 ) , ... , ( xn , yn ) ]
( x1 , y1 ) , ... , ( xn , yn )
( x1 , y1 , ... , xn , yn )
x1 , y1 , ... , xn , yn

where the points are the end points of the line segments comprising the path. Square brackets ([]) indicate
an open path, while parentheses (()) indicate a closed path.
Paths are output using the first syntax.

99
Chapter 8. Data Types

8.7.5. Polygons
Polygons are represented by lists of points (the vertexes of the polygon). Polygons should probably be
considered equivalent to closed paths, but are stored differently and have their own set of support routines.
Values of type polygon are specified using the following syntax:

( ( x1 , y1 ) , ... , ( xn , yn ) )
( x1 , y1 ) , ... , ( xn , yn )
( x1 , y1 , ... , xn , yn )
x1 , y1 , ... , xn , yn

where the points are the end points of the line segments comprising the boundary of the polygon.
Polygons are output using the first syntax.

8.7.6. Circles
Circles are represented by a center point and a radius. Values of type circle are specified using the
following syntax:

< ( x , y ) , r >
( ( x , y ) , r )
( x , y ) , r
x , y , r

where (x,y) is the center and r is the radius of the circle.


Circles are output using the first syntax.

8.8. Network Address Types


PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses, as shown in Table 8-17. It is
preferable to use these types instead of plain text types to store network addresses, because these types
offer input error checking and several specialized operators and functions (see Section 9.11).

Table 8-17. Network Address Types

Name Storage Size Description


cidr 12 or 24 bytes IPv4 and IPv6 networks
inet 12 or 24 bytes IPv4 and IPv6 hosts and networks
macaddr 6 bytes MAC addresses

When sorting inet or cidr data types, IPv4 addresses will always sort before IPv6 addresses, including
IPv4 addresses encapsulated or mapped into IPv6 addresses, such as ::10.2.3.4 or ::ffff::10.4.3.2.

8.8.1. inet
The inet type holds an IPv4 or IPv6 host address, and optionally the identity of the subnet it is in, all

100
Chapter 8. Data Types

in one field. The subnet identity is represented by stating how many bits of the host address represent the
network address (the “netmask”). If the netmask is 32 and the address is IPv4, then the value does not
indicate a subnet, only a single host. In IPv6, the address length is 128 bits, so 128 bits specify a unique
host address. Note that if you want to accept networks only, you should use the cidr type rather than
inet.

The input format for this type is address/y where address is an IPv4 or IPv6 address and y is the
number of bits in the netmask. If the /y part is left off, then the netmask is 32 for IPv4 and 128 for IPv6,
so the value represents just a single host. On display, the /y portion is suppressed if the netmask specifies
a single host.

8.8.2. cidr
The cidr type holds an IPv4 or IPv6 network specification. Input and output formats follow Class-
less Internet Domain Routing conventions. The format for specifying networks is address/y where
address is the network represented as an IPv4 or IPv6 address, and y is the number of bits in the
netmask. If y is omitted, it is calculated using assumptions from the older classful network numbering
system, except that it will be at least large enough to include all of the octets written in the input. It is an
error to specify a network address that has bits set to the right of the specified netmask.
Table 8-18 shows some examples.

Table 8-18. cidr Type Input Examples

cidr Input cidr Output abbrev(cidr)


192.168.100.128/25 192.168.100.128/25 192.168.100.128/25
192.168/24 192.168.0.0/24 192.168.0/24
192.168/25 192.168.0.0/25 192.168.0.0/25
192.168.1 192.168.1.0/24 192.168.1/24
192.168 192.168.0.0/24 192.168.0/24
128.1 128.1.0.0/16 128.1/16
128 128.0.0.0/16 128.0/16
128.1.2 128.1.2.0/24 128.1.2/24
10.1.2 10.1.2.0/24 10.1.2/24
10.1 10.1.0.0/16 10.1/16
10 10.0.0.0/8 10/8
10.1.2.3/32 10.1.2.3/32 10.1.2.3/32
2001:4f8:3:ba::/64 2001:4f8:3:ba::/64 2001:4f8:3:ba::/64
2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128
2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128
2001:4f8:3:ba:2e0:81ff:fe22:d1f1

::ffff:1.2.3.0/120 ::ffff:1.2.3.0/120 ::ffff:1.2.3/120


::ffff:1.2.3.0/128 ::ffff:1.2.3.0/128 ::ffff:1.2.3.0/128

101
Chapter 8. Data Types

8.8.3. inet vs. cidr


The essential difference between inet and cidr data types is that inet accepts values with nonzero bits
to the right of the netmask, whereas cidr does not.

Tip: If you do not like the output format for inet or cidr values, try the functions host, text, and
abbrev.

8.8.4. macaddr
The macaddr type stores MAC addresses, i.e., Ethernet card hardware addresses (although MAC ad-
dresses are used for other purposes as well). Input is accepted in various customary formats, including

’08002b:010203’
’08002b-010203’
’0800.2b01.0203’
’08-00-2b-01-02-03’
’08:00:2b:01:02:03’

which would all specify the same address. Upper and lower case is accepted for the digits a through f.
Output is always in the last of the forms shown.
The directory contrib/mac in the PostgreSQL source distribution contains tools that can be used to map
MAC addresses to hardware manufacturer names.

8.9. Bit String Types


Bit strings are strings of 1’s and 0’s. They can be used to store or visualize bit masks. There are two SQL
bit types: bit(n) and bit varying(n), where n is a positive integer.
bit type data must match the length n exactly; it is an error to attempt to store shorter or longer bit
strings. bit varying data is of variable length up to the maximum length n; longer strings will be
rejected. Writing bit without a length is equivalent to bit(1), while bit varying without a length
specification means unlimited length.

Note: If one explicitly casts a bit-string value to bit(n), it will be truncated or zero-padded on the
right to be exactly n bits, without raising an error. Similarly, if one explicitly casts a bit-string value to
bit varying(n), it will be truncated on the right if it is more than n bits.

Note: Prior to PostgreSQL 7.2, bit data was always silently truncated or zero-padded on the right,
with or without an explicit cast. This was changed to comply with the SQL standard.

102
Chapter 8. Data Types

Refer to Section 4.1.2.3 for information about the syntax of bit string constants. Bit-logical operators and
string manipulation functions are available; see Section 9.6.

Example 8-3. Using the bit string types

CREATE TABLE test (a BIT(3), b BIT VARYING(5));


INSERT INTO test VALUES (B’101’, B’00’);
INSERT INTO test VALUES (B’10’, B’101’);
ERROR: bit string length 2 does not match type bit(3)
INSERT INTO test VALUES (B’10’::bit(3), B’101’);
SELECT * FROM test;
a | b
-----+-----
101 | 00
100 | 101

8.10. Arrays
PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of
any built-in or user-defined base type can be created. (Arrays of composite types or domains are not yet
supported, however.)

8.10.1. Declaration of Array Types


To illustrate the use of array types, we create this table:

CREATE TABLE sal_emp (


name text,
pay_by_quarter integer[],
schedule text[][]
);

As shown, an array data type is named by appending square brackets ([]) to the data type name of the
array elements. The above command will create a table named sal_emp with a column of type text
(name), a one-dimensional array of type integer (pay_by_quarter), which represents the employee’s
salary by quarter, and a two-dimensional array of text (schedule), which represents the employee’s
weekly schedule.
The syntax for CREATE TABLE allows the exact size of arrays to be specified, for example:

CREATE TABLE tictactoe (


squares integer[3][3]
);

However, the current implementation does not enforce the array size limits — the behavior is the same as
for arrays of unspecified length.
Actually, the current implementation does not enforce the declared number of dimensions either. Arrays
of a particular element type are all considered to be of the same type, regardless of size or number of

103
Chapter 8. Data Types

dimensions. So, declaring number of dimensions or sizes in CREATE TABLE is simply documentation, it
does not affect runtime behavior.
An alternative syntax, which conforms to the SQL:1999 standard, may be used for one-dimensional arrays.
pay_by_quarter could have been defined as:

pay_by_quarter integer ARRAY[4],

This syntax requires an integer constant to denote the array size. As before, however, PostgreSQL does
not enforce the size restriction.

8.10.2. Array Value Input


To write an array value as a literal constant, enclose the element values within curly braces and separate
them by commas. (If you know C, this is not unlike the C syntax for initializing structures.) You may put
double quotes around any element value, and must do so if it contains commas or curly braces. (More
details appear below.) Thus, the general format of an array constant is the following:

’{ val1 delim val2 delim ... }’

where delim is the delimiter character for the type, as recorded in its pg_type entry. Among the standard
data types provided in the PostgreSQL distribution, type box uses a semicolon (;) but all the others use
comma (,). Each val is either a constant of the array element type, or a subarray. An example of an array
constant is

’{{1,2,3},{4,5,6},{7,8,9}}’

This constant is a two-dimensional, 3-by-3 array consisting of three subarrays of integers.


(These kinds of array constants are actually only a special case of the generic type constants discussed
in Section 4.1.2.5. The constant is initially treated as a string and passed to the array input conversion
routine. An explicit type specification might be necessary.)
Now we can show some INSERT statements.

INSERT INTO sal_emp


VALUES (’Bill’,
’{10000, 10000, 10000, 10000}’,
’{{"meeting", "lunch"}, {"meeting"}}’);
ERROR: multidimensional arrays must have array expressions with matching dimensions

Note that multidimensional arrays must have matching extents for each dimension. A mismatch causes an
error report.

INSERT INTO sal_emp


VALUES (’Bill’,
’{10000, 10000, 10000, 10000}’,
’{{"meeting", "lunch"}, {"training", "presentation"}}’);

INSERT INTO sal_emp


VALUES (’Carol’,
’{20000, 25000, 25000, 25000}’,
’{{"breakfast", "consulting"}, {"meeting", "lunch"}}’);

104
Chapter 8. Data Types

A limitation of the present array implementation is that individual elements of an array cannot be SQL
null values. The entire array can be set to null, but you can’t have an array with some elements null and
some not.
The result of the previous two inserts looks like this:

SELECT * FROM sal_emp;


name | pay_by_quarter | schedule
-------+---------------------------+-------------------------------------------
Bill | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}
(2 rows)

The ARRAY constructor syntax may also be used:

INSERT INTO sal_emp


VALUES (’Bill’,
ARRAY[10000, 10000, 10000, 10000],
ARRAY[[’meeting’, ’lunch’], [’training’, ’presentation’]]);

INSERT INTO sal_emp


VALUES (’Carol’,
ARRAY[20000, 25000, 25000, 25000],
ARRAY[[’breakfast’, ’consulting’], [’meeting’, ’lunch’]]);

Notice that the array elements are ordinary SQL constants or expressions; for instance, string literals are
single quoted, instead of double quoted as they would be in an array literal. The ARRAY constructor syntax
is discussed in more detail in Section 4.2.10.

8.10.3. Accessing Arrays


Now, we can run some queries on the table. First, we show how to access a single element of an array at
a time. This query retrieves the names of the employees whose pay changed in the second quarter:

SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];

name
-------
Carol
(1 row)

The array subscript numbers are written within square brackets. By default PostgreSQL uses the one-
based numbering convention for arrays, that is, an array of n elements starts with array[1] and ends
with array[n].
This query retrieves the third quarter pay of all employees:

SELECT pay_by_quarter[3] FROM sal_emp;

pay_by_quarter

105
Chapter 8. Data Types

----------------
10000
25000
(2 rows)

We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writ-
ing lower-bound:upper-bound for one or more array dimensions. For example, this query retrieves
the first item on Bill’s schedule for the first two days of the week:

SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = ’Bill’;

schedule
------------------------
{{meeting},{training}}
(1 row)

We could also have written

SELECT schedule[1:2][1] FROM sal_emp WHERE name = ’Bill’;

with the same result. An array subscripting operation is always taken to represent an array slice if any of
the subscripts are written in the form lower:upper . A lower bound of 1 is assumed for any subscript
where only one value is specified, as in this example:

SELECT schedule[1:2][2] FROM sal_emp WHERE name = ’Bill’;

schedule
-------------------------------------------
{{meeting,lunch},{training,presentation}}
(1 row)

The current dimensions of any array value can be retrieved with the array_dims function:

SELECT array_dims(schedule) FROM sal_emp WHERE name = ’Carol’;

array_dims
------------
[1:2][1:1]
(1 row)

array_dims produces a text result, which is convenient for people to read but perhaps not so convenient
for programs. Dimensions can also be retrieved with array_upper and array_lower, which return the
upper and lower bound of a specified array dimension, respectively.

SELECT array_upper(schedule, 1) FROM sal_emp WHERE name = ’Carol’;

array_upper
-------------
2
(1 row)

106
Chapter 8. Data Types

8.10.4. Modifying Arrays


An array value can be replaced completely:

UPDATE sal_emp SET pay_by_quarter = ’{25000,25000,27000,27000}’


WHERE name = ’Carol’;

or using the ARRAY expression syntax:

UPDATE sal_emp SET pay_by_quarter = ARRAY[25000,25000,27000,27000]


WHERE name = ’Carol’;

An array may also be updated at a single element:

UPDATE sal_emp SET pay_by_quarter[4] = 15000


WHERE name = ’Bill’;

or updated in a slice:

UPDATE sal_emp SET pay_by_quarter[1:2] = ’{27000,27000}’


WHERE name = ’Carol’;

A stored array value can be enlarged by assigning to an element adjacent to those already present, or by
assigning to a slice that is adjacent to or overlaps the data already present. For example, if array myarray
currently has 4 elements, it will have five elements after an update that assigns to myarray[5]. Currently,
enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays.
Array slice assignment allows creation of arrays that do not use one-based subscripts. For example one
might assign to myarray[-2:7] to create an array with subscript values running from -2 to 7.
New array values can also be constructed by using the concatenation operator, ||.

SELECT ARRAY[1,2] || ARRAY[3,4];


?column?
-----------
{1,2,3,4}
(1 row)

SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]];


?column?
---------------------
{{5,6},{1,2},{3,4}}
(1 row)

The concatenation operator allows a single element to be pushed on to the beginning or end of a
one-dimensional array. It also accepts two N -dimensional arrays, or an N -dimensional and an
N+1-dimensional array.

107
Chapter 8. Data Types

When a single element is pushed on to the beginning of a one-dimensional array, the result is an array
with a lower bound subscript equal to the right-hand operand’s lower bound subscript, minus one. When
a single element is pushed on to the end of a one-dimensional array, the result is an array retaining the
lower bound of the left-hand operand. For example:

SELECT array_dims(1 || ARRAY[2,3]);


array_dims
------------
[0:2]
(1 row)

SELECT array_dims(ARRAY[1,2] || 3);


array_dims
------------
[1:3]
(1 row)

When two arrays with an equal number of dimensions are concatenated, the result retains the lower bound
subscript of the left-hand operand’s outer dimension. The result is an array comprising every element of
the left-hand operand followed by every element of the right-hand operand. For example:

SELECT array_dims(ARRAY[1,2] || ARRAY[3,4,5]);


array_dims
------------
[1:5]
(1 row)

SELECT array_dims(ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8],[9,0]]);


array_dims
------------
[1:5][1:2]
(1 row)

When an N -dimensional array is pushed on to the beginning or end of an N+1-dimensional array, the result
is analogous to the element-array case above. Each N -dimensional sub-array is essentially an element of
the N+1-dimensional array’s outer dimension. For example:

SELECT array_dims(ARRAY[1,2] || ARRAY[[3,4],[5,6]]);


array_dims
------------
[0:2][1:2]
(1 row)

An array can also be constructed by using the functions array_prepend, array_append,


or array_cat. The first two only support one-dimensional arrays, but array_cat supports
multidimensional arrays. Note that the concatenation operator discussed above is preferred over direct
use of these functions. In fact, the functions are primarily for use in implementing the concatenation

108
Chapter 8. Data Types

operator. However, they may be directly useful in the creation of user-defined aggregates. Some
examples:

SELECT array_prepend(1, ARRAY[2,3]);


array_prepend
---------------
{1,2,3}
(1 row)

SELECT array_append(ARRAY[1,2], 3);


array_append
--------------
{1,2,3}
(1 row)

SELECT array_cat(ARRAY[1,2], ARRAY[3,4]);


array_cat
-----------
{1,2,3,4}
(1 row)

SELECT array_cat(ARRAY[[1,2],[3,4]], ARRAY[5,6]);


array_cat
---------------------
{{1,2},{3,4},{5,6}}
(1 row)

SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]);


array_cat
---------------------
{{5,6},{1,2},{3,4}}

8.10.5. Searching in Arrays


To search for a value in an array, you must check each value of the array. This can be done by hand, if you
know the size of the array. For example:

SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR


pay_by_quarter[2] = 10000 OR
pay_by_quarter[3] = 10000 OR
pay_by_quarter[4] = 10000;

However, this quickly becomes tedious for large arrays, and is not helpful if the size of the array is
uncertain. An alternative method is described in Section 9.17. The above query could be replaced by:

SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);

In addition, you could find rows where the array had all values equal to 10000 with:

SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);

109
Chapter 8. Data Types

Tip: Arrays are not sets; searching for specific array elements may be a sign of database misdesign.
Consider using a separate table with a row for each item that would be an array element. This will be
easier to search, and is likely to scale up better to large numbers of elements.

8.10.6. Array Input and Output Syntax


The external text representation of an array value consists of items that are interpreted according to the
I/O conversion rules for the array’s element type, plus decoration that indicates the array structure. The
decoration consists of curly braces ({ and }) around the array value plus delimiter characters between
adjacent items. The delimiter character is usually a comma (,) but can be something else: it is determined
by the typdelim setting for the array’s element type. (Among the standard data types provided in the
PostgreSQL distribution, type box uses a semicolon (;) but all the others use comma.) In a multidimen-
sional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must
be written between adjacent curly-braced entities of the same level.
The array output routine will put double quotes around element values if they are empty strings or contain
curly braces, delimiter characters, double quotes, backslashes, or white space. Double quotes and back-
slashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume
that double quotes will never appear, but for textual data types one should be prepared to cope with either
presence or absence of quotes. (This is a change in behavior from pre-7.2 PostgreSQL releases.)
By default, the lower bound index value of an array’s dimensions is set to one. If any of an array’s
dimensions has a lower bound index not equal to one, an additional decoration that indicates the actual
array dimensions will precede the array structure decoration. This decoration consists of square brackets
([]) around each array dimension’s lower and upper bounds, with a colon (:) delimiter character in
between. The array dimension decoration is followed by an equal sign (=). For example:

SELECT 1 || ARRAY[2,3] AS array;

array
---------------
[0:2]={1,2,3}
(1 row)

SELECT ARRAY[1,2] || ARRAY[[3,4]] AS array;

array
--------------------------
[0:1][1:2]={{1,2},{3,4}}
(1 row)

This syntax can also be used to specify non-default array subscripts in an array literal. For example:

SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2


FROM (SELECT ’[1:1][-2:-1][3:5]={{{1,2,3},{4,5,6}}}’::int[] AS f1) AS ss;

110
Chapter 8. Data Types

e1 | e2
----+----
1 | 6
(1 row)

As shown previously, when writing an array value you may write double quotes around any individual ar-
ray element. You must do so if the element value would otherwise confuse the array-value parser. For ex-
ample, elements containing curly braces, commas (or whatever the delimiter character is), double quotes,
backslashes, or leading or trailing whitespace must be double-quoted. To put a double quote or backslash
in a quoted array element value, precede it with a backslash. Alternatively, you can use backslash-escaping
to protect all data characters that would otherwise be taken as array syntax.
You may write whitespace before a left brace or after a right brace. You may also write whitespace be-
fore or after any individual item string. In all of these cases the whitespace will be ignored. However,
whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of
an element, is not ignored.

Note: Remember that what you write in an SQL command will first be interpreted as a string literal,
and then as an array. This doubles the number of backslashes you need. For example, to insert a
text array value containing a backslash and a double quote, you’d need to write

INSERT ... VALUES (’{"\\\\","\\""}’);

The string-literal processor removes one level of backslashes, so that what arrives at the array-value
parser looks like {"\\","\""}. In turn, the strings fed to the text data type’s input routine become \
and " respectively. (If we were working with a data type whose input routine also treated backslashes
specially, bytea for example, we might need as many as eight backslashes in the command to get
one backslash into the stored array element.) Dollar quoting (see Section 4.1.2.2) may be used to
avoid the need to double backslashes.

Tip: The ARRAY constructor syntax (see Section 4.2.10) is often easier to work with than the array-
literal syntax when writing array values in SQL commands. In ARRAY, individual element values are
written the same way they would be written when not members of an array.

8.11. Composite Types


A composite type describes the structure of a row or record; it is in essence just a list of field names and
their data types. PostgreSQL allows values of composite types to be used in many of the same ways that
simple types can be used. For example, a column of a table can be declared to be of a composite type.

111
Chapter 8. Data Types

8.11.1. Declaration of Composite Types


Here are two simple examples of defining composite types:

CREATE TYPE complex AS (


r double precision,
i double precision
);

CREATE TYPE inventory_item AS (


name text,
supplier_id integer,
price numeric
);

The syntax is comparable to CREATE TABLE, except that only field names and types can be specified; no
constraints (such as NOT NULL) can presently be included. Note that the AS keyword is essential; without
it, the system will think a quite different kind of CREATE TYPE command is meant, and you’ll get odd
syntax errors.
Having defined the types, we can use them to create tables:

CREATE TABLE on_hand (


item inventory_item,
count integer
);

INSERT INTO on_hand VALUES (ROW(’fuzzy dice’, 42, 1.99), 1000);

or functions:

CREATE FUNCTION price_extension(inventory_item, integer) RETURNS numeric


AS ’SELECT $1.price * $2’ LANGUAGE SQL;

SELECT price_extension(item, 10) FROM on_hand;

Whenever you create a table, a composite type is also automatically created, with the same name as the
table, to represent the table’s row type. For example, had we said

CREATE TABLE inventory_item (


name text,
supplier_id integer REFERENCES suppliers,
price numeric CHECK (price > 0)
);

then the same inventory_item composite type shown above would come into being as a byproduct, and
could be used just as above. Note however an important restriction of the current implementation: since
no constraints are associated with a composite type, the constraints shown in the table definition do not
apply to values of the composite type outside the table. (A partial workaround is to use domain types as
members of composite types.)

112
Chapter 8. Data Types

8.11.2. Composite Value Input


To write a composite value as a literal constant, enclose the field values within parentheses and separate
them by commas. You may put double quotes around any field value, and must do so if it contains commas
or parentheses. (More details appear below.) Thus, the general format of a composite constant is the
following:

’( val1 , val2 , ... )’

An example is

’("fuzzy dice",42,1.99)’

which would be a valid value of the inventory_item type defined above. To make a field be NULL,
write no characters at all in its position in the list. For example, this constant specifies a NULL third field:

’("fuzzy dice",42,)’

If you want an empty string rather than NULL, write double quotes:

’("",42,)’

Here the first field is a non-NULL empty string, the third is NULL.
(These constants are actually only a special case of the generic type constants discussed in Section 4.1.2.5.
The constant is initially treated as a string and passed to the composite-type input conversion routine. An
explicit type specification might be necessary.)
The ROW expression syntax may also be used to construct composite values. In most cases this is consid-
erably simpler to use than the string-literal syntax, since you don’t have to worry about multiple layers of
quoting. We already used this method above:

ROW(’fuzzy dice’, 42, 1.99)


ROW(”, 42, NULL)

The ROW keyword is actually optional as long as you have more than one field in the expression, so these
can simplify to

(’fuzzy dice’, 42, 1.99)


(”, 42, NULL)

The ROW expression syntax is discussed in more detail in Section 4.2.11.

8.11.3. Accessing Composite Types


To access a field of a composite column, one writes a dot and the field name, much like selecting a
field from a table name. In fact, it’s so much like selecting from a table name that you often have to use
parentheses to keep from confusing the parser. For example, you might try to select some subfields from
our on_hand example table with something like:

SELECT item.name FROM on_hand WHERE item.price > 9.99;

113
Chapter 8. Data Types

This will not work since the name item is taken to be a table name, not a field name, per SQL syntax
rules. You must write it like this:

SELECT (item).name FROM on_hand WHERE (item).price > 9.99;

or if you need to use the table name as well (for instance in a multi-table query), like this:

SELECT (on_hand.item).name FROM on_hand WHERE (on_hand.item).price > 9.99;

Now the parenthesized object is correctly interpreted as a reference to the item column, and then the
subfield can be selected from it.
Similar syntactic issues apply whenever you select a field from a composite value. For instance, to select
just one field from the result of a function that returns a composite value, you’d need to write something
like

SELECT (my_func(...)).field FROM ...

Without the extra parentheses, this will provoke a syntax error.

8.11.4. Modifying Composite Types


Here are some examples of the proper syntax for inserting and updating composite columns. First, insert-
ing or updating a whole column:

INSERT INTO mytab (complex_col) VALUES((1.1,2.2));

UPDATE mytab SET complex_col = ROW(1.1,2.2) WHERE ...;

The first example omits ROW, the second uses it; we could have done it either way.
We can update an individual subfield of a composite column:

UPDATE mytab SET complex_col.r = (complex_col).r + 1 WHERE ...;

Notice here that we don’t need to (and indeed cannot) put parentheses around the column name appearing
just after SET, but we do need parentheses when referencing the same column in the expression to the
right of the equal sign.
And we can specify subfields as targets for INSERT, too:

INSERT INTO mytab (complex_col.r, complex_col.i) VALUES(1.1, 2.2);

Had we not supplied values for all the subfields of the column, the remaining subfields would have been
filled with null values.

8.11.5. Composite Type Input and Output Syntax


The external text representation of a composite value consists of items that are interpreted according to the
I/O conversion rules for the individual field types, plus decoration that indicates the composite structure.
The decoration consists of parentheses (( and )) around the whole value, plus commas (,) between
adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered

114
Chapter 8. Data Types

part of the field value, and may or may not be significant depending on the input conversion rules for the
field data type. For example, in

’( 42)’

the whitespace will be ignored if the field type is integer, but not if it is text.
As shown previously, when writing a composite value you may write double quotes around any individual
field value. You must do so if the field value would otherwise confuse the composite-value parser. In
particular, fields containing parentheses, commas, double quotes, or backslashes must be double-quoted.
To put a double quote or backslash in a quoted composite field value, precede it with a backslash. (Also,
a pair of double quotes within a double-quoted field value is taken to represent a double quote character,
analogously to the rules for single quotes in SQL literal strings.) Alternatively, you can use backslash-
escaping to protect all data characters that would otherwise be taken as composite syntax.
A completely empty field value (no characters at all between the commas or parentheses) represents a
NULL. To write a value that is an empty string rather than NULL, write "".
The composite output routine will put double quotes around field values if they are empty strings or
contain parentheses, commas, double quotes, backslashes, or white space. (Doing so for white space is
not essential, but aids legibility.) Double quotes and backslashes embedded in field values will be doubled.

Note: Remember that what you write in an SQL command will first be interpreted as a string literal,
and then as a composite. This doubles the number of backslashes you need. For example, to insert a
text field containing a double quote and a backslash in a composite value, you’d need to write

INSERT ... VALUES (’("\\"\\\\")’);

The string-literal processor removes one level of backslashes, so that what arrives at the composite-
value parser looks like ("\"\\"). In turn, the string fed to the text data type’s input routine becomes
"\. (If we were working with a data type whose input routine also treated backslashes specially, bytea
for example, we might need as many as eight backslashes in the command to get one backslash into
the stored composite field.) Dollar quoting (see Section 4.1.2.2) may be used to avoid the need to
double backslashes.

Tip: The ROW constructor syntax is usually easier to work with than the composite-literal syntax when
writing composite values in SQL commands. In ROW, individual field values are written the same way
they would be written when not members of a composite.

8.12. Object Identifier Types


Object identifiers (OIDs) are used internally by PostgreSQL as primary keys for various system tables.
An OID system column is also added to user-created tables, unless WITHOUT OIDS is specified when
the table is created, or the default_with_oids configuration variable is set to false. Type oid represents
an object identifier. There are also several alias types for oid: regproc, regprocedure, regoper,
regoperator, regclass, and regtype. Table 8-19 shows an overview.

115
Chapter 8. Data Types

The oid type is currently implemented as an unsigned four-byte integer. Therefore, it is not large enough
to provide database-wide uniqueness in large databases, or even in large individual tables. So, using a
user-created table’s OID column as a primary key is discouraged. OIDs are best used only for references
to system tables.

Note: OIDs are included by default in user-created tables in PostgreSQL 8.0.0. However, this be-
havior is likely to change in a future version of PostgreSQL. Eventually, user-created tables will not
include an OID system column unless WITH OIDS is specified when the table is created, or the
default_with_oids configuration variable is set to true. If your application requires the presence
of an OID system column in a table, it should specify WITH OIDS when that table is created to ensure
compatibility with future releases of PostgreSQL.

The oid type itself has few operations beyond comparison. It can be cast to integer, however, and then
manipulated using the standard integer operators. (Beware of possible signed-versus-unsigned confusion
if you do this.)
The OID alias types have no operations of their own except for specialized input and output routines.
These routines are able to accept and display symbolic names for system objects, rather than the raw
numeric value that type oid would use. The alias types allow simplified lookup of OID values for objects.
For example, to examine the pg_attribute rows related to a table mytable, one could write

SELECT * FROM pg_attribute WHERE attrelid = ’mytable’::regclass;

rather than

SELECT * FROM pg_attribute


WHERE attrelid = (SELECT oid FROM pg_class WHERE relname = ’mytable’);

While that doesn’t look all that bad by itself, it’s still oversimplified. A far more complicated sub-select
would be needed to select the right OID if there are multiple tables named mytable in different schemas.
The regclass input converter handles the table lookup according to the schema path setting, and so it
does the “right thing” automatically. Similarly, casting a table’s OID to regclass is handy for symbolic
display of a numeric OID.

Table 8-19. Object Identifier Types

Name References Description Value Example


oid any numeric object identifier 564182
regproc pg_proc function name sum
regprocedure pg_proc function with argument sum(int4)
types
regoper pg_operator operator name +
regoperator pg_operator operator with argument *(integer,integer)
types or -(NONE,integer)
regclass pg_class relation name pg_type
regtype pg_type data type name integer

All of the OID alias types accept schema-qualified names, and will display schema-qualified names on

116
Chapter 8. Data Types

output if the object would not be found in the current search path without being qualified. The regproc
and regoper alias types will only accept input names that are unique (not overloaded), so they are of
limited use; for most uses regprocedure or regoperator is more appropriate. For regoperator,
unary operators are identified by writing NONE for the unused operand.
Another identifier type used by the system is xid, or transaction (abbreviated xact) identifier. This is the
data type of the system columns xmin and xmax. Transaction identifiers are 32-bit quantities.
A third identifier type used by the system is cid, or command identifier. This is the data type of the system
columns cmin and cmax. Command identifiers are also 32-bit quantities.
A final identifier type used by the system is tid, or tuple identifier (row identifier). This is the data type
of the system column ctid. A tuple ID is a pair (block number, tuple index within block) that identifies
the physical location of the row within its table.
(The system columns are further explained in Section 5.4.)

8.13. Pseudo-Types
The PostgreSQL type system contains a number of special-purpose entries that are collectively called
pseudo-types. A pseudo-type cannot be used as a column data type, but it can be used to declare a func-
tion’s argument or result type. Each of the available pseudo-types is useful in situations where a function’s
behavior does not correspond to simply taking or returning a value of a specific SQL data type. Table 8-20
lists the existing pseudo-types.

Table 8-20. Pseudo-Types

Name Description
any Indicates that a function accepts any input data type
whatever.
anyarray Indicates that a function accepts any array data type
(see Section 31.2.5).
anyelement Indicates that a function accepts any data type (see
Section 31.2.5).
cstring Indicates that a function accepts or returns a
null-terminated C string.
internal Indicates that a function accepts or returns a
server-internal data type.
language_handler A procedural language call handler is declared to
return language_handler.
record Identifies a function returning an unspecified row
type.
trigger A trigger function is declared to return trigger.
void Indicates that a function returns no value.
opaque An obsolete type name that formerly served all the
above purposes.

117
Chapter 8. Data Types

Functions coded in C (whether built-in or dynamically loaded) may be declared to accept or return any of
these pseudo data types. It is up to the function author to ensure that the function will behave safely when
a pseudo-type is used as an argument type.
Functions coded in procedural languages may use pseudo-types only as allowed by their implementation
languages. At present the procedural languages all forbid use of a pseudo-type as argument type, and allow
only void and record as a result type (plus trigger when the function is used as a trigger). Some also
support polymorphic functions using the types anyarray and anyelement.
The internal pseudo-type is used to declare functions that are meant only to be called internally by the
database system, and not by direct invocation in a SQL query. If a function has at least one internal-type
argument then it cannot be called from SQL. To preserve the type safety of this restriction it is important
to follow this coding rule: do not create any function that is declared to return internal unless it has at
least one internal argument.

118
Chapter 9. Functions and Operators
PostgreSQL provides a large number of functions and operators for the built-in data types. Users can also
define their own functions and operators, as described in Part V. The psql commands \df and \do can be
used to show the list of all actually available functions and operators, respectively.
If you are concerned about portability then take note that most of the functions and operators described
in this chapter, with the exception of the most trivial arithmetic and comparison operators and some
explicitly marked functions, are not specified by the SQL standard. Some of the extended functionality is
present in other SQL database management systems, and in many cases this functionality is compatible
and consistent between the various implementations.

9.1. Logical Operators


The usual logical operators are available:

AND
OR
NOT

SQL uses a three-valued Boolean logic where the null value represents “unknown”. Observe the following
truth tables:

a b a AND b a OR b
TRUE TRUE TRUE TRUE
TRUE FALSE FALSE TRUE
TRUE NULL NULL TRUE
FALSE FALSE FALSE FALSE
FALSE NULL FALSE NULL
NULL NULL NULL NULL

a NOT a
TRUE FALSE
FALSE TRUE
NULL NULL

The operators AND and OR are commutative, that is, you can switch the left and right operand without
affecting the result. But see Section 4.2.12 for more information about the order of evaluation of subex-
pressions.

9.2. Comparison Operators


The usual comparison operators are available, shown in Table 9-1.

119
Chapter 9. Functions and Operators

Table 9-1. Comparison Operators

Operator Description
< less than
> greater than
<= less than or equal to
>= greater than or equal to
= equal
<> or != not equal

Note: The != operator is converted to <> in the parser stage. It is not possible to implement != and
<> operators that do different things.

Comparison operators are available for all data types where this makes sense. All comparison operators are
binary operators that return values of type boolean; expressions like 1 < 2 < 3 are not valid (because
there is no < operator to compare a Boolean value with 3).
In addition to the comparison operators, the special BETWEEN construct is available.

a BETWEEN x AND y

is equivalent to

a >= x AND a <= y

Similarly,

a NOT BETWEEN x AND y

is equivalent to

a < x OR a > y

There is no difference between the two respective forms apart from the CPU cycles required to rewrite the
first one into the second one internally.
To check whether a value is or is not null, use the constructs

expression IS NULL
expression IS NOT NULL

or the equivalent, but nonstandard, constructs

expression ISNULL
expression NOTNULL

Do not write expression = NULL because NULL is not “equal to” NULL. (The null value represents an
unknown value, and it is not known whether two unknown values are equal.) This behavior conforms to
the SQL standard.

120
Chapter 9. Functions and Operators

Tip: Some applications may expect that expression = NULL returns true if expression evaluates
to the null value. It is highly recommended that these applications be modified to comply with the
SQL standard. However, if that cannot be done the transform_null_equals configuration variable is
available. If it is enabled, PostgreSQL will convert x = NULL clauses to x IS NULL. This was the
default behavior in PostgreSQL releases 6.5 through 7.1.

The ordinary comparison operators yield null (signifying “unknown”) when either input is null. Another
way to do comparisons is with the IS DISTINCT FROM construct:

expression IS DISTINCT FROM expression

For non-null inputs this is the same as the <> operator. However, when both inputs are null it will return
false, and when just one input is null it will return true. Thus it effectively acts as though null were a
normal data value, rather than “unknown”.
Boolean values can also be tested using the constructs

expression IS TRUE
expression IS NOT TRUE
expression IS FALSE
expression IS NOT FALSE
expression IS UNKNOWN
expression IS NOT UNKNOWN

These will always return true or false, never a null value, even when the operand is null. A null input
is treated as the logical value “unknown”. Notice that IS UNKNOWN and IS NOT UNKNOWN are effec-
tively the same as IS NULL and IS NOT NULL, respectively, except that the input expression must be of
Boolean type.

9.3. Mathematical Functions and Operators


Mathematical operators are provided for many PostgreSQL types. For types without common mathemat-
ical conventions for all possible permutations (e.g., date/time types) we describe the actual behavior in
subsequent sections.
Table 9-2 shows the available mathematical operators.

Table 9-2. Mathematical Operators

Operator Description Example Result


+ addition 2 + 3 5
- subtraction 2 - 3 -1
* multiplication 2 * 3 6
/ division (integer division 4 / 2 2
truncates results)
% modulo (remainder) 5 % 4 1
^ exponentiation 2.0 ^ 3.0 8

121
Chapter 9. Functions and Operators

Operator Description Example Result


|/ square root |/ 25.0 5
||/ cube root ||/ 27.0 3
! factorial 5 ! 120
!! factorial (prefix operator) !! 5 120
@ absolute value @ -5.0 5
& bitwise AND 91 & 15 11
| bitwise OR 32 | 3 35
# bitwise XOR 17 # 5 20
~ bitwise NOT ~1 -2
<< bitwise shift left 1 << 4 16
>> bitwise shift right 8 >> 2 2

The bitwise operators work only on integral data types, whereas the others are available for all numeric
data types. The bitwise operators are also available for the bit string types bit and bit varying, as
shown in Table 9-10.
Table 9-3 shows the available mathematical functions. In the table, dp indicates double precision.
Many of these functions are provided in multiple forms with different argument types. Except where
noted, any given form of a function returns the same data type as its argument. The functions working
with double precision data are mostly implemented on top of the host system’s C library; accuracy
and behavior in boundary cases may therefore vary depending on the host system.

Table 9-3. Mathematical Functions

Function Return Type Description Example Result


abs(x) (same as x) absolute value abs(-17.4) 17.4
cbrt(dp) dp cube root cbrt(27.0) 3
ceil(dp or (same as input) smallest integer not ceil(-42.8) -42
numeric) less than argument
ceiling(dp or (same as input) smallest integer not ceiling(-95.3) -95
numeric) less than argument
(alias for ceil)
degrees(dp) dp radians to degrees degrees(0.5) 28.6478897565412

exp(dp or (same as input) exponential exp(1.0) 2.71828182845905


numeric)
floor(dp or (same as input) largest integer not floor(-42.8) -43
numeric) greater than
argument
ln(dp or (same as input) natural logarithm ln(2.0) 0.693147180559945
numeric)
log(dp or (same as input) base 10 logarithm log(100.0) 2
numeric)

122
Chapter 9. Functions and Operators

Function Return Type Description Example Result


log(b numeric, x numeric logarithm to base b log(2.0, 64.0) 6.0000000000
numeric)
mod(y, x) (same as argument remainder of y/x mod(9,4) 1
types)
pi() dp “π” constant pi() 3.14159265358979

power(a dp, b dp a raised to the power power(9.0, 3.0) 729


dp) of b
power(a numeric, numeric a raised to the power power(9.0, 3.0) 729
b numeric) of b
radians(dp) dp degrees to radians radians(45.0) 0.785398163397448

random() dp random value random()


between 0.0 and 1.0
round(dp or (same as input) round to nearest round(42.4) 42
numeric) integer
round(v numeric, numeric round to s decimal round(42.4382, 42.44
s integer) places 2)
setseed(dp) integer set seed for setseed(0.54823)1177314959
subsequent
random() calls
sign(dp or (same as input) sign of the argument sign(-8.4) -1
numeric) (-1, 0, +1)
sqrt(dp or (same as input) square root sqrt(2.0) 1.4142135623731
numeric)
trunc(dp or (same as input) truncate toward zero trunc(42.8) 42
numeric)
trunc(v numeric, numeric truncate to s trunc(42.4382, 42.43
s integer) decimal places 2)
width_bucket(op integer return the bucket to width_bucket(5.35,
3
numeric, b1 which operand 0.024, 10.06,
numeric, b2 would be assigned in 5)
numeric, count an equidepth
integer) histogram with
count buckets, an
upper bound of b1,
and a lower bound
of b2

Finally, Table 9-4 shows the available trigonometric functions. All trigonometric functions take arguments
and return values of type double precision.

Table 9-4. Trigonometric Functions

123
Chapter 9. Functions and Operators

Function Description
acos(x) inverse cosine
asin(x) inverse sine
atan(x) inverse tangent
atan2(x, y) inverse tangent of x/y
cos(x) cosine
cot(x) cotangent
sin(x) sine
tan(x) tangent

9.4. String Functions and Operators


This section describes functions and operators for examining and manipulating string values. Strings
in this context include values of all the types character, character varying, and text. Unless
otherwise noted, all of the functions listed below work on all of these types, but be wary of potential
effects of the automatic padding when using the character type. Generally, the functions described
here also work on data of non-string types by converting that data to a string representation first. Some
functions also exist natively for the bit-string types.
SQL defines some string functions with a special syntax where certain key words rather than commas are
used to separate the arguments. Details are in Table 9-5. These functions are also implemented using the
regular syntax for function invocation. (See Table 9-6.)

Table 9-5. SQL String Functions and Operators

Function Return Type Description Example Result


string || string text String ’Post’ || PostgreSQL
concatenation ’greSQL’
bit_length(string)
integer Number of bits in bit_length(’jose’)
32
string
char_length(stringinteger
) Number of char_length(’jose’)
4
or characters in string
character_length(string)

124
Chapter 9. Functions and Operators

Function Return Type Description Example Result


convert(string text Change encoding convert(’PostgreSQL’
’PostgreSQL’ in
using using specified using Unicode (UTF-8)
conversion_name) conversion name. iso_8859_1_to_utf_8)
encoding
Conversions can be
defined by CREATE
CONVERSION. Also
there are some
pre-defined
conversion names.
See Table 9-7 for
available conversion
names.
lower(string) text Convert string to lower(’TOM’) tom
lower case
octet_length(string
integer
) Number of bytes in octet_length(’jose’)
4
string
overlay(string text Replace substring overlay(’Txxxxas’
Thomas
placing string placing ’hom’
from integer from 2 for 4)
[for integer])
position(substringinteger Location of position(’om’ 3
in string) specified substring in ’Thomas’)
substring(string text Extract substring substring(’Thomas’
hom
[from integer] from 2 for 3)
[for integer])
substring(string text Extract substring substring(’Thomas’
mas
from pattern) matching POSIX from ’...$’)
regular expression
substring(string text Extract substring substring(’Thomas’
oma
from pattern matching SQL from
for escape) regular expression ’%#"o_a#"_’ for
’#’)
trim([leading | text Remove the longest trim(both ’x’ Tom
trailing | string containing from ’xTomxx’)
both] only the
[characters] characters (a
from string) space by default)
from the
start/end/both ends
of the string.
upper(string) text Convert string to upper(’tom’) TOM
uppercase

Additional string manipulation functions are available and are listed in Table 9-6. Some of them are used
internally to implement the SQL-standard string functions listed in Table 9-5.

125
Chapter 9. Functions and Operators

Table 9-6. Other String Functions

Function Return Type Description Example Result


ascii(text) integer ASCII code of the ascii(’x’) 120
first character of the
argument
btrim(string text text Remove the longest btrim(’xyxtrimyyx’,
trim
[, characters string consisting ’xy’)
text]) only of characters in
characters (a
space by default)
from the start and
end of string.
chr(integer) text Character with the chr(65) A
given ASCII code
convert(string text Convert string to convert( text_in_unicode
text, dest_encoding. represented in ISO
’text_in_unicode’,
[src_encoding The original ’UNICODE’, 8859-1 encoding
name,] encoding is specified ’LATIN1’)
dest_encoding by src_encoding.
name) If src_encoding is
omitted, database
encoding is
assumed.
decode(string bytea Decode binary data decode(’MTIzAAE=’,
123\000\001
text, type text) from string ’base64’)
previously encoded
with encode.
Parameter type is
same as in encode.
encode(data text Encode binary data encode( MTIzAAE=
bytea, type to ASCII-only ’123\\000\\001’,
text) representation. ’base64’)
Supported types are:
base64, hex,
escape.
initcap(text) text Convert the first initcap(’hi Hi Thomas
letter of each word THOMAS’)
to uppercase and the
rest to lowercase.
Words are sequences
of alphanumeric
characters separated
by
non-alphanumeric
characters.

126
Chapter 9. Functions and Operators

Function Return Type Description Example Result


length(string integer Number of length(’jose’) 4
text) characters in
string.
lpad(string text Fill up the string lpad(’hi’, 5, xyxhi
text, length to length length by ’xy’)
integer [, fill prepending the
text]) characters fill (a
space by default). If
the string is
already longer than
length then it is
truncated (on the
right).
ltrim(string text text Remove the longest ltrim(’zzzytrim’,
trim
[, characters string containing ’xyz’)
text]) only characters from
characters (a
space by default)
from the start of
string.
md5(string text) text Calculates the MD5 md5(’abc’) 900150983cd24fb0
hash of string, d6963f7d28e17f72
returning the result
in hexadecimal.
pg_client_encodingname
() Current client pg_client_encoding()
SQL_ASCII
encoding name
quote_ident(stringtext Return the given quote_ident(’Foo"Foo bar"
text) string suitably bar’)
quoted to be used as
an identifier in an
SQL statement
string. Quotes are
added only if
necessary (i.e., if the
string contains
non-identifier
characters or would
be case-folded).
Embedded quotes
are properly
doubled.

127
Chapter 9. Functions and Operators

Function Return Type Description Example Result


quote_literal(string
text Return the given quote_literal( ’O”Reilly’
text) string suitably ’O\’Reilly’)
quoted to be used as
a string literal in an
SQL statement
string. Embedded
quotes and
backslashes are
properly doubled.
repeat(string text Repeat string the repeat(’Pg’, 4) PgPgPgPg
text, number specified number of
integer) times
replace(string text Replace all replace( abXXefabXXef
text, from text, occurrences in ’abcdefabcdef’,
to text) string of substring ’cd’, ’XX’)
from with substring
to.
rpad(string text Fill up the string rpad(’hi’, 5, hixyx
text, length to length length by ’xy’)
integer [, fill appending the
text]) characters fill (a
space by default). If
the string is
already longer than
length then it is
truncated.
rtrim(string text text Remove the longest rtrim(’trimxxxx’,
trim
[, characters string containing ’x’)
text]) only characters from
characters (a
space by default)
from the end of
string.
split_part(string text Split string on split_part( def
text, delimiter delimiter and ’abc~@~def~@~ghi’,
text, field return the given field ’~@~’, 2)
integer) (counting from one)

strpos(string, text Location of strpos(’high’, 2


substring) specified substring ’ig’)
(same as
position(substring
in string), but
note the reversed
argument order)

128
Chapter 9. Functions and Operators

Function Return Type Description Example Result


substr(string, text Extract substring substr(’alphabet’,
ph
from [, count]) (same as 3, 2)
substring(string
from from for
count))
to_ascii(text [, text Convert text to to_ascii(’Karel’)
Karel
encoding]) ASCII from another
encoding a
to_hex(number text Convert number to to_hex(2147483647)
7fffffff
integer or its equivalent
bigint) hexadecimal
representation
translate(string text Any character in translate(’12345’,
a23x5
text, from text, string that ’14’, ’ax’)
to text) matches a character
in the from set is
replaced by the
corresponding
character in the to
set.
Notes: a. The to_ascii function supports conversion from LATIN1, LATIN2, LATIN9, and WIN1250 encodings only.

Table 9-7. Built-in Conversions

Conversion Name a Source Encoding Destination Encoding


ascii_to_mic SQL_ASCII MULE_INTERNAL
ascii_to_utf_8 SQL_ASCII UNICODE
big5_to_euc_tw BIG5 EUC_TW
big5_to_mic BIG5 MULE_INTERNAL
big5_to_utf_8 BIG5 UNICODE
euc_cn_to_mic EUC_CN MULE_INTERNAL
euc_cn_to_utf_8 EUC_CN UNICODE
euc_jp_to_mic EUC_JP MULE_INTERNAL
euc_jp_to_sjis EUC_JP SJIS
euc_jp_to_utf_8 EUC_JP UNICODE
euc_kr_to_mic EUC_KR MULE_INTERNAL
euc_kr_to_utf_8 EUC_KR UNICODE
euc_tw_to_big5 EUC_TW BIG5
euc_tw_to_mic EUC_TW MULE_INTERNAL
euc_tw_to_utf_8 EUC_TW UNICODE
gb18030_to_utf_8 GB18030 UNICODE
gbk_to_utf_8 GBK UNICODE

129
Chapter 9. Functions and Operators

Conversion Name a Source Encoding Destination Encoding


iso_8859_10_to_utf_8 LATIN6 UNICODE
iso_8859_13_to_utf_8 LATIN7 UNICODE
iso_8859_14_to_utf_8 LATIN8 UNICODE
iso_8859_15_to_utf_8 LATIN9 UNICODE
iso_8859_16_to_utf_8 LATIN10 UNICODE
iso_8859_1_to_mic LATIN1 MULE_INTERNAL
iso_8859_1_to_utf_8 LATIN1 UNICODE
iso_8859_2_to_mic LATIN2 MULE_INTERNAL
iso_8859_2_to_utf_8 LATIN2 UNICODE
iso_8859_2_to_windows_1250LATIN2 WIN1250

iso_8859_3_to_mic LATIN3 MULE_INTERNAL


iso_8859_3_to_utf_8 LATIN3 UNICODE
iso_8859_4_to_mic LATIN4 MULE_INTERNAL
iso_8859_4_to_utf_8 LATIN4 UNICODE
iso_8859_5_to_koi8_r ISO_8859_5 KOI8
iso_8859_5_to_mic ISO_8859_5 MULE_INTERNAL
iso_8859_5_to_utf_8 ISO_8859_5 UNICODE
iso_8859_5_to_windows_1251ISO_8859_5 WIN

iso_8859_5_to_windows_866 ISO_8859_5 ALT


iso_8859_6_to_utf_8 ISO_8859_6 UNICODE
iso_8859_7_to_utf_8 ISO_8859_7 UNICODE
iso_8859_8_to_utf_8 ISO_8859_8 UNICODE
iso_8859_9_to_utf_8 LATIN5 UNICODE
johab_to_utf_8 JOHAB UNICODE
koi8_r_to_iso_8859_5 KOI8 ISO_8859_5
koi8_r_to_mic KOI8 MULE_INTERNAL
koi8_r_to_utf_8 KOI8 UNICODE
koi8_r_to_windows_1251 KOI8 WIN
koi8_r_to_windows_866 KOI8 ALT
mic_to_ascii MULE_INTERNAL SQL_ASCII
mic_to_big5 MULE_INTERNAL BIG5
mic_to_euc_cn MULE_INTERNAL EUC_CN
mic_to_euc_jp MULE_INTERNAL EUC_JP
mic_to_euc_kr MULE_INTERNAL EUC_KR
mic_to_euc_tw MULE_INTERNAL EUC_TW
mic_to_iso_8859_1 MULE_INTERNAL LATIN1
mic_to_iso_8859_2 MULE_INTERNAL LATIN2

130
Chapter 9. Functions and Operators

Conversion Name a Source Encoding Destination Encoding


mic_to_iso_8859_3 MULE_INTERNAL LATIN3
mic_to_iso_8859_4 MULE_INTERNAL LATIN4
mic_to_iso_8859_5 MULE_INTERNAL ISO_8859_5
mic_to_koi8_r MULE_INTERNAL KOI8
mic_to_sjis MULE_INTERNAL SJIS
mic_to_windows_1250 MULE_INTERNAL WIN1250
mic_to_windows_1251 MULE_INTERNAL WIN
mic_to_windows_866 MULE_INTERNAL ALT
sjis_to_euc_jp SJIS EUC_JP
sjis_to_mic SJIS MULE_INTERNAL
sjis_to_utf_8 SJIS UNICODE
tcvn_to_utf_8 TCVN UNICODE
uhc_to_utf_8 UHC UNICODE
utf_8_to_ascii UNICODE SQL_ASCII
utf_8_to_big5 UNICODE BIG5
utf_8_to_euc_cn UNICODE EUC_CN
utf_8_to_euc_jp UNICODE EUC_JP
utf_8_to_euc_kr UNICODE EUC_KR
utf_8_to_euc_tw UNICODE EUC_TW
utf_8_to_gb18030 UNICODE GB18030
utf_8_to_gbk UNICODE GBK
utf_8_to_iso_8859_1 UNICODE LATIN1
utf_8_to_iso_8859_10 UNICODE LATIN6
utf_8_to_iso_8859_13 UNICODE LATIN7
utf_8_to_iso_8859_14 UNICODE LATIN8
utf_8_to_iso_8859_15 UNICODE LATIN9
utf_8_to_iso_8859_16 UNICODE LATIN10
utf_8_to_iso_8859_2 UNICODE LATIN2
utf_8_to_iso_8859_3 UNICODE LATIN3
utf_8_to_iso_8859_4 UNICODE LATIN4
utf_8_to_iso_8859_5 UNICODE ISO_8859_5
utf_8_to_iso_8859_6 UNICODE ISO_8859_6
utf_8_to_iso_8859_7 UNICODE ISO_8859_7
utf_8_to_iso_8859_8 UNICODE ISO_8859_8
utf_8_to_iso_8859_9 UNICODE LATIN5
utf_8_to_johab UNICODE JOHAB
utf_8_to_koi8_r UNICODE KOI8
utf_8_to_sjis UNICODE SJIS

131
Chapter 9. Functions and Operators

Conversion Name a Source Encoding Destination Encoding


utf_8_to_tcvn UNICODE TCVN
utf_8_to_uhc UNICODE UHC
utf_8_to_windows_1250 UNICODE WIN1250
utf_8_to_windows_1251 UNICODE WIN
utf_8_to_windows_1256 UNICODE WIN1256
utf_8_to_windows_866 UNICODE ALT
utf_8_to_windows_874 UNICODE WIN874
windows_1250_to_iso_8859_2WIN1250 LATIN2

windows_1250_to_mic WIN1250 MULE_INTERNAL


windows_1250_to_utf_8 WIN1250 UNICODE
windows_1251_to_iso_8859_5WIN ISO_8859_5

windows_1251_to_koi8_r WIN KOI8


windows_1251_to_mic WIN MULE_INTERNAL
windows_1251_to_utf_8 WIN UNICODE
windows_1251_to_windows_866
WIN ALT

windows_1256_to_utf_8 WIN1256 UNICODE


windows_866_to_iso_8859_5 ALT ISO_8859_5
windows_866_to_koi8_r ALT KOI8
windows_866_to_mic ALT MULE_INTERNAL
windows_866_to_utf_8 ALT UNICODE
windows_866_to_windows_1251
ALT WIN

windows_874_to_utf_8 WIN874 UNICODE


Notes: a. The conversion names follow a standard naming scheme: The official name of the source encoding with all non-

9.5. Binary String Functions and Operators


This section describes functions and operators for examining and manipulating values of type bytea.
SQL defines some string functions with a special syntax where certain key words rather than commas are
used to separate the arguments. Details are in Table 9-8. Some functions are also implemented using the
regular syntax for function invocation. (See Table 9-9.)

Table 9-8. SQL Binary String Functions and Operators

Function Return Type Description Example Result

132
Chapter 9. Functions and Operators

Function Return Type Description Example Result


string || string bytea String ’\\\\Post’::bytea
\\Post’gres\000
concatenation ||
’\\047gres\\000’::bytea

octet_length(string
integer
) Number of bytes in octet_length( 5
binary string ’jo\\000se’::bytea)

position(substringinteger Location of position(’\\000om’::bytea


3
in string) specified substring in
’Th\\000omas’::bytea)

substring(string bytea Extract substring substring(’Th\\000omas’::bytea


h\000o
[from integer] from 2 for 3)
[for integer])
trim([both] bytea Remove the longest trim(’\\000’::bytea
Tom
bytes from string containing from
string) only the bytes in ’\\000Tom\\000’::bytea)
bytes from the start
and end of string
get_byte(string, integer Extract byte from get_byte(’Th\\000omas’::bytea,
109
offset) string. 4)
set_byte(string, bytea Set byte in string. set_byte(’Th\\000omas’::bytea,
Th\000o@as
offset, newvalue) 4, 64)

get_bit(string, integer Extract bit from get_bit(’Th\\000omas’::bytea,


1
offset) string. 45)
set_bit(string, bytea Set bit in string. set_bit(’Th\\000omas’::bytea,
Th\000omAs
offset, newvalue) 45, 0)

Additional binary string manipulation functions are available and are listed in Table 9-9. Some of them
are used internally to implement the SQL-standard string functions listed in Table 9-8.

Table 9-9. Other Binary String Functions

Function Return Type Description Example Result


btrim(string bytea Remove the longest btrim(’\\000trim\\000’::bytea,
trim
bytea, bytes string consisting ’\\000’::bytea)
bytea) only of bytes in
bytes from the start
and end of string.
length(string) integer Length of binary length(’jo\\000se’::bytea)
5
string

133
Chapter 9. Functions and Operators

Function Return Type Description Example Result


decode(string bytea Decode binary decode(’123\\000456’,
123\000456
text, type text) string from string ’escape’)
previously encoded
with encode.
Parameter type is
same as in encode.
encode(string text Encode binary encode(’123\\000456’::bytea,
123\000456
bytea, type string to ASCII-only ’escape’)
text) representation.
Supported types are:
base64, hex,
escape.

9.6. Bit String Functions and Operators


This section describes functions and operators for examining and manipulating bit strings, that is values
of the types bit and bit varying. Aside from the usual comparison operators, the operators shown in
Table 9-10 can be used. Bit string operands of &, |, and # must be of equal length. When bit shifting, the
original length of the string is preserved, as shown in the examples.

Table 9-10. Bit String Operators

Operator Description Example Result


|| concatenation B’10001’ || B’011’ 10001011
& bitwise AND B’10001’ & B’01101’ 00001
| bitwise OR B’10001’ | B’01101’ 11101
# bitwise XOR B’10001’ # B’01101’ 11100
~ bitwise NOT ~ B’10001’ 01110
<< bitwise shift left B’10001’ << 3 01000
>> bitwise shift right B’10001’ >> 2 00100

The following SQL-standard functions work on bit strings as well as character strings: length,
bit_length, octet_length, position, substring.

In addition, it is possible to cast integral values to and from type bit. Some examples:

44::bit(10) 0000101100
44::bit(3) 100
cast(-44 as bit(12)) 111111010100
’1110’::bit(4)::integer 14

Note that casting to just “bit” means casting to bit(1), and so it will deliver only the least significant bit
of the integer.

134
Chapter 9. Functions and Operators

Note: Prior to PostgreSQL 8.0, casting an integer to bit(n) would copy the leftmost n bits of the
integer, whereas now it copies the rightmost n bits. Also, casting an integer to a bit string width wider
than the integer itself will sign-extend on the left.

9.7. Pattern Matching


There are three separate approaches to pattern matching provided by PostgreSQL: the traditional SQL
LIKE operator, the more recent SIMILAR TO operator (added in SQL:1999), and POSIX-style regular
expressions. Additionally, a pattern matching function, substring, is available, using either SIMILAR
TO-style or POSIX-style regular expressions.

Tip: If you have pattern matching needs that go beyond this, consider writing a user-defined function
in Perl or Tcl.

9.7.1. LIKE
string LIKE pattern [ESCAPE escape-character]
string NOT LIKE pattern [ESCAPE escape-character]

Every pattern defines a set of strings. The LIKE expression returns true if the string is contained in
the set of strings represented by pattern. (As expected, the NOT LIKE expression returns false if LIKE
returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern).)
If pattern does not contain percent signs or underscore, then the pattern only represents the string itself;
in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any
single character; a percent sign (%) matches any string of zero or more characters.
Some examples:

’abc’ LIKE ’abc’ true


’abc’ LIKE ’a%’ true
’abc’ LIKE ’_b_’ true
’abc’ LIKE ’c’ false

LIKE pattern matches always cover the entire string. To match a sequence anywhere within a string, the
pattern must therefore start and end with a percent sign.
To match a literal underscore or percent sign without matching other characters, the respective character
in pattern must be preceded by the escape character. The default escape character is the backslash but
a different one may be selected by using the ESCAPE clause. To match the escape character itself, write
two escape characters.
Note that the backslash already has a special meaning in string literals, so to write a pattern constant that
contains a backslash you must write two backslashes in an SQL statement. Thus, writing a pattern that
actually matches a literal backslash means writing four backslashes in the statement. You can avoid this

135
Chapter 9. Functions and Operators

by selecting a different escape character with ESCAPE; then a backslash is not special to LIKE anymore.
(But it is still special to the string literal parser, so you still need two of them.)
It’s also possible to select no escape character by writing ESCAPE ”. This effectively disables the escape
mechanism, which makes it impossible to turn off the special meaning of underscore and percent signs in
the pattern.
The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the
active locale. This is not in the SQL standard but is a PostgreSQL extension.
The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~*
operators that represent NOT LIKE and NOT ILIKE, respectively. All of these operators are PostgreSQL-
specific.

9.7.2. SIMILAR TO Regular Expressions


string SIMILAR TO pattern [ESCAPE escape-character]
string NOT SIMILAR TO pattern [ESCAPE escape-character]

The SIMILAR TO operator returns true or false depending on whether its pattern matches the given string.
It is much like LIKE, except that it interprets the pattern using the SQL standard’s definition of a regular
expression. SQL regular expressions are a curious cross between LIKE notation and common regular
expression notation.
Like LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string; this is unlike
common regular expression practice, wherein the pattern may match any part of the string. Also like
LIKE, SIMILAR TO uses _ and % as wildcard characters denoting any single character and any string,
respectively (these are comparable to . and .* in POSIX regular expressions).
In addition to these facilities borrowed from LIKE, SIMILAR TO supports these pattern-matching
metacharacters borrowed from POSIX regular expressions:

• | denotes alternation (either of two alternatives).


• * denotes repetition of the previous item zero or more times.
• + denotes repetition of the previous item one or more times.
• Parentheses () may be used to group items into a single logical item.
• A bracket expression [...] specifies a character class, just as in POSIX regular expressions.
Notice that bounded repetition (? and {...}) are not provided, though they exist in POSIX. Also, the dot
(.) is not a metacharacter.
As with LIKE, a backslash disables the special meaning of any of these metacharacters; or a different
escape character can be specified with ESCAPE.
Some examples:

’abc’ SIMILAR TO ’abc’ true


’abc’ SIMILAR TO ’a’ false
’abc’ SIMILAR TO ’%(b|d)%’ true
’abc’ SIMILAR TO ’(b|c)%’ false

136
Chapter 9. Functions and Operators

The substring function with three parameters, substring(string from pattern for
escape-character), provides extraction of a substring that matches an SQL regular expression
pattern. As with SIMILAR TO, the specified pattern must match to the entire data string, else the function
fails and returns null. To indicate the part of the pattern that should be returned on success, the pattern
must contain two occurrences of the escape character followed by a double quote ("). The text matching
the portion of the pattern between these markers is returned.
Some examples:

substring(’foobar’ from ’%#"o_b#"%’ for ’#’) oob


substring(’foobar’ from ’#"o_b#"%’ for ’#’) NULL

9.7.3. POSIX Regular Expressions


Table 9-11 lists the available operators for pattern matching using POSIX regular expressions.

Table 9-11. Regular Expression Match Operators

Operator Description Example


~ Matches regular expression, case ’thomas’ ~ ’.*thomas.*’
sensitive
~* Matches regular expression, case ’thomas’ ~* ’.*Thomas.*’
insensitive
!~ Does not match regular ’thomas’ !~ ’.*Thomas.*’
expression, case sensitive
!~* Does not match regular ’thomas’ !~* ’.*vadim.*’
expression, case insensitive

POSIX regular expressions provide a more powerful means for pattern matching than the LIKE and
SIMILAR TO operators. Many Unix tools such as egrep, sed, or awk use a pattern matching language
that is similar to the one described here.
A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular
set). A string is said to match a regular expression if it is a member of the regular set described by the
regular expression. As with LIKE, pattern characters match string characters exactly unless they are special
characters in the regular expression language — but regular expressions use different special characters
than LIKE does. Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string,
unless the regular expression is explicitly anchored to the beginning or end of the string.
Some examples:

’abc’ ~ ’abc’ true


’abc’ ~ ’^a’ true
’abc’ ~ ’(b|d)’ true
’abc’ ~ ’^(b|c)’ false

137
Chapter 9. Functions and Operators

The substring function with two parameters, substring(string from pattern), provides extrac-
tion of a substring that matches a POSIX regular expression pattern. It returns null if there is no match,
otherwise the portion of the text that matched the pattern. But if the pattern contains any parentheses,
the portion of the text that matched the first parenthesized subexpression (the one whose left parenthe-
sis comes first) is returned. You can put parentheses around the whole expression if you want to use
parentheses within it without triggering this exception. If you need parentheses in the pattern before the
subexpression you want to extract, see the non-capturing parentheses described below.
Some examples:

substring(’foobar’ from ’o.b’) oob


substring(’foobar’ from ’o(.)b’) o

PostgreSQL’s regular expressions are implemented using a package written by Henry Spencer. Much of
the description of regular expressions below is copied verbatim from his manual entry.

9.7.3.1. Regular Expression Details


Regular expressions (REs), as defined in POSIX 1003.2, come in two forms: extended REs or EREs
(roughly those of egrep), and basic REs or BREs (roughly those of ed). PostgreSQL supports both
forms, and also implements some extensions that are not in the POSIX standard, but have become widely
used anyway due to their availability in programming languages such as Perl and Tcl. REs using these
non-POSIX extensions are called advanced REs or AREs in this documentation. AREs are almost an
exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more
limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then
describe how BREs differ.

Note: The form of regular expressions accepted by PostgreSQL can be chosen by setting the
regex_flavor run-time parameter. The usual setting is advanced, but one might choose extended for
maximum backwards compatibility with pre-7.4 releases of PostgreSQL.

A regular expression is defined as one or more branches, separated by |. It matches anything that matches
one of the branches.
A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first,
followed by a match for the second, etc; an empty branch matches the empty string.
A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a
match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can
be any of the possibilities shown in Table 9-12. The possible quantifiers and their meanings are shown in
Table 9-13.
A constraint matches an empty string, but matches only when specific conditions are met. A constraint
can be used where an atom could be used, except it may not be followed by a quantifier. The simple
constraints are shown in Table 9-14; some more constraints are described later.

138
Chapter 9. Functions and Operators

Table 9-12. Regular Expression Atoms

Atom Description
(re) (where re is any regular expression) matches a
match for re, with the match noted for possible
reporting
(?:re) as above, but the match is not noted for reporting (a
“non-capturing” set of parentheses) (AREs only)
. matches any single character
[chars] a bracket expression, matching any one of the
chars (see Section 9.7.3.2 for more detail)
\k (where k is a non-alphanumeric character) matches
that character taken as an ordinary character, e.g. \\
matches a backslash character
\c where c is alphanumeric (possibly followed by
other characters) is an escape, see Section 9.7.3.3
(AREs only; in EREs and BREs, this matches c)
{ when followed by a character other than a digit,
matches the left-brace character {; when followed
by a digit, it is the beginning of a bound (see
below)
x where x is a single character with no other
significance, matches that character

An RE may not end with \.

Note: Remember that the backslash (\) already has a special meaning in PostgreSQL string literals.
To write a pattern constant that contains a backslash, you must write two backslashes in the statement.

Table 9-13. Regular Expression Quantifiers

Quantifier Matches
* a sequence of 0 or more matches of the atom
+ a sequence of 1 or more matches of the atom
? a sequence of 0 or 1 matches of the atom
{m } a sequence of exactly m matches of the atom
{m,} a sequence of m or more matches of the atom
{m,n} a sequence of m through n (inclusive) matches of
the atom; m may not exceed n
*? non-greedy version of *
+? non-greedy version of +
?? non-greedy version of ?

139
Chapter 9. Functions and Operators

Quantifier Matches
{m}? non-greedy version of {m}
{m,}? non-greedy version of {m,}
{m,n}? non-greedy version of {m,n}

The forms using {...} are known as bounds. The numbers m and n within a bound are unsigned decimal
integers with permissible values from 0 to 255 inclusive.
Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding nor-
mal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See
Section 9.7.3.5 for more detail.

Note: A quantifier cannot immediately follow another quantifier. A quantifier cannot begin an expres-
sion or subexpression or follow ^ or |.

Table 9-14. Regular Expression Constraints

Constraint Description
^ matches at the beginning of the string
$ matches at the end of the string
(?=re) positive lookahead matches at any point where a
substring matching re begins (AREs only)
(?!re) negative lookahead matches at any point where no
substring matching re begins (AREs only)

Lookahead constraints may not contain back references (see Section 9.7.3.3), and all parentheses within
them are considered non-capturing.

9.7.3.2. Bracket Expressions


A bracket expression is a list of characters enclosed in []. It normally matches any single character from
the list (but see below). If the list begins with ^, it matches any single character not from the rest of
the list. If two characters in the list are separated by -, this is shorthand for the full range of characters
between those two (inclusive) in the collating sequence, e.g. [0-9] in ASCII matches any decimal digit. It
is illegal for two ranges to share an endpoint, e.g. a-c-e. Ranges are very collating-sequence-dependent,
so portable programs should avoid relying on them.
To include a literal ] in the list, make it the first character (following a possible ^). To include a literal -,
make it the first or last character, or the second endpoint of a range. To use a literal - as the first endpoint
of a range, enclose it in [. and .] to make it a collating element (see below). With the exception of these
characters, some combinations using [ (see next paragraphs), and escapes (AREs only), all other special
characters lose their special significance within a bracket expression. In particular, \ is not special when
following ERE or BRE rules, though it is special (as introducing an escape) in AREs.
Within a bracket expression, a collating element (a character, a multiple-character sequence that collates as
if it were a single character, or a collating-sequence name for either) enclosed in [. and .] stands for the

140
Chapter 9. Functions and Operators

sequence of characters of that collating element. The sequence is a single element of the bracket expres-
sion’s list. A bracket expression containing a multiple-character collating element can thus match more
than one character, e.g. if the collating sequence includes a ch collating element, then the RE [[.ch.]]*c
matches the first five characters of chchcc.

Note: PostgreSQL currently has no multi-character collating elements. This information describes
possible future behavior.

Within a bracket expression, a collating element enclosed in [= and =] is an equivalence class, standing
for the sequences of characters of all collating elements equivalent to that one, including itself. (If there
are no other equivalent collating elements, the treatment is as if the enclosing delimiters were [. and .].)
For example, if o and ^ are the members of an equivalence class, then [[=o=]], [[=^=]], and [o^] are
all synonymous. An equivalence class may not be an endpoint of a range.
Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of
all characters belonging to that class. Standard character class names are: alnum, alpha, blank, cntrl,
digit, graph, lower, print, punct, space, upper, xdigit. These stand for the character classes
defined in ctype. A locale may provide others. A character class may not be used as an endpoint of a
range.
There are two special cases of bracket expressions: the bracket expressions [[:<:]] and [[:>:]] are
constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as
a sequence of word characters that is neither preceded nor followed by word characters. A word character
is an alnum character (as defined by ctype) or an underscore. This is an extension, compatible with but
not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to
other systems. The constraint escapes described below are usually preferable (they are no more standard,
but are certainly easier to type).

9.7.3.3. Regular Expression Escapes


Escapes are special sequences beginning with \ followed by an alphanumeric character. Escapes come in
several varieties: character entry, class shorthands, constraint escapes, and back references. A \ followed
by an alphanumeric character but not constituting a valid escape is illegal in AREs. In EREs, there are no
escapes: outside a bracket expression, a \ followed by an alphanumeric character merely stands for that
character as an ordinary character, and inside a bracket expression, \ is an ordinary character. (The latter
is the one actual incompatibility between EREs and AREs.)
Character-entry escapes exist to make it easier to specify non-printing and otherwise inconvenient char-
acters in REs. They are shown in Table 9-15.
Class-shorthand escapes provide shorthands for certain commonly-used character classes. They are
shown in Table 9-16.
A constraint escape is a constraint, matching the empty string if specific conditions are met, written as an
escape. They are shown in Table 9-17.
A back reference (\n) matches the same string matched by the previous parenthesized subexpression
specified by the number n (see Table 9-18). For example, ([bc])\1 matches bb or cc but not bc or cb.
The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in
the order of their leading parentheses. Non-capturing parentheses do not define subexpressions.

141
Chapter 9. Functions and Operators

Note: Keep in mind that an escape’s leading \ will need to be doubled when entering the pattern as
an SQL string constant. For example:

’123’ ~ ’^\\d{3}’ true

Table 9-15. Regular Expression Character-Entry Escapes

Escape Description
\a alert (bell) character, as in C
\b backspace, as in C
\B synonym for \ to help reduce the need for
backslash doubling
\cX (where X is any character) the character whose
low-order 5 bits are the same as those of X, and
whose other bits are all zero
\e the character whose collating-sequence name is
ESC, or failing that, the character with octal value
033
\f form feed, as in C
\n newline, as in C
\r carriage return, as in C
\t horizontal tab, as in C
\uwxyz (where wxyz is exactly four hexadecimal digits)
the Unicode character U+wxyz in the local byte
ordering
\Ustuvwxyz (where stuvwxyz is exactly eight hexadecimal
digits) reserved for a somewhat-hypothetical
Unicode extension to 32 bits
\v vertical tab, as in C
\xhhh (where hhh is any sequence of hexadecimal digits)
the character whose hexadecimal value is 0xhhh (a
single character no matter how many hexadecimal
digits are used)
\0 the character whose value is 0
\xy (where xy is exactly two octal digits, and is not a
back reference) the character whose octal value is
0xy
\xyz (where xyz is exactly three octal digits, and is not
a back reference) the character whose octal value is
0xyz

Hexadecimal digits are 0-9, a-f, and A-F. Octal digits are 0-7.

142
Chapter 9. Functions and Operators

The character-entry escapes are always taken as ordinary characters. For example, \135 is ] in ASCII,
but \135 does not terminate a bracket expression.

Table 9-16. Regular Expression Class-Shorthand Escapes

Escape Description
\d [[:digit:]]
\s [[:space:]]
\w [[:alnum:]_] (note underscore is included)
\D [^[:digit:]]
\S [^[:space:]]
\W [^[:alnum:]_] (note underscore is included)

Within bracket expressions, \d, \s, and \w lose their outer brackets, and \D, \S, and \W are illegal.
(So, for example, [a-c\d] is equivalent to [a-c[:digit:]]. Also, [a-c\D], which is equivalent to
[a-c^[:digit:]], is illegal.)

Table 9-17. Regular Expression Constraint Escapes

Escape Description
\A matches only at the beginning of the string (see
Section 9.7.3.5 for how this differs from ^)
\m matches only at the beginning of a word
\M matches only at the end of a word
\y matches only at the beginning or end of a word
\Y matches only at a point that is not the beginning or
end of a word
\Z matches only at the end of the string (see Section
9.7.3.5 for how this differs from $)

A word is defined as in the specification of [[:<:]] and [[:>:]] above. Constraint escapes are illegal
within bracket expressions.

Table 9-18. Regular Expression Back References

Escape Description
\m (where m is a nonzero digit) a back reference to the
m’th subexpression
\mnn (where m is a nonzero digit, and nn is some more
digits, and the decimal value mnn is not greater than
the number of closing capturing parentheses seen so
far) a back reference to the mnn’th subexpression

Note: There is an inherent historical ambiguity between octal character-entry escapes and back ref-
erences, which is resolved by heuristics, as hinted at above. A leading zero always indicates an octal

143
Chapter 9. Functions and Operators

escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A
multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable
subexpression (i.e. the number is in the legal range for a back reference), and otherwise is taken as
octal.

9.7.3.4. Regular Expression Metasyntax


In addition to the main syntax described above, there are some special forms and miscellaneous syntactic
facilities available.
Normally the flavor of RE being used is determined by regex_flavor. However, this can be overridden
by a director prefix. If an RE begins with ***:, the rest of the RE is taken as an ARE regardless of
regex_flavor. If an RE begins with ***=, the rest of the RE is taken to be a literal string, with all
characters considered ordinary characters.
An ARE may begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic
characters) specifies options affecting the rest of the RE. These options override any previously determined
options (including both the RE flavor and case sensitivity). The available option letters are shown in Table
9-19.

Table 9-19. ARE Embedded-Option Letters

Option Description
b rest of RE is a BRE
c case-sensitive matching (overrides operator type)
e rest of RE is an ERE
i case-insensitive matching (see Section 9.7.3.5)
(overrides operator type)
m historical synonym for n
n newline-sensitive matching (see Section 9.7.3.5)
p partial newline-sensitive matching (see Section
9.7.3.5)
q rest of RE is a literal (“quoted”) string, all ordinary
characters
s non-newline-sensitive matching (default)
t tight syntax (default; see below)
w inverse partial newline-sensitive (“weird”)
matching (see Section 9.7.3.5)
x expanded syntax (see below)

Embedded options take effect at the ) terminating the sequence. They may appear only at the start of an
ARE (after the ***: director if any).
In addition to the usual (tight) RE syntax, in which all characters are significant, there is an expanded
syntax, available by specifying the embedded x option. In the expanded syntax, white-space characters in
the RE are ignored, as are all characters between a # and the following newline (or the end of the RE).

144
Chapter 9. Functions and Operators

This permits paragraphing and commenting a complex RE. There are three exceptions to that basic rule:

• a white-space character or # preceded by \ is retained


• white space or # within a bracket expression is retained
• white space and comments cannot appear within multi-character symbols, such as (?:
For this purpose, white-space characters are blank, tab, newline, and any character that belongs to the
space character class.
Finally, in an ARE, outside bracket expressions, the sequence (?#ttt) (where ttt is any text not
containing a )) is a comment, completely ignored. Again, this is not allowed between the characters
of multi-character symbols, like (?:. Such comments are more a historical artifact than a useful facility,
and their use is deprecated; use the expanded syntax instead.
None of these metasyntax extensions is available if an initial ***= director has specified that the user’s
input be treated as a literal string rather than as an RE.

9.7.3.5. Regular Expression Matching Rules


In the event that an RE could match more than one substring of a given string, the RE matches the one
starting earliest in the string. If the RE could match more than one substring starting at that point, either
the longest possible match or the shortest possible match will be taken, depending on whether the RE is
greedy or non-greedy.
Whether an RE is greedy or not is determined by the following rules:

• Most atoms, and all constraints, have no greediness attribute (because they cannot match variable
amounts of text anyway).
• Adding parentheses around an RE does not change its greediness.
• A quantified atom with a fixed-repetition quantifier ({m} or {m}?) has the same greediness (possibly
none) as the atom itself.
• A quantified atom with other normal quantifiers (including {m,n} with m equal to n) is greedy (prefers
longest match).
• A quantified atom with a non-greedy quantifier (including {m,n}? with m equal to n) is non-greedy
(prefers shortest match).
• A branch — that is, an RE that has no top-level | operator — has the same greediness as the first
quantified atom in it that has a greediness attribute.
• An RE consisting of two or more branches connected by the | operator is always greedy.

The above rules associate greediness attributes not only with individual quantified atoms, but with
branches and entire REs that contain quantified atoms. What that means is that the matching is done in
such a way that the branch, or whole RE, matches the longest or shortest possible substring as a whole.
Once the length of the entire match is determined, the part of it that matches any particular subexpression
is determined on the basis of the greediness attribute of that subexpression, with subexpressions starting
earlier in the RE taking priority over ones starting later.

145
Chapter 9. Functions and Operators

An example of what this means:

SELECT SUBSTRING(’XY1234Z’, ’Y*([0-9]{1,3})’);


Result: 123
SELECT SUBSTRING(’XY1234Z’, ’Y*?([0-9]{1,3})’);
Result: 1

In the first case, the RE as a whole is greedy because Y* is greedy. It can match beginning at the Y, and it
matches the longest possible string starting there, i.e., Y123. The output is the parenthesized part of that,
or 123. In the second case, the RE as a whole is non-greedy because Y*? is non-greedy. It can match
beginning at the Y, and it matches the shortest possible string starting there, i.e., Y1. The subexpression
[0-9]{1,3} is greedy but it cannot change the decision as to the overall match length; so it is forced to
match just 1.
In short, when an RE contains both greedy and non-greedy subexpressions, the total match length is
either as long as possible or as short as possible, according to the attribute assigned to the whole RE. The
attributes assigned to the subexpressions only affect how much of that match they are allowed to “eat”
relative to each other.
The quantifiers {1,1} and {1,1}? can be used to force greediness or non-greediness, respectively, on a
subexpression or a whole RE.
Match lengths are measured in characters, not collating elements. An empty string is considered
longer than no match at all. For example: bb* matches the three middle characters of abbbc;
(week|wee)(night|knights) matches all ten characters of weeknights; when (.*).* is matched
against abc the parenthesized subexpression matches all three characters; and when (a*)* is matched
against bc both the whole RE and the parenthesized subexpression match an empty string.
If case-independent matching is specified, the effect is much as if all case distinctions had vanished from
the alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside
a bracket expression, it is effectively transformed into a bracket expression containing both cases, e.g. x
becomes [xX]. When it appears inside a bracket expression, all case counterparts of it are added to the
bracket expression, e.g. [x] becomes [xX] and [^x] becomes [^xX].
If newline-sensitive matching is specified, . and bracket expressions using ^ will never match the newline
character (so that matches will never cross newlines unless the RE explicitly arranges it) and ^and $ will
match the empty string after and before a newline respectively, in addition to matching at beginning and
end of string respectively. But the ARE escapes \A and \Z continue to match beginning or end of string
only.
If partial newline-sensitive matching is specified, this affects . and bracket expressions as with newline-
sensitive matching, but not ^ and $.
If inverse partial newline-sensitive matching is specified, this affects ^ and $ as with newline-sensitive
matching, but not . and bracket expressions. This isn’t very useful but is provided for symmetry.

9.7.3.6. Limits and Compatibility


No particular limit is imposed on the length of REs in this implementation. However, programs intended to
be highly portable should not employ REs longer than 256 bytes, as a POSIX-compliant implementation
can refuse to accept such REs.

146
Chapter 9. Functions and Operators

The only feature of AREs that is actually incompatible with POSIX EREs is that \ does not lose its
special significance inside bracket expressions. All other ARE features use syntax which is illegal or
has undefined or unspecified effects in POSIX EREs; the *** syntax of directors likewise is outside the
POSIX syntax for both BREs and EREs.
Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a
few Perl extensions are not present. Incompatibilities of note include \b, \B, the lack of special treatment
for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-
sensitive matching, the restrictions on parentheses and back references in lookahead constraints, and the
longest/shortest-match (rather than first-match) matching semantics.
Two significant incompatibilities exist between AREs and the ERE syntax recognized by pre-7.4 releases
of PostgreSQL:

• In AREs, \ followed by an alphanumeric character is either an escape or an error, while in previous


releases, it was just another way of writing the alphanumeric. This should not be much of a problem
because there was no reason to write such a sequence in earlier releases.
• In AREs, \ remains a special character within [], so a literal \ within a bracket expression must be
written \\.
While these differences are unlikely to create a problem for most applications, you can avoid them if
necessary by setting regex_flavor to extended.

9.7.3.7. Basic Regular Expressions


BREs differ from EREs in several respects. |, +, and ? are ordinary characters and there is no equivalent
for their functionality. The delimiters for bounds are \{ and \}, with { and } by themselves ordinary
characters. The parentheses for nested subexpressions are \( and \), with ( and ) by themselves ordinary
characters. ^ is an ordinary character except at the beginning of the RE or the beginning of a parenthe-
sized subexpression, $ is an ordinary character except at the end of the RE or the end of a parenthesized
subexpression, and * is an ordinary character if it appears at the beginning of the RE or the beginning
of a parenthesized subexpression (after a possible leading ^). Finally, single-digit back references are
available, and \< and \> are synonyms for [[:<:]] and [[:>:]] respectively; no other escapes are
available.

9.8. Data Type Formatting Functions


The PostgreSQL formatting functions provide a powerful set of tools for converting various data types
(date/time, integer, floating point, numeric) to formatted strings and for converting from formatted strings
to specific data types. Table 9-20 lists them. These functions all follow a common calling convention: the
first argument is the value to be formatted and the second argument is a template that defines the output
or input format.

Table 9-20. Formatting Functions

Function Return Type Description Example

147
Chapter 9. Functions and Operators

Function Return Type Description Example


to_char(timestamp, text convert time stamp to to_char(current_timestamp,
text) string ’HH12:MI:SS’)
to_char(interval, text convert interval to string to_char(interval
text) ’15h 2m 12s’,
’HH24:MI:SS’)
to_char(int, text) text convert integer to string to_char(125, ’999’)
to_char(double text convert real/double to_char(125.8::real,
precision, text) precision to string ’999D9’)
to_char(numeric, text convert numeric to string to_char(-125.8,
text) ’999D99S’)
to_date(text, text) date convert string to date to_date(’05 Dec 2000’,
’DD Mon YYYY’)
to_timestamp(text, timestamp with time convert string to time to_timestamp(’05 Dec 2000’,
text) zone stamp ’DD Mon YYYY’)
to_number(text, text) numeric convert string to numeric to_number(’12,454.8-’,
’99G999D9S’)

Warning: to_char(interval, text) is deprecated and should not be used in newly-written code. It will
be removed in the next version.
In an output template string (for to_char), there are certain patterns that are recognized and replaced
with appropriately-formatted data from the value to be formatted. Any text that is not a template pattern
is simply copied verbatim. Similarly, in an input template string (for anything but to_char), template
patterns identify the parts of the input data string to be looked at and the values to be found there.
Table 9-21 shows the template patterns available for formatting date and time values.

Table 9-21. Template Patterns for Date/Time Formatting

Pattern Description
HH hour of day (01-12)
HH12 hour of day (01-12)
HH24 hour of day (00-23)
MI minute (00-59)
SS second (00-59)
MS millisecond (000-999)
US microsecond (000000-999999)
SSSS seconds past midnight (0-86399)
AM or A.M. or PM or P.M. meridian indicator (uppercase)
am or a.m. or pm or p.m. meridian indicator (lowercase)
Y,YYY year (4 and more digits) with comma
YYYY year (4 and more digits)
YYY last 3 digits of year

148
Chapter 9. Functions and Operators

Pattern Description
YY last 2 digits of year
Y last digit of year
IYYY ISO year (4 and more digits)
IYY last 3 digits of ISO year
IY last 2 digits of ISO year
I last digits of ISO year
BC or B.C. or AD or A.D. era indicator (uppercase)
bc or b.c. or ad or a.d. era indicator (lowercase)
MONTH full uppercase month name (blank-padded to 9
chars)
Month full mixed-case month name (blank-padded to 9
chars)
month full lowercase month name (blank-padded to 9
chars)
MON abbreviated uppercase month name (3 chars)
Mon abbreviated mixed-case month name (3 chars)
mon abbreviated lowercase month name (3 chars)
MM month number (01-12)
DAY full uppercase day name (blank-padded to 9 chars)
Day full mixed-case day name (blank-padded to 9 chars)
day full lowercase day name (blank-padded to 9 chars)
DY abbreviated uppercase day name (3 chars)
Dy abbreviated mixed-case day name (3 chars)
dy abbreviated lowercase day name (3 chars)
DDD day of year (001-366)
DD day of month (01-31)
D day of week (1-7; Sunday is 1)
W week of month (1-5) (The first week starts on the
first day of the month.)
WW week number of year (1-53) (The first week starts
on the first day of the year.)
IW ISO week number of year (The first Thursday of the
new year is in week 1.)
CC century (2 digits)
J Julian Day (days since January 1, 4712 BC)
Q quarter
RM month in Roman numerals (I-XII; I=January)
(uppercase)
rm month in Roman numerals (i-xii; i=January)
(lowercase)

149
Chapter 9. Functions and Operators

Pattern Description
TZ time-zone name (uppercase)
tz time-zone name (lowercase)

Certain modifiers may be applied to any template pattern to alter its behavior. For example, FMMonth is
the Month pattern with the FM modifier. Table 9-22 shows the modifier patterns for date/time formatting.

Table 9-22. Template Pattern Modifiers for Date/Time Formatting

Modifier Description Example


FM prefix fill mode (suppress padding blanks FMMonth
and zeroes)
TH suffix uppercase ordinal number suffix DDTH
th suffix lowercase ordinal number suffix DDth
FX prefix fixed format global option (see FX Month DD Day
usage notes)
SP suffix spell mode (not yet implemented) DDSP

Usage notes for date/time formatting:

• FM suppresses leading zeroes and trailing blanks that would otherwise be added to make the output of a
pattern be fixed-width.
• to_timestamp and to_date skip multiple blank spaces in the input string if the FX option is not used.
FX must be specified as the first item in the template. For example to_timestamp(’2000 JUN’,
’YYYY MON’) is correct, but to_timestamp(’2000 JUN’, ’FXYYYY MON’) returns an error,
because to_timestamp expects one space only.
• Ordinary text is allowed in to_char templates and will be output literally. You can put a substring
in double quotes to force it to be interpreted as literal text even if it contains pattern key words. For
example, in ’"Hello Year "YYYY’, the YYYY will be replaced by the year data, but the single Y in
Year will not be.

• If you want to have a double quote in the output you must precede it with a backslash, for example
’\\"YYYY Month\\"’. (Two backslashes are necessary because the backslash already has a special
meaning in a string constant.)
• The YYYY conversion from string to timestamp or date has a restriction if you use a year with
more than 4 digits. You must use some non-digit character or template after YYYY, otherwise the
year is always interpreted as 4 digits. For example (with the year 20000): to_date(’200001131’,
’YYYYMMDD’) will be interpreted as a 4-digit year; instead use a non-digit separator after the year, like
to_date(’20000-1131’, ’YYYY-MMDD’) or to_date(’20000Nov31’, ’YYYYMonDD’).

• Millisecond (MS) and microsecond (US) values in a conversion from string to timestamp are used as
part of the seconds after the decimal point. For example to_timestamp(’12:3’, ’SS:MS’) is not 3
milliseconds, but 300, because the conversion counts it as 12 + 0.3 seconds. This means for the format
SS:MS, the input values 12:3, 12:30, and 12:300 specify the same number of milliseconds. To get
three milliseconds, one must use 12:003, which the conversion counts as 12 + 0.003 = 12.003 seconds.

150
Chapter 9. Functions and Operators

Here is a more complex example: to_timestamp(’15:12:02.020.001230’,


’HH:MI:SS.MS.US’) is 15 hours, 12 minutes, and 2 seconds + 20 milliseconds + 1230 microseconds
= 2.021230 seconds.

• to_char’s day of the week numbering (see the ’D’ formatting pattern) is different from that of the
extract function.

Table 9-23 shows the template patterns available for formatting numeric values.

Table 9-23. Template Patterns for Numeric Formatting

Pattern Description
9 value with the specified number of digits
0 value with leading zeros
. (period) decimal point
, (comma) group (thousand) separator
PR negative value in angle brackets
S sign anchored to number (uses locale)
L currency symbol (uses locale)
D decimal point (uses locale)
G group separator (uses locale)
MI minus sign in specified position (if number < 0)
PL plus sign in specified position (if number > 0)
SG plus/minus sign in specified position
RN roman numeral (input between 1 and 3999)
TH or th ordinal number suffix
V shift specified number of digits (see notes)
EEEE scientific notation (not implemented yet)

Usage notes for numeric formatting:

• A sign formatted using SG, PL, or MI is not anchored to the number; for example, to_char(-12,
’S9999’) produces ’ -12’, but to_char(-12, ’MI9999’) produces ’- 12’. The Oracle im-
plementation does not allow the use of MI ahead of 9, but rather requires that 9 precede MI.
• 9 results in a value with the same number of digits as there are 9s. If a digit is not available it outputs a
space.
• TH does not convert values less than zero and does not convert fractional numbers.
• PL, SG, and TH are PostgreSQL extensions.
• V effectively multiplies the input values by 10^n, where n is the number of digits following V. to_char
does not support the use of V combined with a decimal point. (E.g., 99.9V99 is not allowed.)

151
Chapter 9. Functions and Operators

Table 9-24 shows some examples of the use of the to_char function.

Table 9-24. to_char Examples

Expression Result
to_char(current_timestamp, ’Tuesday , 06 05:39:18’
’Day, DD HH12:MI:SS’)
to_char(current_timestamp, ’Tuesday, 6 05:39:18’
’FMDay, FMDD HH12:MI:SS’)
to_char(-0.1, ’99.99’) ’ -.10’
to_char(-0.1, ’FM9.99’) ’-.1’
to_char(0.1, ’0.9’) ’ 0.1’
to_char(12, ’9990999.9’) ’ 0012.0’
to_char(12, ’FM9990999.9’) ’0012.’
to_char(485, ’999’) ’ 485’
to_char(-485, ’999’) ’-485’
to_char(485, ’9 9 9’) ’ 4 8 5’
to_char(1485, ’9,999’) ’ 1,485’
to_char(1485, ’9G999’) ’ 1 485’
to_char(148.5, ’999.999’) ’ 148.500’
to_char(148.5, ’FM999.999’) ’148.5’
to_char(148.5, ’FM999.990’) ’148.500’
to_char(148.5, ’999D999’) ’ 148,500’
to_char(3148.5, ’9G999D999’) ’ 3 148,500’
to_char(-485, ’999S’) ’485-’
to_char(-485, ’999MI’) ’485-’
to_char(485, ’999MI’) ’485 ’
to_char(485, ’FM999MI’) ’485’
to_char(485, ’PL999’) ’+485’
to_char(485, ’SG999’) ’+485’
to_char(-485, ’SG999’) ’-485’
to_char(-485, ’9SG99’) ’4-85’
to_char(-485, ’999PR’) ’<485>’
to_char(485, ’L999’) ’DM 485
to_char(485, ’RN’) ’ CDLXXXV’
to_char(485, ’FMRN’) ’CDLXXXV’
to_char(5.2, ’FMRN’) ’V’
to_char(482, ’999th’) ’ 482nd’
to_char(485, ’"Good number:"999’) ’Good number: 485’

152
Chapter 9. Functions and Operators

Expression Result
to_char(485.8, ’Pre: 485 Post: .800’
’"Pre:"999" Post:" .999’)
to_char(12, ’99V999’) ’ 12000’
to_char(12.4, ’99V999’) ’ 12400’
to_char(12.45, ’99V9’) ’ 125’

9.9. Date/Time Functions and Operators


Table 9-26 shows the available functions for date/time value processing, with details appearing in the
following subsections. Table 9-25 illustrates the behaviors of the basic arithmetic operators (+, *, etc.).
For formatting functions, refer to Section 9.8. You should be familiar with the background information on
date/time data types from Section 8.5.
All the functions and operators described below that take time or timestamp inputs actually come in two
variants: one that takes time with time zone or timestamp with time zone, and one that takes
time without time zone or timestamp without time zone. For brevity, these variants are not
shown separately. Also, the + and * operators come in commutative pairs (for example both date + integer
and integer + date); we show only one of each such pair.

Table 9-25. Date/Time Operators

Operator Example Result


+ date ’2001-09-28’ + date ’2001-10-05’
integer ’7’
+ date ’2001-09-28’ + timestamp ’2001-09-28
interval ’1 hour’ 01:00’
+ date ’2001-09-28’ + time timestamp ’2001-09-28
’03:00’ 03:00’
+ interval ’1 day’ + interval ’1 day 01:00’
interval ’1 hour’
+ timestamp ’2001-09-28 timestamp ’2001-09-29
01:00’ + interval ’23 00:00’
hours’
+ time ’01:00’ + interval time ’04:00’
’3 hours’
- - interval ’23 hours’ interval ’-23:00’
- date ’2001-10-01’ - date integer ’3’
’2001-09-28’
- date ’2001-10-01’ - date ’2001-09-24’
integer ’7’
- date ’2001-09-28’ - timestamp ’2001-09-27
interval ’1 hour’ 23:00’

153
Chapter 9. Functions and Operators

Operator Example Result


- time ’05:00’ - time interval ’02:00’
’03:00’
- time ’05:00’ - interval time ’03:00’
’2 hours’
- timestamp ’2001-09-28 timestamp ’2001-09-28
23:00’ - interval ’23 00:00’
hours’
- interval ’1 day’ - interval ’23:00’
interval ’1 hour’
- timestamp ’2001-09-29 interval ’1 day 15:00’
03:00’ - timestamp
’2001-09-27 12:00’
* interval ’1 hour’ * interval ’03:30’
double precision ’3.5’
/ interval ’1 hour’ / interval ’00:40’
double precision ’1.5’

Table 9-26. Date/Time Functions

Function Return Type Description Example Result


age(timestamp, interval Subtract arguments, age(timestamp 43 years 9 mons
timestamp) producing a ’2001-04-10’, 27 days
“symbolic” result timestamp
that uses years and ’1957-06-13’)
months
age(timestamp) interval Subtract from age(timestamp 43 years 8 mons
current_date ’1957-06-13’) 3 days
current_date date Today’s date; see
Section 9.9.4
current_time time with time Time of day; see
zone Section 9.9.4
current_timestamp timestamp with Date and time; see
time zone Section 9.9.4
date_part(text, double Get subfield date_part(’hour’,
20
timestamp) precision (equivalent to timestamp
extract); see ’2001-02-16
Section 9.9.1 20:38:40’)
date_part(text, double Get subfield date_part(’month’,
3
interval) precision (equivalent to interval ’2
extract); see years 3
Section 9.9.1 months’)

154
Chapter 9. Functions and Operators

Function Return Type Description Example Result


date_trunc(text, timestamp Truncate to specified date_trunc(’hour’,
2001-02-16
timestamp) precision; see also timestamp 20:00:00
Section 9.9.2 ’2001-02-16
20:38:40’)
extract(field double Get subfield; see extract(hour 20
from timestamp) precision Section 9.9.1 from timestamp
’2001-02-16
20:38:40’)
extract(field double Get subfield; see extract(month 3
from interval) precision Section 9.9.1 from interval
’2 years 3
months’)
isfinite(timestampboolean
) Test for finite time isfinite(timestamp
true
stamp (not equal to ’2001-02-16
infinity) 21:28:30’)
isfinite(interval)
boolean Test for finite isfinite(interval
true
interval ’4 hours’)
localtime time Time of day; see
Section 9.9.4
localtimestamp timestamp Date and time; see
Section 9.9.4
now() timestamp with Current date and
time zone time (equivalent to
current_timestamp);
see Section 9.9.4
timeofday() text Current date and
time; see Section
9.9.4

In addition to these functions, the SQL OVERLAPS operator is supported:

( start1, end1 ) OVERLAPS ( start2, end2 )


( start1, length1 ) OVERLAPS ( start2, length2 )

This expression yields true when two time periods (defined by their endpoints) overlap, false when they
do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time,
or time stamp followed by an interval.

SELECT (DATE ’2001-02-16’, DATE ’2001-12-21’) OVERLAPS


(DATE ’2001-10-30’, DATE ’2002-10-30’);
Result: true
SELECT (DATE ’2001-02-16’, INTERVAL ’100 days’) OVERLAPS
(DATE ’2001-10-30’, DATE ’2002-10-30’);
Result: false

155
Chapter 9. Functions and Operators

9.9.1. EXTRACT, date_part


EXTRACT (field FROM source)

The extract function retrieves subfields such as year or hour from date/time values. source must be
a value expression of type timestamp, time, or interval. (Expressions of type date will be cast to
timestamp and can therefore be used as well.) field is an identifier or string that selects what field to
extract from the source value. The extract function returns values of type double precision. The
following are valid field names:

century

The century
SELECT EXTRACT(CENTURY FROM TIMESTAMP ’2000-12-16 12:21:13’);
Result: 20
SELECT EXTRACT(CENTURY FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 21

The first century starts at 0001-01-01 00:00:00 AD, although they did not know it at the time. This
definition applies to all Gregorian calendar countries. There is no century number 0, you go from -1
to 1. If you disagree with this, please write your complaint to: Pope, Cathedral Saint-Peter of Roma,
Vatican.
PostgreSQL releases before 8.0 did not follow the conventional numbering of centuries, but just
returned the year field divided by 100.
day

The day (of the month) field (1 - 31)


SELECT EXTRACT(DAY FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 16

decade

The year field divided by 10


SELECT EXTRACT(DECADE FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 200

dow

The day of the week (0 - 6; Sunday is 0) (for timestamp values only)


SELECT EXTRACT(DOW FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 5

Note that extract’s day of the week numbering is different from that of the to_char function.
doy

The day of the year (1 - 365/366) (for timestamp values only)


SELECT EXTRACT(DOY FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 47

156
Chapter 9. Functions and Operators

epoch

For date and timestamp values, the number of seconds since 1970-01-01 00:00:00-00 (can be
negative); for interval values, the total number of seconds in the interval
SELECT EXTRACT(EPOCH FROM TIMESTAMP WITH TIME ZONE ’2001-02-16 20:38:40-08’);
Result: 982384720

SELECT EXTRACT(EPOCH FROM INTERVAL ’5 days 3 hours’);


Result: 442800

Here is how you can convert an epoch value back to a time stamp:
SELECT TIMESTAMP WITH TIME ZONE ’epoch’ + 982384720 * INTERVAL ’1 second’;

hour

The hour field (0 - 23)


SELECT EXTRACT(HOUR FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 20

microseconds

The seconds field, including fractional parts, multiplied by 1 000 000. Note that this includes full
seconds.
SELECT EXTRACT(MICROSECONDS FROM TIME ’17:12:28.5’);
Result: 28500000

millennium

The millennium
SELECT EXTRACT(MILLENNIUM FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 3

Years in the 1900s are in the second millennium. The third millennium starts January 1, 2001.
PostgreSQL releases before 8.0 did not follow the conventional numbering of millennia, but just
returned the year field divided by 1000.
milliseconds

The seconds field, including fractional parts, multiplied by 1000. Note that this includes full seconds.
SELECT EXTRACT(MILLISECONDS FROM TIME ’17:12:28.5’);
Result: 28500

minute

The minutes field (0 - 59)


SELECT EXTRACT(MINUTE FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 38

month

For timestamp values, the number of the month within the year (1 - 12) ; for interval values the
number of months, modulo 12 (0 - 11)
SELECT EXTRACT(MONTH FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 2

SELECT EXTRACT(MONTH FROM INTERVAL ’2 years 3 months’);

157
Chapter 9. Functions and Operators

Result: 3

SELECT EXTRACT(MONTH FROM INTERVAL ’2 years 13 months’);


Result: 1

quarter

The quarter of the year (1 - 4) that the day is in (for timestamp values only)
SELECT EXTRACT(QUARTER FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 1

second

The seconds field, including fractional parts (0 - 591)


SELECT EXTRACT(SECOND FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 40

SELECT EXTRACT(SECOND FROM TIME ’17:12:28.5’);


Result: 28.5

timezone

The time zone offset from UTC, measured in seconds. Positive values correspond to time zones east
of UTC, negative values to zones west of UTC.
timezone_hour

The hour component of the time zone offset


timezone_minute

The minute component of the time zone offset


week

The number of the week of the year that the day is in. By definition (ISO 8601), the first week of a
year contains January 4 of that year. (The ISO-8601 week starts on Monday.) In other words, the first
Thursday of a year is in week 1 of that year. (for timestamp values only)
SELECT EXTRACT(WEEK FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 7

year

The year field. Keep in mind there is no 0 AD, so subtracting BC years from AD years should be done
with care.
SELECT EXTRACT(YEAR FROM TIMESTAMP ’2001-02-16 20:38:40’);
Result: 2001

The extract function is primarily intended for computational processing. For formatting date/time val-
ues for display, see Section 9.8.
The date_part function is modeled on the traditional Ingres equivalent to the SQL-standard function
extract:

60 if leap seconds are implemented by the operating system

158
Chapter 9. Functions and Operators

date_part(’field’, source)

Note that here the field parameter needs to be a string value, not a name. The valid field names for
date_part are the same as for extract.

SELECT date_part(’day’, TIMESTAMP ’2001-02-16 20:38:40’);


Result: 16

SELECT date_part(’hour’, INTERVAL ’4 hours 3 minutes’);


Result: 4

9.9.2. date_trunc
The function date_trunc is conceptually similar to the trunc function for numbers.

date_trunc(’field’, source)

source is a value expression of type timestamp or interval. (Values of type date and time are cast
automatically, to timestamp or interval respectively.) field selects to which precision to truncate
the input value. The return value is of type timestamp or interval with all fields that are less significant
than the selected one set to zero (or one, for day and month).
Valid values for field are:

microseconds
milliseconds
second
minute
hour
day
week
month
year
decade
century
millennium

Examples:

SELECT date_trunc(’hour’, TIMESTAMP ’2001-02-16 20:38:40’);


Result: 2001-02-16 20:00:00

SELECT date_trunc(’year’, TIMESTAMP ’2001-02-16 20:38:40’);


Result: 2001-01-01 00:00:00

159
Chapter 9. Functions and Operators

9.9.3. AT TIME ZONE


The AT TIME ZONE construct allows conversions of time stamps to different time zones. Table 9-27
shows its variants.

Table 9-27. AT TIME ZONE Variants

Expression Return Type Description


timestamp without time zone timestamp with time zone Convert local time in given time
AT TIME ZONE zone zone to UTC
timestamp with time zone AT timestamp without time Convert UTC to local time in
TIME ZONE zone zone given time zone
time with time zone AT TIME time with time zone Convert local time across time
ZONE zone zones

In these expressions, the desired time zone zone can be specified either as a text string (e.g., ’PST’) or
as an interval (e.g., INTERVAL ’-08:00’). In the text case, the available zone names are those shown in
Table B-4. (It would be useful to support the more general names shown in Table B-6, but this is not yet
implemented.)
Examples (supposing that the local time zone is PST8PDT):

SELECT TIMESTAMP ’2001-02-16 20:38:40’ AT TIME ZONE ’MST’;


Result: 2001-02-16 19:38:40-08

SELECT TIMESTAMP WITH TIME ZONE ’2001-02-16 20:38:40-05’ AT TIME ZONE ’MST’;
Result: 2001-02-16 18:38:40

The first example takes a zone-less time stamp and interprets it as MST time (UTC-7) to produce a UTC
time stamp, which is then rotated to PST (UTC-8) for display. The second example takes a time stamp
specified in EST (UTC-5) and converts it to local time in MST (UTC-7).
The function timezone(zone, timestamp) is equivalent to the SQL-conforming construct timestamp
AT TIME ZONE zone.

9.9.4. Current Date/Time


The following functions are available to obtain the current date and/or time:

CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_TIME ( precision )
CURRENT_TIMESTAMP ( precision )
LOCALTIME
LOCALTIMESTAMP
LOCALTIME ( precision )
LOCALTIMESTAMP ( precision )

160
Chapter 9. Functions and Operators

CURRENT_TIME and CURRENT_TIMESTAMP deliver values with time zone; LOCALTIME and
LOCALTIMESTAMP deliver values without time zone.

CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP can optionally be given a


precision parameter, which causes the result to be rounded to that many fractional digits in the seconds
field. Without a precision parameter, the result is given to the full available precision.

Note: Prior to PostgreSQL 7.2, the precision parameters were unimplemented, and the result was
always given in integer seconds.

Some examples:

SELECT CURRENT_TIME;
Result: 14:39:53.662522-05

SELECT CURRENT_DATE;
Result: 2001-12-23

SELECT CURRENT_TIMESTAMP;
Result: 2001-12-23 14:39:53.662522-05

SELECT CURRENT_TIMESTAMP(2);
Result: 2001-12-23 14:39:53.66-05

SELECT LOCALTIMESTAMP;
Result: 2001-12-23 14:39:53.662522

The function now() is the traditional PostgreSQL equivalent to CURRENT_TIMESTAMP.


There is also the function timeofday(), which for historical reasons returns a text string rather than a
timestamp value:

SELECT timeofday();
Result: Sat Feb 17 19:07:32.000126 2001 EST

It is important to know that CURRENT_TIMESTAMP and related functions return the start time of the current
transaction; their values do not change during the transaction. This is considered a feature: the intent is to
allow a single transaction to have a consistent notion of the “current” time, so that multiple modifications
within the same transaction bear the same time stamp. timeofday() returns the wall-clock time and does
advance during transactions.

Note: Other database systems may advance these values more frequently.

All the date/time data types also accept the special literal value now to specify the current date and time.
Thus, the following three all return the same result:

SELECT CURRENT_TIMESTAMP;

161
Chapter 9. Functions and Operators

SELECT now();
SELECT TIMESTAMP ’now’;

Tip: You do not want to use the third form when specifying a DEFAULT clause while creating a table.
The system will convert now to a timestamp as soon as the constant is parsed, so that when the
default value is needed, the time of the table creation would be used! The first two forms will not
be evaluated until the default value is used, because they are function calls. Thus they will give the
desired behavior of defaulting to the time of row insertion.

9.10. Geometric Functions and Operators


The geometric types point, box, lseg, line, path, polygon, and circle have a large set of native
support functions and operators, shown in Table 9-28, Table 9-29, and Table 9-30.

Table 9-28. Geometric Operators

Operator Description Example


+ Translation box ’((0,0),(1,1))’ +
point ’(2.0,0)’
- Translation box ’((0,0),(1,1))’ -
point ’(2.0,0)’
* Scaling/rotation box ’((0,0),(1,1))’ *
point ’(2.0,0)’
/ Scaling/rotation box ’((0,0),(2,2))’ /
point ’(2.0,0)’
# Point or box of intersection ’((1,-1),(-1,1))’ #
’((1,1),(-1,-1))’
# Number of points in path or # ’((1,0),(0,1),(-1,0))’
polygon
@-@ Length or circumference @-@ path ’((0,0),(1,0))’
@@ Center @@ circle ’((0,0),10)’
## Closest point to first operand on point ’(0,0)’ ## lseg
second operand ’((2,0),(0,2))’
<-> Distance between circle ’((0,0),1)’ <->
circle ’((5,0),1)’
&& Overlaps? box ’((0,0),(1,1))’ &&
box ’((0,0),(2,2))’
&< Does not extend to the right of? box ’((0,0),(1,1))’ &<
box ’((0,0),(2,2))’

162
Chapter 9. Functions and Operators

Operator Description Example


&> Does not extend to the left of? box ’((0,0),(3,3))’ &>
box ’((0,0),(2,2))’
<< Is left of? circle ’((0,0),1)’ <<
circle ’((5,0),1)’
>> Is right of? circle ’((5,0),1)’ >>
circle ’((0,0),1)’
<^ Is below? circle ’((0,0),1)’ <^
circle ’((0,5),1)’
>^ Is above? circle ’((0,5),1)’ >^
circle ’((0,0),1)’
?# Intersects? lseg ’((-1,0),(1,0))’ ?#
box ’((-2,-2),(2,2))’
?- Is horizontal? ?- lseg ’((-1,0),(1,0))’
?- Are horizontally aligned? point ’(1,0)’ ?- point
’(0,0)’
?| Is vertical? ?| lseg ’((-1,0),(1,0))’
?| Are vertically aligned? point ’(0,1)’ ?| point
’(0,0)’
?-| Is perpendicular? lseg ’((0,0),(0,1))’ ?-|
lseg ’((0,0),(1,0))’
?|| Are parallel? lseg ’((-1,0),(1,0))’ ?||
lseg ’((-1,2),(1,2))’
~ Contains? circle ’((0,0),2)’ ~
point ’(1,1)’
@ Contained in or on? point ’(1,1)’ @ circle
’((0,0),2)’
~= Same as? polygon ’((0,0),(1,1))’
~= polygon
’((1,1),(0,0))’

Table 9-29. Geometric Functions

Function Return Type Description Example


area(object) double precision area area(box
’((0,0),(1,1))’)
box_intersect(box, box intersection box box_intersect(box
box) ’((0,0),(1,1))’,box
’((0.5,0.5),(2,2))’)

center(object) point center center(box


’((0,0),(1,2))’)

163
Chapter 9. Functions and Operators

Function Return Type Description Example


diameter(circle) double precision diameter of circle diameter(circle
’((0,0),2.0)’)
height(box) double precision vertical size of box height(box
’((0,0),(1,1))’)
isclosed(path) boolean a closed path? isclosed(path
’((0,0),(1,1),(2,0))’)

isopen(path) boolean an open path? isopen(path


’[(0,0),(1,1),(2,0)]’)

length(object) double precision length length(path


’((-1,0),(1,0))’)
npoints(path) integer number of points npoints(path
’[(0,0),(1,1),(2,0)]’)

npoints(polygon) integer number of points npoints(polygon


’((1,1),(0,0))’)
pclose(path) path convert path to closed pclose(path
’[(0,0),(1,1),(2,0)]’)

popen(path) path convert path to open popen(path


’((0,0),(1,1),(2,0))’)

radius(circle) double precision radius of circle radius(circle


’((0,0),2.0)’)
width(box) double precision horizontal size of box width(box
’((0,0),(1,1))’)

Table 9-30. Geometric Type Conversion Functions

Function Return Type Description Example


box(circle) box circle to box box(circle
’((0,0),2.0)’)
box(point, point) box points to box box(point ’(0,0)’,
point ’(1,1)’)
box(polygon) box polygon to box box(polygon
’((0,0),(1,1),(2,0))’)

circle(box) circle box to circle circle(box


’((0,0),(1,1))’)
circle(point, double circle point and radius to circle circle(point
precision) ’(0,0)’, 2.0)

164
Chapter 9. Functions and Operators

Function Return Type Description Example


lseg(box) lseg box diagonal to line lseg(box
segment ’((-1,0),(1,0))’)
lseg(point, point) lseg points to line segment lseg(point
’(-1,0)’, point
’(1,0)’)
path(polygon) point polygon to path path(polygon
’((0,0),(1,1),(2,0))’)

point(circle) point center of circle point(circle


’((0,0),2.0)’)
point(lseg, lseg) point intersection point(lseg
’((-1,0),(1,0))’,
lseg
’((-2,-2),(2,2))’)
point(polygon) point center of polygon point(polygon
’((0,0),(1,1),(2,0))’)

polygon(box) polygon box to 4-point polygon polygon(box


’((0,0),(1,1))’)
polygon(circle) polygon circle to 12-point polygon polygon(circle
’((0,0),2.0)’)
polygon(npts, polygon circle to npts-point polygon(12, circle
circle) polygon ’((0,0),2.0)’)
polygon(path) polygon path to polygon polygon(path
’((0,0),(1,1),(2,0))’)

It is possible to access the two component numbers of a point as though it were an array with indices
0 and 1. For example, if t.p is a point column then SELECT p[0] FROM t retrieves the X coordinate
and UPDATE t SET p[1] = ... changes the Y coordinate. In the same way, a value of type box or
lseg may be treated as an array of two point values.

The area function works for the types box, circle, and path. The area function only
works on the path data type if the points in the path are non-intersecting. For example,
the path ’((0,0),(0,1),(2,1),(2,2),(1,2),(1,0),(0,0))’::PATH
won’t work, however, the following visually identical path
’((0,0),(0,1),(1,1),(1,2),(2,2),(2,1),(1,1),(1,0),(0,0))’::PATH
will work. If the concept of an intersecting versus non-intersecting path is confusing, draw both of the
above paths side by side on a piece of graph paper.

9.11. Network Address Functions and Operators


Table 9-31 shows the operators available for the cidr and inet types. The operators <<, <<=, >>, and
>>= test for subnet inclusion. They consider only the network parts of the two addresses, ignoring any

165
Chapter 9. Functions and Operators

host part, and determine whether one network part is identical to or a subnet of the other.

Table 9-31. cidr and inet Operators

Operator Description Example


< is less than inet ’192.168.1.5’ < inet
’192.168.1.6’
<= is less than or equal inet ’192.168.1.5’ <=
inet ’192.168.1.5’
= equals inet ’192.168.1.5’ = inet
’192.168.1.5’
>= is greater or equal inet ’192.168.1.5’ >=
inet ’192.168.1.5’
> is greater than inet ’192.168.1.5’ > inet
’192.168.1.4’
<> is not equal inet ’192.168.1.5’ <>
inet ’192.168.1.4’
<< is contained within inet ’192.168.1.5’ <<
inet ’192.168.1/24’
<<= is contained within or equals inet ’192.168.1/24’ <<=
inet ’192.168.1/24’
>> contains inet ’192.168.1/24’ >>
inet ’192.168.1.5’
>>= contains or equals inet ’192.168.1/24’ >>=
inet ’192.168.1/24’

Table 9-32 shows the functions available for use with the cidr and inet types. The host, text, and
abbrev functions are primarily intended to offer alternative display formats. You can cast a text value to
inet using normal casting syntax: inet(expression) or colname::inet.

Table 9-32. cidr and inet Functions

Function Return Type Description Example Result


broadcast(inet) inet broadcast address broadcast(’192.168.1.5/24’)
192.168.1.255/24
for network
host(inet) text extract IP address as host(’192.168.1.5/24’)
192.168.1.5
text
masklen(inet) integer extract netmask masklen(’192.168.1.5/24’)
24
length
set_masklen(inet, inet set netmask length set_masklen(’192.168.1.5/24’,
192.168.1.5/16
integer) for inet value 16)
netmask(inet) inet construct netmask netmask(’192.168.1.5/24’)
255.255.255.0
for network

166
Chapter 9. Functions and Operators

Function Return Type Description Example Result


hostmask(inet) inet construct host mask hostmask(’192.168.23.20/30’)
0.0.0.3
for network
network(inet) cidr extract network part network(’192.168.1.5/24’)
192.168.1.0/24
of address
text(inet) text extract IP address text(inet 192.168.1.5/32
and netmask length ’192.168.1.5’)
as text
abbrev(inet) text abbreviated display abbrev(cidr 10.1/16
format as text ’10.1.0.0/16’)
family(inet) integer extract family of family(’::1’) 6
address; 4 for IPv4,
6 for IPv6

Table 9-33 shows the functions available for use with the macaddr type. The function trunc(macaddr)
returns a MAC address with the last 3 bytes set to zero. This can be used to associate the remaining prefix
with a manufacturer. The directory contrib/mac in the source distribution contains some utilities to
create and maintain such an association table.

Table 9-33. macaddr Functions

Function Return Type Description Example Result


trunc(macaddr) macaddr set last 3 bytes to trunc(macaddr 12:34:56:00:00:00
zero ’12:34:56:78:90:ab’)

The macaddr type also supports the standard relational operators (>, <=, etc.) for lexicographical order-
ing.

9.12. Sequence Manipulation Functions


This section describes PostgreSQL’s functions for operating on sequence objects. Sequence objects
(also called sequence generators or just sequences) are special single-row tables created with CREATE
SEQUENCE. A sequence object is usually used to generate unique identifiers for rows of a table. The
sequence functions, listed in Table 9-34, provide simple, multiuser-safe methods for obtaining successive
sequence values from sequence objects.

Table 9-34. Sequence Functions

Function Return Type Description


nextval(text) bigint Advance sequence and return new
value
currval(text) bigint Return value most recently
obtained with nextval
setval(text, bigint) bigint Set sequence’s current value

167
Chapter 9. Functions and Operators

Function Return Type Description


setval(text, bigint, bigint Set sequence’s current value and
boolean) is_called flag

For largely historical reasons, the sequence to be operated on by a sequence-function call is specified
by a text-string argument. To achieve some compatibility with the handling of ordinary SQL names, the
sequence functions convert their argument to lowercase unless the string is double-quoted. Thus

nextval(’foo’) operates on sequence foo


nextval(’FOO’) operates on sequence foo
nextval(’"Foo"’) operates on sequence Foo

The sequence name can be schema-qualified if necessary:

nextval(’myschema.foo’) operates on myschema.foo


nextval(’"myschema".foo’) same as above
nextval(’foo’) searches search path for foo

Of course, the text argument can be the result of an expression, not only a simple literal, which is occa-
sionally useful.
The available sequence functions are:

nextval

Advance the sequence object to its next value and return that value. This is done atomically: even if
multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.
currval

Return the value most recently obtained by nextval for this sequence in the current session. (An
error is reported if nextval has never been called for this sequence in this session.) Notice that
because this is returning a session-local value, it gives a predictable answer whether or not other
sessions have executed nextval since the current session did.
setval

Reset the sequence object’s counter value. The two-parameter form sets the sequence’s last_value
field to the specified value and sets its is_called field to true, meaning that the next nextval
will advance the sequence before returning a value. In the three-parameter form, is_called may
be set either true or false. If it’s set to false, the next nextval will return exactly the specified
value, and sequence advancement commences with the following nextval. For example,
SELECT setval(’foo’, 42); Next nextval will return 43
SELECT setval(’foo’, 42, true); Same as above
SELECT setval(’foo’, 42, false); Next nextval will return 42

The result returned by setval is just the value of its second argument.

Important: To avoid blocking of concurrent transactions that obtain numbers from the same sequence,
a nextval operation is never rolled back; that is, once a value has been fetched it is considered used,
even if the transaction that did the nextval later aborts. This means that aborted transactions may

168
Chapter 9. Functions and Operators

leave unused “holes” in the sequence of assigned values. setval operations are never rolled back,
either.

If a sequence object has been created with default parameters, nextval calls on it will return successive
values beginning with 1. Other behaviors can be obtained by using special parameters in the CREATE
SEQUENCE command; see its command reference page for more information.

9.13. Conditional Expressions


This section describes the SQL-compliant conditional expressions available in PostgreSQL.

Tip: If your needs go beyond the capabilities of these conditional expressions you might want to
consider writing a stored procedure in a more expressive programming language.

9.13.1. CASE
The SQL CASE expression is a generic conditional expression, similar to if/else statements in other lan-
guages:

CASE WHEN condition THEN result


[WHEN ...]
[ELSE result]
END

CASE clauses can be used wherever an expression is valid. condition is an expression that returns a
boolean result. If the result is true then the value of the CASE expression is the result that follows the
condition. If the result is false any subsequent WHEN clauses are searched in the same manner. If no WHEN
condition is true then the value of the case expression is the result in the ELSE clause. If the ELSE
clause is omitted and no condition matches, the result is null.
An example:

SELECT * FROM test;

a
---
1
2
3

SELECT a,
CASE WHEN a=1 THEN ’one’
WHEN a=2 THEN ’two’
ELSE ’other’
END
FROM test;

169
Chapter 9. Functions and Operators

a | case
---+-------
1 | one
2 | two
3 | other

The data types of all the result expressions must be convertible to a single output type. See Section
10.5 for more detail.
The following “simple” CASE expression is a specialized variant of the general form above:

CASE expression
WHEN value THEN result
[WHEN ...]
[ELSE result]
END

The expression is computed and compared to all the value specifications in the WHEN clauses until
one is found that is equal. If no match is found, the result in the ELSE clause (or a null value) is
returned. This is similar to the switch statement in C.
The example above can be written using the simple CASE syntax:

SELECT a,
CASE a WHEN 1 THEN ’one’
WHEN 2 THEN ’two’
ELSE ’other’
END
FROM test;

a | case
---+-------
1 | one
2 | two
3 | other

A CASE expression does not evaluate any subexpressions that are not needed to determine the result. For
example, this is a possible way of avoiding a division-by-zero failure:

SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false END;

9.13.2. COALESCE
COALESCE(value [, ...])

170
Chapter 9. Functions and Operators

The COALESCE function returns the first of its arguments that is not null. Null is returned only if all
arguments are null. This is often useful to substitute a default value for null values when data is retrieved
for display, for example:

SELECT COALESCE(description, short_description, ’(none)’) ...

Like a CASE expression, COALESCE will not evaluate arguments that are not needed to determine the
result; that is, arguments to the right of the first non-null argument are not evaluated.

9.13.3. NULLIF
NULLIF(value1, value2)

The NULLIF function returns a null value if and only if value1 and value2 are equal. Otherwise it
returns value1. This can be used to perform the inverse operation of the COALESCE example given
above:

SELECT NULLIF(value, ’(none)’) ...

9.14. Array Functions and Operators


Table 9-35 shows the operators available for array types.

Table 9-35. array Operators

Operator Description Example Result


= equal ARRAY[1.1,2.1,3.1]::int[]
t
= ARRAY[1,2,3]
<> not equal ARRAY[1,2,3] <> t
ARRAY[1,2,4]
< less than ARRAY[1,2,3] < t
ARRAY[1,2,4]
> greater than ARRAY[1,4,3] > t
ARRAY[1,2,4]
<= less than or equal ARRAY[1,2,3] <= t
ARRAY[1,2,3]
>= greater than or equal ARRAY[1,4,3] >= t
ARRAY[1,4,3]
|| array-to-array ARRAY[1,2,3] || {1,2,3,4,5,6}
concatenation ARRAY[4,5,6]

171
Chapter 9. Functions and Operators

Operator Description Example Result


|| array-to-array ARRAY[1,2,3] || {{1,2,3},{4,5,6},{7,8,9}}
concatenation ARRAY[[4,5,6],[7,8,9]]

|| element-to-array 3 || ARRAY[4,5,6] {3,4,5,6}


concatenation
|| array-to-element ARRAY[4,5,6] || 7 {4,5,6,7}
concatenation

See Section 8.10 for more details about array operator behavior.
Table 9-36 shows the functions available for use with array types. See Section 8.10 for more discussion
and examples of the use of these functions.

Table 9-36. array Functions

Function Return Type Description Example Result


array_cat anyarray concatenate two array_cat(ARRAY[1,2,3],
{1,2,3,4,5}
(anyarray, arrays ARRAY[4,5])
anyarray)
array_append anyarray append an element array_append(ARRAY[1,2],
{1,2,3}
(anyarray, to the end of an 3)
anyelement) array
array_prepend anyarray append an element array_prepend(1,{1,2,3}
(anyelement, to the beginning of ARRAY[2,3])
anyarray) an array
array_dims text returns a text array_dims(array[[1,2,3],
[1:2][1:3]
(anyarray) representation of [4,5,6]])
array’s dimensions
array_lower integer returns lower bound array_lower(array_prepend(0,
0
(anyarray, of the requested ARRAY[1,2,3]),
integer) array dimension 1)
array_upper integer returns upper bound array_upper(ARRAY[1,2,3,4],
4
(anyarray, of the requested 1)
integer) array dimension
array_to_string text concatenates array array_to_string(array[1,
1~^~2~^~3
(anyarray, text) elements using 2, 3], ’~^~’)
provided delimiter
string_to_array text[] splits string into string_to_array({xx,yy,zz}
(text, text) array elements using ’xx~^~yy~^~zz’,
provided delimiter ’~^~’)

9.15. Aggregate Functions


Aggregate functions compute a single result value from a set of input values. Table 9-37 shows the built-in

172
Chapter 9. Functions and Operators

aggregate functions. The special syntax considerations for aggregate functions are explained in Section
4.2.7. Consult Section 2.7 for additional introductory information.

Table 9-37. Aggregate Functions

Function Argument Type Return Type Description


avg(expression) smallint, integer, numeric for any integer the average (arithmetic
bigint, real, double type argument, double mean) of all input values
precision, numeric, or precision for a
interval floating-point argument,
otherwise the same as the
argument data type
smallint, integer, same as argument data the bitwise AND of all
bit_and(expression) bigint, or bit type non-null input values, or
null if none
bit_or(expression) smallint, integer, same as argument data the bitwise OR of all
bigint, or bit type non-null input values, or
null if none
bool bool true if all input values are
bool_and(expression) true, otherwise false

bool bool true if at least one input


bool_or(expression) value is true, otherwise
false
count(*) bigint number of input values
count(expression) any bigint number of input values
for which the value of
expression is not null

every(expression) bool bool equivalent to bool_and


max(expression) any numeric, string, or same as argument type maximum value of
date/time type expression across all
input values
min(expression) any numeric, string, or same as argument type minimum value of
date/time type expression across all
input values
stddev(expression) smallint, integer, double precision for sample standard deviation
bigint, real, double floating-point arguments, of the input values
precision, or numeric otherwise numeric

173
Chapter 9. Functions and Operators

Function Argument Type Return Type Description


sum(expression) smallint, integer, bigint for smallint sum of expression
bigint, real, double or integer arguments, across all input values
precision, numeric, or numeric for bigint
interval arguments, double
precision for
floating-point arguments,
otherwise the same as the
argument data type
smallint, integer, double precision for sample variance of the
variance(expression)bigint, real, double floating-point arguments, input values (square of
precision, or numeric otherwise numeric the sample standard
deviation)

It should be noted that except for count, these functions return a null value when no rows are selected.
In particular, sum of no rows returns null, not zero as one might expect. The coalesce function may be
used to substitute zero for null when necessary.

Note: Boolean aggregates bool_and and bool_or correspond to standard SQL aggregates every
and any or some. As for any and some, it seems that there is an ambiguity built into the standard
syntax:

SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;

Here ANY can be considered both as leading to a subquery or as an aggregate if the select expression
returns 1 row. Thus the standard name cannot be given to these aggregates.

Note: Users accustomed to working with other SQL database management systems may be surprised
by the performance characteristics of certain aggregate functions in PostgreSQL when the aggregate
is applied to the entire table (in other words, no WHERE clause is specified). In particular, a query like

SELECT min(col) FROM sometable;

will be executed by PostgreSQL using a sequential scan of the entire table. Other database systems
may optimize queries of this form to use an index on the column, if one is available. Similarly, the
aggregate functions max() and count() always require a sequential scan if applied to the entire table
in PostgreSQL.
PostgreSQL cannot easily implement this optimization because it also allows for user-defined ag-
gregate queries. Since min(), max(), and count() are defined using a generic API for aggregate
functions, there is no provision for special-casing the execution of these functions under certain cir-
cumstances.
Fortunately, there is a simple workaround for min() and max(). The query shown below is equivalent
to the query above, except that it can take advantage of a B-tree index if there is one present on the
column in question.

SELECT col FROM sometable ORDER BY col ASC LIMIT 1;

A similar query (obtained by substituting DESC for ASC in the query above) can be used in the place of
max().

174
Chapter 9. Functions and Operators

Unfortunately, there is no similarly trivial query that can be used to improve the performance of
count() when applied to the entire table.

9.16. Subquery Expressions


This section describes the SQL-compliant subquery expressions available in PostgreSQL. All of the ex-
pression forms documented in this section return Boolean (true/false) results.

9.16.1. EXISTS
EXISTS ( subquery )

The argument of EXISTS is an arbitrary SELECT statement, or subquery. The subquery is evaluated to
determine whether it returns any rows. If it returns at least one row, the result of EXISTS is “true”; if the
subquery returns no rows, the result of EXISTS is “false”.
The subquery can refer to variables from the surrounding query, which will act as constants during any
one evaluation of the subquery.
The subquery will generally only be executed far enough to determine whether at least one row is returned,
not all the way to completion. It is unwise to write a subquery that has any side effects (such as calling
sequence functions); whether the side effects occur or not may be difficult to predict.
Since the result depends only on whether any rows are returned, and not on the contents of those rows, the
output list of the subquery is normally uninteresting. A common coding convention is to write all EXISTS
tests in the form EXISTS(SELECT 1 WHERE ...). There are exceptions to this rule however, such as
subqueries that use INTERSECT.
This simple example is like an inner join on col2, but it produces at most one output row for each tab1
row, even if there are multiple matching tab2 rows:

SELECT col1 FROM tab1


WHERE EXISTS(SELECT 1 FROM tab2 WHERE col2 = tab1.col2);

9.16.2. IN
expression IN (subquery)

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand
expression is evaluated and compared to each row of the subquery result. The result of IN is “true” if any
equal subquery row is found. The result is “false” if no equal row is found (including the special case
where the subquery returns no rows).
Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one
right-hand row yields null, the result of the IN construct will be null, not false. This is in accordance with
SQL’s normal rules for Boolean combinations of null values.

175
Chapter 9. Functions and Operators

As with EXISTS, it’s unwise to assume that the subquery will be evaluated completely.

row_constructor IN (subquery)

The left-hand side of this form of IN is a row constructor, as described in Section 4.2.11. The right-hand
side is a parenthesized subquery, which must return exactly as many columns as there are expressions
in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the
subquery result. The result of IN is “true” if any equal subquery row is found. The result is “false” if no
equal row is found (including the special case where the subquery returns no rows).
As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two
rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal
if any corresponding members are non-null and unequal; otherwise the result of that row comparison is
unknown (null). If all the row results are either unequal or null, with at least one null, then the result of IN
is null.

9.16.3. NOT IN
expression NOT IN (subquery)

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand
expression is evaluated and compared to each row of the subquery result. The result of NOT IN is “true”
if only unequal subquery rows are found (including the special case where the subquery returns no rows).
The result is “false” if any equal row is found.
Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one
right-hand row yields null, the result of the NOT IN construct will be null, not true. This is in accordance
with SQL’s normal rules for Boolean combinations of null values.
As with EXISTS, it’s unwise to assume that the subquery will be evaluated completely.

row_constructor NOT IN (subquery)

The left-hand side of this form of NOT IN is a row constructor, as described in Section 4.2.11. The right-
hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions
in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the
subquery result. The result of NOT IN is “true” if only unequal subquery rows are found (including the
special case where the subquery returns no rows). The result is “false” if any equal row is found.
As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two
rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal
if any corresponding members are non-null and unequal; otherwise the result of that row comparison is
unknown (null). If all the row results are either unequal or null, with at least one null, then the result of
NOT IN is null.

9.16.4. ANY/SOME
expression operator ANY (subquery)
expression operator SOME (subquery)

176
Chapter 9. Functions and Operators

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand
expression is evaluated and compared to each row of the subquery result using the given operator,
which must yield a Boolean result. The result of ANY is “true” if any true result is obtained. The result is
“false” if no true result is found (including the special case where the subquery returns no rows).
SOME is a synonym for ANY. IN is equivalent to = ANY.

Note that if there are no successes and at least one right-hand row yields null for the operator’s result,
the result of the ANY construct will be null, not false. This is in accordance with SQL’s normal rules for
Boolean combinations of null values.
As with EXISTS, it’s unwise to assume that the subquery will be evaluated completely.

row_constructor operator ANY (subquery)


row_constructor operator SOME (subquery)

The left-hand side of this form of ANY is a row constructor, as described in Section 4.2.11. The right-hand
side is a parenthesized subquery, which must return exactly as many columns as there are expressions
in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the
subquery result, using the given operator. Presently, only = and <> operators are allowed in row-wise
ANY constructs. The result of ANY is “true” if any equal or unequal row is found, respectively. The result
is “false” if no such row is found (including the special case where the subquery returns no rows).
As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two
rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal
if any corresponding members are non-null and unequal; otherwise the result of that row comparison is
unknown (null). If there is at least one null row result, then the result of ANY cannot be false; it will be
true or null.

9.16.5. ALL
expression operator ALL (subquery)

The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand
expression is evaluated and compared to each row of the subquery result using the given operator,
which must yield a Boolean result. The result of ALL is “true” if all rows yield true (including the special
case where the subquery returns no rows). The result is “false” if any false result is found.
NOT IN is equivalent to <> ALL.

Note that if there are no failures but at least one right-hand row yields null for the operator’s result, the
result of the ALL construct will be null, not true. This is in accordance with SQL’s normal rules for Boolean
combinations of null values.
As with EXISTS, it’s unwise to assume that the subquery will be evaluated completely.

row_constructor operator ALL (subquery)

The left-hand side of this form of ALL is a row constructor, as described in Section 4.2.11. The right-hand
side is a parenthesized subquery, which must return exactly as many columns as there are expressions
in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the
subquery result, using the given operator. Presently, only = and <> operators are allowed in row-wise
ALL queries. The result of ALL is “true” if all subquery rows are equal or unequal, respectively (including

177
Chapter 9. Functions and Operators

the special case where the subquery returns no rows). The result is “false” if any row is found to be unequal
or equal, respectively.
As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two
rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal
if any corresponding members are non-null and unequal; otherwise the result of that row comparison is
unknown (null). If there is at least one null row result, then the result of ALL cannot be true; it will be false
or null.

9.16.6. Row-wise Comparison


row_constructor operator (subquery)

The left-hand side is a row constructor, as described in Section 4.2.11. The right-hand side is a paren-
thesized subquery, which must return exactly as many columns as there are expressions in the left-hand
row. Furthermore, the subquery cannot return more than one row. (If it returns zero rows, the result is
taken to be null.) The left-hand side is evaluated and compared row-wise to the single subquery result
row. Presently, only = and <> operators are allowed in row-wise comparisons. The result is “true” if the
two rows are equal or unequal, respectively.
As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two
rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal
if any corresponding members are non-null and unequal; otherwise the result of the row comparison is
unknown (null).

9.17. Row and Array Comparisons


This section describes several specialized constructs for making multiple comparisons between groups
of values. These forms are syntactically related to the subquery forms of the previous section, but do
not involve subqueries. The forms involving array subexpressions are PostgreSQL extensions; the rest are
SQL-compliant. All of the expression forms documented in this section return Boolean (true/false) results.

9.17.1. IN
expression IN (value[, ...])

The right-hand side is a parenthesized list of scalar expressions. The result is “true” if the left-hand ex-
pression’s result is equal to any of the right-hand expressions. This is a shorthand notation for

expression = value1
OR
expression = value2
OR
...

178
Chapter 9. Functions and Operators

Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one
right-hand expression yields null, the result of the IN construct will be null, not false. This is in accordance
with SQL’s normal rules for Boolean combinations of null values.

9.17.2. NOT IN
expression NOT IN (value[, ...])

The right-hand side is a parenthesized list of scalar expressions. The result is “true” if the left-hand ex-
pression’s result is unequal to all of the right-hand expressions. This is a shorthand notation for

expression <> value1


AND
expression <> value2
AND
...

Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one
right-hand expression yields null, the result of the NOT IN construct will be null, not true as one might
naively expect. This is in accordance with SQL’s normal rules for Boolean combinations of null values.

Tip: x NOT IN y is equivalent to NOT (x IN y) in all cases. However, null values are much more
likely to trip up the novice when working with NOT IN than when working with IN. It’s best to express
your condition positively if possible.

9.17.3. ANY/SOME (array)


expression operator ANY (array expression)
expression operator SOME (array expression)

The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expres-
sion is evaluated and compared to each element of the array using the given operator, which must
yield a Boolean result. The result of ANY is “true” if any true result is obtained. The result is “false” if no
true result is found (including the special case where the array has zero elements).
SOME is a synonym for ANY.

9.17.4. ALL (array)


expression operator ALL (array expression)

The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expres-
sion is evaluated and compared to each element of the array using the given operator, which must
yield a Boolean result. The result of ALL is “true” if all comparisons yield true (including the special case
where the array has zero elements). The result is “false” if any false result is found.

179
Chapter 9. Functions and Operators

9.17.5. Row-wise Comparison


row_constructor operator row_constructor

Each side is a row constructor, as described in Section 4.2.11. The two row values must have the same
number of fields. Each side is evaluated and they are compared row-wise. Presently, only = and <>
operators are allowed in row-wise comparisons. The result is “true” if the two rows are equal or unequal,
respectively.
As usual, null values in the rows are combined per the normal rules of SQL Boolean expressions. Two
rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal
if any corresponding members are non-null and unequal; otherwise the result of the row comparison is
unknown (null).

row_constructor IS DISTINCT FROM row_constructor

This construct is similar to a <> row comparison, but it does not yield null for null inputs. Instead, any
null value is considered unequal to (distinct from) any non-null value, and any two nulls are considered
equal (not distinct). Thus the result will always be either true or false, never null.

row_constructor IS NULL
row_constructor IS NOT NULL

These constructs test a row value for null or not null. A row value is considered not null if it has at least
one field that is not null.

9.18. Set Returning Functions


This section describes functions that possibly return more than one row. Currently the only functions in
this class are series generating functions, as detailed in Table 9-38.

Table 9-38. Series Generating Functions

Function Argument Type Return Type Description


generate_series(start,int or bigint setof int or setof Generate a series of
stop) bigint (same as values, from start to
argument type) stop with a step size of
one.
generate_series(start,int or bigint setof int or setof Generate a series of
stop, step) bigint (same as values, from start to
argument type) stop with a step size of
step.

When step is positive, zero rows are returned if start is greater than stop. Conversely, when step is
negative, zero rows are returned if start is less than stop. Zero rows are also returned for NULL inputs.
It is an error for step to be zero. Some examples follow:

select * from generate_series(2,4);


generate_series

180
Chapter 9. Functions and Operators

-----------------
2
3
4
(3 rows)

select * from generate_series(5,1,-2);


generate_series
-----------------
5
3
1
(3 rows)

select * from generate_series(4,3);


generate_series
-----------------
(0 rows)

select current_date + s.a as dates from generate_series(0,14,7) as s(a);


dates
------------
2004-02-05
2004-02-12
2004-02-19
(3 rows)

9.19. System Information Functions


Table 9-39 shows several functions that extract session and system information.

Table 9-39. Session Information Functions

Name Return Type Description


current_database() name name of current database
current_schema() name name of current schema
current_schemas(boolean) name[] names of schemas in search path
optionally including implicit
schemas
current_user name user name of current execution
context
inet_client_addr() inet address of the remote connection
inet_client_port() int4 port of the remote connection
inet_server_addr() inet address of the local connection
inet_server_port() int4 port of the local connection
session_user name session user name

181
Chapter 9. Functions and Operators

Name Return Type Description


user name equivalent to current_user
version() text PostgreSQL version information

The session_user is normally the user who initiated the current database connection; but superusers
can change this setting with SET SESSION AUTHORIZATION. The current_user is the user identifier
that is applicable for permission checking. Normally, it is equal to the session user, but it changes during
the execution of functions with the attribute SECURITY DEFINER. In Unix parlance, the session user is
the “real user” and the current user is the “effective user”.

Note: current_user, session_user, and user have special syntactic status in SQL: they must be
called without trailing parentheses.

current_schema returns the name of the schema that is at the front of the search path (or a null value
if the search path is empty). This is the schema that will be used for any tables or other named objects
that are created without specifying a target schema. current_schemas(boolean) returns an array of
the names of all schemas presently in the search path. The Boolean option determines whether or not
implicitly included system schemas such as pg_catalog are included in the search path returned.

Note: The search path may be altered at run time. The command is:

SET search_path TO schema [, schema, ...]

inet_client_addr returns the IP address of the current client, and inet_client_port returns the
port number. inet_server_addr returns the IP address on which the server accepted the current con-
nection, and inet_server_port returns the port number. All these functions return NULL if the current
connection is via a Unix-domain socket.
version() returns a string describing the PostgreSQL server’s version.

Table 9-40 lists functions that allow the user to query object access privileges programmatically. See
Section 5.7 for more information about privileges.

Table 9-40. Access Privilege Inquiry Functions

Name Return Type Description


has_table_privilege(user, boolean does user have privilege for table
table, privilege)
has_table_privilege(table, boolean does current user have privilege
privilege) for table
has_database_privilege(user, boolean does user have privilege for
database, privilege) database
has_database_privilege(database
boolean
, does current user have privilege
privilege) for database

182
Chapter 9. Functions and Operators

Name Return Type Description


has_function_privilege(user, boolean does user have privilege for
function, privilege) function
has_function_privilege(function
boolean
, does current user have privilege
privilege) for function
has_language_privilege(user, boolean does user have privilege for
language, privilege) language
has_language_privilege(language
boolean
, does current user have privilege
privilege) for language
has_schema_privilege(user, boolean does user have privilege for
schema, privilege) schema
has_schema_privilege(schema, boolean does current user have privilege
privilege) for schema
has_tablespace_privilege(userboolean
, does user have privilege for
tablespace, privilege) tablespace
has_tablespace_privilege(tablespace
boolean
, does current user have privilege
privilege) for tablespace

has_table_privilege checks whether a user can access a table in a particular way. The user can
be specified by name or by ID (pg_user.usesysid), or if the argument is omitted current_user
is assumed. The table can be specified by name or by OID. (Thus, there are actually six variants of
has_table_privilege, which can be distinguished by the number and types of their arguments.) When
specifying by name, the name can be schema-qualified if necessary. The desired access privilege type is
specified by a text string, which must evaluate to one of the values SELECT, INSERT, UPDATE, DELETE,
RULE, REFERENCES, or TRIGGER. (Case of the string is not significant, however.) An example is:

SELECT has_table_privilege(’myschema.mytable’, ’select’);

has_database_privilege checks whether a user can access a database in a particular way. The pos-
sibilities for its arguments are analogous to has_table_privilege. The desired access privilege type
must evaluate to CREATE, TEMPORARY, or TEMP (which is equivalent to TEMPORARY).
has_function_privilege checks whether a user can access a function in a particular way. The possi-
bilities for its arguments are analogous to has_table_privilege. When specifying a function by a text
string rather than by OID, the allowed input is the same as for the regprocedure data type (see Section
8.12). The desired access privilege type must evaluate to EXECUTE. An example is:

SELECT has_function_privilege(’joeuser’, ’myfunc(int, text)’, ’execute’);

has_language_privilege checks whether a user can access a procedural language in a particular way.
The possibilities for its arguments are analogous to has_table_privilege. The desired access privilege
type must evaluate to USAGE.
has_schema_privilege checks whether a user can access a schema in a particular way. The possibili-
ties for its arguments are analogous to has_table_privilege. The desired access privilege type must
evaluate to CREATE or USAGE.

183
Chapter 9. Functions and Operators

has_tablespace_privilege checks whether a user can access a tablespace in a particular way. The
possibilities for its arguments are analogous to has_table_privilege. The desired access privilege
type must evaluate to CREATE.
To test whether a user holds a grant option on the privilege, append WITH GRANT OPTION to the privi-
lege key word; for example ’UPDATE WITH GRANT OPTION’.
Table 9-41 shows functions that determine whether a certain object is visible in the current schema search
path. A table is said to be visible if its containing schema is in the search path and no table of the same
name appears earlier in the search path. This is equivalent to the statement that the table can be referenced
by name without explicit schema qualification. For example, to list the names of all visible tables:

SELECT relname FROM pg_class WHERE pg_table_is_visible(oid);

Table 9-41. Schema Visibility Inquiry Functions

Name Return Type Description


pg_table_is_visible(table_oidboolean
) is table visible in search path

pg_type_is_visible(type_oid) boolean is type (or domain) visible in


search path
pg_function_is_visible(function_oid
boolean
) is function visible in search path

pg_operator_is_visible(operator_oid
boolean
) is operator visible in search path

pg_opclass_is_visible(opclass_oid
boolean
) is operator class visible in search
path
pg_conversion_is_visible(conversion_oid
boolean ) is conversion visible in search path

pg_table_is_visible performs the check for tables (or views, or any other kind of pg_class
entry). pg_type_is_visible, pg_function_is_visible, pg_operator_is_visible,
pg_opclass_is_visible, and pg_conversion_is_visible perform the same sort of visibility
check for types (and domains), functions, operators, operator classes and conversions, respectively. For
functions and operators, an object in the search path is visible if there is no object of the same name and
argument data type(s) earlier in the path. For operator classes, both name and associated index access
method are considered.
All these functions require object OIDs to identify the object to be checked. If you want to test an
object by name, it is convenient to use the OID alias types (regclass, regtype, regprocedure, or
regoperator), for example

SELECT pg_type_is_visible(’myschema.widget’::regtype);

Note that it would not make much sense to test an unqualified name in this way — if the name can be
recognized at all, it must be visible.
Table 9-42 lists functions that extract information from the system catalogs.

184
Chapter 9. Functions and Operators

Table 9-42. System Catalog Information Functions

Name Return Type Description


pg_get_viewdef(view_name) text get CREATE VIEW command for
view (deprecated)
pg_get_viewdef(view_name, text get CREATE VIEW command for
pretty_bool) view (deprecated)
pg_get_viewdef(view_oid) text get CREATE VIEW command for
view
pg_get_viewdef(view_oid, text get CREATE VIEW command for
pretty_bool) view
pg_get_ruledef(rule_oid) text get CREATE RULE command for
rule
pg_get_ruledef(rule_oid, text get CREATE RULE command for
pretty_bool) rule
pg_get_indexdef(index_oid) text get CREATE INDEX command for
index
pg_get_indexdef(index_oid, text get CREATE INDEX command for
column_no, pretty_bool) index, or definition of just one
index column when column_no is
not zero
pg_get_triggerdef(trigger_oid
text) get CREATE [ CONSTRAINT ]
TRIGGER command for trigger
pg_get_constraintdef(constraint_oid
text ) get definition of a constraint

pg_get_constraintdef(constraint_oid
text , get definition of a constraint
pretty_bool)
pg_get_expr(expr_text, text decompile internal form of an
relation_oid) expression, assuming that any Vars
in it refer to the relation indicated
by the second parameter
pg_get_expr(expr_text, text decompile internal form of an
relation_oid, pretty_bool) expression, assuming that any Vars
in it refer to the relation indicated
by the second parameter
pg_get_userbyid(userid) name get user name with given ID
pg_get_serial_sequence(table_name
text, get name of the sequence that a
column_name) serial or bigserial column
uses
pg_tablespace_databases(tablespace_oid
setof oid) get set of database OIDs that have
objects in the tablespace

pg_get_viewdef, pg_get_ruledef, pg_get_indexdef, pg_get_triggerdef, and


pg_get_constraintdef respectively reconstruct the creating command for a view, rule, index, trigger,
or constraint. (Note that this is a decompiled reconstruction, not the original text of the command.)

185
Chapter 9. Functions and Operators

pg_get_expr decompiles the internal form of an individual expression, such as the default value for a
column. It may be useful when examining the contents of system catalogs. Most of these functions come
in two variants, one of which can optionally “pretty-print” the result. The pretty-printed format is more
readable, but the default format is more likely to be interpreted the same way by future versions of
PostgreSQL; avoid using pretty-printed output for dump purposes. Passing false for the pretty-print
parameter yields the same result as the variant that does not have the parameter at all.
pg_get_userbyid extracts a user’s name given a user ID number. pg_get_serial_sequence fetches
the name of the sequence associated with a serial or bigserial column. The name is suitably formatted
for passing to the sequence functions (see Section 9.12). NULL is returned if the column does not have a
sequence attached.
pg_tablespace_databases allows usage examination of a tablespace. It will return a set of OIDs of
databases that have objects stored in the tablespace. If this function returns any row, the tablespace is not
empty and cannot be dropped. To display the specific objects populating the tablespace, you will need to
connect to the databases identified by pg_tablespace_databases and query their pg_class catalogs.
The functions shown in Table 9-43 extract comments previously stored with the COMMENT command. A
null value is returned if no comment could be found matching the specified parameters.

Table 9-43. Comment Information Functions

Name Return Type Description


obj_description(object_oid, text get comment for a database object
catalog_name)
obj_description(object_oid) text get comment for a database object
(deprecated)
col_description(table_oid, text get comment for a table column
column_number)

The two-parameter form of obj_description returns the comment for a database


object specified by its OID and the name of the containing system catalog. For example,
obj_description(123456,’pg_class’) would retrieve the comment for a table with OID 123456.
The one-parameter form of obj_description requires only the object OID. It is now deprecated
since there is no guarantee that OIDs are unique across different system catalogs; therefore, the wrong
comment could be returned.
col_description returns the comment for a table column, which is specified by the OID of its table
and its column number. obj_description cannot be used for table columns since columns do not have
OIDs of their own.

9.20. System Administration Functions


Table 9-44 shows the functions available to query and alter run-time configuration parameters.

Table 9-44. Configuration Settings Functions

Name Return Type Description

186
Chapter 9. Functions and Operators

Name Return Type Description


text current value of setting
current_setting(setting_name)

set_config(setting_name, text set parameter and return new value


new_value, is_local)

The function current_setting yields the current value of the setting setting_name. It corresponds
to the SQL command SHOW. An example:

SELECT current_setting(’datestyle’);

current_setting
-----------------
ISO, MDY
(1 row)

set_config sets the parameter setting_name to new_value. If is_local is true, the new value
will only apply to the current transaction. If you want the new value to apply for the current session, use
false instead. The function corresponds to the SQL command SET. An example:

SELECT set_config(’log_statement_stats’, ’off’, false);

set_config
------------
off
(1 row)

The function shown in Table 9-45 sends control signals to other server processes. Use of this function is
restricted to superusers.

Table 9-45. Backend Signalling Functions

Name Return Type Description


pg_cancel_backend(pid) int Cancel a backend’s current query

This function returns 1 if successful, 0 if not successful. The process ID (pid) of an active backend can be
found from the procpid column in the pg_stat_activity view, or by listing the postgres processes
on the server with ps.
The functions shown in Table 9-46 assist in making on-line backups. Use of these functions is restricted
to superusers.

Table 9-46. Backup Control Functions

Name Return Type Description

187
Chapter 9. Functions and Operators

Name Return Type Description


pg_start_backup(label_text) text Set up for performing on-line
backup
pg_stop_backup() text Finish performing on-line backup

pg_start_backup accepts a single parameter which is an arbitrary user-defined label for the backup.
(Typically this would be the name under which the backup dump file will be stored.) The function writes
a backup label file into the database cluster’s data directory, and then returns the backup’s starting WAL
offset as text. (The user need not pay any attention to this result value, but it is provided in case it is of
use.)
pg_stop_backup removes the label file created by pg_start_backup, and instead creates a backup
history file in the WAL archive area. The history file includes the label given to pg_start_backup, the
starting and ending WAL offsets for the backup, and the starting and ending times of the backup. The
return value is the backup’s ending WAL offset (which again may be of little interest).
For details about proper usage of these functions, see Section 22.3.

188
Chapter 10. Type Conversion
SQL statements can, intentionally or not, require mixing of different data types in the same expression.
PostgreSQL has extensive facilities for evaluating mixed-type expressions.
In many cases a user will not need to understand the details of the type conversion mechanism. However,
the implicit conversions done by PostgreSQL can affect the results of a query. When necessary, these
results can be tailored by using explicit type conversion.
This chapter introduces the PostgreSQL type conversion mechanisms and conventions. Refer to the rele-
vant sections in Chapter 8 and Chapter 9 for more information on specific data types and allowed functions
and operators.

10.1. Overview
SQL is a strongly typed language. That is, every data item has an associated data type which determines
its behavior and allowed usage. PostgreSQL has an extensible type system that is much more general
and flexible than other SQL implementations. Hence, most type conversion behavior in PostgreSQL is
governed by general rules rather than by ad hoc heuristics. This allows mixed-type expressions to be
meaningful even with user-defined types.
The PostgreSQL scanner/parser divides lexical elements into only five fundamental categories: integers,
non-integer numbers, strings, identifiers, and key words. Constants of most non-numeric types are first
classified as strings. The SQL language definition allows specifying type names with strings, and this
mechanism can be used in PostgreSQL to start the parser down the correct path. For example, the query

SELECT text ’Origin’ AS "label", point ’(0,0)’ AS "value";

label | value
--------+-------
Origin | (0,0)
(1 row)

has two literal constants, of type text and point. If a type is not specified for a string literal, then the
placeholder type unknown is assigned initially, to be resolved in later stages as described below.
There are four fundamental SQL constructs requiring distinct type conversion rules in the PostgreSQL
parser:

Function calls
Much of the PostgreSQL type system is built around a rich set of functions. Functions can have one
or more arguments. Since PostgreSQL permits function overloading, the function name alone does
not uniquely identify the function to be called; the parser must select the right function based on the
data types of the supplied arguments.
Operators
PostgreSQL allows expressions with prefix and postfix unary (one-argument) operators, as well as
binary (two-argument) operators. Like functions, operators can be overloaded, and so the same prob-
lem of selecting the right operator exists.

189
Chapter 10. Type Conversion

Value Storage
SQL INSERT and UPDATE statements place the results of expressions into a table. The expressions
in the statement must be matched up with, and perhaps converted to, the types of the target columns.
UNION, CASE, and ARRAY constructs

Since all query results from a unionized SELECT statement must appear in a single set of columns,
the types of the results of each SELECT clause must be matched up and converted to a uniform set.
Similarly, the result expressions of a CASE construct must be converted to a common type so that the
CASE expression as a whole has a known output type. The same holds for ARRAY constructs.

The system catalogs store information about which conversions, called casts, between data types are valid,
and how to perform those conversions. Additional casts can be added by the user with the CREATE CAST
command. (This is usually done in conjunction with defining new data types. The set of casts between the
built-in types has been carefully crafted and is best not altered.)
An additional heuristic is provided in the parser to allow better guesses at proper behavior for SQL stan-
dard types. There are several basic type categories defined: boolean, numeric, string, bitstring,
datetime, timespan, geometric, network, and user-defined. Each category, with the exception of
user-defined, has one or more preferred types which are preferentially selected when there is ambiguity.
In the user-defined category, each type is its own preferred type. Ambiguous expressions (those with mul-
tiple candidate parsing solutions) can therefore often be resolved when there are multiple possible built-in
types, but they will raise an error when there are multiple choices for user-defined types.
All type conversion rules are designed with several principles in mind:

• Implicit conversions should never have surprising or unpredictable outcomes.


• User-defined types, of which the parser has no a priori knowledge, should be “higher” in the type
hierarchy. In mixed-type expressions, native types shall always be converted to a user-defined type (of
course, only if conversion is necessary).
• User-defined types are not related. Currently, PostgreSQL does not have information available to it on
relationships between types, other than hardcoded heuristics for built-in types and implicit relationships
based on available functions and casts.
• There should be no extra overhead from the parser or executor if a query does not need implicit type
conversion. That is, if a query is well formulated and the types already match up, then the query should
proceed without spending extra time in the parser and without introducing unnecessary implicit con-
version calls into the query.
Additionally, if a query usually requires an implicit conversion for a function, and if then the user
defines a new function with the correct argument types, the parser should use this new function and will
no longer do the implicit conversion using the old function.

190
Chapter 10. Type Conversion

10.2. Operators
The specific operator to be used in an operator invocation is determined by following the procedure below.
Note that this procedure is indirectly affected by the precedence of the involved operators. See Section
4.1.6 for more information.

Operator Type Resolution

1. Select the operators to be considered from the pg_operator system catalog. If an unqualified opera-
tor name was used (the usual case), the operators considered are those of the right name and argument
count that are visible in the current search path (see Section 5.8.3). If a qualified operator name was
given, only operators in the specified schema are considered.

a. If the search path finds multiple operators of identical argument types, only the one ap-
pearing earliest in the path is considered. But operators of different argument types are
considered on an equal footing regardless of search path position.
2. Check for an operator accepting exactly the input argument types. If one exists (there can be only one
exact match in the set of operators considered), use it.

a. If one argument of a binary operator invocation is of the unknown type, then assume it is the
same type as the other argument for this check. Other cases involving unknown will never
find a match at this step.
3. Look for the best match.

a. Discard candidate operators for which the input types do not match and cannot be converted
(using an implicit conversion) to match. unknown literals are assumed to be convertible to
anything for this purpose. If only one candidate remains, use it; else continue to the next
step.
b. Run through all candidates and keep those with the most exact matches on input types.
(Domains are considered the same as their base type for this purpose.) Keep all candidates
if none have any exact matches. If only one candidate remains, use it; else continue to the
next step.
c. Run through all candidates and keep those that accept preferred types (of the input data
type’s type category) at the most positions where type conversion will be required. Keep
all candidates if none accept preferred types. If only one candidate remains, use it; else
continue to the next step.
d. If any input arguments are unknown, check the type categories accepted at those argument
positions by the remaining candidates. At each position, select the string category if any
candidate accepts that category. (This bias towards string is appropriate since an unknown-
type literal does look like a string.) Otherwise, if all the remaining candidates accept the
same type category, select that category; otherwise fail because the correct choice cannot
be deduced without more clues. Now discard candidates that do not accept the selected
type category. Furthermore, if any candidate accepts a preferred type at a given argument
position, discard candidates that accept non-preferred types for that argument.
e. If only one candidate remains, use it. If no candidate or more than one candidate remains,
then fail.

191
Chapter 10. Type Conversion

Some examples follow.

Example 10-1. Exponentiation Operator Type Resolution

There is only one exponentiation operator defined in the catalog, and it takes arguments of type double
precision. The scanner assigns an initial type of integer to both arguments of this query expression:
SELECT 2 ^ 3 AS "exp";

exp
-----
8
(1 row)
So the parser does a type conversion on both operands and the query is equivalent to
SELECT CAST(2 AS double precision) ^ CAST(3 AS double precision) AS "exp";

Example 10-2. String Concatenation Operator Type Resolution

A string-like syntax is used for working with string types as well as for working with complex extension
types. Strings with unspecified type are matched with likely operator candidates.
An example with one unspecified argument:
SELECT text ’abc’ || ’def’ AS "text and unknown";

text and unknown


------------------
abcdef
(1 row)

In this case the parser looks to see if there is an operator taking text for both arguments. Since there is,
it assumes that the second argument should be interpreted as of type text.
Here is a concatenation on unspecified types:
SELECT ’abc’ || ’def’ AS "unspecified";

unspecified
-------------
abcdef
(1 row)

In this case there is no initial hint for which type to use, since no types are specified in the query. So, the
parser looks for all candidate operators and finds that there are candidates accepting both string-category
and bit-string-category inputs. Since string category is preferred when available, that category is selected,
and then the preferred type for strings, text, is used as the specific type to resolve the unknown literals
to.

192
Chapter 10. Type Conversion

Example 10-3. Absolute-Value and Negation Operator Type Resolution

The PostgreSQL operator catalog has several entries for the prefix operator @, all of which implement
absolute-value operations for various numeric data types. One of these entries is for type float8, which
is the preferred type in the numeric category. Therefore, PostgreSQL will use that entry when faced with
a non-numeric input:
SELECT @ ’-4.5’ AS "abs";
abs
-----
4.5
(1 row)
Here the system has performed an implicit conversion from text to float8 before applying the chosen
operator. We can verify that float8 and not some other type was used:
SELECT @ ’-4.5e500’ AS "abs";

ERROR: "-4.5e500" is out of range for type double precision

On the other hand, the prefix operator ~ (bitwise negation) is defined only for integer data types, not for
float8. So, if we try a similar case with ~, we get:
SELECT ~ ’20’ AS "negation";

ERROR: operator is not unique: ~ "unknown"


HINT: Could not choose a best candidate operator. You may need to add explicit
type casts.
This happens because the system can’t decide which of the several possible ~ operators should be pre-
ferred. We can help it out with an explicit cast:
SELECT ~ CAST(’20’ AS int8) AS "negation";

negation
----------
-21
(1 row)

10.3. Functions
The specific function to be used in a function invocation is determined according to the following steps.

Function Type Resolution

1. Select the functions to be considered from the pg_proc system catalog. If an unqualified function
name was used, the functions considered are those of the right name and argument count that are
visible in the current search path (see Section 5.8.3). If a qualified function name was given, only
functions in the specified schema are considered.

a. If the search path finds multiple functions of identical argument types, only the one ap-
pearing earliest in the path is considered. But functions of different argument types are
considered on an equal footing regardless of search path position.

193
Chapter 10. Type Conversion

2. Check for a function accepting exactly the input argument types. If one exists (there can be only one
exact match in the set of functions considered), use it. (Cases involving unknown will never find a
match at this step.)
3. If no exact match is found, see whether the function call appears to be a trivial type conversion request.
This happens if the function call has just one argument and the function name is the same as the
(internal) name of some data type. Furthermore, the function argument must be either an unknown-
type literal or a type that is binary-compatible with the named data type. When these conditions are
met, the function argument is converted to the named data type without any actual function call.
4. Look for the best match.

a. Discard candidate functions for which the input types do not match and cannot be converted
(using an implicit conversion) to match. unknown literals are assumed to be convertible to
anything for this purpose. If only one candidate remains, use it; else continue to the next
step.
b. Run through all candidates and keep those with the most exact matches on input types.
(Domains are considered the same as their base type for this purpose.) Keep all candidates
if none have any exact matches. If only one candidate remains, use it; else continue to the
next step.
c. Run through all candidates and keep those that accept preferred types (of the input data
type’s type category) at the most positions where type conversion will be required. Keep
all candidates if none accept preferred types. If only one candidate remains, use it; else
continue to the next step.
d. If any input arguments are unknown, check the type categories accepted at those argument
positions by the remaining candidates. At each position, select the string category if any
candidate accepts that category. (This bias towards string is appropriate since an unknown-
type literal does look like a string.) Otherwise, if all the remaining candidates accept the
same type category, select that category; otherwise fail because the correct choice cannot
be deduced without more clues. Now discard candidates that do not accept the selected
type category. Furthermore, if any candidate accepts a preferred type at a given argument
position, discard candidates that accept non-preferred types for that argument.
e. If only one candidate remains, use it. If no candidate or more than one candidate remains,
then fail.

Note that the “best match” rules are identical for operator and function type resolution. Some examples
follow.

Example 10-4. Rounding Function Argument Type Resolution

There is only one round function with two arguments. (The first is numeric, the second is integer.)
So the following query automatically converts the first argument of type integer to numeric:
SELECT round(4, 4);

round
--------
4.0000
(1 row)

194
Chapter 10. Type Conversion

That query is actually transformed by the parser to


SELECT round(CAST (4 AS numeric), 4);

Since numeric constants with decimal points are initially assigned the type numeric, the following query
will require no type conversion and may therefore be slightly more efficient:
SELECT round(4.0, 4);

Example 10-5. Substring Function Type Resolution

There are several substr functions, one of which takes types text and integer. If called with a string
constant of unspecified type, the system chooses the candidate function that accepts an argument of the
preferred category string (namely of type text).
SELECT substr(’1234’, 3);

substr
--------
34
(1 row)

If the string is declared to be of type varchar, as might be the case if it comes from a table, then the
parser will try to convert it to become text:
SELECT substr(varchar ’1234’, 3);

substr
--------
34
(1 row)
This is transformed by the parser to effectively become
SELECT substr(CAST (varchar ’1234’ AS text), 3);

Note: The parser learns from the pg_cast catalog that text and varchar are binary-compatible,
meaning that one can be passed to a function that accepts the other without doing any physical
conversion. Therefore, no explicit type conversion call is really inserted in this case.

And, if the function is called with an argument of type integer, the parser will try to convert that to
text:
SELECT substr(1234, 3);

substr
--------
34
(1 row)
This actually executes as
SELECT substr(CAST (1234 AS text), 3);

195
Chapter 10. Type Conversion

This automatic transformation can succeed because there is an implicitly invocable cast from integer to
text.

10.4. Value Storage


Values to be inserted into a table are converted to the destination column’s data type according to the
following steps.

Value Storage Type Conversion

1. Check for an exact match with the target.


2. Otherwise, try to convert the expression to the target type. This will succeed if there is a registered
cast between the two types. If the expression is an unknown-type literal, the contents of the literal
string will be fed to the input conversion routine for the target type.
3. Check to see if there is a sizing cast for the target type. A sizing cast is a cast from that type to
itself. If one is found in the pg_cast catalog, apply it to the expression before storing into the
destination column. The implementation function for such a cast always takes an extra parameter
of type integer, which receives the destination column’s declared length (actually, its atttypmod
value; the interpretation of atttypmod varies for different datatypes). The cast function is responsible
for applying any length-dependent semantics such as size checking or truncation.

Example 10-6. character Storage Type Conversion

For a target column declared as character(20) the following statement ensures that the stored value is
sized correctly:
CREATE TABLE vv (v character(20));
INSERT INTO vv SELECT ’abc’ || ’def’;
SELECT v, length(v) FROM vv;

v | length
----------------------+--------
abcdef | 20
(1 row)

What has really happened here is that the two unknown literals are resolved to text by default, allowing
the || operator to be resolved as text concatenation. Then the text result of the operator is converted to
bpchar (“blank-padded char”, the internal name of the character data type) to match the target column
type. (Since the types text and bpchar are binary-compatible, this conversion does not insert any real
function call.) Finally, the sizing function bpchar(bpchar, integer) is found in the system catalog
and applied to the operator’s result and the stored column length. This type-specific function performs the
required length check and addition of padding spaces.

196
Chapter 10. Type Conversion

10.5. UNION, CASE, and ARRAY Constructs


SQL UNION constructs must match up possibly dissimilar types to become a single result set. The resolu-
tion algorithm is applied separately to each output column of a union query. The INTERSECT and EXCEPT
constructs resolve dissimilar types in the same way as UNION. The CASE and ARRAY constructs use the
identical algorithm to match up their component expressions and select a result data type.

UNION, CASE, and ARRAY Type Resolution

1. If all inputs are of type unknown, resolve as type text (the preferred type of the string category).
Otherwise, ignore the unknown inputs while choosing the result type.
2. If the non-unknown inputs are not all of the same type category, fail.
3. Choose the first non-unknown input type which is a preferred type in that category or allows all the
non-unknown inputs to be implicitly converted to it.
4. Convert all inputs to the selected type.

Some examples follow.

Example 10-7. Type Resolution with Underspecified Types in a Union

SELECT text ’a’ AS "text" UNION SELECT ’b’;

text
------
a
b
(2 rows)
Here, the unknown-type literal ’b’ will be resolved as type text.

Example 10-8. Type Resolution in a Simple Union

SELECT 1.2 AS "numeric" UNION SELECT 1;

numeric
---------
1
1.2
(2 rows)
The literal 1.2 is of type numeric, and the integer value 1 can be cast implicitly to numeric, so that
type is used.

197
Chapter 10. Type Conversion

Example 10-9. Type Resolution in a Transposed Union

SELECT 1 AS "real" UNION SELECT CAST(’2.2’ AS REAL);

real
------
1
2.2
(2 rows)
Here, since type real cannot be implicitly cast to integer, but integer can be implicitly cast to real,
the union result type is resolved as real.

198
Chapter 11. Indexes
Indexes are a common way to enhance database performance. An index allows the database server to find
and retrieve specific rows much faster than it could do without an index. But indexes also add overhead to
the database system as a whole, so they should be used sensibly.

11.1. Introduction
Suppose we have a table similar to this:

CREATE TABLE test1 (


id integer,
content varchar
);

and the application requires a lot of queries of the form

SELECT content FROM test1 WHERE id = constant;

With no advance preparation, the system would have to scan the entire test1 table, row by row, to find all
matching entries. If there are a lot of rows in test1 and only a few rows (perhaps only zero or one) that
would be returned by such a query, then this is clearly an inefficient method. But if the system has been
instructed to maintain an index on the id column, then it can use a more efficient method for locating
matching rows. For instance, it might only have to walk a few levels deep into a search tree.
A similar approach is used in most books of non-fiction: terms and concepts that are frequently looked
up by readers are collected in an alphabetic index at the end of the book. The interested reader can scan
the index relatively quickly and flip to the appropriate page(s), rather than having to read the entire book
to find the material of interest. Just as it is the task of the author to anticipate the items that the readers
are most likely to look up, it is the task of the database programmer to foresee which indexes would be of
advantage.
The following command would be used to create the index on the id column, as discussed:

CREATE INDEX test1_id_index ON test1 (id);

The name test1_id_index can be chosen freely, but you should pick something that enables you to
remember later what the index was for.
To remove an index, use the DROP INDEX command. Indexes can be added to and removed from tables at
any time.
Once an index is created, no further intervention is required: the system will update the index when the
table is modified, and it will use the index in queries when it thinks this would be more efficient than a
sequential table scan. But you may have to run the ANALYZE command regularly to update statistics to
allow the query planner to make educated decisions. See Chapter 13 for information about how to find out
whether an index is used and when and why the planner may choose not to use an index.
Indexes can also benefit UPDATE and DELETE commands with search conditions. Indexes can moreover
be used in join queries. Thus, an index defined on a column that is part of a join condition can significantly
speed up queries with joins.

199
Chapter 11. Indexes

When an index is created, the system has to keep it synchronized with the table. This adds overhead to
data manipulation operations. Therefore indexes that are non-essential or do not get used at all should be
removed. Note that a query or data manipulation command can use at most one index per table.

11.2. Index Types


PostgreSQL provides several index types: B-tree, R-tree, Hash, and GiST. Each index type uses a different
algorithm that is best suited to different types of queries. By default, the CREATE INDEX command will
create a B-tree index, which fits the most common situations.
B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular,
the PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved
in a comparison using one of these operators:

<
<=
=
>=
>

Constructs equivalent to combinations of these operators, such as BETWEEN and IN, can also be imple-
mented with a B-tree index search. (But note that IS NULL is not equivalent to = and is not indexable.)
The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE,
ILIKE, ~, and ~*, if the pattern is anchored to the beginning of the string, e.g., col LIKE ’foo%’ or
col ~ ’^foo’, but not col LIKE ’%bar’. However, if your server does not use the C locale you will
need to create the index with a special operator class to support indexing of pattern-matching queries. See
Section 11.6 below.
R-tree indexes are suited for queries on spatial data. To create an R-tree index, use a command of the form

CREATE INDEX name ON table USING RTREE (column);

The PostgreSQL query planner will consider using an R-tree index whenever an indexed column is in-
volved in a comparison using one of these operators:

<<
&<
&>
>>
@
~=
&&

(See Section 9.10 for the meaning of these operators.)


Hash indexes can only handle simple equality comparisons. The query planner will consider using a
hash index whenever an indexed column is involved in a comparison using the = operator. The following
command is used to create a hash index:

CREATE INDEX name ON table USING HASH (column);

200
Chapter 11. Indexes

Note: Testing has shown PostgreSQL’s hash indexes to perform no better than B-tree indexes, and
the index size and build time for hash indexes is much worse. For these reasons, hash index use is
presently discouraged.

GiST indexes are not a single kind of index, but rather an infrastructure within which many different in-
dexing strategies can be implemented. Accordingly, the particular operators with which a GiST index can
be used vary depending on the indexing strategy (the operator class). For more information see Chapter
48.
The B-tree index method is an implementation of Lehman-Yao high-concurrency B-trees. The R-tree
index method implements standard R-trees using Guttman’s quadratic split algorithm. The hash index
method is an implementation of Litwin’s linear hashing. We mention the algorithms used solely to indicate
that all of these index methods are fully dynamic and do not have to be optimized periodically (as is the
case with, for example, static hash methods).

11.3. Multicolumn Indexes


An index can be defined on more than one column. For example, if you have a table of this form:

CREATE TABLE test2 (


major int,
minor int,
name varchar
);

(say, you keep your /dev directory in a database...) and you frequently make queries like

SELECT name FROM test2 WHERE major = constant AND minor = constant;

then it may be appropriate to define an index on the columns major and minor together, e.g.,

CREATE INDEX test2_mm_idx ON test2 (major, minor);

Currently, only the B-tree and GiST implementations support multicolumn indexes. Up to 32 columns may
be specified. (This limit can be altered when building PostgreSQL; see the file pg_config_manual.h.)
The query planner can use a multicolumn index for queries that involve the leftmost column in the index
definition plus any number of columns listed to the right of it, without a gap. For example, an index on
(a, b, c) can be used in queries involving all of a, b, and c, or in queries involving both a and b,
or in queries involving only a, but not in other combinations. (In a query involving a and c the planner
could choose to use the index for a, while treating c like an ordinary unindexed column.) Of course, each
column must be used with operators appropriate to the index type; clauses that involve other operators
will not be considered.
Multicolumn indexes can only be used if the clauses involving the indexed columns are joined with AND.
For instance,

201
Chapter 11. Indexes

SELECT name FROM test2 WHERE major = constant OR minor = constant;

cannot make use of the index test2_mm_idx defined above to look up both columns. (It can be used to
look up only the major column, however.)
Multicolumn indexes should be used sparingly. Most of the time, an index on a single column is sufficient
and saves space and time. Indexes with more than three columns are unlikely to be helpful unless the
usage of the table is extremely stylized.

11.4. Unique Indexes


Indexes may also be used to enforce uniqueness of a column’s value, or the uniqueness of the combined
values of more than one column.

CREATE UNIQUE INDEX name ON table (column [, ...]);

Currently, only B-tree indexes can be declared unique.


When an index is declared unique, multiple table rows with equal indexed values will not be allowed.
Null values are not considered equal. A multicolumn unique index will only reject cases where all of the
indexed columns are equal in two rows.
PostgreSQL automatically creates a unique index when a unique constraint or a primary key is defined for
a table. The index covers the columns that make up the primary key or unique columns (a multicolumn
index, if appropriate), and is the mechanism that enforces the constraint.

Note: The preferred way to add a unique constraint to a table is ALTER TABLE ... ADD CONSTRAINT.
The use of indexes to enforce unique constraints could be considered an implementation detail that
should not be accessed directly. One should, however, be aware that there’s no need to manually
create indexes on unique columns; doing so would just duplicate the automatically-created index.

11.5. Indexes on Expressions


An index column need not be just a column of the underlying table, but can be a function or scalar
expression computed from one or more columns of the table. This feature is useful to obtain fast access
to tables based on the results of computations.
For example, a common way to do case-insensitive comparisons is to use the lower function:

SELECT * FROM test1 WHERE lower(col1) = ’value’;

This query can use an index, if one has been defined on the result of the lower(col1) operation:

CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1));

202
Chapter 11. Indexes

If we were to declare this index UNIQUE, it would prevent creation of rows whose col1 values differ only
in case, as well as rows whose col1 values are actually identical. Thus, indexes on expressions can be
used to enforce constraints that are not definable as simple unique constraints.
As another example, if one often does queries like this:

SELECT * FROM people WHERE (first_name || ’ ’ || last_name) = ’John Smith’;

then it might be worth creating an index like this:

CREATE INDEX people_names ON people ((first_name || ’ ’ || last_name));

The syntax of the CREATE INDEX command normally requires writing parentheses around index expres-
sions, as shown in the second example. The parentheses may be omitted when the expression is just a
function call, as in the first example.
Index expressions are relatively expensive to maintain, since the derived expression(s) must be computed
for each row upon insertion or whenever it is updated. Therefore they should be used only when queries
that can use the index are very frequent.

11.6. Operator Classes


An index definition may specify an operator class for each column of an index.

CREATE INDEX name ON table (column opclass [, ...]);

The operator class identifies the operators to be used by the index for that column. For example, a B-
tree index on the type int4 would use the int4_ops class; this operator class includes comparison
functions for values of type int4. In practice the default operator class for the column’s data type is
usually sufficient. The main point of having operator classes is that for some data types, there could be
more than one meaningful index behavior. For example, we might want to sort a complex-number data
type either by absolute value or by real part. We could do this by defining two operator classes for the data
type and then selecting the proper class when making an index.
There are also some built-in operator classes besides the default ones:

• The operator classes text_pattern_ops, varchar_pattern_ops, bpchar_pattern_ops, and


name_pattern_ops support B-tree indexes on the types text, varchar, char, and name, respec-
tively. The difference from the ordinary operator classes is that the values are compared strictly charac-
ter by character rather than according to the locale-specific collation rules. This makes these operator
classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular
expressions) if the server does not use the standard “C” locale. As an example, you might index a
varchar column like this:
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);

If you do use the C locale, you may instead create an index with the default operator class, and it
will still be useful for pattern-matching queries. Also note that you should create an index with the
default operator class if you want queries involving ordinary comparisons to use an index. Such queries

203
Chapter 11. Indexes

cannot use the xxx_pattern_ops operator classes. It is allowed to create multiple indexes on the
same column with different operator classes.

The following query shows all defined operator classes:

SELECT am.amname AS index_method,


opc.opcname AS opclass_name
FROM pg_am am, pg_opclass opc
WHERE opc.opcamid = am.oid
ORDER BY index_method, opclass_name;

It can be extended to show all the operators included in each class:

SELECT am.amname AS index_method,


opc.opcname AS opclass_name,
opr.oprname AS opclass_operator
FROM pg_am am, pg_opclass opc, pg_amop amop, pg_operator opr
WHERE opc.opcamid = am.oid AND
amop.amopclaid = opc.oid AND
amop.amopopr = opr.oid
ORDER BY index_method, opclass_name, opclass_operator;

11.7. Partial Indexes


A partial index is an index built over a subset of a table; the subset is defined by a conditional expression
(called the predicate of the partial index). The index contains entries for only those table rows that satisfy
the predicate.
A major motivation for partial indexes is to avoid indexing common values. Since a query searching for a
common value (one that accounts for more than a few percent of all the table rows) will not use the index
anyway, there is no point in keeping those rows in the index at all. This reduces the size of the index,
which will speed up queries that do use the index. It will also speed up many table update operations
because the index does not need to be updated in all cases. Example 11-1 shows a possible application of
this idea.

Example 11-1. Setting up a Partial Index to Exclude Common Values

Suppose you are storing web server access logs in a database. Most accesses originate from the IP address
range of your organization but some are from elsewhere (say, employees on dial-up connections). If your
searches by IP are primarily for outside accesses, you probably do not need to index the IP range that
corresponds to your organization’s subnet.
Assume a table like this:
CREATE TABLE access_log (
url varchar,
client_ip inet,
...
);

204
Chapter 11. Indexes

To create a partial index that suits our example, use a command such as this:
CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
WHERE NOT (client_ip > inet ’192.168.100.0’ AND client_ip < inet ’192.168.100.255’);

A typical query that can use this index would be:


SELECT * FROM access_log WHERE url = ’/index.html’ AND client_ip = inet ’212.78.10.32’;
A query that cannot use this index is:
SELECT * FROM access_log WHERE client_ip = inet ’192.168.100.23’;

Observe that this kind of partial index requires that the common values be predetermined. If the distribu-
tion of values is inherent (due to the nature of the application) and static (not changing over time), this is
not difficult, but if the common values are merely due to the coincidental data load this can require a lot
of maintenance work.

Another possibility is to exclude values from the index that the typical query workload is not interested
in; this is shown in Example 11-2. This results in the same advantages as listed above, but it prevents the
“uninteresting” values from being accessed via that index at all, even if an index scan might be profitable
in that case. Obviously, setting up partial indexes for this kind of scenario will require a lot of care and
experimentation.

Example 11-2. Setting up a Partial Index to Exclude Uninteresting Values

If you have a table that contains both billed and unbilled orders, where the unbilled orders take up a small
fraction of the total table and yet those are the most-accessed rows, you can improve performance by
creating an index on just the unbilled rows. The command to create the index would look like this:
CREATE INDEX orders_unbilled_index ON orders (order_nr)
WHERE billed is not true;

A possible query to use this index would be


SELECT * FROM orders WHERE billed is not true AND order_nr < 10000;
However, the index can also be used in queries that do not involve order_nr at all, e.g.,
SELECT * FROM orders WHERE billed is not true AND amount > 5000.00;
This is not as efficient as a partial index on the amount column would be, since the system has to scan the
entire index. Yet, if there are relatively few unbilled orders, using this partial index just to find the unbilled
orders could be a win.
Note that this query cannot use this index:
SELECT * FROM orders WHERE order_nr = 3501;
The order 3501 may be among the billed or among the unbilled orders.

Example 11-2 also illustrates that the indexed column and the column used in the predicate do not need
to match. PostgreSQL supports partial indexes with arbitrary predicates, so long as only columns of the
table being indexed are involved. However, keep in mind that the predicate must match the conditions
used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used
in a query only if the system can recognize that the WHERE condition of the query mathematically implies
the predicate of the index. PostgreSQL does not have a sophisticated theorem prover that can recognize
mathematically equivalent expressions that are written in different forms. (Not only is such a general

205
Chapter 11. Indexes

theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The
system can recognize simple inequality implications, for example “x < 1” implies “x < 2”; otherwise
the predicate condition must exactly match part of the query’s WHERE condition or the index will not be
recognized to be usable.
A third possible use for partial indexes does not require the index to be used in queries at all. The idea
here is to create a unique index over a subset of a table, as in Example 11-3. This enforces uniqueness
among the rows that satisfy the index predicate, without constraining those that do not.

Example 11-3. Setting up a Partial Unique Index

Suppose that we have a table describing test outcomes. We wish to ensure that there is only one “success-
ful” entry for a given subject and target combination, but there might be any number of “unsuccessful”
entries. Here is one way to do it:
CREATE TABLE tests (
subject text,
target text,
success boolean,
...
);

CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)


WHERE success;
This is a particularly efficient way of doing it when there are few successful tests and many unsuccessful
ones.

Finally, a partial index can also be used to override the system’s query plan choices. It may occur that data
sets with peculiar distributions will cause the system to use an index when it really should not. In that case
the index can be set up so that it is not available for the offending query. Normally, PostgreSQL makes
reasonable choices about index usage (e.g., it avoids them when retrieving common values, so the earlier
example really only saves index size, it is not required to avoid index usage), and grossly incorrect plan
choices are cause for a bug report.
Keep in mind that setting up a partial index indicates that you know at least as much as the query planner
knows, in particular you know when an index might be profitable. Forming this knowledge requires ex-
perience and understanding of how indexes in PostgreSQL work. In most cases, the advantage of a partial
index over a regular index will not be much.
More information about partial indexes can be found in The case for partial indexes, Partial indexing in
POSTGRES: research project, and Generalized Partial Indexes.

11.8. Examining Index Usage


Although indexes in PostgreSQL do not need maintenance and tuning, it is still important to check which
indexes are actually used by the real-life query workload. Examining index usage for an individual query
is done with the EXPLAIN command; its application for this purpose is illustrated in Section 13.1. It is
also possible to gather overall statistics about index usage in a running server, as described in Section
23.2.

206
Chapter 11. Indexes

It is difficult to formulate a general procedure for determining which indexes to set up. There are a number
of typical cases that have been shown in the examples throughout the previous sections. A good deal of
experimentation will be necessary in most cases. The rest of this section gives some tips for that.

• Always run ANALYZE first. This command collects statistics about the distribution of the values in the
table. This information is required to guess the number of rows returned by a query, which is needed by
the planner to assign realistic costs to each possible query plan. In absence of any real statistics, some
default values are assumed, which are almost certain to be inaccurate. Examining an application’s index
usage without having run ANALYZE is therefore a lost cause.
• Use real data for experimentation. Using test data for setting up indexes will tell you what indexes you
need for the test data, but that is all.
It is especially fatal to use very small test data sets. While selecting 1000 out of 100000 rows could be
a candidate for an index, selecting 1 out of 100 rows will hardly be, because the 100 rows will probably
fit within a single disk page, and there is no plan that can beat sequentially fetching 1 disk page.
Also be careful when making up test data, which is often unavoidable when the application is not in
production use yet. Values that are very similar, completely random, or inserted in sorted order will
skew the statistics away from the distribution that real data would have.

• When indexes are not used, it can be useful for testing to force their use. There are run-time parameters
that can turn off various plan types (described in Section 16.4). For instance, turning off sequential scans
(enable_seqscan) and nested-loop joins (enable_nestloop), which are the most basic plans, will
force the system to use a different plan. If the system still chooses a sequential scan or nested-loop join
then there is probably a more fundamental problem for why the index is not used, for example, the
query condition does not match the index. (What kind of query can use what kind of index is explained
in the previous sections.)
• If forcing index usage does use the index, then there are two possibilities: Either the system is right
and using the index is indeed not appropriate, or the cost estimates of the query plans are not reflecting
reality. So you should time your query with and without indexes. The EXPLAIN ANALYZE command
can be useful here.
• If it turns out that the cost estimates are wrong, there are, again, two possibilities. The total cost is
computed from the per-row costs of each plan node times the selectivity estimate of the plan node.
The costs of the plan nodes can be tuned with run-time parameters (described in Section 16.4). An
inaccurate selectivity estimate is due to insufficient statistics. It may be possible to help this by tuning
the statistics-gathering parameters (see ALTER TABLE).
If you do not succeed in adjusting the costs to be more appropriate, then you may have to resort to
forcing index usage explicitly. You may also want to contact the PostgreSQL developers to examine the
issue.

207
Chapter 12. Concurrency Control
This chapter describes the behavior of the PostgreSQL database system when two or more sessions try
to access the same data at the same time. The goals in that situation are to allow efficient access for
all sessions while maintaining strict data integrity. Every developer of database applications should be
familiar with the topics covered in this chapter.

12.1. Introduction
Unlike traditional database systems which use locks for concurrency control, PostgreSQL maintains data
consistency by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that
while querying a database each transaction sees a snapshot of data (a database version) as it was some
time ago, regardless of the current state of the underlying data. This protects the transaction from viewing
inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows,
providing transaction isolation for each database session.
The main advantage to using the MVCC model of concurrency control rather than locking is that in
MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data,
and so reading never blocks writing and writing never blocks reading.
Table- and row-level locking facilities are also available in PostgreSQL for applications that cannot adapt
easily to MVCC behavior. However, proper use of MVCC will generally provide better performance than
locks.

12.2. Transaction Isolation


The SQL standard defines four levels of transaction isolation in terms of three phenomena that must be
prevented between concurrent transactions. These undesirable phenomena are:

dirty read
A transaction reads data written by a concurrent uncommitted transaction.
nonrepeatable read
A transaction re-reads data it has previously read and finds that data has been modified by another
transaction (that committed since the initial read).
phantom read
A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that
the set of rows satisfying the condition has changed due to another recently-committed transaction.

The four transaction isolation levels and the corresponding behaviors are described in Table 12-1.

Table 12-1. SQL Transaction Isolation Levels

208
Chapter 12. Concurrency Control

Isolation Level Dirty Read Nonrepeatable Read Phantom Read


Read uncommitted Possible Possible Possible
Read committed Not possible Possible Possible
Repeatable read Not possible Not possible Possible
Serializable Not possible Not possible Not possible

In PostgreSQL, you can request any of the four standard transaction isolation levels. But internally, there
are only two distinct isolation levels, which correspond to the levels Read Committed and Serializable.
When you select the level Read Uncommitted you really get Read Committed, and when you select
Repeatable Read you really get Serializable, so the actual isolation level may be stricter than what you
select. This is permitted by the SQL standard: the four isolation levels only define which phenomena
must not happen, they do not define which phenomena must happen. The reason that PostgreSQL only
provides two isolation levels is that this is the only sensible way to map the standard isolation levels to the
multiversion concurrency control architecture. The behavior of the available isolation levels is detailed in
the following subsections.
To set the transaction isolation level of a transaction, use the command SET TRANSACTION.

12.2.1. Read Committed Isolation Level


Read Committed is the default isolation level in PostgreSQL. When a transaction runs on this isolation
level, a SELECT query sees only data committed before the query began; it never sees either uncommitted
data or changes committed during query execution by concurrent transactions. (However, the SELECT
does see the effects of previous updates executed within its own transaction, even though they are not yet
committed.) In effect, a SELECT query sees a snapshot of the database as of the instant that that query
begins to run. Notice that two successive SELECT commands can see different data, even though they are
within a single transaction, if other transactions commit changes during execution of the first SELECT.
UPDATE, DELETE, and SELECT FOR UPDATE commands behave the same as SELECT in terms of search-
ing for target rows: they will only find target rows that were committed as of the command start time.
However, such a target row may have already been updated (or deleted or marked for update) by another
concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its
effects are negated and the second updater can proceed with updating the originally found row. If the first
updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will
attempt to apply its operation to the updated version of the row. The search condition of the command (the
WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition.
If so, the second updater proceeds with its operation, starting from the updated version of the row.
Because of the above rule, it is possible for an updating command to see an inconsistent snapshot: it
can see the effects of concurrent updating commands that affected the same rows it is trying to update,
but it does not see effects of those commands on other rows in the database. This behavior makes Read
Committed mode unsuitable for commands that involve complex search conditions. However, it is just
right for simpler cases. For example, consider updating bank balances with transactions like

BEGIN;
UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534;
COMMIT;

209
Chapter 12. Concurrency Control

If two such transactions concurrently try to change the balance of account 12345, we clearly want the
second transaction to start from the updated version of the account’s row. Because each command is
affecting only a predetermined row, letting it see the updated version of the row does not create any
troublesome inconsistency.
Since in Read Committed mode each new command starts with a new snapshot that includes all transac-
tions committed up to that instant, subsequent commands in the same transaction will see the effects of
the committed concurrent transaction in any case. The point at issue here is whether or not within a single
command we see an absolutely consistent view of the database.
The partial transaction isolation provided by Read Committed mode is adequate for many applications,
and this mode is fast and simple to use. However, for applications that do complex queries and updates, it
may be necessary to guarantee a more rigorously consistent view of the database than the Read Committed
mode provides.

12.2.2. Serializable Isolation Level


The level Serializable provides the strictest transaction isolation. This level emulates serial transaction ex-
ecution, as if transactions had been executed one after another, serially, rather than concurrently. However,
applications using this level must be prepared to retry transactions due to serialization failures.
When a transaction is on the serializable level, a SELECT query sees only data committed before the trans-
action began; it never sees either uncommitted data or changes committed during transaction execution by
concurrent transactions. (However, the SELECT does see the effects of previous updates executed within
its own transaction, even though they are not yet committed.) This is different from Read Committed in
that the SELECT sees a snapshot as of the start of the transaction, not as of the start of the current query
within the transaction. Thus, successive SELECT commands within a single transaction always see the
same data.
UPDATE, DELETE, and SELECT FOR UPDATE commands behave the same as SELECT in terms of search-
ing for target rows: they will only find target rows that were committed as of the transaction start time.
However, such a target row may have already been updated (or deleted or marked for update) by another
concurrent transaction by the time it is found. In this case, the serializable transaction will wait for the first
updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then
its effects are negated and the serializable transaction can proceed with updating the originally found row.
But if the first updater commits (and actually updated or deleted the row, not just selected it for update)
then the serializable transaction will be rolled back with the message

ERROR: could not serialize access due to concurrent update

because a serializable transaction cannot modify rows changed by other transactions after the serializable
transaction began.
When the application receives this error message, it should abort the current transaction and then retry
the whole transaction from the beginning. The second time through, the transaction sees the previously-
committed change as part of its initial view of the database, so there is no logical conflict in using the new
version of the row as the starting point for the new transaction’s update.
Note that only updating transactions may need to be retried; read-only transactions will never have serial-
ization conflicts.

210
Chapter 12. Concurrency Control

The Serializable mode provides a rigorous guarantee that each transaction sees a wholly consistent view
of the database. However, the application has to be prepared to retry transactions when concurrent up-
dates make it impossible to sustain the illusion of serial execution. Since the cost of redoing complex
transactions may be significant, this mode is recommended only when updating transactions contain logic
sufficiently complex that they may give wrong answers in Read Committed mode. Most commonly, Se-
rializable mode is necessary when a transaction executes several successive commands that must see
identical views of the database.

12.2.2.1. Serializable Isolation versus True Serializability


The intuitive meaning (and mathematical definition) of “serializable” execution is that any two success-
fully committed concurrent transactions will appear to have executed strictly serially, one after the other
— although which one appeared to occur first may not be predictable in advance. It is important to realize
that forbidding the undesirable behaviors listed in Table 12-1 is not sufficient to guarantee true serializ-
ability, and in fact PostgreSQL’s Serializable mode does not guarantee serializable execution in this sense.
As an example, consider a table mytab, initially containing

class | value
-------+-------
1 | 10
1 | 20
2 | 100
2 | 200

Suppose that serializable transaction A computes

SELECT SUM(value) FROM mytab WHERE class = 1;

and then inserts the result (30) as the value in a new row with class = 2. Concurrently, serializable
transaction B computes

SELECT SUM(value) FROM mytab WHERE class = 2;

and obtains the result 300, which it inserts in a new row with class = 1. Then both transactions commit.
None of the listed undesirable behaviors have occurred, yet we have a result that could not have occurred
in either order serially. If A had executed before B, B would have computed the sum 330, not 300, and
similarly the other order would have resulted in a different sum computed by A.
To guarantee true mathematical serializability, it is necessary for a database system to enforce predicate
locking, which means that a transaction cannot insert or modify a row that would have matched the WHERE
condition of a query in another concurrent transaction. For example, once transaction A has executed the
query SELECT ... WHERE class = 1, a predicate-locking system would forbid transaction B from in-
serting any new row with class 1 until A has committed. 1 Such a locking system is complex to implement
and extremely expensive in execution, since every session must be aware of the details of every query
executed by every concurrent transaction. And this large expense is mostly wasted, since in practice most
applications do not do the sorts of things that could result in problems. (Certainly the example above is
rather contrived and unlikely to represent real software.) Accordingly, PostgreSQL does not implement
predicate locking, and so far as we are aware no other production DBMS does either.

1. Essentially, a predicate-locking system prevents phantom reads by restricting what is written, whereas MVCC prevents them
by restricting what is read.

211
Chapter 12. Concurrency Control

In those cases where the possibility of nonserializable execution is a real hazard, problems can be pre-
vented by appropriate use of explicit locking. Further discussion appears in the following sections.

12.3. Explicit Locking


PostgreSQL provides various lock modes to control concurrent access to data in tables. These modes can
be used for application-controlled locking in situations where MVCC does not give the desired behav-
ior. Also, most PostgreSQL commands automatically acquire locks of appropriate modes to ensure that
referenced tables are not dropped or modified in incompatible ways while the command executes. (For
example, ALTER TABLE cannot be executed concurrently with other operations on the same table.)
To examine a list of the currently outstanding locks in a database server, use the pg_locks system view
(Section 41.33). For more information on monitoring the status of the lock manager subsystem, refer to
Chapter 23.

12.3.1. Table-Level Locks


The list below shows the available lock modes and the contexts in which they are used automatically by
PostgreSQL. You can also acquire any of these locks explicitly with the command LOCK. Remember
that all of these lock modes are table-level locks, even if the name contains the word “row”; the names
of the lock modes are historical. To some extent the names reflect the typical usage of each lock mode
— but the semantics are all the same. The only real difference between one lock mode and another is
the set of lock modes with which each conflicts. Two transactions cannot hold locks of conflicting modes
on the same table at the same time. (However, a transaction never conflicts with itself. For example, it
may acquire ACCESS EXCLUSIVE lock and later acquire ACCESS SHARE lock on the same table.) Non-
conflicting lock modes may be held concurrently by many transactions. Notice in particular that some
lock modes are self-conflicting (for example, an ACCESS EXCLUSIVE lock cannot be held by more than
one transaction at a time) while others are not self-conflicting (for example, an ACCESS SHARE lock can
be held by multiple transactions). Once acquired, a lock is held till end of transaction.

Table-level lock modes


ACCESS SHARE

Conflicts with the ACCESS EXCLUSIVE lock mode only.


The commands SELECT and ANALYZE acquire a lock of this mode on referenced tables. In general,
any query that only reads a table and does not modify it will acquire this lock mode.
ROW SHARE

Conflicts with the EXCLUSIVE and ACCESS EXCLUSIVE lock modes.


The SELECT FOR UPDATE command acquires a lock of this mode on the target table(s) (in addition
to ACCESS SHARE locks on any other tables that are referenced but not selected FOR UPDATE).
ROW EXCLUSIVE

Conflicts with the SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock
modes.

212
Chapter 12. Concurrency Control

The commands UPDATE, DELETE, and INSERT acquire this lock mode on the target table (in addition
to ACCESS SHARE locks on any other referenced tables). In general, this lock mode will be acquired
by any command that modifies the data in a table.
SHARE UPDATE EXCLUSIVE

Conflicts with the SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE,
and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent schema
changes and VACUUM runs.
Acquired by VACUUM (without FULL).
SHARE

Conflicts with the ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE ROW EXCLUSIVE,
EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode protects a table against concurrent
data changes.
Acquired by CREATE INDEX.
SHARE ROW EXCLUSIVE

Conflictswith the ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW
EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.

This lock mode is not automatically acquired by any PostgreSQL command.


EXCLUSIVE

Conflicts with the ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE
ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes. This mode allows only
concurrent ACCESS SHARE locks, i.e., only reads from the table can proceed in parallel with a
transaction holding this lock mode.
This lock mode is not automatically acquired by any PostgreSQL command.
ACCESS EXCLUSIVE

Conflicts with locks of all modes (ACCESS SHARE, ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE
EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE). This mode
guarantees that the holder is the only transaction accessing the table in any way.
Acquired by the ALTER TABLE, DROP TABLE, REINDEX, CLUSTER, and VACUUM FULL commands.
This is also the default lock mode for LOCK TABLE statements that do not specify a mode explicitly.

Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE) statement.

12.3.2. Row-Level Locks


In addition to table-level locks, there are row-level locks. A row-level lock on a specific row is automat-
ically acquired when the row is updated (or deleted or marked for update). The lock is held until the
transaction commits or rolls back. Row-level locks do not affect data querying; they block writers to the
same row only. To acquire a row-level lock on a row without actually modifying the row, select the row

213
Chapter 12. Concurrency Control

with SELECT FOR UPDATE. Note that once a particular row-level lock is acquired, the transaction may
update the row multiple times without fear of conflicts.
PostgreSQL doesn’t remember any information about modified rows in memory, so it has no limit to the
number of rows locked at one time. However, locking a row may cause a disk write; thus, for example,
SELECT FOR UPDATE will modify selected rows to mark them and so will result in disk writes.

In addition to table and row locks, page-level share/exclusive locks are used to control read/write access
to table pages in the shared buffer pool. These locks are released immediately after a row is fetched or
updated. Application developers normally need not be concerned with page-level locks, but we mention
them for completeness.

12.3.3. Deadlocks
The use of explicit locking can increase the likelihood of deadlocks, wherein two (or more) transactions
each hold locks that the other wants. For example, if transaction 1 acquires an exclusive lock on table A
and then tries to acquire an exclusive lock on table B, while transaction 2 has already exclusive-locked
table B and now wants an exclusive lock on table A, then neither one can proceed. PostgreSQL automati-
cally detects deadlock situations and resolves them by aborting one of the transactions involved, allowing
the other(s) to complete. (Exactly which transaction will be aborted is difficult to predict and should not
be relied on.)
Note that deadlocks can also occur as the result of row-level locks (and thus, they can occur even if explicit
locking is not used). Consider the case in which there are two concurrent transactions modifying a table.
The first transaction executes:

UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 11111;

This acquires a row-level lock on the row with the specified account number. Then, the second transaction
executes:

UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 22222;


UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 11111;

The first UPDATE statement successfully acquires a row-level lock on the specified row, so it succeeds in
updating that row. However, the second UPDATE statement finds that the row it is attempting to update has
already been locked, so it waits for the transaction that acquired the lock to complete. Transaction two is
now waiting on transaction one to complete before it continues execution. Now, transaction one executes:

UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;

Transaction one attempts to acquire a row-level lock on the specified row, but it cannot: transaction two
already holds such a lock. So it waits for transaction two to complete. Thus, transaction one is blocked
on transaction two, and transaction two is blocked on transaction one: a deadlock condition. PostgreSQL
will detect this situation and abort one of the transactions.
The best defense against deadlocks is generally to avoid them by being certain that all applications using a
database acquire locks on multiple objects in a consistent order. In the example above, if both transactions
had updated the rows in the same order, no deadlock would have occurred. One should also ensure that the
first lock acquired on an object in a transaction is the highest mode that will be needed for that object. If it

214
Chapter 12. Concurrency Control

is not feasible to verify this in advance, then deadlocks may be handled on-the-fly by retrying transactions
that are aborted due to deadlock.
So long as no deadlock situation is detected, a transaction seeking either a table-level or row-level lock
will wait indefinitely for conflicting locks to be released. This means it is a bad idea for applications to
hold transactions open for long periods of time (e.g., while waiting for user input).

12.4. Data Consistency Checks at the Application Level


Because readers in PostgreSQL do not lock data, regardless of transaction isolation level, data read by
one transaction can be overwritten by another concurrent transaction. In other words, if a row is returned
by SELECT it doesn’t mean that the row is still current at the instant it is returned (i.e., sometime after the
current query began). The row might have been modified or deleted by an already-committed transaction
that committed after this one started. Even if the row is still valid “now”, it could be changed or deleted
before the current transaction does a commit or rollback.
Another way to think about it is that each transaction sees a snapshot of the database contents, and con-
currently executing transactions may very well see different snapshots. So the whole concept of “now”
is somewhat ill-defined anyway. This is not normally a big problem if the client applications are iso-
lated from each other, but if the clients can communicate via channels outside the database then serious
confusion may ensue.
To ensure the current validity of a row and protect it against concurrent updates one must use SELECT FOR
UPDATE or an appropriate LOCK TABLE statement. (SELECT FOR UPDATE locks just the returned rows
against concurrent updates, while LOCK TABLE locks the whole table.) This should be taken into account
when porting applications to PostgreSQL from other environments. (Before version 6.5 PostgreSQL used
read locks, and so this above consideration is also relevant when upgrading from PostgreSQL versions
prior to 6.5.)
Global validity checks require extra thought under MVCC. For example, a banking application might wish
to check that the sum of all credits in one table equals the sum of debits in another table, when both tables
are being actively updated. Comparing the results of two successive SELECT sum(...) commands will
not work reliably under Read Committed mode, since the second query will likely include the results of
transactions not counted by the first. Doing the two sums in a single serializable transaction will give an
accurate picture of the effects of transactions that committed before the serializable transaction started
— but one might legitimately wonder whether the answer is still relevant by the time it is delivered.
If the serializable transaction itself applied some changes before trying to make the consistency check,
the usefulness of the check becomes even more debatable, since now it includes some but not all post-
transaction-start changes. In such cases a careful person might wish to lock all tables needed for the check,
in order to get an indisputable picture of current reality. A SHARE mode (or higher) lock guarantees that
there are no uncommitted changes in the locked table, other than those of the current transaction.
Note also that if one is relying on explicit locking to prevent concurrent changes, one should use Read
Committed mode, or in Serializable mode be careful to obtain the lock(s) before performing queries. A
lock obtained by a serializable transaction guarantees that no other transactions modifying the table are
still running, but if the snapshot seen by the transaction predates obtaining the lock, it may predate some
now-committed changes in the table. A serializable transaction’s snapshot is actually frozen at the start of
its first query or data-modification command (SELECT, INSERT, UPDATE, or DELETE), so it’s possible to
obtain locks explicitly before the snapshot is frozen.

215
Chapter 12. Concurrency Control

12.5. Locking and Indexes


Though PostgreSQL provides nonblocking read/write access to table data, nonblocking read/write access
is not currently offered for every index access method implemented in PostgreSQL. The various index
types are handled as follows:

B-tree indexes
Short-term share/exclusive page-level locks are used for read/write access. Locks are released imme-
diately after each index row is fetched or inserted. B-tree indexes provide the highest concurrency
without deadlock conditions.
GiST and R-tree indexes
Share/exclusive index-level locks are used for read/write access. Locks are released after the com-
mand is done.
Hash indexes
Share/exclusive hash-bucket-level locks are used for read/write access. Locks are released after the
whole bucket is processed. Bucket-level locks provide better concurrency than index-level ones, but
deadlock is possible since the locks are held longer than one index operation.

In short, B-tree indexes offer the best performance for concurrent applications; since they also have more
features than hash indexes, they are the recommended index type for concurrent applications that need to
index scalar data. When dealing with non-scalar data, B-trees obviously cannot be used; in that situation,
application developers should be aware of the relatively poor concurrent performance of GiST and R-tree
indexes.

216
Chapter 13. Performance Tips
Query performance can be affected by many things. Some of these can be manipulated by the user, while
others are fundamental to the underlying design of the system. This chapter provides some hints about
understanding and tuning PostgreSQL performance.

13.1. Using EXPLAIN


PostgreSQL devises a query plan for each query it is given. Choosing the right plan to match the query
structure and the properties of the data is absolutely critical for good performance. You can use the EX-
PLAIN command to see what query plan the system creates for any query. Plan-reading is an art that
deserves an extensive tutorial, which this is not; but here is some basic information.
The numbers that are currently quoted by EXPLAIN are:

• Estimated start-up cost (Time expended before output scan can start, e.g., time to do the sorting in a
sort node.)
• Estimated total cost (If all rows were to be retrieved, which they may not be: a query with a LIMIT
clause will stop short of paying the total cost, for example.)
• Estimated number of rows output by this plan node (Again, only if executed to completion)
• Estimated average width (in bytes) of rows output by this plan node

The costs are measured in units of disk page fetches. (CPU effort estimates are converted into disk-page
units using some fairly arbitrary fudge factors. If you want to experiment with these factors, see the list of
run-time configuration parameters in Section 16.4.5.2.)
It’s important to note that the cost of an upper-level node includes the cost of all its child nodes. It’s also
important to realize that the cost only reflects things that the planner/optimizer cares about. In particular,
the cost does not consider the time spent transmitting result rows to the frontend, which could be a pretty
dominant factor in the true elapsed time; but the planner ignores it because it cannot change it by altering
the plan. (Every correct plan will output the same row set, we trust.)
Rows output is a little tricky because it is not the number of rows processed/scanned by the query, it is
usually less, reflecting the estimated selectivity of any WHERE-clause conditions that are being applied
at this node. Ideally the top-level rows estimate will approximate the number of rows actually returned,
updated, or deleted by the query.
Here are some examples (using the regression test database after a VACUUM ANALYZE, and 7.3 develop-
ment sources):

EXPLAIN SELECT * FROM tenk1;

QUERY PLAN
-------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=148)

217
Chapter 13. Performance Tips

This is about as straightforward as it gets. If you do

SELECT * FROM pg_class WHERE relname = ’tenk1’;

you will find out that tenk1 has 233 disk pages and 10000 rows. So the cost is estimated at 233 page
reads, defined as costing 1.0 apiece, plus 10000 * cpu_tuple_cost which is currently 0.01 (try SHOW
cpu_tuple_cost).

Now let’s modify the query to add a WHERE condition:

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;

QUERY PLAN
------------------------------------------------------------
Seq Scan on tenk1 (cost=0.00..358.00 rows=1033 width=148)
Filter: (unique1 < 1000)

The estimate of output rows has gone down because of the WHERE clause. However, the scan will still have
to visit all 10000 rows, so the cost hasn’t decreased; in fact it has gone up a bit to reflect the extra CPU
time spent checking the WHERE condition.
The actual number of rows this query would select is 1000, but the estimate is only approximate. If you try
to duplicate this experiment, you will probably get a slightly different estimate; moreover, it will change
after each ANALYZE command, because the statistics produced by ANALYZE are taken from a randomized
sample of the table.
Modify the query to restrict the condition even more:

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 50;

QUERY PLAN
-------------------------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..179.33 rows=49 width=148)
Index Cond: (unique1 < 50)

and you will see that if we make the WHERE condition selective enough, the planner will eventually decide
that an index scan is cheaper than a sequential scan. This plan will only have to visit 50 rows because of
the index, so it wins despite the fact that each individual fetch is more expensive than reading a whole
disk page sequentially.
Add another condition to the WHERE clause:

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 50 AND stringu1 = ’xxx’;

QUERY PLAN
-------------------------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..179.45 rows=1 width=148)
Index Cond: (unique1 < 50)
Filter: (stringu1 = ’xxx’::name)

The added condition stringu1 = ’xxx’ reduces the output-rows estimate, but not the cost because we
still have to visit the same set of rows. Notice that the stringu1 clause cannot be applied as an index
condition (since this index is only on the unique1 column). Instead it is applied as a filter on the rows
retrieved by the index. Thus the cost has actually gone up a little bit to reflect this extra checking.

218
Chapter 13. Performance Tips

Let’s try joining two tables, using the columns we have been discussing:

EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.uniq

QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.00..327.02 rows=49 width=296)
-> Index Scan using tenk1_unique1 on tenk1 t1
(cost=0.00..179.33 rows=49 width=148)
Index Cond: (unique1 < 50)
-> Index Scan using tenk2_unique2 on tenk2 t2
(cost=0.00..3.01 rows=1 width=148)
Index Cond: ("outer".unique2 = t2.unique2)

In this nested-loop join, the outer scan is the same index scan we had in the example before last, and so
its cost and row count are the same because we are applying the WHERE clause unique1 < 50 at that
node. The t1.unique2 = t2.unique2 clause is not relevant yet, so it doesn’t affect row count of the
outer scan. For the inner scan, the unique2 value of the current outer-scan row is plugged into the inner
index scan to produce an index condition like t2.unique2 = constant. So we get the same inner-scan
plan and costs that we’d get from, say, EXPLAIN SELECT * FROM tenk2 WHERE unique2 = 42. The
costs of the loop node are then set on the basis of the cost of the outer scan, plus one repetition of the inner
scan for each outer row (49 * 3.01, here), plus a little CPU time for join processing.
In this example the join’s output row count is the same as the product of the two scans’ row counts, but
that’s not true in general, because in general you can have WHERE clauses that mention both tables and
so can only be applied at the join point, not to either input scan. For example, if we added WHERE ...
AND t1.hundred < t2.hundred, that would decrease the output row count of the join node, but not
change either input scan.
One way to look at variant plans is to force the planner to disregard whatever strategy it thought was the
winner, using the enable/disable flags for each plan type. (This is a crude tool, but useful. See also Section
13.3.)

SET enable_nestloop = off;


EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 = t2.uniq

QUERY PLAN
--------------------------------------------------------------------------
Hash Join (cost=179.45..563.06 rows=49 width=296)
Hash Cond: ("outer".unique2 = "inner".unique2)
-> Seq Scan on tenk2 t2 (cost=0.00..333.00 rows=10000 width=148)
-> Hash (cost=179.33..179.33 rows=49 width=148)
-> Index Scan using tenk1_unique1 on tenk1 t1
(cost=0.00..179.33 rows=49 width=148)
Index Cond: (unique1 < 50)

This plan proposes to extract the 50 interesting rows of tenk1 using ye same olde index scan, stash them
into an in-memory hash table, and then do a sequential scan of tenk2, probing into the hash table for
possible matches of t1.unique2 = t2.unique2 at each tenk2 row. The cost to read tenk1 and set
up the hash table is entirely start-up cost for the hash join, since we won’t get any rows out until we can
start reading tenk2. The total time estimate for the join also includes a hefty charge for the CPU time to

219
Chapter 13. Performance Tips

probe the hash table 10000 times. Note, however, that we are not charging 10000 times 179.33; the hash
table setup is only done once in this plan type.
It is possible to check on the accuracy of the planner’s estimated costs by using EXPLAIN ANALYZE. This
command actually executes the query, and then displays the true run time accumulated within each plan
node along with the same estimated costs that a plain EXPLAIN shows. For example, we might get a result
like this:

EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 50 AND t1.unique2 =

QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (cost=0.00..327.02 rows=49 width=296)
(actual time=1.181..29.822 rows=50 loops=1)
-> Index Scan using tenk1_unique1 on tenk1 t1
(cost=0.00..179.33 rows=49 width=148)
(actual time=0.630..8.917 rows=50 loops=1)
Index Cond: (unique1 < 50)
-> Index Scan using tenk2_unique2 on tenk2 t2
(cost=0.00..3.01 rows=1 width=148)
(actual time=0.295..0.324 rows=1 loops=50)
Index Cond: ("outer".unique2 = t2.unique2)
Total runtime: 31.604 ms

Note that the “actual time” values are in milliseconds of real time, whereas the “cost” estimates are
expressed in arbitrary units of disk fetches; so they are unlikely to match up. The thing to pay attention to
is the ratios.
In some query plans, it is possible for a subplan node to be executed more than once. For example, the
inner index scan is executed once per outer row in the above nested-loop plan. In such cases, the “loops”
value reports the total number of executions of the node, and the actual time and rows values shown are
averages per-execution. This is done to make the numbers comparable with the way that the cost estimates
are shown. Multiply by the “loops” value to get the total time actually spent in the node.
The Total runtime shown by EXPLAIN ANALYZE includes executor start-up and shut-down time, as
well as time spent processing the result rows. It does not include parsing, rewriting, or planning time. For
a SELECT query, the total run time will normally be just a little larger than the total time reported for the
top-level plan node. For INSERT, UPDATE, and DELETE commands, the total run time may be considerably
larger, because it includes the time spent processing the result rows. In these commands, the time for the
top plan node essentially is the time spent computing the new rows and/or locating the old ones, but it
doesn’t include the time spent making the changes.
It is worth noting that EXPLAIN results should not be extrapolated to situations other than the one you are
actually testing; for example, results on a toy-sized table can’t be assumed to apply to large tables. The
planner’s cost estimates are not linear and so it may well choose a different plan for a larger or smaller
table. An extreme example is that on a table that only occupies one disk page, you’ll nearly always get a
sequential scan plan whether indexes are available or not. The planner realizes that it’s going to take one
disk page read to process the table in any case, so there’s no value in expending additional page reads to
look at an index.

220
Chapter 13. Performance Tips

13.2. Statistics Used by the Planner


As we saw in the previous section, the query planner needs to estimate the number of rows retrieved by
a query in order to make good choices of query plans. This section provides a quick look at the statistics
that the system uses for these estimates.
One component of the statistics is the total number of entries in each table and index, as well as the number
of disk blocks occupied by each table and index. This information is kept in the table pg_class in the
columns reltuples and relpages. We can look at it with queries similar to this one:

SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE ’tenk1%’;

relname | relkind | reltuples | relpages


---------------+---------+-----------+----------
tenk1 | r | 10000 | 233
tenk1_hundred | i | 10000 | 30
tenk1_unique1 | i | 10000 | 30
tenk1_unique2 | i | 10000 | 30
(4 rows)

Here we can see that tenk1 contains 10000 rows, as do its indexes, but the indexes are (unsurprisingly)
much smaller than the table.
For efficiency reasons, reltuples and relpages are not updated on-the-fly, and so they usually contain
somewhat out-of-date values. They are updated by VACUUM, ANALYZE, and a few DDL commands such
as CREATE INDEX. A stand-alone ANALYZE, that is one not part of VACUUM, generates an approximate
reltuples value since it does not read every row of the table. The planner will scale the values it finds
in pg_class to match the current physical table size, thus obtaining a closer approximation.
Most queries retrieve only a fraction of the rows in a table, due to having WHERE clauses that restrict the
rows to be examined. The planner thus needs to make an estimate of the selectivity of WHERE clauses, that
is, the fraction of rows that match each condition in the WHERE clause. The information used for this task
is stored in the pg_statistic system catalog. Entries in pg_statistic are updated by ANALYZE and
VACUUM ANALYZE commands and are always approximate even when freshly updated.

Rather than look at pg_statistic directly, it’s better to look at its view pg_stats when examining the
statistics manually. pg_stats is designed to be more easily readable. Furthermore, pg_stats is readable
by all, whereas pg_statistic is only readable by a superuser. (This prevents unprivileged users from
learning something about the contents of other people’s tables from the statistics. The pg_stats view is
restricted to show only rows about tables that the current user can read.) For example, we might do:

SELECT attname, n_distinct, most_common_vals FROM pg_stats WHERE tablename = ’road’;

attname | n_distinct |
---------+------------+-----------------------------------------------------------------
name | -0.467008 | {"I- 580 Ramp","I- 880
thepath | 20 | {"[(-122.089,37.71),(-122.0886,37.711)]"}
(2 rows)

pg_stats is described in detail in Section 41.36.

221
Chapter 13. Performance Tips

The amount of information stored in pg_statistic, in particular the maximum number of entries in
the most_common_vals and histogram_bounds arrays for each column, can be set on a column-
by-column basis using the ALTER TABLE SET STATISTICS command, or globally by setting the de-
fault_statistics_target configuration variable. The default limit is presently 10 entries. Raising the limit
may allow more accurate planner estimates to be made, particularly for columns with irregular data distri-
butions, at the price of consuming more space in pg_statistic and slightly more time to compute the
estimates. Conversely, a lower limit may be appropriate for columns with simple data distributions.

13.3. Controlling the Planner with Explicit JOIN Clauses


It is possible to control the query planner to some extent by using the explicit JOIN syntax. To see why
this matters, we first need some background.
In a simple join query, such as

SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;

the planner is free to join the given tables in any order. For example, it could generate a query plan that
joins A to B, using the WHERE condition a.id = b.id, and then joins C to this joined table, using the
other WHERE condition. Or it could join B to C and then join A to that result. Or it could join A to C and
then join them with B, but that would be inefficient, since the full Cartesian product of A and C would
have to be formed, there being no applicable condition in the WHERE clause to allow optimization of the
join. (All joins in the PostgreSQL executor happen between two input tables, so it’s necessary to build up
the result in one or another of these fashions.) The important point is that these different join possibilities
give semantically equivalent results but may have hugely different execution costs. Therefore, the planner
will explore all of them to try to find the most efficient query plan.
When a query only involves two or three tables, there aren’t many join orders to worry about. But the
number of possible join orders grows exponentially as the number of tables expands. Beyond ten or so
input tables it’s no longer practical to do an exhaustive search of all the possibilities, and even for six
or seven tables planning may take an annoyingly long time. When there are too many input tables, the
PostgreSQL planner will switch from exhaustive search to a genetic probabilistic search through a limited
number of possibilities. (The switch-over threshold is set by the geqo_threshold run-time parameter.) The
genetic search takes less time, but it won’t necessarily find the best possible plan.
When the query involves outer joins, the planner has much less freedom than it does for plain (inner)
joins. For example, consider

SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);

Although this query’s restrictions are superficially similar to the previous example, the semantics are
different because a row must be emitted for each row of A that has no matching row in the join of B and
C. Therefore the planner has no choice of join order here: it must join B to C and then join A to that result.
Accordingly, this query takes less time to plan than the previous query.
Explicit inner join syntax (INNER JOIN, CROSS JOIN, or unadorned JOIN) is semantically the same as
listing the input relations in FROM, so it does not need to constrain the join order. But it is possible to
instruct the PostgreSQL query planner to treat explicit inner JOINs as constraining the join order anyway.
For example, these three queries are logically equivalent:

SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;

222
Chapter 13. Performance Tips

SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);

But if we tell the planner to honor the JOIN order, the second and third take less time to plan than the first.
This effect is not worth worrying about for only three tables, but it can be a lifesaver with many tables.
To force the planner to follow the JOIN order for inner joins, set the join_collapse_limit run-time param-
eter to 1. (Other possible values are discussed below.)
You do not need to constrain the join order completely in order to cut search time, because it’s OK to use
JOIN operators within items of a plain FROM list. For example, consider

SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;

With join_collapse_limit = 1, this forces the planner to join A to B before joining them to other
tables, but doesn’t constrain its choices otherwise. In this example, the number of possible join orders is
reduced by a factor of 5.
Constraining the planner’s search in this way is a useful technique both for reducing planning time and
for directing the planner to a good query plan. If the planner chooses a bad join order by default, you can
force it to choose a better order via JOIN syntax — assuming that you know of a better order, that is.
Experimentation is recommended.
A closely related issue that affects planning time is collapsing of subqueries into their parent query. For
example, consider

SELECT *
FROM x, y,
(SELECT * FROM a, b, c WHERE something) AS ss
WHERE somethingelse;

This situation might arise from use of a view that contains a join; the view’s SELECT rule will be inserted
in place of the view reference, yielding a query much like the above. Normally, the planner will try to
collapse the subquery into the parent, yielding

SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

This usually results in a better plan than planning the subquery separately. (For example, the outer WHERE
conditions might be such that joining X to A first eliminates many rows of A, thus avoiding the need
to form the full logical output of the subquery.) But at the same time, we have increased the plan-
ning time; here, we have a five-way join problem replacing two separate three-way join problems. Be-
cause of the exponential growth of the number of possibilities, this makes a big difference. The plan-
ner tries to avoid getting stuck in huge join search problems by not collapsing a subquery if more than
from_collapse_limit FROM items would result in the parent query. You can trade off planning time
against quality of plan by adjusting this run-time parameter up or down.
from_collapse_limit and join_collapse_limit are similarly named because they do almost the same
thing: one controls when the planner will “flatten out” subselects, and the other controls when
it will flatten out explicit inner joins. Typically you would either set join_collapse_limit
equal to from_collapse_limit (so that explicit joins and subselects act similarly) or set
join_collapse_limit to 1 (if you want to control join order with explicit joins). But you might set
them differently if you are trying to fine-tune the trade off between planning time and run time.

223
Chapter 13. Performance Tips

13.4. Populating a Database


One may need to insert a large amount of data when first populating a database. This section contains
some suggestions on how to make this process as efficient as possible.

13.4.1. Disable Autocommit


Turn off autocommit and just do one commit at the end. (In plain SQL, this means issuing BEGIN at the
start and COMMIT at the end. Some client libraries may do this behind your back, in which case you need
to make sure the library does it when you want it done.) If you allow each insertion to be committed
separately, PostgreSQL is doing a lot of work for each row that is added. An additional benefit of doing
all insertions in one transaction is that if the insertion of one row were to fail then the insertion of all rows
inserted up to that point would be rolled back, so you won’t be stuck with partially loaded data.

13.4.2. Use COPY


Use COPY to load all the rows in one command, instead of using a series of INSERT commands. The
COPY command is optimized for loading large numbers of rows; it is less flexible than INSERT, but incurs
significantly less overhead for large data loads. Since COPY is a single command, there is no need to
disable autocommit if you use this method to populate a table.
If you cannot use COPY, it may help to use PREPARE to create a prepared INSERT statement, and then use
EXECUTE as many times as required. This avoids some of the overhead of repeatedly parsing and planning
INSERT.

Note that loading a large number of rows using COPY is almost always faster than using INSERT, even if
PREPARE is used and multiple insertions are batched into a single transaction.

13.4.3. Remove Indexes


If you are loading a freshly created table, the fastest way is to create the table, bulk load the table’s data
using COPY, then create any indexes needed for the table. Creating an index on pre-existing data is quicker
than updating it incrementally as each row is loaded.
If you are augmenting an existing table, you can drop the index, load the table, and then recreate the index.
Of course, the database performance for other users may be adversely affected during the time that the
index is missing. One should also think twice before dropping unique indexes, since the error checking
afforded by the unique constraint will be lost while the index is missing.

13.4.4. Increase maintenance_work_mem


Temporarily increasing the maintenance_work_mem configuration variable when loading large amounts
of data can lead to improved performance. This is because when a B-tree index is created from scratch,
the existing content of the table needs to be sorted. Allowing the merge sort to use more memory means
that fewer merge passes will be required. A larger setting for maintenance_work_mem may also speed
up validation of foreign-key constraints.

224
Chapter 13. Performance Tips

13.4.5. Increase checkpoint_segments


Temporarily increasing the checkpoint_segments configuration variable can also make large data loads
faster. This is because loading a large amount of data into PostgreSQL can cause checkpoints to occur
more often than the normal checkpoint frequency (specified by the checkpoint_timeout configura-
tion variable). Whenever a checkpoint occurs, all dirty pages must be flushed to disk. By increasing
checkpoint_segments temporarily during bulk data loads, the number of checkpoints that are required
can be reduced.

13.4.6. Run ANALYZE Afterwards


Whenever you have significantly altered the distribution of data within a table, running ANALYZE is
strongly recommended. This includes bulk loading large amounts of data into the table. Running ANALYZE
(or VACUUM ANALYZE) ensures that the planner has up-to-date statistics about the table. With no statis-
tics or obsolete statistics, the planner may make poor decisions during query planning, leading to poor
performance on any tables with inaccurate or nonexistent statistics.

225
III. Server Administration
This part covers topics that are of interest to a PostgreSQL database administrator. This includes instal-
lation of the software, set up and configuration of the server, management of users and databases, and
maintenance tasks. Anyone who runs a PostgreSQL server, even for personal use, but especially in pro-
duction, should be familiar with the topics covered in this part.
The information in this part is arranged approximately in the order in which a new user should read it.
But the chapters are self-contained and can be read individually as desired. The information in this part is
presented in a narrative fashion in topical units. Readers looking for a complete description of a particular
command should look into Part VI.
The first few chapters are written so that they can be understood without prerequisite knowledge, so that
new users who need to set up their own server can begin their exploration with this part. The rest of this
part is about tuning and management; that material assumes that the reader is familiar with the general
use of the PostgreSQL database system. Readers are encouraged to look at Part I and Part II for additional
information.
Chapter 14. Installation Instructions
This chapter describes the installation of PostgreSQL from the source code distribution. (If you are in-
stalling a pre-packaged distribution, such as an RPM or Debian package, ignore this chapter and read the
packager’s instructions instead.)

14.1. Short Version

./configure
gmake
su
gmake install
adduser postgres
mkdir /usr/local/pgsql/data
chown postgres /usr/local/pgsql/data
su - postgres
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
/usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data >logfile 2>&1 &
/usr/local/pgsql/bin/createdb test
/usr/local/pgsql/bin/psql test

The long version is the rest of this chapter.

14.2. Requirements
In general, a modern Unix-compatible platform should be able to run PostgreSQL. The platforms that had
received specific testing at the time of release are listed in Section 14.7 below. In the doc subdirectory of
the distribution there are several platform-specific FAQ documents you might wish to consult if you are
having trouble.
The following software packages are required for building PostgreSQL:

• GNU make is required; other make programs will not work. GNU make is often installed under the
name gmake; this document will always refer to it by that name. (On some systems GNU make is the
default tool with the name make.) To test for GNU make enter
gmake --version

It is recommended to use version 3.76.1 or later.


• You need an ISO/ANSI C compiler. Recent versions of GCC are recommendable, but PostgreSQL is
known to build with a wide variety of compilers from different vendors.
• gzip is needed to unpack the distribution in the first place.
• The GNU Readline library (for comfortable line editing and command history retrieval) will be used
by default. If you don’t want to use it then you must specify the --without-readline option for
configure. (On NetBSD, the libedit library is Readline-compatible and is used if libreadline
is not found.) If you are using a package-based Linux distribution, be aware that you need both the
readline and readline-devel packages, if those are separate in your distribution.

228
Chapter 14. Installation Instructions

• Additional software is needed to build PostgreSQL on Windows. You can build PostgreSQL for NT-
based versions of Windows (like Windows XP and 2003) using MinGW; see doc/FAQ_MINGW for
details. You can also build PostgreSQL using Cygwin; see doc/FAQ_CYGWIN. A Cygwin-based build
will work on older versions of Windows, but if you have a choice, we recommend the MinGW approach.
While these are the only tool sets recommended for a complete build, it is possible to build just the C
client library (libpq) and the interactive terminal (psql) using other Windows tool sets. For details of
that see Chapter 15.

The following packages are optional. They are not required in the default configuration, but they are
needed when certain build options are enabled, as explained below.

• To build the server programming language PL/Perl you need a full Perl installation, including the
libperl library and the header files. Since PL/Perl will be a shared library, the libperl library
must be a shared library also on most platforms. This appears to be the default in recent Perl versions,
but it was not in earlier versions, and in any case it is the choice of whomever installed Perl at your site.
If you don’t have the shared library but you need one, a message like this will appear during the build
to point out this fact:
*** Cannot build PL/Perl because libperl is not a shared library.
*** You might have to rebuild your Perl installation. Refer to
*** the documentation for details.

(If you don’t follow the on-screen output you will merely notice that the PL/Perl library object,
plperl.so or similar, will not be installed.) If you see this, you will have to rebuild and install Perl
manually to be able to build PL/Perl. During the configuration process for Perl, request a shared
library.

• To build the PL/Python server programming language, you need a Python installation with the header
files and the distutils module. The distutils module is included by default with Python 1.6 and later;
users of earlier versions of Python will need to install it.
Since PL/Python will be a shared library, the libpython library must be a shared library also on most
platforms. This is not the case in a default Python installation. If after building and installing you have
a file called plpython.so (possibly a different extension), then everything went well. Otherwise you
should have seen a notice like this flying by:
*** Cannot build PL/Python because libpython is not a shared library.
*** You might have to rebuild your Python installation. Refer to
*** the documentation for details.

That means you have to rebuild (part of) your Python installation to supply this shared library.
If you have problems, run Python 2.3 or later’s configure using the --enable-shared flag. On some
operating systems you don’t have to build a shared library, but you will have to convince the PostgreSQL
build system of this. Consult the Makefile in the src/pl/plpython directory for details.

• If you want to build the PL/Tcl procedural language, you of course need a Tcl installation.
• To enable Native Language Support (NLS), that is, the ability to display a program’s messages in a
language other than English, you need an implementation of the Gettext API. Some operating systems

229
Chapter 14. Installation Instructions

have this built-in (e.g., Linux, NetBSD, Solaris), for other systems you can download an add-on package
from here: http://developer.postgresql.org/~petere/bsd-gettext/. If you are using the Gettext implemen-
tation in the GNU C library then you will additionally need the GNU Gettext package for some utility
programs. For any of the other implementations you will not need it.
• Kerberos, OpenSSL, and/or PAM, if you want to support authentication or encryption using these ser-
vices.

If you are building from a CVS tree instead of using a released source package, or if you want to do
development, you also need the following packages:

• GNU Flex and Bison are needed to build a CVS checkout or if you changed the actual scanner and
parser definition files. If you need them, be sure to get Flex 2.5.4 or later and Bison 1.875 or later. Other
yacc programs can sometimes be used, but doing so requires extra effort and is not recommended. Other
lex programs will definitely not work.

If you need to get a GNU package, you can find it at your local GNU mirror site (see
http://www.gnu.org/order/ftp.html for a list) or at ftp://ftp.gnu.org/gnu/.
Also check that you have sufficient disk space. You will need about 65 MB for the source tree during
compilation and about 15 MB for the installation directory. An empty database cluster takes about 25
MB, databases take about five times the amount of space that a flat text file with the same data would take.
If you are going to run the regression tests you will temporarily need up to an extra 90 MB. Use the df
command to check free disk space.

14.3. Getting The Source


The PostgreSQL 8.0.0 sources can be obtained by anonymous FTP from
ftp://ftp.postgresql.org/pub/source/v8.0.0/postgresql-8.0.0.tar.gz. Use a mirror if possible. After you have
obtained the file, unpack it:

gunzip postgresql-8.0.0.tar.gz
tar xf postgresql-8.0.0.tar

This will create a directory postgresql-8.0.0 under the current directory with the PostgreSQL sources.
Change into that directory for the rest of the installation procedure.

14.4. If You Are Upgrading


The internal data storage format changes with new releases of PostgreSQL. Therefore, if you are up-
grading an existing installation that does not have a version number “8.0.x”, you must back up and
restore your data as shown here. These instructions assume that your existing installation is under the
/usr/local/pgsql directory, and that the data area is in /usr/local/pgsql/data. Substitute your
paths appropriately.

230
Chapter 14. Installation Instructions

1. Make sure that your database is not updated during or after the backup. This does not affect the
integrity of the backup, but the changed data would of course not be included. If necessary, edit the
permissions in the file /usr/local/pgsql/data/pg_hba.conf (or equivalent) to disallow access
from everyone except you.
2. To back up your database installation, type:
pg_dumpall > outputfile

If you need to preserve OIDs (such as when using them as foreign keys), then use the -o option when
running pg_dumpall.
pg_dumpall does not save large objects. Check Section 22.1.4 if you need to do this.
To make the backup, you can use the pg_dumpall command from the version you are currently run-
ning. For best results, however, try to use the pg_dumpall command from PostgreSQL 8.0.0, since
this version contains bug fixes and improvements over older versions. While this advice might seem
idiosyncratic since you haven’t installed the new version yet, it is advisable to follow it if you plan to
install the new version in parallel with the old version. In that case you can complete the installation
normally and transfer the data later. This will also decrease the downtime.
3. If you are installing the new version at the same location as the old one then shut down the old server,
at the latest before you install the new files:
pg_ctl stop

On systems that have PostgreSQL started at boot time, there is probably a start-up file that will
accomplish the same thing. For example, on a Red Hat Linux system one might find that
/etc/rc.d/init.d/postgresql stop

works.
Very old versions might not have pg_ctl. If you can’t find it or it doesn’t work, find out the process
ID of the old server, for example by typing
ps ax | grep postmaster

and signal it to stop this way:


kill -INT processID

4. If you are installing in the same place as the old version then it is also a good idea to move the old
installation out of the way, in case you have trouble and need to revert to it. Use a command like this:
mv /usr/local/pgsql /usr/local/pgsql.old

After you have installed PostgreSQL 8.0.0, create a new database directory and start the new server.
Remember that you must execute these commands while logged in to the special database user account
(which you already have if you are upgrading).

/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
/usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data

Finally, restore your data with

/usr/local/pgsql/bin/psql -d template1 -f outputfile

231
Chapter 14. Installation Instructions

using the new psql.


Further discussion appears in Section 22.4, which you are encouraged to read in any case.

14.5. Installation Procedure

1. Configuration
The first step of the installation procedure is to configure the source tree for your system and choose
the options you would like. This is done by running the configure script. For a default installation
simply enter
./configure

This script will run a number of tests to guess values for various system dependent variables and
detect some quirks of your operating system, and finally will create several files in the build tree to
record what it found. (You can also run configure in a directory outside the source tree if you want
to keep the build directory separate.)
The default configuration will build the server and utilities, as well as all client applications and
interfaces that require only a C compiler. All files will be installed under /usr/local/pgsql by
default.
You can customize the build and installation process by supplying one or more of the following
command line options to configure:

--prefix=PREFIX

Install all files under the directory PREFIX instead of /usr/local/pgsql. The actual files will
be installed into various subdirectories; no files will ever be installed directly into the PREFIX
directory.
If you have special needs, you can also customize the individual subdirectories with the follow-
ing options. However, if you leave these with their defaults, the installation will be relocatable,
meaning you can move the directory after installation. (The man and doc locations are not af-
fected by this.)
For relocatable installs, you might want to use configure’s --disable-rpath option. Also,
you will need to tell the operating system how to find the shared libraries.
--exec-prefix=EXEC-PREFIX

You can install architecture-dependent files under a different prefix, EXEC-PREFIX, than what
PREFIX was set to. This can be useful to share architecture-independent files between hosts.
If you omit this, then EXEC-PREFIX is set equal to PREFIX and both architecture-dependent
and independent files will be installed under the same tree, which is probably what you want.
--bindir=DIRECTORY

Specifies the directory for executable programs. The default is EXEC-PREFIX/bin, which nor-
mally means /usr/local/pgsql/bin.
--datadir=DIRECTORY

Sets the directory for read-only data files used by the installed programs. The default is
PREFIX/share. Note that this has nothing to do with where your database files will be placed.

232
Chapter 14. Installation Instructions

--sysconfdir=DIRECTORY

The directory for various configuration files, PREFIX/etc by default.


--libdir=DIRECTORY

The location to install libraries and dynamically loadable modules. The default is
EXEC-PREFIX/lib.

--includedir=DIRECTORY

The directory for installing C and C++ header files. The default is PREFIX/include.
--mandir=DIRECTORY

The man pages that come with PostgreSQL will be installed under this directory, in their respec-
tive manx subdirectories. The default is PREFIX/man.
--with-docdir=DIRECTORY
--without-docdir

Documentation files, except “man” pages, will be installed into this directory. The default is
PREFIX/doc. If the option --without-docdir is specified, the documentation will not be
installed by make install. This is intended for packaging scripts that have special methods
for installing documentation.

Note: Care has been taken to make it possible to install PostgreSQL into shared installation
locations (such as /usr/local/include) without interfering with the namespace of the rest of
the system. First, the string “/postgresql” is automatically appended to datadir, sysconfdir,
and docdir, unless the fully expanded directory name already contains the string “postgres”
or “pgsql”. For example, if you choose /usr/local as prefix, the documentation will be
installed in /usr/local/doc/postgresql, but if the prefix is /opt/postgres, then it will be
in /opt/postgres/doc. The public C header files of the client interfaces are installed into
includedir and are namespace-clean. The internal header files and the server header files are
installed into private directories under includedir. See the documentation of each interface for
information about how to get at the its header files. Finally, a private subdirectory will also be
created, if appropriate, under libdir for dynamically loadable modules.

--with-includes=DIRECTORIES

DIRECTORIES is a colon-separated list of directories that will be added to the list the com-
piler searches for header files. If you have optional packages (such as GNU Readline) installed
in a non-standard location, you have to use this option and probably also the corresponding
--with-libraries option.
Example: --with-includes=/opt/gnu/include:/usr/sup/include.
--with-libraries=DIRECTORIES

DIRECTORIES is a colon-separated list of directories to search for libraries. You will probably
have to use this option (and the corresponding --with-includes option) if you have packages
installed in non-standard locations.

233
Chapter 14. Installation Instructions

Example: --with-libraries=/opt/gnu/lib:/usr/sup/lib.
--enable-nls[=LANGUAGES]

Enables Native Language Support (NLS), that is, the ability to display a program’s messages in
a language other than English. LANGUAGES is a space-separated list of codes of the languages
that you want supported, for example --enable-nls=’de fr’. (The intersection between
your list and the set of actually provided translations will be computed automatically.) If you do
not specify a list, then all available translations are installed.
To use this option, you will need an implementation of the Gettext API; see above.
--with-pgport=NUMBER

Set NUMBER as the default port number for server and clients. The default is 5432. The port can
always be changed later on, but if you specify it here then both server and clients will have the
same default compiled in, which can be very convenient. Usually the only good reason to select
a non-default value is if you intend to run multiple PostgreSQL servers on the same machine.
--with-perl

Build the PL/Perl server-side language.


--with-python

Build the PL/Python server-side language.


--with-tcl

Build the PL/Tcl server-side language.


--with-tclconfig=DIRECTORY

Tcl installs the file tclConfig.sh, which contains configuration information needed to build
modules interfacing to Tcl. This file is normally found automatically at a well-known location,
but if you want to use a different version of Tcl you can specify the directory in which to look
for it.
--with-krb4
--with-krb5

Build with support for Kerberos authentication. You can use either Kerberos version 4 or 5, but
not both. On many systems, the Kerberos system is not installed in a location that is searched by
default (e.g., /usr/include, /usr/lib), so you must use the options --with-includes and
--with-libraries in addition to this option. configure will check for the required header
files and libraries to make sure that your Kerberos installation is sufficient before proceeding.
--with-krb-srvnam=NAME

The name of the Kerberos service principal. postgres is the default. There’s probably no reason
to change this.
--with-openssl

Build with support for SSL (encrypted) connections. This requires the OpenSSL package to be
installed. configure will check for the required header files and libraries to make sure that
your OpenSSL installation is sufficient before proceeding.
--with-pam

Build with PAM (Pluggable Authentication Modules) support.

234
Chapter 14. Installation Instructions

--without-readline

Prevents use of the Readline library. This disables command-line editing and history in psql, so
it is not recommended.
--with-rendezvous

Build with Rendezvous support. This requires Rendezvous support in your operating system.
Recommended on Mac OS X.
--disable-spinlocks

Allow the build to succeed even if PostgreSQL has no CPU spinlock support for the platform.
The lack of spinlock support will result in poor performance; therefore, this option should only
be used if the build aborts and informs you that the platform lacks spinlock support. If this option
is required to build PostgreSQL on your platform, please report the problem to the PostgreSQL
developers.
--enable-thread-safety

Make the client libraries thread-safe. This allows concurrent threads in libpq and ECPG pro-
grams to safely control their private connection handles. This option requires adequate threading
support in your operating system.
--without-zlib

Prevents use of the Zlib library. This disables support for compressed archives in pg_dump and
pg_restore. This option is only intended for those rare systems where this library is not available.
--enable-debug

Compiles all programs and libraries with debugging symbols. This means that you can run the
programs through a debugger to analyze problems. This enlarges the size of the installed exe-
cutables considerably, and on non-GCC compilers it usually also disables compiler optimization,
causing slowdowns. However, having the symbols available is extremely helpful for dealing with
any problems that may arise. Currently, this option is recommended for production installations
only if you use GCC. But you should always have it on if you are doing development work or
running a beta version.
--enable-cassert

Enables assertion checks in the server, which test for many “can’t happen” conditions. This is
invaluable for code development purposes, but the tests slow things down a little. Also, having
the tests turned on won’t necessarily enhance the stability of your server! The assertion checks
are not categorized for severity, and so what might be a relatively harmless bug will still lead
to server restarts if it triggers an assertion failure. Currently, this option is not recommended for
production use, but you should have it on for development work or when running a beta version.
--enable-depend

Enables automatic dependency tracking. With this option, the makefiles are set up so that all
affected object files will be rebuilt when any header file is changed. This is useful if you are
doing development work, but is just wasted overhead if you intend only to compile once and
install. At present, this option will work only if you use GCC.

235
Chapter 14. Installation Instructions

If you prefer a C compiler different from the one configure picks, you can set the environment
variable CC to the program of your choice. By default, configure will pick gcc if available, else the
platform’s default (usually cc). Similarly, you can override the default compiler flags if needed with
the CFLAGS variable.
You can specify environment variables on the configure command line, for example:
./configure CC=/opt/bin/gcc CFLAGS=’-O2 -pipe’

2. Build
To start the build, type
gmake

(Remember to use GNU make.) The build may take anywhere from 5 minutes to half an hour de-
pending on your hardware. The last line displayed should be
All of PostgreSQL is successfully made. Ready to install.

3. Regression Tests
If you want to test the newly built server before you install it, you can run the regression tests at this
point. The regression tests are a test suite to verify that PostgreSQL runs on your machine in the way
the developers expected it to. Type
gmake check

(This won’t work as root; do it as an unprivileged user.) Chapter 26 contains detailed information
about interpreting the test results. You can repeat this test at any later time by issuing the same
command.
4. Installing The Files

Note: If you are upgrading an existing system and are going to install the new files over the old
ones, be sure to back up your data and shut down the old server before proceeding, as explained
in Section 14.4 above.

To install PostgreSQL enter

gmake install

This will install files into the directories that were specified in step 1. Make sure that you have appro-
priate permissions to write into that area. Normally you need to do this step as root. Alternatively, you
could create the target directories in advance and arrange for appropriate permissions to be granted.
You can use gmake install-strip instead of gmake install to strip the executable files and
libraries as they are installed. This will save some space. If you built with debugging support, stripping
will effectively remove the debugging support, so it should only be done if debugging is no longer
needed. install-strip tries to do a reasonable job saving space, but it does not have perfect
knowledge of how to strip every unneeded byte from an executable file, so if you want to save all the
disk space you possibly can, you will have to do manual work.

236
Chapter 14. Installation Instructions

The standard installation provides all the header files needed for client application development as
well as for server-side program development, such as custom functions or data types written in C.
(Prior to PostgreSQL 8.0, a separate gmake install-all-headers command was needed for the
latter, but this step has been folded into the standard install.)
Client-only installation: If you want to install only the client applications and interface libraries,
then you can use these commands:
gmake -C src/bin install
gmake -C src/include install
gmake -C src/interfaces install
gmake -C doc install

Registering eventlog on Windows: To register a Windows eventlog library with the operating system,
issue this command after installation:

regsvr32 pgsql_library_directory/pgevent.dll

This creates registry entries used by the event viewer.


Uninstallation: To undo the installation use the command gmake uninstall. However, this will not
remove any created directories.
Cleaning: After the installation you can make room by removing the built files from the source tree with
the command gmake clean. This will preserve the files made by the configure program, so that you
can rebuild everything with gmake later on. To reset the source tree to the state in which it was distributed,
use gmake distclean. If you are going to build for several platforms within the same source tree you
must do this and re-configure for each build. (Alternatively, use a separate build tree for each platform, so
that the source tree remains unmodified.)
If you perform a build and then discover that your configure options were wrong, or if you change
anything that configure investigates (for example, software upgrades), then it’s a good idea to do gmake
distclean before reconfiguring and rebuilding. Without this, your changes in configuration choices may
not propagate everywhere they need to.

14.6. Post-Installation Setup

14.6.1. Shared Libraries


On some systems that have shared libraries (which most systems do) you need to tell your system how
to find the newly installed shared libraries. The systems on which this is not necessary include BSD/OS,
FreeBSD, HP-UX, IRIX, Linux, NetBSD, OpenBSD, Tru64 UNIX (formerly Digital UNIX), and Solaris.
The method to set the shared library search path varies between platforms, but the most widely usable
method is to set the environment variable LD_LIBRARY_PATH like so: In Bourne shells (sh, ksh, bash,
zsh)

LD_LIBRARY_PATH=/usr/local/pgsql/lib
export LD_LIBRARY_PATH

or in csh or tcsh

237
Chapter 14. Installation Instructions

setenv LD_LIBRARY_PATH /usr/local/pgsql/lib

Replace /usr/local/pgsql/lib with whatever you set --libdir to in step 1. You should put these
commands into a shell start-up file such as /etc/profile or ~/.bash_profile. Some good informa-
tion about the caveats associated with this method can be found at http://www.visi.com/~barr/ldpath.html.
On some systems it might be preferable to set the environment variable LD_RUN_PATH before building.
On Cygwin, put the library directory in the PATH or move the .dll files into the bin directory.
If in doubt, refer to the manual pages of your system (perhaps ld.so or rld). If you later on get a message
like

psql: error in loading shared libraries


libpq.so.2.1: cannot open shared object file: No such file or directory

then this step was necessary. Simply take care of it then.


If you are on BSD/OS, Linux, or SunOS 4 and you have root access you can run

/sbin/ldconfig /usr/local/pgsql/lib

(or equivalent directory) after installation to enable the run-time linker to find the shared libraries faster.
Refer to the manual page of ldconfig for more information. On FreeBSD, NetBSD, and OpenBSD the
command is

/sbin/ldconfig -m /usr/local/pgsql/lib

instead. Other systems are not known to have an equivalent command.

14.6.2. Environment Variables


If you installed into /usr/local/pgsql or some other location that is not searched for programs by
default, you should add /usr/local/pgsql/bin (or whatever you set --bindir to in step 1) into your
PATH. Strictly speaking, this is not necessary, but it will make the use of PostgreSQL much more conve-
nient.
To do this, add the following to your shell start-up file, such as ~/.bash_profile (or /etc/profile,
if you want it to affect every user):

PATH=/usr/local/pgsql/bin:$PATH
export PATH

If you are using csh or tcsh, then use this command:

set path = ( /usr/local/pgsql/bin $path )

To enable your system to find the man documentation, you need to add lines like the following to a shell
start-up file unless you installed into a location that is searched by default.

MANPATH=/usr/local/pgsql/man:$MANPATH
export MANPATH

238
Chapter 14. Installation Instructions

The environment variables PGHOST and PGPORT specify to client applications the host and port of the
database server, overriding the compiled-in defaults. If you are going to run client applications remotely
then it is convenient if every user that plans to use the database sets PGHOST. This is not required, however:
the settings can be communicated via command line options to most client programs.

14.7. Supported Platforms


PostgreSQL has been verified by the developer community to work on the platforms listed below. A
supported platform generally means that PostgreSQL builds and installs according to these instructions
and that the regression tests pass. “Build farm” entries refer to builds reported by the PostgreSQL Build
Farm6. Platform entries that show an older version of PostgreSQL are those that did not receive explicit
testing at the time of release of version 8.0 but that we still expect to work.

Note: If you are having problems with the installation on a supported platform, please write to
<[email protected]> or <[email protected]>, not to the people listed here.

OS Processor Version Reported Remarks


AIX PowerPC 8.0.0 Travis P see also
(<[email protected] >),
doc/FAQ_AIX
2004-12-12
AIX RS6000 8.0.0 Hans-Jürgen see also
Schönig doc/FAQ_AIX
(<[email protected]>),
2004-12-06
BSD/OS x86 8.0.0 Bruce Momjian 4.3.1
(<[email protected]>),
2004-12-07
Debian GNU/Linux Alpha 7.4 Noèl Köthe
(<[email protected]>),
2003-10-25
Debian GNU/Linux AMD64 8.0.0 Build farm panda, sid, kernel 2.6
snapshot 2004-12-06
01:20:02
Debian GNU/Linux ARM 8.0.0 Jim Buttafuoco
(<[email protected]>),
2005-01-06
Debian GNU/Linux IA64 7.4 Noèl Köthe
(<[email protected]>),
2003-10-25

6. http://www.pgbuildfarm.org/

239
Chapter 14. Installation Instructions

OS Processor Version Reported Remarks


Debian GNU/Linux m68k 8.0.0 Noèl Köthe sid
(<[email protected]>),
2004-12-09
Debian GNU/Linux MIPS 8.0.0 Build farm lionfish, 3.1 (sarge), kernel
snapshot 2004-12-06 2.4
11:00:08
Debian GNU/Linux PA-RISC 8.0.0 Noèl Köthe sid
(<[email protected]>),
2004-12-07
Debian GNU/Linux PowerPC 8.0.0 Noèl Köthe sid
(<[email protected]>),
2004-12-15
Debian GNU/Linux S/390 7.4 Noèl Köthe
(<[email protected]>),
2003-10-25
Debian GNU/Linux Sparc 8.0.0 Noèl Köthe sid, 32-bit
(<[email protected]>),
2004-12-09
Debian GNU/Linux x86 8.0.0 Peter Eisentraut 3.1 (sarge), kernel
(<[email protected]
>),
2004-12-06
Fedora AMD64 8.0.0 John Gray FC3
(<[email protected]>),
2004-12-12
Fedora x86 8.0.0 Build farm dog, FC1
snapshot 2004-12-06
02:06:01
FreeBSD Alpha 7.4 Peter Eisentraut 4.8
(<[email protected]>),
2003-10-25
FreeBSD x86 8.0.0 Build farm
cockatoo, snapshot
2004-12-06
14:10:01 (4.10);
Marc Fournier
(<[email protected]>),
2004-12-07 (5.3)
Gentoo Linux AMD64 8.0.0 Jani Averbach
(<[email protected]>),
2005-01-13
Gentoo Linux x86 8.0.0 Paul Bort
(<[email protected]>),
2004-12-07

240
Chapter 14. Installation Instructions

OS Processor Version Reported Remarks


HP-UX IA64 8.0.0 Tom Lane 11.23, gcc and cc;
(<[email protected]
see also
>),
2005-01-06 doc/FAQ_HPUX
HP-UX PA-RISC 8.0.0 Tom Lane 10.20 and 11.11,
(<[email protected]
gcc>),
and cc; see also
2005-01-06 doc/FAQ_HPUX
IRIX MIPS 7.4 Robert E. Bruccoleri 6.5.20, cc only
(<[email protected]>),
2003-11-12
Mac OS X PowerPC 8.0.0 Andrew Rawnsley 10.3.5
(<[email protected]>),
2004-12-07
Mandrakelinux x86 8.0.0 Build farm shrew, 10.0
snapshot 2004-12-06
02:02:01
NetBSD arm32 7.4 Patrick Welche 1.6ZE/acorn32
(<[email protected]>),
2003-11-12
NetBSD m68k 8.0.0 Rémi Zara 2.0
(<[email protected]>),
2004-12-14
NetBSD Sparc 7.4.1 Peter Eisentraut 1.6.1, 32-bit
(<[email protected]>),
2003-11-26
NetBSD x86 8.0.0 Build farm canary, 1.6
snapshot 2004-12-06
03:30:00
OpenBSD Sparc 8.0.0 Chris Mair 3.3
(<[email protected]>),
2005-01-10
OpenBSD Sparc64 8.0.0 Build farm 3.6
spoonbill, snapshot
2005-01-06
00:50:05
OpenBSD x86 8.0.0 Build farm emu, 3.6
snapshot 2004-12-06
11:35:03
Red Hat Linux AMD64 8.0.0 Tom Lane RHEL 3AS
(<[email protected]>),
2004-12-07
Red Hat Linux IA64 8.0.0 Tom Lane RHEL 3AS
(<[email protected]>),
2004-12-07

241
Chapter 14. Installation Instructions

OS Processor Version Reported Remarks


Red Hat Linux PowerPC 8.0.0 Tom Lane RHEL 3AS
(<[email protected]>),
2004-12-07
Red Hat Linux PowerPC 64 8.0.0 Tom Lane RHEL 3AS
(<[email protected]>),
2004-12-07
Red Hat Linux S/390 8.0.0 Tom Lane RHEL 3AS
(<[email protected]>),
2004-12-07
Red Hat Linux S/390x 8.0.0 Tom Lane RHEL 3AS
(<[email protected]>),
2004-12-07
Red Hat Linux x86 8.0.0 Tom Lane RHEL 3AS
(<[email protected]>),
2004-12-07
Solaris Sparc 8.0.0 Kenneth Marshall Solaris 8; see also
(<[email protected] >),
doc/FAQ_Solaris
2004-12-07
Solaris x86 8.0.0 Build farm kudu, Solaris 9; see also
snapshot 2004-12-10 doc/FAQ_Solaris
02:30:04 (cc);
dragonfly, snapshot
2004-12-09
04:30:00 (gcc)
SUSE Linux AMD64 8.0.0 Reinhard Max 9.0, 9.1, 9.2, SLES 9
(<[email protected]>),
2005-01-03
SUSE Linux IA64 8.0.0 Reinhard Max SLES 9
(<[email protected]>),
2005-01-03
SUSE Linux PowerPC 8.0.0 Reinhard Max SLES 9
(<[email protected]>),
2005-01-03
SUSE Linux PowerPC 64 8.0.0 Reinhard Max SLES 9
(<[email protected]>),
2005-01-03
SUSE Linux S/390 8.0.0 Reinhard Max SLES 9
(<[email protected]>),
2005-01-03
SUSE Linux S/390x 8.0.0 Reinhard Max SLES 9
(<[email protected]>),
2005-01-03

242
Chapter 14. Installation Instructions

OS Processor Version Reported Remarks


SUSE Linux x86 8.0.0 Reinhard Max 9.0, 9.1, 9.2, SLES 9
(<[email protected]>),
2005-01-03
Tru64 UNIX Alpha 8.0.0 Honda Shigehiro 5.0
(<[email protected]>),
2005-01-07
UnixWare x86 8.0.0 Peter Eisentraut cc, 7.1.4; see also
(<[email protected] >),
doc/FAQ_SCO
2004-12-14
Windows x86 8.0.0 Dave Page XP Pro; see
(<[email protected] >),
doc/FAQ_MINGW
2004-12-07
Windows with x86 8.0.0 Build farm gibbon, see
Cygwin snapshot 2004-12-11 doc/FAQ_CYGWIN
01:33:01

Unsupported Platforms: The following platforms are either known not to work, or they used to work
in a fairly distant previous release. We include these here to let you know that these platforms could be
supported if given some attention.

OS Processor Version Reported Remarks


BeOS x86 7.2 Cyril Velter needs updates to
(<[email protected]
semaphore code >),
2001-11-29
Linux PlayStation 2 8.0.0 Chris Mair requires
(<[email protected]>),
--disable-spinlocks
2005-01-09 (works, but slow)
NetBSD Alpha 7.2 Thomas Thai 1.5W
(<[email protected]>),
2001-11-20
NetBSD MIPS 7.2.1 Warwick Hunter 1.5.3
(<[email protected]>),
2002-06-13
NetBSD PowerPC 7.2 Bill Studenmund 1.5
(<[email protected]>),
2001-11-28
NetBSD VAX 7.1 Tom I. Helbekkmo 1.5
(<[email protected]>),
2001-03-30
QNX 4 RTOS x86 7.2 Bernd Tegge needs updates to
(<[email protected]
semaphore
>), code; see
2001-12-10 also doc/FAQ_QNX4

243
Chapter 14. Installation Instructions

OS Processor Version Reported Remarks


QNX RTOS v6 x86 7.2 Igor Kovalenko patches available in
(<[email protected]
archives, but too late
>),
2001-11-20 for 7.2
SCO OpenServer x86 7.3.1 Shibashish Satpathy 5.0.4, gcc; see also
(<[email protected] >),
doc/FAQ_SCO
2002-12-11
SunOS 4 Sparc 7.2 Tatsuo Ishii
(<[email protected]>),
2001-12-04

244
Chapter 15. Client-Only Installation on
Windows
Although a complete PostgreSQL installation for Windows can only be built using MinGW or Cygwin,
the C client library (libpq) and the interactive terminal (psql) can be compiled using other Windows tool
sets. Makefiles are included in the source distribution for Microsoft Visual C++ and Borland C++. It
should be possible to compile the libraries manually for other configurations.

Tip: Using MinGW or Cygwin is preferred. If using one of those tool sets, see Chapter 14.

To build everything that you can on Windows using Microsoft Visual C++, change into the src directory
and type the command

nmake /f win32.mak

This assumes that you have Visual C++ in your path.


To build everything using Borland C++, change into the src directory and type the command

make -DCFG=Release /f bcc32.mak

The following files will be built:

interfaces\libpq\Release\libpq.dll

The dynamically linkable frontend library


interfaces\libpq\Release\libpqdll.lib

Import library to link your programs to libpq.dll


interfaces\libpq\Release\libpq.lib

Static version of the frontend library


bin\psql\Release\psql.exe

The PostgreSQL interactive terminal

The only file that really needs to be installed is the libpq.dll library. This file should in most cases
be placed in the WINNT\SYSTEM32 directory (or in WINDOWS\SYSTEM on a Windows 95/98/ME sys-
tem). If this file is installed using a setup program, it should be installed with version checking using the
VERSIONINFO resource included in the file, to ensure that a newer version of the library is not overwritten.

If you plan to do development using libpq on this machine, you will have to add the src\include and
src\interfaces\libpq subdirectories of the source tree to the include path in your compiler’s settings.

To use the library, you must add the libpqdll.lib file to your project. (In Visual C++, just right-click
on the project and choose to add it.)

245
Chapter 16. Server Run-time Environment
This chapter discusses how to set up and run the database server and its interactions with the operating
system.

16.1. The PostgreSQL User Account


As with any other server daemon that is accessible to the outside world, it is advisable to run PostgreSQL
under a separate user account. This user account should only own the data that is managed by the server,
and should not be shared with other daemons. (For example, using the user nobody is a bad idea.) It is
not advisable to install executables owned by this user because compromised systems could then modify
their own binaries.
To add a Unix user account to your system, look for a command useradd or adduser. The user name
postgres is often used, and is assumed throughout this book, but you can use another name if you like.

16.2. Creating a Database Cluster


Before you can do anything, you must initialize a database storage area on disk. We call this a database
cluster. (SQL uses the term catalog cluster.) A database cluster is a collection of databases that is managed
by a single instance of a running database server. After initialization, a database cluster will contain a
database named template1. As the name suggests, this will be used as a template for subsequently
created databases; it should not be used for actual work. (See Chapter 18 for information about creating
new databases within a cluster.)
In file system terms, a database cluster will be a single directory under which all data will be stored. We
call this the data directory or data area. It is completely up to you where you choose to store your data.
There is no default, although locations such as /usr/local/pgsql/data or /var/lib/pgsql/data
are popular. To initialize a database cluster, use the command initdb, which is installed with PostgreSQL.
The desired file system location of your database cluster is indicated by the -D option, for example

$ initdb -D /usr/local/pgsql/data

Note that you must execute this command while logged into the PostgreSQL user account, which is
described in the previous secti