PGSQL NoSQL
PGSQL NoSQL
This course is primarily focused on teaching RDBMS concepts, SQL with PostgreSQL and NoSQL concepts
and NoSQL practice with MongoDB database.
Concepts and Commands explained in the document confirms to the concepts and commands as
described in the official documentation of PostgreSQL and MongoDB databases. All the examples are
tested and verified.
ABOUT TRAINER
Mohammed Abdul Sami is an experienced System Administrator & DBA having more than 25 years of
experience in managing Database Servers, Linux and Unix servers.
Since 2014 he is mostly involved in trainings and a frequent instructor at Oracle University for MySQL and
Solaris Courses.
DISCLAIMER
This document is provided as reference material to the participants who has attended SQL & NoSQL
training conducted by Mr. Mohammed Abdul Sami at Valtech (Ahmedabad Campus).
Unauthorized use, duplication or sale of this document or any of its contents without the author’s consent
is strictly prohibited.
This Document is provided for free to the intended audience only and is not for sale under any
circumstances.
All trademarks, logos, or brand names mentioned in the document are the property of their respective
owners.
BIBLIOGRAPHY
• PostgreSQL official documentation available here
• MongoDB official documentation available here
Contents
Contents ....................................................................................................................................... 2
RDBMS ........................................................................................................................................ 3
Lesson 1: Introduction................................................................................................................... 4
Lesson 2: Introduction to PostgreSQL......................................................................................... 17
Lesson 3: PostgreSQL Clients ................................................................................................... 22
Lesson 4 : Structured Query Language....................................................................................... 26
Lesson 5 : SELECT Statement ................................................................................................... 51
Lesson 6: Creating and Managing Databases............................................................................. 58
Lesson 7: Advance Querying ...................................................................................................... 63
Lesson 8: Indexes & Views ......................................................................................................... 70
Lesson 9: Transactions ............................................................................................................... 76
Lesson 10: Copying and Moving Data......................................................................................... 78
NoSQL Databases ..................................................................................................................... 81
Lesson 11: Introduction ............................................................................................................... 82
Lesson 12: Introduction to MongoDB .......................................................................................... 88
Lesson 13: CRUD Operations ..................................................................................................... 93
Lesson 14: Aggregation ............................................................................................................ 106
Lesson 15: Indexes ................................................................................................................... 112
Lesson 16: Exporting and Importing Data ................................................................................. 117
RDBMS
(SQL Databases)
Lesson 1: Introduction
RDBMS
RDBMS stands for Relational Database Management System. RDBMS recommends data to be logically
organized in Relations (tables), Tuples (rows or records), Attributes (fields or columns). Robustness and
power of RDBMS model derives from the use of simple tabular structure to represent the data. Data is
represented uniformly in a tabular structure as it is easy to organize and manipulate it.
Most of the RDBMS theory is based on mathematical set theory. It was first proposed by Dr. Edgar Frank
Codd, A computer scientists working at IBM company in 1970 and he improvised the theory throughout
1970s & 1980s. Theoretical development of the model continues till this day.
The Relations (tables) in RDBMS theory corresponds to mathematical sets, tuples (rows) correspond to
members of set. Many of the principles from mathematical set theory is applied in RDBMS.
RDBMS Terminology
Relation (Table)
In Relational database model, a table is a collection of data elements organised in terms of rows and
columns. A table is also considered as a convenient representation of relations. But a table can have
duplicate row of data while a true relation cannot have duplicate data.
A single entry in a table is called a Tuple or Record or Row. A tuple in a table represents a set of related
data.
A table consists of several records(row), each record can be broken down into several smaller parts of
data known as Attributes.
Domain (Datatype)
When an attribute is defined in a relation(table), it is defined to hold only a certain type of values, which is
known as Attribute Domain.
Keys
A relation key is an attribute which can uniquely identify a particular tuple(row) in a relation(table). It is
also called as Primary Key
Foreign Key
An attribute which can link two relations. For example, a course-id uniquely identifies all the offered
courses along with their details in Course_table and which student has opted for which course in the
student-table.
Degree
Cardinality
Instance
The data stored in database at a particular moment of time is called instance of database. .
Schema
A relation schema describes the structure of the relation, with the name of the relation(name of table), its
attributes and their names and datatype.
All information is explicitly and logically represented in exactly one way – by data values in tables.
If an item of data does not reside somewhere in a table in the database, then it does not exist.
Every item of data must be logically addressable by resorting to a combination of table name, primary key
value and a column name.
The RDBMS handles records that have unknown or inapplicable values in a pre-defined fashion.
The RDBMS distinguishes between zeros, blanks and nulls in the records hand handles such values in a
consistent manner that produces correct answers, comparisons and calculations. Through the set of rules
for handling nulls, users can distinguish results of the queries that involve nulls, zeros and blanks.
The description of a database and in its contents are database tables and therefore can be queried on-line
via the data manipulation language.
There must be a Data Dictionary within the RDBMS that is constructed of tables and/or views that can be
examined using SQL.
A RDBMS may support several languages. But at least one of them should allow user to do all of the
following:
All views that can be updated in theory, can also be updated by the system.
Data consistency is ensured since the changes made in the view are transmitted to the base table and
vice-versa.
This means that data can be retrieved from a relational database in sets constructed of data from multiple
rows and/or multiple tables. This rule states that insert, update, and delete operations should be
supported for any retrievable set rather than just for a single row in a single table.
The execution of adhoc requests and application programs is not affected by changes in the physical data
access and storage methods.
Database administrators can make changes to the physical access and storage method which improve
performance and do not require changes in the application programs or requests. Here the user specified
what he wants and need not worry about how the data is obtained.
Logical changes in tables and views such adding/deleting columns or changing fields lengths need not
necessitate modifications in the programs or in the format of adhoc requests.
The database can change and grow to reflect changes in reality without requiring the user intervention or
changes in the applications. For example, adding attribute or column to the base table should not disrupt
the programs or the interactive command that have no use for the new attribute.
Integrity constraints must be specified separately from application programs and stored in the catalog. It
must be possible to change such constraints as and when appropriate without unnecessarily affecting
existing applications.
No component of a primary key can have missing values – this is the basic rule of Entity Integrity.
For each distinct foreign key value there must exist a matching primary key value in the same domain.
Conformation to this rule ensures what is called Referential integrity.
Application programs and adhoc requests are not affected by change in the distribution of physical data.
Improved systems reliability since application programs will work even if the programs and data are
moved in different sites.
If the RDBMS has a language that accesses the information of a record at a time, this language should not
be used to bypass the integrity constraints.
Normalization
Database Anomalies
Anomalies in Database design are faults in organizing data which may lead to in efficient way of managing
database.
Update anomalies
If data items are scattered and are not linked to each other properly, then it could lead to strange
situations. For example, when we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left with old values. Such instances
leave the database in an inconsistent state.
Deletion anomalies
We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also
saved somewhere else.
Insert anomalies
Normalization
Normalization is a method to remove all these anomalies and bring the database to a consistent state.
The primary objective of the normalization is that the data is structured in a way that all the attributes are
grouped along with primary key which ensures unique identification of all tuples. To achieve this we may
have to decompose the data into many relations.
If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is something that
can not be further decomposed
Students Data
Name Address Phone No Course Duration Course Fee Fee Paid Final Marks Division Instructor
875642,9875
Niel 10, Rd-5 MySQL 40 25000 20000 97 Distinction Sami
43
870987,
Emma 5, Rd-8 MySQL 40 25000 25000 75 Second Sami
870543
A relation is said to be in 1NF if every attribute contains no non-atomic values and each row can provide a
unique combination of values.
Niel 10, Rd-5 875642 987543 MySQL 40 25000 20000 97 Distinction Sami
2nd
Mia 982133 Postgresql 32 30000 30000 89 First Sami
avenue
City
George MySQL 40 25000 25000 91 First Sami
Center
2nd
Jane 984321 Postgresql 32 30000 30000 93 First Sami
Avenue
A relation is said to be in 2NF if it is already in 1NF and each and every attribute fully depends on the
primary key of the relation.
Course Data
Oracle 80 50000
MySQL 40 25000
Postgresql 32 30000
Name Address Phone No Home Phone Course Fee Paid Final Marks Division Instructor
A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency in that
relation.
StudentID Name Address Phone No Home Phone CourseID Fee Paid Final Marks Instructor
Distinction 95 100
First 85 94
Second 75 84
Passed 65 74
Need Improvement 0 64
StudentID Name Address Phone No Home Phone CourseID Fee Paid Final Marks
Instructor 4th NF
Instructor 4th NF
Distinction 95 100
First 85 94
Second 75 84
Passed 65 74
Need Improvement 0 64
InstructorID StudentID
DB02 1551
DB01 1552
DB02 1553
DB01 1554
DB02 1555
DB02 1556
DB02 1557
Workbook
Exercise - 1
Normalize the below health history data upto 3rd NF
PET ID PET NAME PET TYPE PET AGE OWNER VISIT DATE PROCEDURE
Exercise - 2
INVOICE
HILLTOP ANIMAL HOSPITAL DATE: JAN 13/2002
INVOICE # 987
MY CITY, ONTARIO
Z5Z 6G6
TOTAL 54.00
Introduction to PostgreSQL
PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version
4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES
pioneered many concepts that only became available in some commercial database systems much later.
PostgreSQL is an open-source descendant of this original Berkeley code. It supports a large part of the SQL
standard and offers many modern features:
o complex queries
o foreign keys
o triggers
o updatable views
o transactional integrity
o multiversion concurrency control (MVCC)
The object-relational database management system now known as PostgreSQL is derived from the POST-
GRES package written at the University of California at Berkeley. With over two decades of development
behind it, PostgreSQL is now the most advanced open-source database available anywhere.
Database Limits
Limit Value
Partition Keys 32
Row Tuple
Column Attribute
A client begins a new session by asking for and authenticating a connection to the server.
• The server forks a new system process to handle the connection and work session. The session’s
state is initialized per a combination of server-level, database-level, and user-level configuration
parameters.
• The client does as much work as it needs by executing one or more transactions. Examples
include:
o Execute reads and writes against relations (tables, views, etc.)
o Use the SET command to change the session or transaction state
o Prepare and execute prepared statements
• The session ends when the client disconnects.
• The server destroys the process that handled the session.
A database session: consists of all the work done over a single connection’s lifetime. Database sessions
are of variable length in time and consume a variable amount of resources on both the client and server.
A central Web-Server based PGAdmin4 running on a linux machine and the DBA/Developers
connect through a Browser.
DBeaver
Now a days very popular GUI base Database client capable of supporting almost all popular SQL and
NoSQL databases.
PostgreSQL connectors
PostgreSQL connectors enable applications to connect to and interact with a PostgreSQL database. They
allow developers to integrate PostgreSQL database operations within various environments and
programming languages
These are more connectors for various programming languages. Refer link below
PostgreSQL Connectors
Also parallelly developed at Oracle Corporation and Ingres Database. Later it has been standardize under
ANSI SQL standards.
Sub Languages
DDL
Data Definition Language. Useful create design relational schemas and create and manage other objects.
It consists of 4 statements
CREATE
ALTER
DROP
TRUNCATE
DML
Data Manipulation Language. used to save, retrieve and manipulate data. It consists of 4 statements
INSERT
UPDATE
DELETE
SELECT
TCL
Transaction Control Language. In a multiuser system transactions ensure concurrent DML operations
happen without any conflicts and ensures data consistency. It consists of 5 statements
COMMIT
ROLLBACK
SAVEPOINT
ROLLBACK TO SAVEPOINT
DCL
Data Control Language. It helps In implementing access restriction to the data so that data can only be
accessed or manipulated by authorized user. It consists of 2 statements
GRANT
REVOKE
Data Types
Data type defines what type data will be stored in a particular column.
Storage
Name Size Description Range
smallint 2 bytes small-range integer -32768 to +32767
Integer, int 4 bytes typical choice for -2147483648 to +2147483647
integer
bigint 8 bytes large-range integer -9223372036854775808 to
+9223372036854775807
decimal variable user-specified up to 131072 digits before the decimal point;
precision, exact up to 16383 digits after the decimal point
numeric variable user-specified up to 131072 digits before the decimal point;
precision, exact up to 16383 digits after the decimal point
real 4 bytes variable-precision, 6 decimal digits precision
inexact
double 8 bytes variable-precision, 15 decimal digits precision
precision inexact
smallserial 2 bytes small 1 to 32767
autoincrementing
integer
Storage
Name Size Description Range
serial 4 bytes autoincrementing 1 to 2147483647
integer
bigserial 8 bytes large 1 to 9223372036854775807
autoincrementing
integer
Monetary Types
The money type stores a currency amount with a fixed fractional precision. The fractional precision is
determined by the database's lc_monetary setting.
Character Types
Name Description
Date/Time Types
Storage
Name Size Description Low Value High Value
timestamp [ (p) ] [ without 8 bytes both date and time (no time 4713 BC 294276 AD
time zone ] zone)
timestamp [ (p) ] with time 8 bytes both date and time, with 4713 BC 294276 AD
zone time zone
time [ (p) ] [ without time 8 bytes time of day (no date) 00:00:00 24:00:00
zone ]
time [ (p) ] with time zone 12 bytes time of day (no date), with 00:00:00+1559 24:00:00-1559
time zone
Where (p) specifies the number of fractional digits retained in the seconds field and range is 0-6.
YEAR
MONTH
DAY
HOUR
MINUTE
SECOND
bytea 1 or 4 bytes plus the actual binary string variable-length binary string
Enumerated Types
We can create a new data type associated with a specific list of values. When any column declared with
this type then it will accept values from given list.
Syntax:
CREATE TYPE <typename> AS ENUM ('v1', 'v2', 'v3',….’vn’);
Example:
CREATE TYPE weekdays AS ENUM ('Monday', 'Tuesday', 'Wednesday',’Thursday’, ‘Friday’);
Creating a Database
A database is a collection of database objects such as tables etc. It allows us to group the objects for
efficient management.
Syntax:
CREATE DATABASE <DATABASENAME>;
Example:
CREATE DATABASE students;
SHOW command helps in listing all the existing databases and tables which are present in the current
database.
Postgres=> \l
Postgres=> \d
Creating Tables
Syntax:
CREATE TABLE table_name
( [column_name data_type [ column_constraint], ..... );
Example:
CREATE TABLE COURSES (courseid int, course_name varchar(25), duration_in_hrs int);
Table structure can be displayed from PSQL command line client by the following meta command
Syntax:
ALTER TABLE table_name ADD COLUMN column_name datatype column_constraint;
ALTER TABLE table_name DROP COLUMN column_name;
ALTER TABLE table_name RENAME COLUMN column_name TO new_column_name;
ALTER TABLE table_name ALTER COLUMN column_name [SET DEFAULT value | DROP DEFAULT];
ALTER TABLE table_name ALTER COLUMN column_name [SET NOT NULL| DROP NOT NULL];
ALTER TABLE table_name ADD CHECK expression;
ALTER TABLE table_name ADD CONSTRAINT constraint_name constraint_definition;
ALTER TABLE table_name RENAME TO new_table_name;
DROP TABLE
The DROP TABLE statement is used to drop a table All data and structure is deleted
TRUNCATE TABLE
Truncate table statement deletes all the rows from the given table and empties the table.
Data Definition Language consists of 4 statements INSERT, UPDTAE, DELETE and SELECT. These statements
are useful to add, modify or retrieve data from tables.
INSERT
Syntax:
INSERT INTO <table name> VALUES (v1, v2,.....vn);
Note: Need to provide values for all the columns in the same order as they are arranged in the table.
INSERT statement can be used to add new row into the table by providing values only for few columns.
Syntax:
INSERT INTO <table name> (column1, column2, ...) VALUES (v1, v2,...);
SELECT
Syntax:
SELECT * FROM <table name>;
Syntax:
SELECT column1, column2,... FROM <table name>;
UPDATE
Syntax:
UPDATE <table name> SET <column1> = <value>, <column2> = <value> [WHERE
<condition>];
All the rows matching the giving conditions are changed. If WHERE is omitted then all rows in the table are
changed.
DELETE
DALETE statements deletes selected rows or all rows from the table.
Syntax:
DELETE FROM <table name> [WHERE <condition>];
If WHERE clause is omitted then all rows from the table are deleted.
PostgreSQL Operators
Operators helps to perform various data manipulations like arithmetic operations,
comparisons etc.
Arithmetic Operators
+ addition 2 + 3 5
- subtraction 2 - 3 -1
* multiplication 2 * 3 6
% modulo (remainder) 5 % 4 1
Comparison operators
Operator Description Usage
= Equal? 1 = 1
IN X in (v1, v2,..vn)
Logical Operators
Operator Description
Constraints
Constraints are used to enforce data integrity
• NOT NULL
• CHECK
• UNIQUE
• PRIMARY KEY
• FOREIGN KEY
Constraints can be defined on a table at the time of creation or can also be enforced latter using ALTER
TABLE statement.
Primary Key
Any table can have only one primary key. It restricts duplicate values and NULL values in the column or
combination of columns. Primary key identifies all rows uniquely within the table it is defined. It is
recommended to have a primary key for each table.
Syntax-1
CREATE TABLE <table name> (
<columns name> <datatype> PRIMARY KEY,
<column name2> <datatype>,
..............
);
Syntax-2:
CREATE TABLE <table name> (
<columns name> <datatype> ,
<column name2> <datatype>,
<columns name3> <datatype>,
PRIMARY KEY (column1, column2......)
);
Syntax-3:
ALTER TABLE <table name> ADD CONSTRAINT PRIMARY KEY (col1, col2...);
Note: Creating a primary key also create a Unique index on the columns.
Unique Key
Like primary key Unique Key also restricts the duplicate values in the columns but it allows NULL values. A
table can have more than one Unique Keys
Syntax-1
CREATE TABLE <table name> (
<columns name> <datatype> UNIQUE,
<column name2> <datatype>,
..............
);
Syntax-2:
Syntax-3:
ALTER TABLE <table name> ADD CONSTRAINT <constraint name> UNIQUE (col1, col2...);
NOT NULL
Syntax:
CREATE TABLE <table name> (
<columns name> <datatype> NOT NULL,
<column name2> <datatype>,
..............
);
CHECK Constraint
CHECK constraint ensures that the values entered in the column satisfy the given condition.
Syntax-1
CREATE TABLE <table name> (
<columns name> <datatype>,
<column name2> <datatype> CHECK (<condition>,
..............
);
Syntax-2:
CREATE TABLE <table name> (
<columns name> <datatype> ,
<column name2> <datatype>,
<columns name3> <datatype>,
CONSTRAINT <constraint name> CHECK (<condition>)
);
Syntax-3:
ALTER TABLE <table name> ADD CONSTRAINT <constraint name> CHECK (<condition>);
FOREIGN KEY
FOREIGN KEY constraint restricts the values in the columns to the available values in the PRIMARY KEY of
another table. It establishes the link between two tables by referencing the PRIMARY KEY of another table.
Syntax-1
CREATE TABLE <table name> (
<columns name> <datatype>,
<column name2> <datatype> REFERENCES <another table name(<column>),
..............
);
Syntax-2:
CREATE TABLE <table name> (
<columns name> <datatype> ,
<column name2> <datatype>,
<columns name3> <datatype>,
CONSTRAINT <constraint name> FOREIGN KEY (column) REFERENCES
<tablename>(<condition>)
);
Syntax-3:
ALTER TABLE table_name
ADD CONSTRAINT constraint_name
FOREIGN KEY (column_name) REFERENCES parent_table (parent_column);
Default values
Any columns can be assigned a default value which will replace NULL in case if the values for the column is
not provided.
Syntax-1
Syntax-2:
ALTER TABLE table_name ALTER COLUMN column_name [SET DEFAULT value | DROP DEFAULT];
Generated Columns
Value of generated column is computed from the expression given in columns definition.
Example1:
CREATE TABLE t1 (
first_name VARCHAR(10),
last_name VARCHAR(10),
full_name VARCHAR(255) GENERATED ALWAYS A
(CONCAT(first_name,' ',last_name)) );
Example1:
CREATE TABLE trinagles(
base float,
height float,
area float GENERATED ALWAYS AS (base * height * 0.5) );
Built-In Functions
Numeric Functions
FLOOR
Round a number down to the nearest integer, which is less
FLOOR(1 0.6) 10
than or equal to number
LOG Return the logarithm of a numeric value to a specified base LOG(2, 64) 6
Divide the first parameter by the second one and return the
MOD MOD(10,4 ) 1
remainder
ROUND
Round a number to the nearest integer or to a specified
ROUND(1 0.3) 10
decimal places
SCALE Return the number of decimal digits in the fractional part SCALE(1. 234) 3
TRUNC
Truncate a numeric value to a whole number of to the
TRUNC(1 2.3) 12
specified decimal places
1 0.9684356
RANDOM Return a random number that ranges from 0 to 1 random()
String Functions
CONCAT
Concatenate two or more strings into
CONCAT(‘A’,’B’,’C’) ABC’
one
CONCAT_WS
Concatenate strings with a separator CONCAT_WS(‘,’,’A’,’B’,’ C’) A,B,C’
INITCAP
Convert words in a string to title case INITCAP(‘hI tHERE’) Hi There
LENGTH
Return the number of characters in a
LENGTH(‘ABC’) 3
string
LOWER
Convert a string to lowercase LOWER(‘hI tHERE’) hi there’
POSITION
Return the location of a substring in a
POSTION(‘B’ in ‘A B C’) 3
string
SUBSTRING
Extract a substring from a string SUBSTRING(‘ABC’,1,1) A’
UPPER
Convert a string to uppercase UPPER(‘hI tHERE’) HI THERE’
Date Functions
select
current_timesta mp Returns Current date and time
current_timestamp ;
Return the FIELD part from the date. Valid FIELDS are below:
extract(FIELD from date select extract(day from
'<date>') YEAR, DAY, MONTH, HOUR, MINUTE, SECOND, date '2021-05-24')
MILLISECONDS, DOY, DOW, CENTURY, QUARTER
age(timestamp '2001-
age ( timestamp, times Returns difference between given 2 dates or given date and
04-10', timestamp
tamp ) now()
'1957-06-13')
select pg_sleep('5
pg_sleep_for('interval') Sleeps for given interval
minutes')
Date Arithmetic
Date Addition
Date Subtraction
select date '2021-02-15' - date '2021-03-15' ;
select date '2021-02-15' + interval '30 days';
select ename, now()-hiredate from emp;
select ename, (now()-hiredate) from emp;
select ename, age(now(),hiredate) from emp;
to_char() function can be used to format date or numeric values and output will be treated as
string.
Examples:
to_char(timestamp '2002-04-20 17:31:12.66', 'HH12:MI:SS')
to_char(now(), 'YYYY-MON-DD')
to_char(125, '999') → 125
to_char(125.8::real, '999D9') → 125.8
to_char(-125.8, '999D99S') → 125.80-
to_date() Function
Pattern Description
MI minute (00–59)
SS second (00–59)
MS millisecond (000–999)
MON abbreviated upper case month name (3 chars in English, localized lengths vary)
DY abbreviated upper case day name (3 chars in English, localized lengths vary)
to_number() Function
to_number('122345', '99G999D9S') -> 12,234.5
Numeric Formatting
Pattern Description
Aggregate Functions
sum()
avg()
count()
min()
max()
string_agg()
max
---------
5000.00 (1 row)
employees=# select min(sal) from emp;
min
--------
800.00 (1 row)
employees=# select count(sal) from emp;
count
-------
14 (1 row)
employees=# select avg(sal) from emp;
avg
-----------------------
2073.2142857142857143
(1 row)
employees=# select job, count(*) from emp group by job;
job | count
-----------+-------
CLERK | 4
PRESIDENT | 1
MANAGER | 3
SALESMAN | 4
ANALYST | 2
(5 rows)
employees=# select deptno, count(*) from emp group by deptno;
deptno | count
--------+-------
30| 6
10| 3
20| 5
(3 rows) employees=#
employees=# select deptno, count(*) from emp group by deptno having count(*) < 4;
deptno | count
--------+-------
10| 3
(1 row)
employees=#
employees=# select * from emp limit 3;
employees=# select * from emp order by ename;
employees=# select job, count(*) from emp group by job order by count(*);
job | count
-----------+-------
PRESIDENT | 1
job | count
-----------+-------
CLERK | 4
SALESMAN | 4
MANAGER | 3
ANALYST | 2
PRESIDENT | 1
(5 rows)
The SELECT statement can be used to retrieve (select) data from a table or an expression. It is one of the
most complex statement in SQL language with a lot of clauses to perform data retrieval and
transformation.
The SELECT statement can be used to retrieve (select) data from a table or an expression. It is one of the
most complex statement in SQL language with a lot of clauses to perform data retrieval and
transformation.
Syntax:
SELECT [DISTINCT] [columns selection] [FROM <table>] [WHERE <expression>]
[GROUP BY <Expression> [HAVING <Expression>]] [ORDER BY <sort expression> ASC |
DESC] [LIMIT row_count OFFSET row_to_skip]
Aliasing
Alias is a temporary name given to a table or a column during execution of the query.
Column Alias
• Column aliases can be used in the SELECT list of a SQL query in PostgreSQL.
• Like all objects, aliases will be in lowercase by default. If mixed-case letters or special symbols, or
spaces are required, quotes must be used.
• Column aliases can be used for derived columns.
• Column aliases can be used with GROUP BY and ORDER BY clauses.
• We cannot use a column alias with WHERE and HAVING clauses.
Table Alias
• Table aliases can be used in SELECT lists and in the FROM clause to show the complete record or
selective columns from a table.
• Table aliases can be used in WHERE, GROUP BY, HAVING, and ORDER BY clauses.
• When we need data from multiple tables, we need to join those tables by qualifying the columns using
table name/table alias.
• The aliases are mandatory for inline queries (queries embedded within another statement) as a data
source to qualify the columns in a select list.
Example:
SELECT ename employee_name, empno AS employee_id FROM emp;
Quoting
Single Quote / $ (dollar Quote)
• Single quotes and dollar quotes are used to specify non-numeric values
Example:
'hello world'
'2011-07-04 13:36:24' '{1,4,5}'
$$A string "with" various 'quotes' in.$$
$foo$A string with $$ quotes in $foo$
Double Quote
• Double quotes are used for names of database objects which either clash with keywords, contain
mixed case letters, or contain characters other than a-z, 0-9 or underscore
Example:
select * from "select" create table "HelloWorld" ...
select * from "Hi everyone and everything"
• Cross Join
• Inner Join
• Natural Join
• Left Outer Join
• Right Outer Join
• Full Outer Join
Join Condition
The join condition determines which rows from the two source tables are considered to be a “match”. The
join condition is specified with the ON or USING clause or implicitly by the word NATURAL.
Cross Join
Cross Join is a cartesian product which contains all rows from first table are joined with all rows from
second table. For example if the first table has m rows and second table has n rows then resultant cross
joined tables will have m*n rows.
Inner Join
Inner also called as Join combines only matching rows for two tables and leaves out the rows without
matches.
Syntax:
Select <columns list> from tableA A Inner Join TableB B On <condidition>;
Select <column list> from TableA A Join TableB B On <condition>;
Select <column list> from TableA A Join TableB B Using <common column>;
Natural Join
Natural Join is Inner Join only difference is that there is no need to give any Join Condition.
Syntax:
Select <column list> from TableA A Natural Join TableB B;
Left Outer Join also called as Left Join combines all matching rows from both table plus rows from left side
table which does not find matches also included.
Syntax:
Select <columns list> from tableA A Left Outer Join TableB B On <condidition>;
Right Outer Join also called as Right Join combines all matching rows from both table plus rows from Right
side table which does not find matches also included.
Syntax:
Select <columns list> from tableA A Right Outer Join TableB B On
<condidition>;
Select <column list> from TableA A Right Join TableB B On <condition>;
Select <column list> from TableA A Right Join TableB B Using <common column>;
Full Outer Join also called as Full Join combines all matching rows from both table plus rows from both left
side table and right side table which does not find matches also included.
Syntax
Select <columns list> from tableA A Full Outer Join TableB B On <condidition>;
Select <column list> from TableA A Full Join TableB B On <condition>;
Select <column list> from TableA A Full Join TableB B Using <common column>;
Results of 2 or more queries can be combined using SET operations like UNION, INTERSECT. Supported set
operations in PotgreSQL as follows:
• UNION
• UNION ALL
• INTERSECT
• EXCEPT
UNION
Union set operator allows to combine the results of 2 queries as a single result set.
The following are requirements for the queries in the syntax above:
o The number and the order of the columns must be the same in both queries.
o The data types of the corresponding columns must be the same or compatible.
Syntax:
select <col1>, <col2>, ... from <table1> UNION select <col1>, <col2> ... from
<table2>
UNION ALL
UNION ALL works same as UNION but it combines all elements from both the sets where as UNION
returns common elements only once
Syntax:
select <col1>, <col2>, ... from <table1> UNION ALL select <col1>, <col2> ... from
<table2>
INTERSECT
Intersect allows to combine results of 2 queries and returns only common rows between them.
Syntax:
select <col1>, <col2>, ... from <table1> INTERSECT select <col1>, <col2>,.....
from <table2>
EXCEPT
EXCEPT compares the result sets of two queries and returns the distinct rows from the first query that are
not output by the second query.
Syntax:
select <col1>, <col2>, ... from <table1> EXCEPT select <col1>, <col2>,..... from
<table2>
Sub Queries
A Sub Query is also a query, which is defined under a main query. First Query is also called as outer query
or main query and second query is also called as inner query or subquery.
Server first sub query executed based on the sub query value main query will get executed.
If a sub query send single value to its nearest main query then that sub query is called Single-Valued-Sub
query.
If a sub query sends multiple values to its nearest main query then that sub query is called Multi-Valued-
Sub query.
Note: If a sub query sends multiple values to its nearest main query then we have to use IN or NOT IN
operator between Main query and Sub query.
A subquery which returns more than one row to the outer query is called as Multi Values subquery.
Multivalues values sub queries can be handled by IN,
Multivalue SubQuery
select * from tablename where columname in (select statement where condition)
Example:
In Correlated sub queries the outer query and the inner (sub) query are executed simultaneously.
Example:
display max salary for each dept and corresponding emp details
select * from emp e where sal=(select max(sal) from emp where deptno=e.deptno)
"First outer query gets executed, extracts 1 row at a time(candidate row) and that row is given to inner
query for processing, inner query will provide its output to outer query and based on the condition outer
query will display the extracted record".
Object Hierarchy
Database
• A running PostgreSQL server can manage multiple databases.
• A database is a named collection of SQL objects. It is a collection of schemas and the schemas
contain the tables, functions, etc.
• Databases are created with CREATE DATABASE command.
• Databases are destroyed with DROP DATABASE command .
• To determine the set of existing databases:
• SQL: SELECT datname FROM pg_database;
• PSQL META COMMAND: \l (backslash lowercase L)
Creating a Database
There is a program that you can execute from the shell to create new databases, createdb.
Syntax:
$ createdb <dbname>
Syntax:
Switching to a Database:
postgres=# \connect <dbname>
Connecting to a Database:
We can query pg_database system table to find the list of databases in the cluster.
Example:
postgres=# select datname from pg_database;
Or we can also use psql meta command to find the list of databases created so far.
postgres=# \l
Dropping Database:
postgres=# DROP DATABASE <dbname>;
What is a Schema
• A database can contains one or more named schemas.
• By Default, all database contain public schema.
• There are several reasons why one might want to use schemas:
✓ To allow many users to use one database without interfering with each other.
✓ To organize database objects into logical groups to make them more manageable.
✓ Third-party applications can be put into separate schemas so they cannot collide with the
names of other objects.
User Owns Schema and Schema contains the database objects like table, views etc.
PUBLIC Schema
Every database will be having a default schema named PUBLIC accessible to OWNER of database.
Creating Schema
A Schema can be created by connecting to the desired database and running the CREATE SCHEMA
command.
Syntax:
CREATE SCHEMA <schema name> AUTHORIZATION <username>;
For example if we need to create a schema “raj” in PROD database for user ‘raj’ then run the following
commands
Example
postgres=# \c prod
Now a new schema called “raj” is created in the database PROD which is owned by user “raj”
Listing Schemas
postgres=# \dn
postgres=# select * from pg_namespace;
Database Objects
• Tables
• Sequences
• Views
• Synonyms (only in commercial Postgresql)
• Domains
• Packages (only in commercial Postgresql)
• Functions
• Procedures
CASE Function
The CASE statement in PostgreSQL allows you to add conditional logic to your SQL queries. Here’s a basic
syntax and example:
Syntax:
SELECT
column1,
column2,
CASE
ELSE default_result
END AS new_column
FROM
table_name;
Example :
SELECT
empno,
ename,
sal,
CASE
ELSE 'Low'
END AS salary_category
FROM
emp;
COALESCE
The COALESCE function in PostgreSQL is used to return the first non-null value from a list of expressions.
This is particularly useful when dealing with potential NULL values in your data.
Syntax:
Example:
SELECT
empno,
ename,
FROM emp;
NULLIF
The NULLIF function in PostgreSQL is used to compare two expressions and returns NULL if they are equal.
Otherwise, it returns the first expression.
Examples:
SELECT NULLIF(10, 10); -- Output: NULL
SELECT NULLIF(10, 20); -- Output: 10
Use Cases:
• NULLIF is commonly used to replace empty strings ('') or other default values with NULL in SELECT
statements.
• useful for avoiding division by zero errors
Type Casting
Data type casting is helpful to convert a datatype to another type. It can be done by using CAST function
or by using :: operator.
Syntax:
-- OR
expression::target_data_type
Example:
-- OR
SELECT '2025-01-26'::DATE;
Example:
-- OR
SELECT 123::TEXT;
JSON / JSONB
JSON & JSONB datatype brings NoSQL support in PostgreSQL allowing users to save data in JSON objects.
PostgreSQL validates the syntax any input and rejects if it does not confirm to JSON syntax.
JSON support was included in PostgreSQL in version 9.2 and JSONB support included in version 9.4.
JSONB is much better choice to use as it is faster and supports many features.
Querying
select * from emp_json;
update emp_json set details = details - 'comm' where details ->> 'ename' = 'SMITH';
Syntax:
WITH <cte_name> AS (auxiliary query) primary_query;
Example:
WITH emp_salgrade AS (select ename, job, sal, deptno, grade from emp join salgrade on
sal between losal and hisal) select ename, job, sal, grade from emp_salgrade es join
dept d on es.deptno=d.deptno;
Both the primary and auxiliary queries can be data modifying queries like INSERT, UPDATE or DELETE.
Temporary result set of the auxiliary query can be referenced by primary query only if the RETURNING *
clause is used otherwise not.
Example:
WITH del_table AS (delete from visitor_logs where log_date < CURRENT_DATE - interval
'1 month' RETURNING *) insert into visitor_archive select * from del_table;
The above statement deletes the rows from visitor_log table and also insert them in visitor_archive table.
Window Functions
Windows functions performs the calculations on set of rows that are related to current row.
Window Functions
o min(),
o max(),
o avg()
o rank() – ranks the output
o row_number() - generates the row numbers
o lag() - Showing the value of previous row in current row
o lead() - Showing the value of next row in current row
o last_value() - Showing last value
min()
select ename, job, sal, min(sal) over (partition by job) from emp;
select ename, job, sal, min(sal) over (partition by job order by sal desc ) from emp
select ename, job, sal, min(sal) over () from emp
rank()
select ename, job, sal, rank() over (order by sal desc ) from emp;
select ename, job, sal, rank() over (partition by job order by sal desc ) from emp;
row_number()
select ename, job, sal, row_number() over (order by sal desc ) from emp;
lag()
select ename, job, sal, lag(sal,1) over (partition by job order by sal desc ) as
prev_sal from emp;
lead()
select ename, job, sal, lead(sal,1) over (partition by job order by sal desc ) as
next_sal from emp;
last_value()
select ename, job, sal, last_value(sal) over (partition by job order by sal desc )
from emp;
first_value()
select ename, job, sal, first_value(sal) over (partition by job order by sal desc ) from emp;
PostgreSQL Indexes
An index allows the database server to find and retrieve specific rows much faster than it could do without
an index.
o Hash Indexes
o B-Tree Indexes
o GIN Indexes
o GiST Indexes
o BRIN Indexes
B-Tree Indexes
This the default type of index created in PostgreSQL. B-Tree indexes can handle equality searches and
range searches efficiently. If the searches with following operators are done then optimizer selects B-Tree
index, if present.
However LIKE operator is supported only if the search term is anchored at the beginning of string (eg.
ename LIKE "foo%).
Syntax:
CREATE INDEX <indexname> ON <table_name> (<indexed_column> [ASC | DESC] [NULLS
{FIRST | LAST }, ,indexed_column2....);
You can adjust the ordering of a B-tree index by including the options ASC, DESC, NULLS FIRST, and/or
NULLS LAST when creating the index;
Syntax:
CREATE INDEX <index name> ON <tablename> (<col> NULLS FIRST);
CREATE INDEX <index name> ON <tablename> (<col> DESC NULLS LAST);
If any of frequently running queries uses multiple columns in where clause then for such queries a
multicolumn index can be created which includes all columns used in where clause.
Syntax:
CREATE INDEX test_idx1 ON test (id_1, id_2);
Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes. Up to 32
columns can be specified
Unique Indexes
Indexes can also be used to enforce uniqueness of a column's value or the uniqueness of the combined
values of more than one column
Syntax:
CREATE UNIQUE INDEX name ON table (column [, ...]);
PostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a
table
Functional Indexes
An index can be created on a computed value from the table columns. Usually if the expression used in
where clause optimizer ignore the indexes and prefer full tablescan. We can avoid full tablescans by
creating the index with expression instead of column.
Syntax:
CREATE INDEX <index name> ON <tablename> (lower(col1));
A partial index is an index built over a subset of a table rather than whole table which reduces
maintenance of the index.
The index contains entries only for those table rows that satisfy the predicate and also optimizer uses this
index when the query predicate matches the index predicate.
Syntax:
CREATE INDEX <indexname> on <tablename> (<column>) where <condition;
Views
The view is not physically object or table. Instead, the query is run every time the view is referenced in a
query.
Normal Views
Creating Views
syntax
CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] VIEW name [ WITH ( view_option_name [=
view_option_value] [, ... ] ) ]
AS query
[ WITH [ CASCADED | LOCAL ] CHECK OPTION ]
Syntax Explanation
TEMPORARY or TEMP
If specified, the view is created as a temporary view. Temporary views are automatically dropped at the
end of the current session.
name
This clause specifies optional parameters for a view; the following parameters are supported:
check_option (string)
Local: New rows are only checked against the conditions defined directly in the view itself
CASCADED: New rows are checked against the conditions of the view and all underlying base views
security_barrier (boolean)
security_invoker (boolean)
This option causes the underlying base relations to be checked against the privileges of the user of the
view rather than the view owner
query
A SELECT command which will provide the columns and rows of the view.
When this option is specified, INSERT and UPDATE commands on the view will be checked to ensure that
new rows satisfy the view-defining condition If they are not, the update will be rejected.
It is possible to change the query of the view only if all the existing columns are included in new query
otherwise such change is rejected however where clause can be changed.
Altering View
A view can be alerted to change OWNER, SCHEMA, RENAME and view options
Syntax
ALTER VIEW [ IF EXISTS ] name OWNER TO { new_owner | CURRENT_USER | SESSION_USER };
ALTER VIEW [ IF EXISTS ] name RENAME TO new_name;
ALTER VIEW [ IF EXISTS ] name SET SCHEMA new_schema;
ALTER VIEW [ IF EXISTS ] name SET ( view_option_name [= view_option_value] [, ... ]
);
ALTER VIEW [ IF EXISTS ] name RESET ( view_option_name [, ... ] );
Drop View
A view can be dropped when it is not referenced by other views otherwise a error is returned.
Syntax:
Materialized Views
Materialized views are created same like normal view the only difference is that the results are persisted
in materialized until materialized is refreshed. Materialized view behaves similar to tables.
Syntax:
CREATE MATERIALIZED VIEW name AS query;
Example
CREATE MATERIALIZED VIEW mymatview AS SELECT * FROM emp;
Changes made to the base table are not reflected in materialized view until it is refreshed.
Syntax
REFRESH MATERIALIZED VIEW view name;
Example
REFRESH MATERIALIZED VIEW mymatview;
Indexes
Indexes can be created on materialized views in same way as they can be created on tables
Syntax
DROP MATERIALIZED VIEW [ IF EXISTS ] name;
Example
DROP MATERIALIZED VIEW myview;
Lesson 9: Transactions
Transactions provide a way of grouping DML statements and before accepting the changes we can review
and other accepts them all or discard the changes. Other benefit of the transaction is it protect transaction
from concurrently running other transactions and manipulating same data.
How it works
Every transaction starts with START TRANSACTION statement and ends with either COMMIT or ROLLBACK
statement.
Syntax:
START TRANSACTION;
some DML Statements;
COMMIT; / ROLLBACK;
Example:
START TRANSACTION;
SELECT * FROM EMP;
UPDATE EMP SET SAL = SAL + 100 WHERE ENAME = 'MILLER';
COMMIT;
Syntax:
START TRANSACTION;
some DML Statements;
SAVEPOINT <savepoint name>;
some more DML statements;
ROLLBACK TO SAVEPOINT <savepoint name>;
COMMIT; / ROLLBACK;
Exporting
A table data can be exported to a portable CSV or TAB Separated text format. PostgreSQL gives flexibility
of choosing which columns to include or can also specify a query instead of Table.
Syntax 1:
COPY table_name ( column_name [, ...] ) TO 'filename' WITH Options…..;
Syntax 2:
COPY ( SELECT query ) TO 'filename' WITH Options…..;
Syntax 3:
COPY ( SELECT query ) TO PROGRAM 'command' WITH Options…..;
Syntax 4:
COPY table_name ( column_name [, ...] ) TO PROGRAM 'command' WITH Options…..;
OPTIONS:
CSV
TEXT
DELIMITER <character>
Importing
Any CSV or TAB Separated text file can be imported into a table If the column count and data types
matches irrespective of the origin of the file.
COPY <TableName>(Column1, Column2, ….) FROM /path/to/CSVfile WITH options.. [
WHERE condition ];
Insert statement allows inserting data from a table to another using SELECT query provided provided the
datatype and constraints are not in conflict
Syntax:
INSERT INTO target_table SELECT * FROM source_table [WHERE ..];
Syntax:
create table new_table as select * from source_table ;
Syntax:
CREATE TABLE new_table (LIKE existing_table );
NoSQL Databases
NoSQL Database
NoSQL is an approach to database design that can accommodate a wide variety of data models, including
key-value, document, columnar and graph formats. NoSQL, which stand for "not only SQL," is an
alternative to traditional relational databases in which data is placed in tables and data schema is carefully
designed before the database is built. NoSQL databases are especially useful for working with large sets of
distributed data.
Evolution of NoSQL
The evolution of NoSQL started in 1990's with the development and release of berkeleyDB. Developed by
University of California, Berkeley, influenced the development of NoSQL which supported certain
applications with specific storage needs.
Key-value stores
Key-value stores, or key-value databases, implement a simple data model that pairs a unique key with an
associated value. Because this model is simple, it can lead to the development of key-value databases,
which are extremely performant and highly scalable for session management and caching in web
applications.
Examples
Aerospike, Berkeley DB, MemchacheDB, Redis and Riak.
Document databases
Document databases, also called document stores, store semi-structured data and descriptions of that
data in document format. They allow developers to create and update programs without needing to
reference master schema. Use of document databases has increased along with use of JavaScript and
the JavaScript Object Notation (JSON)
Examples
Couchbase Server, CouchDB, DocumentDB, MarkLogic and MongoDB.
Wide-column stores
Wide-column stores organize data tables as columns instead of as rows. Wide-column stores can be found
both in SQL and NoSQL databases. Wide-column stores can query large data volumes faster than
conventional relational databases.
Examples
Graph stores
Graph data stores organize data as nodes, which are like records in a relational database, and edges,
which represent connections between nodes. Because the graph system stores the relationship between
nodes, it can support richer representations of data relationships.
Examples
AllegroGraph, IBM Graph, Neo4j and Titan
The basic NoSQL database classifications are only guides. Over time, vendors have mixed and matched
elements from different NoSQL database family trees to achieve more generally useful systems. That
evolution is seen, for example, in MarkLogic, which has added a graph store and other elements to its
original document databases. Couchbase Server supports both key-value and document
approaches. Cassandra has combined key-value elements with a wide-column store and a graph
database. Sometimes NoSQL elements are mixed with SQL elements, creating a variety of databases that
are referred to as multimodel databases.
NoSQL vs RDBMS
SQL NOSQL
These databases have fixed or static or predefined They have have dynamic schema
schema
These databases are not suited for hierarchical data These databases are best suited for hierarchical
storage. data storage.
These databases are best suited for complex queries These databases are not so good for complex
queries
CAP Theorem
The CAP Theorem is a fundamental theorem in distributed database systems that states any distributed
system can have at most two of the following three properties.
• Consistency
• Availability
• Partition tolerance
The CAP theorem states that a distributed system cannot simultaneously be consistent, available, and
partition tolerant.
Distributed databases
Distributed databases are usually non-relational databases that enable a quick access to data over a large
number of nodes.
Consistency
Any read operation that begins after a write operation completes must return the latest value.
In a consistent system, once a client writes a value to any server and gets a response, it expects to get that
value (or a fresher value) back from any server it reads from.
Availability
Every request receives a (non-error) response – without the guarantee that it contains the most recent
write
Partition Tolerance
The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by
the network between nodes
PACELC theorem
PACELC theorem is an extension to the CAP theorem. It states that in case of network partitioning (P) in
a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the
CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has
to choose between latency (L) and consistency (C )
The purpose of PACELC is to address the oversight in CAP theorem pertaining to consistency/latency
tradeoff of replicated systems as it is present at all times during system operation.
Markup Languages
What is JSON?
History
Douglas Crockford is typically credited with discovering the format because he was the first to popularize
and specify its use around the year 2001.
Crockford, was developing an AJAX web application framework. However, needed to find a way to enable
real-time two-way browser-to-server communication without relying on Flash plugins or Java applets. It
was this need that served as the impetus behind the discovery of JSON. As JSON is used to transfer data
between browser and server.
JSON consists of nothing more than commas, curly braces, square brackets, and data, it can be easily
parsed into an array or object by any programming language.
JSON data is formatted into two data structures that are used universally in all modern
programming languages:
JSON Arrays:
[ "red", "green", "blue", 7 ]
BSON
BSON is a computer data interchange format. The name "BSON" is based on the term JSON and stands for
"Binary JSON". It is a binary form for representing simple or complex data structures including associative
arrays (also known as name-value pairs), integer indexed arrays, and a suite of fundamental scalar types.
BSON originated in 2009 at MongoDB.
MongoDB represents JSON documents in binary-encoded format called BSON behind the scenes. BSON
extends the JSON model to provide additional data types, ordered fields, and to be efficient for encoding
and decoding within different languages
What is YAML?
Is a human-readable data-serialization language. It is commonly used for configuration files, but could be
used in many applications where data is being stored or transmitted. YAML targets many of the same
communications applications as XML but has a minimal syntax. It uses both Python-style indentation to
indicate nesting, and a more compact format that uses [] for lists and {} for maps making YAML
a superset of JSON.
employee
name: "Sami"
Occupation: "Trainer"
Location: "Hyderabad"
Canteach:
- mongoDB
- MySQL
- PosgreSQL
- Linux
LanguageFluency:
English: "Very Good"
Hindi: "Very Good"
Telugu: "Very Good"
Tamil: "Can Understand"
MongoDB is a document oriented database server developed in the C++ programming language. The word
Mongo is derived from Humongous.
History
The initial development of MongoDB began in 2007 when the company was building a cloud platform.
MongoDB was developed by a NewYork based organization named 10gen which is now known as
MongoDB Inc. It was initially developed as a PAAS (Platform As A Service). Later in 2009, it is introduced in
the market as an open source database server that was maintained and supported by MongoDB Inc.
The first ready production of MongoDB has been considered from version 1.4 which was released in
March 2010.
Features of MongoDB
In MongoDB, you can search by field, range query and it also supports regular expression
searches.
• Indexing
• Replication
A master can perform Reads and Writes and a Slave copies data from the master and can only be
used for reads or back up (not writes)
• Duplication of data
MongoDB can run over multiple servers. The data is duplicated to keep the system up and also
keep its running condition in case of hardware failure.
• Load balancing
MongoDB Architecture
Client Connection Protocols:
The MongoDB Wire Protocol is a simple socket-based, request-response style protocol. Clients
communicate with the database server through a regular TCP/IP socket.
• TCP/IP Socket
• Port
The default port number for mongod and mongos instances is 27017.
• Default Clients
The mongo shell is an interactive JavaScript interface to MongoDB. You can use the mongo shell to
query and update data as well as perform administrative operations.
• Security
MongoDB provides security by implementing user and roles. Security can be disabled.
MongoDB uses its own Query Language which is similar to Javascript and does not uses standard
SQL.
Object Hierarchy
A MongoDB instance consisted of 2 high level objects called as DATABASES and USER/ROLES.
DATABASE in turn may have one or more COLLECTIONS and INDEXES which actually stores data.
RDBMS MONGODB
ROW DOCUMENT
TABLE COLLECTION
Data Types
String
This is the most commonly used datatype to store the data. String in MongoDB must be UTF-8 valid.
Integer
This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.
Boolean
Double
This type is used to compare a value against the lowest and highest BSON elements.
Arrays
This type is used to store arrays or list or multiple values into one key.
Timestamp
This can be handy for recording when a document has been modified or added.
Object
Null
Symbol
This datatype is used identically to a string; however, it's generally reserved for languages that use a
specific symbol type.
Date
This datatype is used to store the current date or time in UNIX time format. You can specify your own date
time by creating object of Date and passing day, month, year into it.
Object ID
Binary data
Code
Regular expression
CRUD stands for Create, Read, Update and Delete basically they are equivalent to DML operations in
RDBMS.
MongoDB Operators
Comparison Operators:
Name Description
$gte Matches values that are greater than or equal to a specified value.
$lte Matches values that are less than or equal to a specified value.
$ne Matches all values that are not equal to a specified value.
Logical Operators:
Name Description
$and Joins query clauses with a logical AND returns all documents that match the conditions
of both clauses.
$not Inverts the effect of a query expression and returns documents that do not match the
query expression.
$nor Joins query clauses with a logical NOR returns all documents that fail to match both
clauses.
$or Joins query clauses with a logical OR returns all documents that match the conditions
of either clause.
Element Operators
Name Description
db.collection.insert([
{doc1},{doc2},....],
{
writeConcern: <document>,
ordered: <boolean>
}
)
Examples:
db.products.insert( { item: "card", qty: 15 } )
db.products.insert(
[
{ _id: 11, item: "pencil", qty: 50, type: "no.2" },
{ item: "pen", qty: 20 },
{ item: "eraser", qty: 25 }
]
)
Unordered Inserts:
db.products.insert(
[
{ _id: 20, item: "lamp", qty: 50, type: "desk" },
{ _id: 21, item: "lamp", qty: 20, type: "floor" },
{ _id: 22, item: "bulk", qty: 100 }
],
{ ordered: false }
)
insertOne method
db.collection.insertOne(
<document>,
{
writeConcern: <document>
}
)
insertMany Method
db.collection.insertMany(
[ <document 1> , <document 2>, ... ],
{
writeConcern: <document>,
ordered: <boolean>
}
)
find Method
Syntax:
db.collection.find({query}, {projection})
Projection: Selection of fields to return. Projection cannot contain both exclude and include specification
except for _id field
Query Examples:
db.bios.find( { qty: 23 } )
db.bios.find( { "name.last": "Hopper" } )
db.employees.find(
{ canteach: { $in: [ "mongodb", "MySQL") ] } }
)
db.employees.find(
{ name: { $regex: /^N/ } }
)
Range Queries:
db.employees.find( { salary: { $gte: 10000, $lt: 20000 } } )
db.employees.find( { doj: { $gt: new Date('2015-01-01'), $lt: new Date('2017-12-31')
} } )
Querying an Array
db.employees.find( { canteach: { $all: [ "mongodb", "mysql" ] } } )
db.employees.find(
{ canteach: { $in: [ "mongodb", "MySQL") ] } }
)
$elemmatch : The $elemMatch operator matches documents that contain an array field with at
least one element that matches all the specified query criteria
Sorting Result
db.employees.find().sort({name: 1})
Limiting output
db.employees.find().limit(5)
Skip method
db.employees.find().skip(5)
Updating Data
Update Method
db.collection.update(query, update, options)
Modifies an existing document or documents in a collection. The method can modify specific fields of an
existing document or documents or replace an existing document entirely, depending on the update
parameter.
Syntax:
db.collection.update(
<query>,
<update>,
{
upsert: <boolean>,
multi: <boolean>,
writeConcern: <document>,
}
)
Upsert Behavior
If upsert is true and no document matches the query criteria, update() inserts a single document. The
update creates the new document.
If upsert is true and there are documents that match the query criteria, update() performs an update.
db.people.update(
{ name: "Andy" },
{
name: "Andy",
rating: 1,
score: 1
},
{ upsert: true }
)
If multi is set to true, the update() method updates all documents that meet the <query> criteria.
db.books.update(
{ stock: { $lte: 10 } },
{ $set: { reorder: true } },
{ multi: true }
)
Changing Data
Changes to documents can be done using any of the $set or $inc Operators
Example of $set:
db.books.update(
{ _id: 1 },
{
$set: {
item: "ABC123",
"info.publisher": "2222",
tags: [ "software" ],
"ratings.1": { by: "xyz", rating: 3 }
}
}
)
Remove Fields
The following operation uses the $unset operator to remove the field
updateOne method
db.collection.updateOne(
<filter>,
<update>,
{
upsert: <boolean>,
writeConcern: <document>,
}
)
updateMany method
Syntax:
db.collection.updateMany(
<filter>,
<update>,
{
upsert: <boolean>,
writeConcern: <document>,
}
)
Deleting Data
Remove Method
Syntax:
db.collection.remove(
{<query>},
{
justOne: <boolean>,
writeConcern: <document>,
}
)
And empty query deletes all documents from the collections. Example below.
db.employees.remove( { } )
db.employees.remove({name:"Sami"})
Delete Method
Like remove method delete method also deletes the documents matching given query. There are 2
variations namely deletemany and deleteone.
Syntax:
db.collection.deleteMany(
<filter>,
{
writeConcern: <document>,
collation: <document>
}
)
db.collection.deleteOne(
<filter>,
{
writeConcern: <document>,
collation: <document>,
hint: <Index hint>
}
)
MongoDB is schema less, means there is no fixed schema or field definitions for collections however
MongoDB allow to specify certain validations such as size limit for collection or required fields.
SIZE: Maximum size of the collections. When Collection reaches this size then the oldest documents are
deleted to make room for new documents. MongoDB ensures that the collection does not grow above the
mention size. Mentioning SIZE is must for capped collections.
MAX: specifies maximum number of document a collection can have. SIZE takes the precedence when SIZE
limit is reached before reaching MAX.
Example:
db.createCollection("stock", { capped : true, size : 5242880, max : 5000 } )
The above example creates a collection named stock with 5MB size cap.
Collection Validations
MongoDB does not enforce fixed schema requirements but it provides validations to declare required field
list and field properties like datatype, input validations etc.
Required Fields
db.createCollection( "employees", {
validator: { $jsonSchema: {
bsonType: "object",
required: [ "ename", "job", "salary" ] }
}
}
)
A new collection named "employees" is created where every document required to have "ename", "job"
and "salary" columns.
Input Validation
Schema validations allows to specify datatype and other input validations for documents in a collection.
properties object which is embedded object within validator object facilitates input validations.
properties: {
field1 : {
bsonType: "<type>",
enum : [v1,v2,....],
description: "<some description>"
},
field2: {
bsonType: "<type>",
minimum : <number>,
maximum : <number>,
description: "<some descriptions>"
}
}
Example:
db.createCollection( "employee2", {
validator: { $jsonSchema: {
bsonType: "object",
required: [ "ename", "job", "salary" ],
properties: {
ename : {
bsonType: "string",
description: "Name of the Employee"
},
job: {
bsonType: "string",
enum : ["Manager", "Analyst","Clerk","Salesman"],
description: "Designation of Employee, Choose one of the mentioned jobs"
},
salary: {
bsonType: "decimal",
minimum : 1000,
maximum : 1000000,
description: "Salary of Employee"
},
deptno: {
bsonType: "int",
description: "Department number of Employee"
}
}
}
}
}
)
db.getCollectionInfos()
The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing
pipelines. Documents enter a multi-stage pipeline that transforms the documents into aggregated results.
Syntax:
db.collection.aggregate([ {stage1}, {stage2}, {stage3},....])
• Each stage performs an operation on the input documents. For example, a stage can filter
documents, group documents, and calculate values.
• The documents that are output from a stage are passed to the next stage.
• An aggregation pipeline can return results for groups of documents. For example, return the total,
average, maximum, and minimum values.
Aggregation stages
$match stage
Usage:
{ $match: { <query> } }
Example:
db.emp.aggregate([{$match: {job:"CLERK"}}])
$group Stage
Group stage perform group level aggregation computations like sum, count, min, max etc. For group level
computations it further needs accumulators as shown in table below.
Name Description
$first Returns a value from the first document for each group. Order is only defined if the
documents are in a defined order.
$last Returns a value from the last document for each group. Order is only defined if the
documents are in a defined order.
Grouping Examples:
Lowest Salary:
Highest Salary:
$bucket stage
Syntax:
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}
Lets say that we want categorized products according to salePrice and count number of products within
each price range.
Here the price range we are using is called as boundaries based on which grouping is done for salePrice
field
Example:
db.emp.aggregate([
{$bucket:{groupBy:"$sal",boundaries:[0,1000,2000,5000],default:"Other",output:{"count
":{$sum:1}}}}])
Example of bucket with 2 output fields fist will count no. of products in each bucket and second will report
what is the maximum salePrice in each bucket
db.emp.aggregate([
{$bucket:{groupBy:"$sal",boundaries:[0,1000,2000,3000,4000,5000],default:"Other",outp
ut:{"count":{$sum:1},"max":{$max:"$sal"}}}}])
$bucketAuto stage
syntax:
{
$bucketAuto: {
groupBy: <expression>,
buckets: <number>,
output: {
<output1>: { <$accumulator expression> },
...
}
}
}
Example:
db.emp.aggregate([
{$bucketAuto:{groupBy:"$sal",buckets:4,output:{"count":{$sum:1},"max":{$max:"$sal"}}}
}])
$addFields stage
Adds new fields to documents. $addFields outputs documents that contain all existing fields from the
input documents and newly added fields
Syntax:
{ $addFields: { <newField>: <expression>, ... } }
Lets add a new field called Annual Salary which is computed 12 times of sal.
$project stage
Passes along the documents with the requested fields to the next stage in the pipeline. The specified fields
can be existing fields from the input documents or newly computed fields.
Syntax:
{ $project: { <specification(s)> } }
Example:
db.emp.aggregate([ {$project: {_id:0,ename:1,job:1, sal:1}}])
$lookup stage
Performs a left outer join to an un-sharded collection in the same database to filter in documents from the
“joined” collection for processing.
Syntax:
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
For example if we have a dept collection with deptno, name and location filelds and an
employees collection with a matching deptno filed we can match both the collections.
db.dept.aggregate([{ $lookup: {
from: "emp",
localField: "_id",
foreignField: "deptno",
as: "employee-details" }
}
]
)
In MongoDB, indexes enhance the performance of queries by allowing the database to efficiently locate
documents within collections. Here we learn all about Indexes in MongoDB.
Indexes created on single field either ascending or descending are Single Field Indexes
Syntax:
db.collection.createIndex( { field: 1 } )
Creates an ascending index
db.collection.createIndex( { field: -1 } )
The given field can be embedded document or a field from embedded document
Compound Indexes
MongoDB supports compound indexes, where a single index structure holds references to multiple
fields within a collection’s documents.
Syntax:
db.collection.createIndex( { <field1>: <type>, <field2>: <type2>, ... } )
Example:
db.employees.createIndex({name:1, designation:-1})
Multikey Indexes
MongoDB creates an index key for each element in the array if the field used for index is an array.
Syntax:
db.coll.createIndex( { <field>: < 1 or -1 > } )
Example:
db.employees.createIndex({canteach:1})
Text Index
Text Indexes are useful for Fulltext searches on fields having large text data
db.collection.createIndex({text-column: "text"})
Example :
db.reviews.createIndex( { comments: "text" } )
Hashed Indexes
Hashed indexes are useful if used as sharing key as it result in more even distribution of data across the
shards.
TTL Indexes
TTL indexes are created with a particular expire time period in seconds to automatically remove the
associated document from the collection. TTL indexes are effective only if created on a date field or an
array containing date value.
Document Deletion:
• TTL Index deletes the document when the specified number of seconds lapses from the date
mentioned in field value
• If the specified field does not contain date then the document never expires
• If the document does to contain the specified field then also it will never expires
Syntax:
db.collection.createIndex( { "Date Field": 1 }, { expireAfterSeconds: 3600 } )
db.collection.createIndex( { "postedOn": 1 }, { expireAfterSeconds: 86400 } )
Unique Indexes
A unique index ensures that the indexed fields do not store duplicate values; i.e. enforces uniqueness for
the indexed fields. By default, MongoDB creates a unique index on the _id field during the creation of a
collection.
Syntax:
db.collection.createIndex( <key and index type specification>, { unique: true } )
Example:
db.members.createIndex( { "user_id": 1 }, { unique: true } )
Partial indexes only index the documents in a collection that meet a specified filter expression. By indexing
a subset of the documents in a collection, partial indexes have lower storage requirements and reduced
performance costs for index creation and maintenance.
Example:
db.products.createIndex(
{ pname: 1 },
{ partialFilterExpression: { stock: { $gt: 50 } } }
)
Sparse Indexes
Sparse indexes only contain entries for documents that have the indexed field, even if the index field
contains a null value. The index skips over any document that is missing the indexed field.
Syntax:
db.collection.createIndex( { "field1": 1 }, { sparse: true } )
Dropping Indexes
db.collection.dropIndex()
db.collection.dropIndexes()
example:
db.employees.dropIndex({name:1})
db.employees.dropIndexes()
Reindexing
db.collection.reIndex()
JSON Exports
MongoDB allows to export a collection in JSON format and also imports a JSON file into a collection using
mongoexport and mongoimport commands
Syntax:
mongoexport --host <hostname> --db <database> --collection <collection> --out
<outputfile.json>
Example:
mongoexport --host localhost --port 27017 --db mydatabase --collection users --out
users.json
CSV Exports
MongoDB allows to export a collection in CSV format and also imports a CSV file into a collection using
mongoexport and mongoimport commands
Syntax:
mongoexport --host <hostname> --db <database> --collection <collection> --type=csv -
-fields <field1>,<field2>,... --out <outputfile.csv>
example:
mongoexport --host localhost --port 27017 --db mydatabase --collection users --
type=csv --fields name,email,age --out users.csv
JSON Imports
Mongoimport command allows importing a fie containing valid JSON documents to a collection.
Syntax:
mongoimport --host <hostname> --db <database> --collection <collection> --file
<inputfile.json> --jsonArray
Example:
mongoimport --host localhost --port 27017 --db mydatabase --collection users --file
users.json –jsonArray
CSV Imports
Same mogoimport command allows to import a .CSV file into a MongoDB collections.
Syntax:
mongoimport --host <hostname> --db <database> --collection <collection> --type=csv --
file <inputfile.csv> --headerline
Example:
mongoimport --host localhost --port 27017 --db mydatabase --collection products --
type=csv --file products.csv –headerline