SQ L Transcript

This YouTube course, led by Barzalini, offers a comprehensive 30-hour SQL training for beginners, focusing on both writing SQL code and understanding its underlying principles through animated visuals. The course covers topics from basic queries to advanced techniques like data warehousing, optimization, and the use of AI in SQL projects, all provided for free. By the end, participants will be equipped to handle complex SQL tasks and apply their knowledge in real-world projects.

Uploaded by

Shah Fahad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views488 pages

SQ L Transcript

Uploaded by

Shah Fahad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

(76) SQL Full Course for Beginners (30 Hours) – From Zero to Hero - YouTube

https://www.youtube.com/watch?
v=SSKVgrwhzus&t=6985s&ab_channel=DatawithBaraa

Transcript:
(00:00) Hello and welcome to this unique course to master SQL. My name is
Barzalini and I lead big data projects at Mercedes-Benz over a decade of experience
in SQL data engineering, building data warehouses and data analytics. Now, of
course, the first question is what makes this course so special. Well, not only you will
learn
(00:15) how to write SQL codes, but more important than that, you will learn how
exactly SQL works behind the scenes. So I'm going to break complex concept in
SQL using hundreds of animated visuals. This makes it really easier to understand
SQL and as well it is more fun than just sharing my screen and I just show you code.
Right. The second
(00:35) reason is this course is taught by me. I have industrial experience and I will
be sharing with you everything that I know about SQL and how I use it in my real
projects. So I will be sharing with you hundreds of best practices, tips and tricks and
I'm going to show you my decision-m process in SQL. So by the end
(00:51) of this course, you will be ready to solve any complex task like I do using
SQL. So now I designed this course to cover the basics like writing your first SQL
query and then we're going to keep progressing in the course by covering advanced
techniques in SQL like the window functions, stored procedures,
(01:09) indexes and even at the end we're going to build a data warehouse using
SQL. And this course is suitable for anyone data engineers, data analyst, data
scientist and even for students. And by the way the good news everything is for free
from the start until the ends I will be sharing with you as well a lot of
(01:26) materials code presentations and animations and there are no hidden costs.
So you don't have to pay for anything. But my friends in return I really appreciate it if
you support the channel in order to grow. All right my friends I'm really excited about
it. I don't know about you. If you are motivated join me learning SQL. This is
(01:42) going to be amazing. So let's go. All right. Now I'm going to show you the
road map in order to learn everything about SQL starting from very basics and then
advance step by step until we have very advanced topics. So now at the start we
have to understand few stuff like what is SQL, why to learn it, what are databases
and the types of
(02:04) databases and after the theory we're going to prepare your PC with data and
the softwares. Now once we have everything then we can go to the next chapter.
This is the basics how to query data using SQL and here we're going to cover the
basic components in each SQL query like select from where those basics. Now once
you understand how to
(02:24) query the data, how to get the data out of the database the next step we're
going to go and learn how to define the structure of the database. How to create a
new table add a new column remove column and as well how to drop a table. So
with that you are defining new stuff in the database and then the next
(02:41) chapter you have to learn about the data manipulation. This time we're going
to go inside the table and we're going to learn how to insert a new data, how to
update the data and as well delete few rows from our database. So with that you
have the basics how to query data, how to define the structure of your tables
(02:57) and how to manipulate your data. And I can say with that you cover the
basics about SQL. Now after that we start with the intermediate phase where we're
going to deep dive into topics like how to filter your data. Here we're going to learn
about the comparison operators, logical operators, between and like. So
(03:12) all the operators that you can use in order to build a condition in order to filter
your data. Then after that it's going to be very interesting topic. You have to learn
how to combine them. And here we have two mechanism either using the join or
using the set operators. And oh my god joining data. It's going to be
(03:29) very interesting topic. Here we're going to cover like a lot of stuff like we're
going to start with the basic joins and then we go to advanced and then you have to
learn how to choose the right join and after that you have to learn about the set
operators and here you have like four methods union union all except
(03:44) intersects. So with that you learn how to combine multiple tables by
combining the columns or the rows of your tables. So this is very important. Now
moving on in our course. Now using SQL you can do a lot of stuff cleaning up the
data a lot of data preparations and at the end you can do a lot of analytics and
(04:03) aggregations. So there are like two families of functions. The first one is the
role level functions and here we have a lot of stuff you can transform your string
values the numbers date and time and how to handle the nulls in SQL and at the end
the amazing case statements. So all those stuffs are transformation for only one
single
(04:21) value. We call it role level functions. And after you learn how to do data
transformations, then you have to learn about how to do data analytics and
aggregations using SQL functions. So we're going to start with very basics like the
aggregate functions. And then we're going to deep dive into the window
(04:36) functions, analytical functions. And here we have like aggregates, ranking
and value functions. Those are very important tool for any data analyst or data
scientist doing analytics task in SQL. So I can say the rowle functions is for data
engineers and the analytical functions are for data analysts. So at
(04:54) the chapter 8 we can say you have covered now the intermediate level and
the last four chapters they will be the advanced stuff in SQL. So here there are a lot
of techniques that you have to learn about SQL. So the first one is the subquery
query inside another query and the very famous CTE common table
(05:11) expression. A lot of developers like this one and then you will learn about
how to create views in the database. This technique if you learn it you're going to be
really professional in SQL. Then we're going to learn how to create tables using
select the temporal tables and then we're going to learn about the
(05:27) third procedures how to write a program in SQL and after that of course
comes the triggers. So those are the advanced techniques that you have to learn in
SQL in order to do advanced projects using SQL. So now once you learn all those
concepts and you start writing a lot of SQL codes you will notice that some
(05:44) queries going to be really slow and for that you have to learn how to optimize
the performance of your queries and here there are a lot of techniques. The most
famous one is to create an index in the database or create a partition and at the end
I will be sharing with you the top 10 best practices that I have
(06:00) learned in my projects on how to optimize the performance of your queries.
So this is very important and then we're going to move to very interesting one. I will
be sharing with you how I use AI like shy GBT or copilot as I'm using SQL in my
projects. So here you have to learn how to write correct prompts to get assistance
from AI as you
(06:19) are using SQL. And finally and my favorite one it will be about SQL projects.
So my friends here you have to bring everything that you have learned about SQL in
handon projects. With real projects you will get challenges and struggle and here
going to happen the magic and the real learning and here there are three types of
projects. The
(06:39) first one is data warehousing project. This is very data engineering focused
project where you're going to learn how to build real data warehouse where you're
going to take the data from the raw formats and then process it in different layers.
Once you build it then you jump to another project. Here you're
(06:53) going to start exploring the data and start getting the first insights about the
business. And the last project that you can do is the advanced data analytics project.
So this is very important section where you do SQL projects. So my friends this is the
road map on how to learn SQL. So as you can see it takes you step by step from
(07:12) basics to intermediate and you will end up having advanced topics and with
that I can tell you you will learn everything about SQL. Okay. So now let's start with
the first chapter the introduction to SQL and here we're going to cover few topics. So
we have to understand first what is exactly SQL? Why we have to
(07:26) learn it? what are databases and the different SQL commands that we have
in SQL. So it is the basics the theory about SQL. So what is exactly SQL? Let's go.
So what is exactly SQL? Everything generate data and data is everywhere. Your first
name is data your mobile and everything inside the mobile is data.
(07:50) Car is as well generating a lot of data. Bank, your finance statements,
everything is data. And now of course the question is where do we store our data?
Personally we store a lot of our data in like excels, spreadsheets in a text file. So you
store a lot of your data in different files. Now how about
(08:05) companies? They have a lot of things that generate a lot of data that the
products that they produce their customers as well generating a lot of data and sales
informations and a lot of things. So companies generate massive amount of data. So
now the big question is how they handle the data how they
(08:21) store it. Of course, they cannot go unused like simple files. They need
something bigger, stronger and smarter. And here where the database comes in. So
think about the database. It's like a container for storing data. But instead of just
dumping files into folders, the database organized the data. So it is
(08:38) easy to access, to manage and to search. So a database simply it is a
container that stores data. So now you might ask why we are using database. Can't
we just use files like I do it personally? Well, let me tell you why we use databases.
Imagine that someone asks the following question. Go and find the total spending
(08:55) in your data. So now, in order for Mike to find the total spending and the
costs, he will be opening each of those files one by one, searching for the costs
trying to combine the data and it's going to be very long and messy process. But now
in the other side, if your data in database and you want to
(09:13) ask a question, it's going to be very easy. So all what you have to do is to talk
to the database to ask a question and the database can answer your question with a
result. And now comes of course the question how do we talk to a database? Well
we use SQL. SQL is the language that you use in order to talk
(09:30) to the database. It stands for structured query language SQL. And here you
have people that call it SQL like me and others that call it SQL. There is no right and
wrong but if you follow me through the course I think you will start saying SQL. So by
using SQL you can ask the database you can ask your
(09:50) data and the database going to answer your question by sending you a
result. So this process is very easy simple and fast and this is way better than having
your data stored in different files. Another reason why we use databases is that they
can handle really huge amount of data. So sometimes we have like
(10:07) millions of data inside our database but in the other side if you are storing
your data inside spreadsheets and you have like massive amount of data what can
happen your spreadsheets going to just break they simply can't handle big data and
another reason why we use databases is that it is just secure. It
(10:22) is safer to store important and critical data inside the database than just
storing it in spreadsheets and files. So the databases are secure and you can control
who is accessing what. So it is just more professional to store the data inside a
database. All right my friends so far what we have learned most of the
(10:40) companies stores their data inside a container called a database and for you
in order to ask questions and to talk to your database you have to speak the
language of SQL. Now I'm going to show you how it looks like usually in companies.
So we have our data inside the database and then you will have multiple people with
(11:01) multiple roles that are just writing different SQLs in order to talk to the data.
But now not only employees and people interact with the database. You could build a
website or an application that as well interacts with the database by sending different
SQLs. And of course, depend on how many people are
(11:18) interacting with the application and the website, it might generate really
massive amount of SQLs that sent to the database. And not only that, you might has
as well tools in order to do data visualizations where you have like a dashboard or
reports maybe created using PowerBI or Tableau and it is used by
(11:34) stakeholders and managers in order to make decisions and as well those
tools will be connected to the database and creating SQLs. So now as you can see
we have a lot of interactions with the database from people applications tools a lot of
things are generating SQLs and interacting with the database but the
(11:54) database is just a container and storage right so we need something a
software that manage all those requests and that's why we have something called
database management system DPMS so it is a software that going to manage all
those different requests to our database and it going to make the priority which
(12:11) SQL must be executed First, this software can as well manage the security
whether the SQL is allowed to be executed in the first place. So my friends, the
DPMS is the software that going to manage the database. And now we are not done
yet. There is something missing. So we have our data, we have the software. What
is missing here is
(12:30) the hardware. So in real companies, we cannot run that on our PC because
first our PC is weak and as well it goes offline. That's why we need a server. server it
is like very powerful PC and as well it lives 24/7 so it is always available and here we
can decide whether we're going to have a server inside the
(12:48) company or we can use cloud services in order to run our database so my
friends so far what we have learned the database it is container to store the data the
SQL it is the language in order to talk to the database the DPMS it is the manager it
manage the database and the server it is the physical machine where
(13:06) the database lives so this is how it looks Like and now my friends there are
different types of databases. So let's see what do we have. The first and the most
famous one it is the relational database. It is very simple. It is like spreadsheets call
them table where we have columns and rows and then there is
(13:26) like a relationship between those tables to describe how they relate to each
other and that's why we call it relational database. So if people hear a database
they're going to think about this one. Now we have another type of databases called
key value. This time the data is organized completely different where you have pairs
of keys
(13:43) and values. Think about it. It's like a big dictionary where you have a word
like the key and the definition of the word this is the value. And now moving on to the
next one. This is as well important column based. So now instead of grouping the
data by the rows this type of databases group the data into
(13:59) columns. That's why it's called column paste. And this is very advanced
database in order to handle huge amount of data where the main purpose is to
search for data. Moving on to another database called graph database. The main
focus here is the relationship between objects. So the main idea here is how to
(14:15) connect my data points. And now finally we have the document database.
The data is stored as entire documents where the structure of the data is not that
important. What is more important is to fit everything in one page in one document.
And now if you look to those five types, we can group the document,
(14:32) graph, column based, key value, all those databases called NoSQL
databases and the relational database, SQL database. And in this course, we will be
focusing of course on the relational database. And I'm sure you have heard about
like the Microsoft SQL server, the MySQL, the possesses they are SQL relational
(14:52) database. And for the key value you have the radius the Amazon Dynamo
DB and we have for the column paste we have the Cassandra and the red shift. For
the graph database we have the Neo 4G and the very famous database the
MongoDB as a document database. Now my friends for this course we're going to be
focusing
(15:11) on the SQL relational databases because it is the most famous one and the
most used one in companies and I will be focusing on the Microsoft SQL server. So
those are the different types of databases. Now the databases are very structured
and organized. It has the following hierarchy. The starting point is the
(15:32) server as we learned it is powerful PC and it is where the database lives and
inside it we can have multiple databases. So maybe you have a database for the
sales and another one for the HR. So the server can host multiple databases and as
we learned a database is a container of your data. Now moving
(15:49) on to the next level. In each database we can have multiple schemas. A
schema it is like category or you can call it a logical container that we can use it in
order to group up related objects like let's say you have hundred of tables. So you
can split all the tables that has to do with the orders in one schema and
(16:06) then another group of tables with the schema customers and so on. So it
help you to organize your tables and your objects in the database. And now if you go
inside schema you can have multiple objects like tables. So now of course the
question is what is a table? It is like spreadsheet. It organize your data
(16:23) into columns. The column define the data that you store inside it. So you
have one column about the customer ID. Another column about the names, the
scores, the birthday. So each column is about one type of data and sometimes we
call the columns as fields. Now the other thing that we have in tables is
(16:39) the rows or sometimes we call it records. It is where actually the data is
stored. Now in this example each record represent one customer one person. So we
have one record for Maria, John and Peter. Those we call them rows. Now in each
table there is like one very important column called the primary key.
(16:57) It is always very important to have like one unique identifier for each
customer for each row and we use it for different purposes in order to combine it with
another table in order to identify quickly one customer. So it is unique. It's like
fingerprint and there is no two customers having the same ID. Now
(17:14) the overlapping between the columns and the rows we have a single value a
cell and each value each column stores specific data type. A data type it is like what
kind of data we are storing like an integer 1 2 30 or a decimal where you have a
decimal point 3.14. Now if you want to store characters we have
(17:34) different data types for that like you want to store the name or the
description. So here we can use the char or the vchar. So you store inside them like
the first name Maria or something. Now you might ask what is a char or vchar. So
the char always a fixed one. So if you define it like five characters
(17:50) always it's going to go and reserve five characters from the space. But if you
want things more dynamic then you go with the vchar. And now moving on we have
another data types called the date and time. So if you want to store a date like the
birth dates and if you want to store the time information you can use
(18:05) the time data type. So we call those stuff int, decimal, char, date, time. They
are data types. So my friends, as you can see, SQL databases are very organized
and structured. Okay. So now let's focus more about the SQL itself. We have in SQL
different type of commands. So let's say that we have a database and this
(18:27) database is empty. So we have nothing inside it. Now, of course, the first
thing that you have to do is to write an SQL with the command create in order to
create brand new table in the database. So, once you executed the database going
to go and build one, but this table is empty. So, we have nothing inside it. So
(18:43) now what you have done here is you have defined something new, right?
And we call this type of commands the data definition language, the DDL. We have
create to create something new, alter in order to edit something that already exists
and drop in order to delete something. to drop for example a table.
(19:00) So this is the first family of commands. Now if you look at our table, it is
empty. What do we need? We need data. So let's say that we have a website or an
application. Now this application is generating a lot of data. Now in order for this
application to move the data inside our new table, it must use the
(19:15) SQL command insert. So if you execute insert, you can add a new data
inside your table. This type of commands we call it data manipulation language. And
here we have three commands. insert in order to insert a new data, update in order
to update an already existing data and delete in order to go and delete
(19:33) data from your table and that's why we call it data manipulation language
because you are manipulating your data. So what do we have now? We have table,
we have data inside the table. Now what we can do we can start asking questions.
So let's say that you have analytical question about your data. Now all what
(19:48) you have to do is to write something called SQL query and inside it you use
the command select but the whole thing we call it a query. So you send a query to
the database, you have a question and the database can return for you the result,
the data answering your query, your question and we call this type of
(20:05) activities using SQL, the data query language. And here we have only one
and it is very famous. We have the select. We can use it in order to query our data.
So those are the three different commands in SQL. And of course, we're going to
learn all of them, but we will spend most of our time learning how to
(20:22) write the correct query for the correct answer. And now you might ask me,
Barra, why we have to learn SQL? And if the time goes back, are you going to learn
SQL again? Well, for sure, of course. And here are the top three reasons that I have.
The first one, you have to learn it in order to talk to the data. You
(20:43) know, most of the companies stores their data in databases, and this is a
standard way. This is how they do it. And if you want to work on the company in the
data field and you want to talk to their data, then you have to use SQL. It's like you
move to another country where they speak another language and
(20:58) you want to live there for a long time, you have to speak their language. The
same thing here. If you want to work with data, you have to learn the language in
order to speak to the database, the SQL. So this is for me the most important reason
why we have to learn SQL and SQL it is in high demand. If you go now and check
the job
(21:14) description of the software developer, data analyst, data engineer, data
scientist, I promise you you will find there that they going to demand for SQL. So you
will find they going to ask for SQL skills almost in each job description. So if you
check for any data related jobs, you will find that they going to ask for SQL skills.
Now
(21:31) another reason that I have is it is industry standard. So if you go and check
multiple modern data platforms and tools like PowerBI, Tableau, Kafka, Spark,
Synaps, you will understand that there will be always a section where you have to
enter SQL code. So most of those vendors adopt SQL because it is the
(21:49) standard. It is widely used. It is like selling points that their tools are easy. So
those are my top three reasons why SQL is still relevant and why you have to learn
it. Okay, my friends. So with that we have now clear understanding what is an SQL
why we need it what are databases and their different types why do we have DBMS
(22:08) servers and as well now you have understanding how things are very
organized and structured inside the databases so that's all this is SQL all right so
with that we have covered the basics about what is SQL and databases now in the
next step we're going to go and set up our environments so that means we're going
to prepare your PC
(22:27) with the data with the databases and all the tools that you need in order to
learn SQL. Okay. So now go to the link in the description and you will land here in
my newsletter website and you can subscribe if you want to get weekly news about
my content. I make as well post about data and many other projects. So once you do
(22:48) that what we're going to do now we're going to go to the downloads over
here and you will find here all the materials of different courses and the one that we
want is SQL ultimate course. Let's go over here. Now once you do that you will land
to this page where I have listed all the important links. So the first
(23:02) one and the most important one is to go and download the course materials.
Here you can find everything code the slides the presentations the whole course or if
you don't want that you can go to my get repository and there you will find exactly the
same materials. So let's go and download everything. Okay. So now go
(23:17) and put the downloaded folder somewhere safe and let's go inside it. And
here you can find three things. The first one is the data sets. Here if you go inside it
you will find the data for the course the databases that we will be using in order to
practice SQL. So everything is available here. Now the second folder
(23:35) you can find all the documentations. So that means all the visuals the
presentation slides everything that I present during the course. It is available here as
a documentation notes for you. Now moving on to the third one we have the scripts.
So during the course we will be writing a lot of SQL codes and all those codes are
here
(23:52) available. So that means those are all the codes that is used in the course.
Okay. So with that you have now all the course materials. All right. So now the next
step is that we have to go and download the SQL Server Express and you can find
the link as well over here. So let's go there SQL Server Express. And
(24:07) now we're going to land on the Microsoft page where we can see the
different offering from Microsoft where it's called server. So either we have it on the
Azure or we can download it on the on premises. But we don't want those stuff. Just
scroll down to see those two options. So the first option on the left
(24:22) side we have the developer edition. You will get all the features and services
that Microsoft offers with the SQL server. It is as well free but the installation here is
little bit complicated. But in the second option on the right side we have the express
edition. Installation here going to be really fast and very easy. You will get
(24:40) as well all the stuff that you need for practicing SQL and learn SQL. So both
of the options are free. It's just a matter of the installation. We will go now for the
express edition. So go and click download now and it's very small file. So let's go
and start it. And now the installation going to start. So we have
(24:57) basic, custom and download media. So download media means download
now and later we're going to do the installation. Custom means we have more
control on how to download and install the stuff. The basic is the easiest one and the
quickest one. So let's go with the basics and click on that. And let's
(25:11) go and accept all those stuff. And now let's click on install. So now it's going
to install the applications, drivers and so on. It may take a little bit time. So in order to
do that, let's go and click on install SS SMS. So let's click on that and as well we can
find the link over here. So let's go to SQL
(25:32) Server Management Studio. So let's click on that. You can find of course this
link as well with the other links that I have collected. So now we are again at
Microsoft page. Let's go scroll down and now we will see the following link free
download for SQL Server Management Studio SS SMS. So let's go and click on
(25:48) that and then it's going to go and download it. Let's go and start it. So the first
thing that we have to define the location. I will go with the default stuff. So let's click
on install. Okay. Setup completed. We just installed SM SS SMS. So let's go and
close it. So now let's go and start it. If you go to your menu over here, search
(26:11) for SQL Server and you will find it here. SQL Server Management Studio.
Let's go and start it. Okay, so now we're going to get this window in order to connect
to our server. So again, what is our server? It is the one we have installed at the first
step, SQL Server Express. And that's why you're going to
(26:28) see in the server name, your PC name, of course, like it's not going to be my
PC name. But here we have something called SQL Express. This is the server we
just installed. So in the first option, we have database engines. We have reporting
services. Those are different stuff from Microsoft. We're going to leave it as a
(26:44) database engine. And it should be like this. SQL Express. Now, how to
access this database? We have the following stuff. We can do that using the window
authentications or SQL server authentications. I'm going to say that let's stick with
the window authentication. And the username going to be the PC name and as well
the window
(27:01) user. If you don't have it for some reason those informations, you can go to
your search search for cmd and then here you can say who am I? And with that you
will get the PC name and as well the user that you are currently logged in. And this is
exactly what I'm seeing over here. One more thing if you're having issue connecting
(27:22) to your database make sure to check the encryption. It should be mandatory
and to click on the trust server certificates. So once you do that you will be able to
connect. Okay. So with that we have the server we have the client. And now the last
step we have to go and create the database. We want to
(27:37) insert our data. So now if you look to the object explorer and open the
databases you can see that we don't have any database. So now let's do something
about it. Go back to the course materials inside the data sets you will find the
following. You will find we have here three folders MySQL postcress
(27:52) and SQL server. So if you want to follow with this course using different
database like MySQL and Postgress you can find the exact same data for the
database that you are using. But now in this course we are using the SQL server. So
if you follow me with that go inside the SQL server folder and here you will
(28:10) find four files with different extensions. So what is going on here? Now for
this course we have two databases. One that is very simple called my database and
second one that has more tables called sales DB. And now in SQL server there are
multiple ways on how to create databases. I will show you now two methods on how
to create the
(28:28) database. Now the first option we want to create the database from a script.
And if you look to those files, we have here two files with the extension SQL. Those
are files with SQL code. So let's start with the first one, the init SQL server my
database.SQL. Go inside it. And now here we have the SQL code. Copy everything.
(28:47) And now let's go back to our studio and then go to the menu and click on
new query. And here in the middle you can paste the code. So now we have the
code for the first database. And all what you have to do is to go and execute it. So
once we executed you will see we will not get any error. And now on the left
(29:03) side we don't see yet our database because we have to refresh. So right
click on the databases and click refresh. And now you can see it my database. So
now let's see the content. Go extend it and then go extend the tables. And now you
see here our two tables customers and orders. Inside those tables we can find our
data. In
(29:20) order to see the data right click for example of the customers and let's go
with the option select top 1,000 rows. Once you do that you can see now in the
results we have here five customers. This is our data inside the table customers. So
here again about the interface on the left side we have the object explorer where you
can see the
(29:38) whole structure of the database from server to databases to tables. So you
can see the whole structure on the top we have a menu with a lot of icons and then
in the middle this place here we call it the SQL editor. We're going to go and write
their SQL codes and then once you execute it at the bottom you
(29:55) will get the result and messages and below the SQL editor we have the
output. So here you can see for example the data the results or different messages
from the database. So the interface is very simple. Now we have to go and get our
second database. So if you go back to our files you can find a second SQL file
(30:11) the initql server sales db.sql. Open that and let's go and copy everything here
and let's go back to our studio. Same thing you have to go and create a new query
then paste the whole code and this database is about the sales DB. So let's go and
execute it and with that we will not get any errors and now we go to
(30:31) the left side and we do the same thing refresh and we can see the second
database sales DB. Now we can go and explore it. So extend it go to the tables and
here you can see five tables customers employees orders products. So here this is
the intermediate database for our course. So now let's go and check our data. For
example, let's go to
(30:49) the orders, right click on it and select top 10,00. And those are the orders of
our database. Perfect. So everything is working. So those are the main two
databases that we will be working through the whole course. And of course if you
want to go and practice using another database, it's totally fine. For
(31:05) example, in Microsoft, there are a database called Adventure Works. It is
really amazing. And I'm going to show you now how to import it. We can go over
here the adventure works. So let's click on this link. So now we are again in Microsoft
page. If you scroll down you can see here three different types of
(31:21) databases. The OLTB, data warehouse and lightweights. So they are like
different databases. The OLTP is the most like complicated one. A lot of tables and
transactions and so on. The data warehouse it is like really nice one in order to do
data analyzes and stuff. The lightweight it is the simplest one. So
(31:38) let's go for example and get the data warehouse. So click on that and now as
you can see the extension of this file isbak and now I'm going to show you the
second way on how to create databases in SQL server. So now all what you have to
do is to go to the following path. It really depends where you have installed
(31:55) the SQL server. So for me I have installed it in the program files Microsoft
SQL Server MSSQL SQL Express then MSSQL backup. You have to go there. So
here what you can do you can place all the files with the extension bak. For example,
the adventure works that we just installed. This is a backup file
(32:13) for the database and we want to go and restore it and with that you are
creating like a database. So this is the second method on how to create databases in
SQL server by restoring the database. If for some reason the script didn't work for
you. Now let me show you quickly how we can do that. Let's go
(32:30) back to our studio. Right click on the database and then here we have an
option called restore database. Click on that. And now here we have two options
under the source database and device. The default going to be database but we
have to switch to a device because we want to import it from files. And then we go to
(32:46) these three dots. Click on that. And now we have to go to the option add. And
now it's going to take you to the place where the SQL server creates backups. So
here we can find our files and what we want you to create is the adventure works.
Select that. Then okay, one more okay and one final okay. So now the
(33:03) database will be restored and it is successfully. So now on the left side we
can see our third database. If you don't see it go and refresh of course and here you
will find a lot of tables in the adventure works. And as usual we can go and explore
the data by selecting top thousand rows. So my friends now you
(33:20) have three databases but of course our focus is only the first two that we
have done my database and sales DB. And with that you have learned two ways on
how to import databases into SQL server. So with that my friends we have prepared
everything. We have the SQL Server Express running on your local PC. We
(33:36) have the studio the clients where we're going to use it in order to interact with
the database and we have created our two databases that we will be using in order
to practice SQL. So we are ready. All right my friends. So with that we are done with
the first chapter. We have our introduction to SQL and now
(33:52) we're going to start learning the first thing in SQL and that is how to query
our data. So let's go and start with that. Okay, so now we can understand exactly
what is an SQL query. Now normally your data is inside the table and your table is
inside the database and now you might have a question from the business like
(34:13) what is the total sales? What is the total number of customers? So any
question that you have in your mind and you want to go and ask your data you want
to go and retrieve data from the database and in order to do that you have to talk to
the database using its language the SQL. So in order to do that
(34:29) you're going to go and write a query where you write inside the query
something called select statement and with that you are asking the database for
data. So once you execute your query the database going to go and fetch your data
and then it prepares a result to be sent back to you. So with that you are
(34:47) asking the database a question by writing a query and the database going to
process your query and answer your question by sending back data and with that we
are like reading our data from the database and the queries will not modify anything
will not change the data inside your tables or even change the
(35:04) structure of the database. So you use select statement only in order to read
something from the database. You just want to retrieve data from the database. So
this is what we mean with a query. And now my friends, each SQL query has usually
different sections, different components. We call them clauses. And this is amazing
because
(35:27) you're going to have enough tools to write a query that matches any question
that you have about your data. So what we're going to do, we're going to cover all
those clauses step by step in order to write any query that you need. So now we're
going to start with two clauses that makes the simplest query in SQL.
(35:43) the select and from. So let's start with that. All right. So now it's really
important for me that you understand how SQL works with the code with the queries.
So now what I'm going to do, I'm going to show you on the right side the syntax of
the query in SQL and then on the left side I'm going to show you
(36:03) exactly step by step how SQL going to go and execute your query. So now
we have the table customers inside our database and we will start with the easiest
form where we're going to select everything. Select the star. So the select star is
going to go and retrieve all the columns from your table. So everything and the
(36:20) from clause it's going to tell SQL where to find your data. So with the select
we select the columns that we want and the from you specify the table where your
data come from. So the syntax going to be very simple. In each query we start
always with the select. And now since we want all the columns we're going to
(36:37) write star and with that SQL going to understand I want to see everything.
And then after that comes the keyword from. And now we want to tell SQL where the
data come from. So we have to specify the table name. And that's it. This is all what
you need to do. So once you execute it what's going to happen? SQL
(36:52) going to go and execute first the from clause. So it's going to go and retrieve
all the data from the database to the results. And then in the next step going to go
and check the select statement. So which columns we have to keep in the result
since you are saying star then the SQL going to keep everything all the
(37:08) columns and with that you will see in the result everything all the columns
and all the rows. So that's it. This is how it works. Now let's go back to scale in order
to select few data from our database. Okay. So back to our studio. Let's go and start
a new query and let's go and find our database just to expand
(37:24) it and our tables. Now it is very important to make sure that you are
connected to the correct database. So go to the top left in the menu over here and
make sure to select your database. So my database like this or we have a command
for that called use and then just write the database name like this.
(37:40) So I'm telling SQL just use my database like this and with that SQL going to
switch to your database. Now if you are learning any new programming language, it
is very important to understand about the comments. So comments are like notes
that you add to your code in order to understand what is going on. And of
(37:56) course the engine, the database will not go and execute it. it's going to go
and ignore everything inside it. And there is like two ways on how to do that. Either
you make inline comments by typing two dashes like this and then you write anything
this is a comment. So now in SQL if you see it is green that means
(38:13) it is a comments. Now the other type you can have multiple line comments
and in order to do that what you can do you can write slash and then start and then
you can write anything this and then start a new line is a comment. So as you can
see all the lines after the slash star it is getting green that means it is a comment
(38:32) and now let's say that you are at the end. So in order to close it you write
again star and then slash and that you are telling SQL I'm done with my comments.
So those are the two types of writing comments in SQL. Now back to our query. Let's
say that we have the following task says retrieve all customer data. So I would like to
see in
(38:49) the results all the data of my customers everything all the rows and all the
columns. So currently our data is stored inside the table called customer and I need
to see all the data in the output. In order to do that we're going to write a query and
all our query start always with a select and since I need
(39:05) everything all the columns we write star and then a new line. Let's go and
specify for SQL from where it's going to go and get the data. So it's going to be from
and then we going to write the name of the table. It must be exactly like it is in the
database. So it's called customers and you have to have it here
(39:21) as a customers. So that's it. Let's go and execute it. And now if you look to
the results, you can see we have four columns and five rows. So with that you are
seeing everything inside the table customers. You can see we have five customers
and you can see all the columns about the customers. So this is
(39:37) very simple. We have ask question for the database using SQL query and
the database should answer our question by returning our data in the results. All
right. So now let's move to another task. I'm going to go and create a new query and
this time we're going to retrieve all the order data. So that means I would like to see
all the data
(39:54) inside the orders. So let's go and write a very simple query. We start as usual
with select and since we want everything. So it is select star from our table orders.
So that's it. Let's go and execute. And with that you can see in the output we have
again four columns but this time we have only four rows. So
(40:11) that means in this table we have four orders and we can see all the data
inside this table. So with that we can understand we have five customers inside our
database and these customers did generate four orders. So as you can see we are
now talking to our database and this is the simplest form of query in
(40:28) SQL. All right. So now let's move to the next step in our query where you say
you know what I don't want to see all the columns from the database. I want to be
more specific. So I would like to select exactly the columns that I need. So now we
want to select few columns from the database where we select only the
(40:49) columns that we need instead of everything. Now about the syntax we're
going to go and change a little thing. So instead of using star we're going to go and
make a list of columns that we want to see in the output. So we're going to select
column one column two and we're going to separate them using a
(41:04) comma. So we are just writing a list of columns exactly after the select. And
for the from it's going to stay as it is. So from a table. Now if you execute this what
going to happen as usual SQL going to start with the from. So it's going to go and get
the data from the database and then the next step is going
(41:19) to go and check the select. So what going to happen? SQL going to go and
keep only two columns like for example the name and the country and all the
columns that are not mentioned in the select statements will be excluded. So SQL
going to go and remove it from the results and keeps only the columns that
(41:35) we mentioned in our query. So this time instead of having four columns in the
output we can have only two. So with that you are like filtering the columns and you
are selecting exactly what you need. So now let's go back to SQL in order to practice
this. All right. So now we have the following task and it
(41:51) says retrieve each customer's name, country and score. So that means I
don't want to see everything from the table customers. I need only to see the three
columns. So let's see how we can do that. As usual we start with select and I'm
going to go with a star in order to see the whole table first from the table
(42:08) customers. So it's exactly like before. Let's go and execute it. And now I can
see everything inside the table customers. But the task says I need only three
columns. So now what we're going to do instead of the star, we're going to make a
list of columns. So we start a new line and then we write the name of
(42:23) the first column. So the first name and a new line for the second column for
the country and then again a comma and then we write a score. So with that we
have the three columns. Now what I usually do, I go and select them and give it then
a push using a tab. This just looks nicer and easier to read. So with that
(42:40) we have now between the select and from list of columns. Now there is like
mistake that happens a lot where we go and type a comma after the last column. So
if you do that and execute it you will get an error because SQL going to expect from
you a column after the comma and since there is no column and
(42:56) immediately you have a from you will get an error. So there is no need for a
comma after the last column. Now let's remove it and execute. And now that you can
see in the output we don't have four columns we have only three. the first name, the
country and the score. And by the way, they are ordered exactly like
(43:11) you selected in your query. So first we have the first name and then the
country and then the last one the score. So that means if I go and now change the
order. So let's get the country at the end and execute. You will see the country at the
end. I'm going to go and put it back in between to match exactly like the task
(43:30) and remove the last comma. So execute again. And with that we have
selected few columns from our table. So we are more specific to what we need.
Okay. So that we have covered the two select and from next we're going to talk
about the wear clause that you can use in order to filter your data. So let's
(43:48) go. So what is exactly where? We use where in order to filter our data based
on a condition and any data that fulfill the condition going to stay in the output in the
result and the data that don't meet the condition will be filtered out of the results.
Condition could be anything like for example we
(44:09) say the score must be higher than 500 or you can say the country must be
equal to Germany. So any condition that you have in your question. Now let's see the
syntax in SQL. As usual we start with a select. We select the columns that we need.
Then we write from where the data come from and then after the from we're
(44:26) going to write the where and exactly after that you specify your condition. So
now let's see how SQL going to execute this. First SQL start as usual from the from.
So it's going to go and get your data from the database and after that SQL going to
go and execute the wear clause. So let's say that the
(44:42) condition should be higher than 500. And now what going to happen? SQL
going to check each row whether it meets this condition or not. So for example for
Maria she doesn't fulfill the condition because her score the 350 is not higher than
500. So she doesn't fulfill the condition and SQL going to go and remove
(45:00) completely this row this record from the results. Now SQL going to go to the
second record. So Joan is fulfilling the condition. So he going to stay in the result.
The same thing for George. Now moving on to the fourth one Martin. So this
customer is not fulfilling the condition and SQL going to go and remove
(45:18) it from the results. The same things happen for the last customer. The score
is zero and not fulfilling the condition. So that means if we apply this filter, SQL going
to return only two customers out of five. So with that we are filtering the rows based
on condition using the work clause. Now as you can see in the result we are getting
(45:36) all the columns but if you specify in the query like for example only two
columns like the name and the country then SQL going to start removing as well the
columns of the results. And this means in the output we will get only two columns
and two rows. So with that you are filtering the columns and the rows
(45:53) of your results. So now let's go back to scale in order to practice this. All
right. So let's have the following task and it says retrieve customers with a score not
equal to zero. So now if you are looking to our task you see we have like here a
condition. The condition says the score must not be equal to
(46:09) zero. So I don't want to see all the customers. I want to see only the
customers thus fulfill this condition. So it's like we have to filter the data. So let's go
and solve the task. Let's start as usual. Select star. There's no specifications about
the columns from our table customers. Okay. So I'm going
(46:25) to start with this. Let's go and execute it. Now if you look at the result, you
can see like almost all the customers are fulfilling the condition. Their scores are not
equal to zero. Only one. The last customer his score is zero. So this customer does
not fulfill our condition. Now let's go and build filter
(46:42) for that. So we're going to say where. And now there will be a section that is
only focusing on how to build conditions and filtering in SQL. So don't worry a lot
about the syntax of the conditions. We're going to cover that later of course but it is
very simple. Now for the condition we need a column. So in
(46:57) which column is our condition based on it's going to be on the score. So
we're going to write here score and since we are saying not equal there is like an
operator in SQL called not equal and then we have to write a value after that. It's
going to be a zero. So again the condition is like this. The score
(47:11) must not be equal to zero. It's very simple, right? And with that we have our
condition and we are using the where in order to filter the data. So let's go and
execute it. And now as you can see SQL did remove the last customer because he is
not fulfilling this condition. And we have now only the rows that fulfill
(47:28) our condition. So as you can see it is very simple how to filter the data. All
what you have to do is to write where clause after the from and then write a condition
after that. Now let's have another task like for example it says retrieve customers
from Germany. So I don't want to see all customers from
(47:45) different countries. I just want to see the customers that come from Germany.
So that means we have a condition here. Country of the customer must be equal to
Germany. So let's go and remove the current condition. It is not the one that we need
and execute. If you are looking to the results, we have two
(48:00) customers that come from Germany and we are interested only to show
those two customers. So let's go and make a filter for that. We're going to write
where clause and after that we need a column. The column going to be the country.
So we're going to write here country and this time the country must be equal to
(48:15) Germany. So we're going to write an equal operator. So we're going to write
Germany like this exactly like the value inside our data. But now as you can see we
are getting like an error here. And that's because in SQL if you want to write a value
that contains characters then you have to put it between two
(48:33) single quotes. So at the start you put a single quote and as well at the end.
And now as you can see the red line is away and the value now is red and that's
because it is a string value. It is a value that contains characters and with that you
will not get an error. So if your columns contains only numbers you
(48:49) can write it without single quotes. But if your values contains characters then
you have to write it between two single quotes. Okay. So now back to our condition
the country must be equal to Germany. Let's go and execute it. And it is working. So
as you can see now we are seeing in the output only the customers
(49:04) does fulfill my condition where the country is equal to Germany. So this is
exactly how we work with the wear clause in order to filter our data. So my friends
this is how you filter your rows. And now let's say that I would like to filter the rows
together with the columns. So I just want to keep the
(49:20) first name and the country and not interested to see the scores and the ids.
So in order to do that we're going to go to the select and list the columns that we
want to see. So the first name and after that a comma then the country and that's it.
So let's go and give it a push and execute it. So we have two rows
(49:37) and two columns. So guys as you can see SQL is very simple. All right. So
with that you have learned how to filter your data using the wear clause. Next we're
going to talk about how to sort your data using the order by. So let's go. Okay. So
what is exactly order by? You can use this type of clouds in order
(49:59) to sort your data. And of course, in order to sort your data, you have to
decide on two mechanism. Either you want to sort your data ascending from the
lowest value to the highest value or exactly the opposite way using descending from
the highest value to the lowest. And the syntax kind of looks like this. So as usual,
we start with
(50:17) the select and then from and after the from you can specify order by and with
that you are telling SQL we have to sort the data and you have to specify two things.
First you have to specify for SQL the column that should be used in order to sort the
results. So for example you can say score and after the
(50:33) column name you have to specify the mechanism. So for example you say
ascending from the lowest to the highest. And in SQL if you don't specify the
mechanism the default going to be ascending. So you will not get an error if you
don't specify anything after the column name. But my advice here is always to
specify something after the
(50:51) column easier because it's just straightforward and easier to understand and
if someone reads it can understand immediately it's going to be ascending because
maybe not everyone knows what is the default in SQL. So always specify a value
even if it's like easier to skip it and if you want to store the data
(51:09) from the highest to the lowest then you can specify descending. So as usual
SQL going to go and start from the from it's going to go and grab your data from
database. Then the second step is SQL going to go and sort the result. So the order
by going to be executed and SQL going to see okay I'm going to sort it
(51:24) by the score and using the sending mechanism and still going to go and start
like moving around your rows where the first row going to be the customer with the
highest score and in this example John has the highest score the 900. So John
going to appear as a first row at the result and that's because his
(51:41) score and after that the second highest is going to be George with 750 and
SQL going to go and keep sorting the data and then we have 500 then 350 and the
last row going to be the customer with the lowest score the zero. So this is how SQL
executes your order by. Now let's go back to scale in order to
(52:00) practice. All right. So now we have the firming task and it says retrieve all
customers and sort the result by the highest score first. So now by looking at the task
we need all the customers. So there is like no conditions or anything to filter but we
have to sort the results. So let's go and do that.
(52:16) We're going to start as usual by selecting all the columns from the table
customers. So now if you go and execute it you will get all your customers and you
are now seeing the data exactly like stored in the database. And you can see the
result is not sorted by the scores. So we have here a low score then high
(52:32) score then low and so on. Now the task says we have to sort the results. So
we have to go and use the order by and now you have to understand from which
column and we can get that from the task. So it says it should be sorted by the
score. So we're going to go and define the score here. And the final thing that you
(52:49) have to define is the mechanism descending or ascending. And you can get
it as well from the task. So we have to sort the data by the highest score first. So the
highest first and then the lowest. So that means we're going to go and use the
descending. So that's all. Let's go and execute it. Now as you can
(53:04) see in the results, the first customer has the highest score. Then we have the
second one with the second highest until the last one with the lowest score. That's it.
This is how you sort your data. And with that we have solved the task. Now let's do
exactly the opposite. So we want to sort the results by the
(53:20) lowest score first. So that means we want to see first the customers with the
lowest score like here in this example we should see the ID number five as the first
because he has the lowest score the zero. Now in order to do that all what you have
to do is to switch the mechanism instead of descending when you
(53:35) can use ascending. Let's go and execute it. And that's it. As you can see now
we have the lowest score then the second lowest score until the last row. It's going to
be the customer with the highest score. So the lowest score comes first. So it is very
simple. This is how you sort your data using SQL. And now I'm going to show you
one
(53:56) more thing that you can do with the order by. You can sort your data using
multiple columns. And we call it nested sorting. So now let's take this very simple
example where you want to sort your data using country. So we are saying order by
the column country and the mechanism going to be ascending. So
(54:13) from the lowest to the highest. Now if you do that going to go and sort the
data this time based on the country. So we're going to have like the first two
customers from Germany. It is sorting it alphabetically. Then we have the UK and the
last two going to be from USA. Now if you are checking the final results
(54:28) you might say you know what there is like something wrong. The data is not
completely sorted correctly. So if you are looking to the first two customers that come
from country Germany. You can see the scores are sorted in ascending way from the
lowest to the highest. So first we have 350 then 500. Then UK it's
(54:44) fine because we have only one customer. Now if you look to the customers
from USA you see that it is like sorted the way around. It is sorted descending from
the highest to the lowest. So first we have the score 900 then zero. So there is like
no clean way on how the data is sorted and the result is not really
(55:02) clean and this issue happens usually if you are sorting your data based in a
column that has repetition like here the country we have twice Germany and twice
USA. So now in order to refine the sorting and make it more correct, we can include
in the sorting another column in this scenario for example the score. So
(55:20) we can make a list of columns in the order by and we can separate them
using the comma. And of course you can have different mechanism for each column
like for the country we are saying it is ascending but for the score we say you know
what let's make it descending. It will not be only one for all columns. So
(55:34) now what can happen is we're going to start sorting the data for each
section. So for the two customers from Germany the sorting going to be from the
highest to the lowest. So it's going to go and switch the two customers. So Martin
going to be first because he has higher score than Maria. And with that we are
(55:50) refining the scores based on the same value of course the country. Now for
the UK nothing going to happen because we have only one value and for the USA as
well nothing going to happen because it is already sorted in the correct way from the
highest to the lowest. So as you can see if you are including a
(56:06) second column you are refining your sorting and as well my friends the order
is very important. So this is how you can do nested sorting in SQL. Let's go back to
our SQL and start practicing. All right so now we have the following task and it says
retrieve all customers and sort the results by the country and
(56:23) then by the highest score. So again we need all customers. So select
everything from customers table. And now the task says we have to sort the result by
the country. So we're going to start with the order by and since it says by the country.
We're going to go with the country and we're going to sort it
(56:40) alphabetically. So it's going to be ascending. So let's go execute it. Now you
can see the data is sorted completely differently by the country. So we have first
Germany, UK and then USA. But that's not all and says then by the highest score.
So we have to go and include another column in the sorting
(56:57) and we can go and add that by adding a comma and then mention another
column the score and now we have to specify the mechanism. It says by the highest
score. So the highest must come first and with that we are using descending. Now
what is the current situation in that? If you look to the results for example for
(57:13) those two customers we have 350 and then 500. So that means the scores
are sorted ascending right the same thing for USA. So from the lowest to the highest.
Now if you go and do it like this what going to happen it's going to go and switch it.
So you can see over here now for Germany first comes the highest the 500
(57:31) and then the 350 and for USA as well they switched. So we have the highest
and then the lowest and with that we have solved the task. Now again the order of
those columns are very important. So since the scores comes after the country we
will not get the highest scores first at the results. So we will not get the 900 as a first
row.
(57:49) And that's because the scores must be sorted after the country. So the
country has more priority. Now if you go and flip that. So let's go over here and says
sort first the score and then the country. So let's go and execute it. It's called has first
to sort the scores. So with that you will get the
(58:07) 900 first, right? And then the countries. And since there is like no duplicates
in the scores, this makes no sense at all. So you can go and skip it. So nested
sorting only makes sense if you have repetition in your results and you can use the
help of a second column in order to make the sorting perfect. So
(58:25) that's it and with that of course we have solved the task. All right. So with that
you have learned how to sort your data using order by. Now in the next step we're
going to talk about how to aggregate and group up your data using group by and
we're going to put it between the where and the order by
(58:41) because in the order of the query the group by comes between the where
and the order by. So let's go. Okay. So what is exactly group by? It's going to go and
combine the rows with the same value. So it's going to go and combine and smash
press your rows to make it aggregated and more combined. So all what group by
does it aggregates a
(59:03) column by another column. Like for example, if you want to find the total
score by country. So you aggregate all the scores value for one country. If you have
this kind of tasks, then you can use the group I. Let's see the syntax of that. We will
start as usual with the select. And now what we want to see in
(59:21) the result is two columns. So we have to specify like a category like the
country. This is the value that you want to group the data by. and another one where
you are doing the aggregations. So for example you are saying I would like to see
the total score. So we use the function sum in order to summarize the
(59:37) values of the score. After that as usual we use the from in order to select the
data from specific table. And now comes the magic we use after the from group by.
And now understands okay I have now to combine the data. I have to group up the
data by something. And this time we are saying you have to group up the data
(59:54) by the country. So that means each value of the country must be presented
in the output only once and for each country we want to see the aggregation and that
is the total score. So let's see how is going to execute it. So it's going to first start
with the from it's going to go and get the data from the database
(1:00:10) and then it's still going to execute the group by and now scale understand
okay I have to group up now the data by the country and it understands it has to
aggregate the scores for that. So it's going to go and identify the rows that are
sharing the same value. Like for example here we have two rows for
(1:00:26) Germany and it's going to bring it to the results. So now we have two rows
for the same country but since we are saying group by country SQL going to try and
combine them smash them together in only one row. So each value of the country
must exist at maximum once. We cannot leave it like this. So now what we going
(1:00:43) to do with the scores? We have two scores. Now SQL going to check the
aggregate function. It is the summarization. So, and it's going to go and add those
values 350 + 500. And with that, we're going to get the total score of 850. And with
that, as you can see, scale is combining those two rows into one. So, in the output,
Germany will
(1:01:01) exist only one. And about the scores, we will get the total score. And the
same thing going to happen for the next value. In the country, we have the USA. We
have it twice. So, we're going to get two rows. And scale going to combine those two
rows in one because USA must exist only once. And with the scores we
(1:01:17) will have the total scores. So 900 plus zero we will get 900. And with that
it's still converted those two rows into one. And for the last value in the countries we
have the UK. It's going to stay as it is. There is no need to smash and combine
anything because it's already one value. So my friends if you are
(1:01:35) looking to the output you can see we grouped the original data by the
country. And that means we're going to get one row for each value inside the country
column. So my friends the original data you have five rows in the output if you are
using group by like this you will get only three rows. So this is exactly how the group
by works.
(1:01:53) Let's go back to scale and practice. Okay. So we have the following task
and it says find the total score for each country. So from reading this you can
understand we have to do aggregations and we have to combine the data by a
column. So now usually I start like this. I start selecting the columns that
(1:02:09) I need in order to solve this task. So what do we need? We need the
country and score from our table customers. So let's start like this. Now you can see
we have the countries and the scores. And the task says we have to group up the
data by the country. So that means this is the column where we're going to do the
(1:02:27) group by and the total scores will be aggregated. So what we have to do?
We're going to use the group by since it says for each country. We're going to use it
over here. Group by country. And now we have to go and aggregate the scores. We
cannot leave it like this. So we're going to say the sum of the score. So
(1:02:42) let's go and execute it. And with that, as you can see, we are getting the
total scores for each country. So now instead of having five customers, we have only
three rows now. And that's because the countries has three rows. And now if you
check the result, you can see something weird. It says no column name. And
(1:03:00) that's because we have changed the scores. It's not anymore the original
score. It is it is the total scores. We have summarized those values. So SQL don't
know how we going to call it. So those values doesn't come directly from the
database. It is manipulation that you have done here. Now in order to give
(1:03:18) a nice name for that we can go and add aliases. An alias it is only like a
name that lives inside your query. So we can do it like this as and you can specify
any name you want like for example total score. And now scale can understand okay
this is the name for this column and if you go and execute it you will see the
(1:03:36) new name in the results. But you have to understand this name exists only
in this query. You are not renaming anything inside your database and you cannot
use it in any other queries. It is just something that is known inside this query and
only for your results. And of course you can rename anything any
(1:03:53) column like for example here you can say this is the customer country and
if you execute it you are just renaming the column in the output. So this is really nice
in SQL. Okay. So now there is like one more thing about the group I the non-
aggregated columns that you are adding in the select must be as well
(1:04:10) mentioned in the group I. So now for example let's say that okay I'm seeing
now the countries the total scores I would like to see as well the first name. So you
go over here and say you know what let's get the first name. So country first name
the total scores and execute. You will get an error because
(1:04:26) it's going to tell you I need only the columns that you want to group the
data by or should be aggregated. So now the first name it is not aggregated and as
well not used for the group I. So it is just here to confuse SQL and it will not work. So
if you bring a column either it should be in the aggregation or it
(1:04:45) should be part of the group I. So in order to fix this and you really want to
see the first name you can go over here and say you know what let's add it to the
group I and execute. This time it going to work because all the columns that are
mentioned here is as well part of the group I. So now as you can see we
(1:05:01) have the countries the first name and the total scores and you can see
again we have five rows we don't have three rows and that's because now you are
combining the data by the country and as well the first name and now you can see in
the output we are getting five rows we are not getting anymore the three
(1:05:18) rows the three countries and that's because SQL now grouping the data by
two columns the combination of the country and the first name and those two
columns gives five combinations and that means you will get five rows so that means
you have to be really careful what you are defining in the group I and the number
(1:05:34) of the unique values that those columns are generating going to define the
output the results. So if you go and remove the first name and from here as well you
are grouping by only one column and this column has only three values and that's
why you are getting three rows and with that of course we have
(1:05:51) solved the task and now let's extend the task and say find the total score
and total number of customers for each country. So that means we need two
aggregations. We have the total score and as well we need the total number of
customers. So from reading this you can understand we still want to group up the
(1:06:07) data by the country but this time we need two type of aggregations. We
need the total number of customers and the total scores. So we have almost
everything but what is missing is the second aggregation. Now what you can do you
can go over here and add another aggregate function called the count. And
(1:06:23) what we want to count is the number of customers. So we can go and add
the ID over here and call it total customers. So now of course SQL going to So now if
you go and execute it, you will get as well the total customers by the country. And
now as you can see SSQL has no problem with the ID and that's because
(1:06:41) you are aggregating the ID. So SQL know what to do with it and how to
combine it. So that means you don't have to mention the ID in the country because
you are aggregating it. So that's all with that we have solved as well the task. All
right. Right. So with this you have learned how to group up your data
(1:06:56) using the group eye. Next we're going to talk about another technique on
how to filter your data but this time using the having clause. So let's go. All right. So
what is exactly having? You can use it in order to filter your data but after the
aggregation. So that means we can use the having only after using the group I.
(1:07:18) So let's see the syntax of that. So again like the previous example we are
finding the total score by country. So we have our select from group I and now you
say you know what I would like to filter the end results and in order to do that we use
the having after the group I and now like the wear clause you
(1:07:36) have to specify a condition. So we have the following condition where we
want to see in the results only the countries if their total score is higher than 800. So
this going to be our condition. So now you might noticing something with the group
by we are using the country the column where we are grouping the data by
(1:07:54) its value but with the having we are using the aggregated column the sum
of the score. So this is how the syntax works and now let's see how is going to
execute it. So as usual SQL start with the from we are getting our data and then the
second step is going to go and aggregate the data by the country. So
(1:08:11) it's like before going to group the rows with the same value of the country.
So we're going to have one row for each country and this is what going to happen if
you use group I and with that we have now aggregated values right and after the
group IQL going to go and execute the having. So having it is like a
(1:08:28) filter. Now we have a nice condition the total sale must be higher than 800
and SQL going to go and check the new results after the aggregation. So in
Germany we have the total sales of 850. So it meets the condition and it going to
stay in the results. The same thing for USA it is higher as well than 900s
(1:08:44) but for UK it is not meeting the condition 750 it is not higher than 800 and
SQL going to go and filter out this row so that means after applying the having we
will get only two countries because they have values that is fulfilling the condition
and that's it is what can happen if you are using having it is simply filtering the data
(1:09:05) but now you might be confused you say you know what we have used the
wear clouds to filter the data so why we have in SQL another cloud how to filter my
data. Can't we just use the where? Well, in SQL there are like different ways on how
to filter your data based on the scenario. So now let's go and add both
(1:09:21) of the filters in my query. We are already using the having after the group I
and now let's go and add the wear. Usually the wear comes between the from and
the group I so directly after the from. And here we are saying the score must be
higher than 400. So now we are filtering based on the scores twice,
(1:09:37) right? Once we are saying the score higher than 400 and by having we are
saying the sum of score must be higher than 800. So what is the big difference? It is
when the filter is happening. If you want to filter the data before the aggregation you
want to filter the original data then you can go and use
(1:09:54) the wear clause. But if you want to filter the data after the aggregations
after the group by then you can go and use having. So it's really all about when the
filter is happening. So let's see how is still going to execute this. So as usual first the
from going to be executed to get the data. Then after
(1:10:11) that the second step the wear going to be executed. This is our first filter.
So SQL going to filter the data using where before doing any aggregations and
based on our condition the first customer will be filtered out because score is less
than 400 and the same thing for the last customer. Now after
(1:10:28) the applying the wear clouds we will get only three rows only three
customers. And now next SQL going to go and execute the group by. So it's still
going to go and group the data by the country. So now we have fewer data to be
combined. So the values will not be summarized because we have only one row for
each
(1:10:44) country. Now after the data is aggregated by the group by then SQL going
to activate the second filter having. So the next step is going to execute the having
and here SQL going to filter the new results based on the total scores and still going
to check one by one. So, USA is meeting the condition. UK going to be filtered out
(1:11:02) because it is not higher than 800. And this time Germany as well will be
filtered out because this time it is not fulfilling the condition. In the previous example
without the wear, we had more scores for Germany. That's why it passed the test.
But this time since we filtered a lot of customers using the
(1:11:18) wear, Germany will not have enough scores pass the second filter. So with
that in the output we will get only one row and that's because we are filtering a lot of
data. So it is very simple where going to be executed before the group by before the
aggregations having going to be executed after the group by
(1:11:34) after the aggregations. So now let's go back to scale in order to practice.
Okay. So now we have very interesting task find the average score for each country
considering only customers with a score not equal to zero. So it sounds like condition
and return only those countries with an average score greater
(1:11:51) than 430. So this is again another condition. So I know there is a lot of
things that's going on. Let's do it step by step. Usually I start by doing a very simple
select statement with the columns and data that I need. So let's start with a simple
select. So what do we need over here? We need a score. We need a
(1:12:08) country. Again we need a score country. So all what we need is two
columns. Now I'm going to go and select the ID just to see the customer ID. Then
let's go and get the country score from our table customers. So let's go and query
that. So now as you can see I start with the basics. Query the data and then build up
(1:12:25) on top of it the second step. Now what do we have in the task? We have to
find the average score for each country. That means we have to do some
aggregations. And here we have two conditions. The first condition says we need
only the customers with a score not equal to zero. And the second one we need only
(1:12:40) the countries with an average score greater than 430. Now you have to
decide for each condition whether you're going to use the where or having. Now for
the first one we want to filter based on the scores. So that means we want to filter
before the aggregations. It's not saying the average score. It's saying the score
(1:12:59) itself. So that means we can use for this a wear condition. Now about the
second one it says countries with an average score greater than 430. That means
we want to filter the data after aggregating the score. So that means for this
condition we have to use the having. Now what I would like to do is
(1:13:16) to implement the first condition. It's very simple. We're going to say where
after the from the score is not equal to zero. So let's go and execute it. And with that
we don't have any customers where the scores is not equal to zero. So that we have
solved this part. But now for the second condition first we
(1:13:34) have to do the aggregations. So we're going to start with the average
score. We're going to go over here and say average and we're going to call it
average score. Now we don't want to see only the average score. We want to see
the average score for each country. So that means we have to aggregate by the
(1:13:49) country and for that we use the group I group by comes always after the
wear clause. So group by and which column? It's going to be the country. So country.
Now there is like an issue here. You cannot execute it like this. We have to go and
get rid of the ID. We don't need it at all. So let's go and
(1:14:06) execute it. So with that we have the average score for each country and we
have solved the first part. So that means the first and the second part they are
completed. Now we're going to talk about the last part. The average score must be
higher than 430. And for that we're going to use the having and having
(1:14:23) comes after the group by. Now we need to specify the condition. It must be
the aggregated column. So we're going to take the average score from here and put
it after the having and it should be greater than 430. So that's it. With that we have
the last part as well. Let's go and execute it now. And with
(1:14:40) that my friends we have filtered the data after the aggregation. So this is
how I decide between the where and having. It is very simple. All right. So with that
you have learned how to filter the aggregated data using the having. And now next
we're going to go back to the top where we can use there the
(1:14:56) keyword distinct exactly after the select. So let's go now and learn about
the distinct. Okay. So what is exactly distinct? If you use it in SQL, it's going to go
and remove duplicates in your data. Duplicates are like repeated values in your data
and it's going to make sure that each value appears only
(1:15:16) once in the results. So it sounds very simple and as well the syntax is easy.
So as usual we start always with a select but directly after the select we use the
keyword distinct. So there is nothing between them and then the normal stuff we
specify the columns and then the from in order to get the data from
(1:15:34) table. Let's say that I would like to get a list of unique values of the country.
So the first thing that SQL going to do of course is to get the data from the database
using the from. And now the second step is the select. So SQL going to execute it
and going to select only one column the country. All
(1:15:48) other columns going to be excluded and removed from the results. And
now SQL going to go to the third step. It's going to go and apply the distincts on the
country values. So it acts like a filter where it going to make sure each value
happens only once. So it's going to start with the first value Germany.
(1:16:04) Now it's going to look to the results. Do we have Germany? Well, we don't
have anything yet. So that's why it's going to include it in the results. Then the next
value is going to be USA. The same thing. We don't have USA in the results. So it's
going to go and include it. And this happens as well for the UK. We
(1:16:18) don't have UK in the final results. That's why it's going to go as well
included. Now comes Germany again. Now it's going to say wait, we have it already.
So it will not go and add it again in the output because it must appear only once. So
we will not have Germany twice. And as well for the last value the USA we have it
already in the
(1:16:36) results that's why it will not appear again and with that we have removed
the duplicates or the repetition inside our data. So each value is unique. Now let's go
back to SQL. Okay that task is very simple. It says return unique list of all countries.
So let's go and do that. It's going to be funny. So select and
(1:16:54) now let's get the column country from our table customers like this. Now
you can see we have a list of all countries but the task says we need a unique list.
So that means I cannot have here repetitions inside it. And with that we're going to
use the very nice distinct. So if you do it like this let's go and execute. You will see
there
(1:17:13) will be no duplicates in your results and all the values in the result going to
be unique. So with that we have solved the task. It's it's very simple. Now there is
like one thing about the distinct that I see a lot of people using it a lot in cases that
it's not really necessary. So for example, let's
(1:17:27) go and get the ID. Now if you go and execute it, you can see here we have
a list of all ids and there are no duplicates. But now if I go and remove the distinct
and executed, we will get the same results because the ids are usually unique. So it
really makes no sense to go and say distinct because as
(1:17:44) you can see the database has to go and make sure each value happens
only once. So there's like extra work for the SQL and it is usually an expensive
operation. So if your data is already unique, don't go and apply distincts. Only if you
see repetitions and duplicates and you don't want to see that only in this scenario, go
and apply
(1:18:01) the distinct. Don't go blindly for each query applying distinct just in case
there is duplicates. This is usually bad practices. Okay. So that's all for distinct. Okay
my friends. So with that you have learned how to remove the duplicates using the
distinct. In the next step we're going to talk about
(1:18:17) another keyword that you can use together with the select. You can use top
in order to limit your data. So now let's go and understand what this means. Okay.
So what is exactly top or in other databases we call it limit. So it is again some kind
of filtering in SQL. If you use it, it's going to go and
(1:18:36) restrict the number of rows returned in the results. So you have a control
on how many rows you want to see in the results. The syntax is very simple as well.
Directly after the selects you're going to use the keyword top and then you specify
the number of rows you want to see in the results. So for example
(1:18:53) three and then only after that you specify the columns that you want and
then from which table. Now let's see how going to execute it. So as usual the from
going to be executed we will get our data and then the second step is going to go
and select the columns. In this case all the columns going to stay
(1:19:08) and then after that it's going to execute that top. So how it works? It's very
simple. For each row in database, we have a row number. It has nothing to do with
your data with the ids. For example, here like in the current result, we have row
number 1 2 3 4 5. Those numbers are not your actual data.
(1:19:24) It is something technical from the database. So it is not equal to the ids.
For example, the ids is actually your content your data. So here we are not filtering
based on the data based on the row numbers. So since here we have defined three
SQL going to count. Okay. row number one 2 three and that's it. So
(1:19:41) it's going to make a cut and all the rows after number three they will be
excluded from the results and you will get only the three rows at the results. So now
as you can see this type of filtering is not based on a condition or something it's just
based on the row numbers. So whatever results you have in
(1:19:58) your data it will go and make a cut at specific row. So let's go to scale and
practice that. Okay. So now we have a very simple task. It says retrieve only three
customers. So let's go and do that. We're going to go and select star from our table
customers and execute it. Now as you can see in the output we have
(1:20:16) five customers. But the task says we want only three. And there is no
specifications at all about any condition. So I don't have to go and make a work
clause where we write a condition based on our data. We just want three customers.
So we can do that very simply by just adding top exactly after the select and then
specify the
(1:20:34) number of rows you want to see from the output. So select top three and
then the star. Let's go and execute it. And with that we are getting three customers.
That's it. It's very simple. All right. Now moving on to another task. It says retrieve the
top three customers with the highest scores. Now of course this
(1:20:51) is like a mix between ordering the data and filtering the data. Right? So we
usually sort the data by the scores from the highest to the lowest. But now it's like we
are doing both together. So let's do it again step by step. I will just back to the select
star from customers. Now what we can do we can go
(1:21:07) and sort the data by the score from the highest to the lowest using the
order by so order by score and then descending. So let's go and execute it. And now
you can see the first customer is with the highest score and then the second highest
and so on. Now I think you already got it in order to get the top
(1:21:23) three customers with the highest scores. What you have to do is to just go
over here and say top three and execute it. And with that you have now a really nice
analyzis on your data. It's like a reports where we are finding the top customers with
the highest score. So this is really amazing and very easy. So
(1:21:41) as you can see mixing the top with the sorting the data you can make top
end analyzes or bottom end analyzers. So let's have this task retrieve the lowest two
customers based on the score. So now we want to get the lowest scores in our table.
And in order to do that is very simple. What we're going to do we're
(1:21:58) going to flip that. So we're going to sort our data based on the scores
ascending from the lowest to the highest. And since we want only the lowest two
customers, we're going to replace the three with a two and execute it. And with that,
we're going to get at the lowest two customers. It is Peter and Maria. They have the
lowest scores.
(1:22:12) Again, it's very easy. Okay, this is fun. Let's go to the next one. Get the two
most recent orders. Well, this time we are speaking about another table. Let's go and
select everything from the table orders like this. So now, as you can see, we have
here four orders and we want the two most recent orders. So most
(1:22:30) recent means we have to deal with the order dates and we can build that
by sorting the data by the order dates. So order by order dates and since we are
saying the most recent orders so from the highest date to the lowest that means
descending right let's go and execute it and as you can see based on
(1:22:49) our data and now we can look to our result this is the last order in our
business based on the order age and this one is one of the earliest orders. So with
that we have sorted the data and since we want the two most recent orders we go
over here and say we go exactly after the select and say top two and
(1:23:05) execute and with that we have now the last two orders in our business. So
as you can see combining the top with the order by you can do amazing analyszis.
All right so this is how you limit your data using top and with that you have learned
the basics everything that you can learn and with that you have learned
(1:23:22) all the clauses the sections that you can use in any query in SQL. Now
next what we're going to do we're going to put everything together in one query in
order to learn how SQL going to go and deal with all those clauses and how SQL
going to go and execute it. So let's go and do that. Okay. So now I'm going to show
you
(1:23:42) the coding order of a query compared to the execution order that happens
in the database. So the coding order of a query starts always with a select and then
exactly after that you can put a distinct and then after the distinct you can put a top.
So this is the order of all those keywords and then you can go
(1:24:00) and select like few columns and after you specify the columns separated
with a comma you tell SQL from which table your data come from using the from
clause. Now after that if you want to filter the data before the aggregation you can
use the where clause and this always comes directly after the from. And if you want
(1:24:17) to group the data then you have to do it after the wear clause using the
group by and after the group buys comes the having if you want to filter the data.
And the last thing that you can specify in query it is always the order by. So this is
the order of all those components of the query. And if you
(1:24:33) don't follow this order you will get an error from the database. Now if you
look to this query there are a lot of things that's going to filter your data. So let's
check them one by one. The first thing that you can do is to filter the columns. If you
don't want to see all the columns, you want to see only
(1:24:47) specific columns, you use the select and of course you must use it. So the
columns that you specify will be shown in the results. So it's like filtering the columns.
Now there is another type of filter where you filter out the duplicates if you want to
see unique results and that's using the distinct.
(1:25:02) So this is another type of filter. Moving on, we can filter the result based on
the row numbers. So we can limit the result using the top. But this type of filter
doesn't need any conditions. It's purely based on the row number in the results. Now
moving on, if you want to filter your data based on conditions based on your data,
you can
(1:25:20) filter the rows before the aggregation using the wear clause. And the last
type of filtering, you can filter your rows after the aggregation using the having. So as
you can see, we have like five different types and how to filter the results in SQL. So
now let's see the execution order. As we learned the first
(1:25:38) thing that's going to happen is that SQL going to execute the from clause.
So SQL going to go and find your data in the database where all the next steps going
to be paste on this data. Now the next step that is going to do is that it's going to go
and filter the data using the wear clause. This has to be happen
(1:25:55) before anything else. So before any aggregations and so on we have to
make scope of the data. So once SQL apply it maybe some of the rows going to be
removed and once the data is filtered the third step SQL going to execute the group I
so going to take the results and start combining the similar values in
(1:26:12) one row and start aggregating the data based on the aggregate function
that you have specified. So now after the group by after aggregating the data what is
going to do now it's going to go and apply the second type of filter the having. So
based on the condition the SQL going to go and start removing few
(1:26:27) aggregated data away and keep the rest. Now moving on to the step
number five. Finally it's going to go and execute the select distinct. So SQL going to
go and start selecting the columns that we need to see in the results and remove the
other stuff. And once the columns are selected SQL going to go and execute the
(1:26:44) order by. So SQL going to start sorting the data based on the column that
you have specified and the mechanism as well. So the data will be sorted differently.
And my friends the last step that going to happen in your query will be always the top
statements. So based on the final final results SQL going to go and execute the top.
So here
(1:27:04) we are saying top two that means we want to keep only the first two rows
without any conditions. So SQL going to count okay row number one two and after
that it's going to make cuts and remove anything after that. So this is the last filter
that's going to happen and as well the last step. So now if you sit
(1:27:20) back and look at this the coding order is completely different than the
execution order in the coding we have first to specify the select actually the select
going to be executed just almost at the end. So at the step number five and once you
understand how SQL execute your query you can understand how to
(1:27:36) build correct queries. So now the first thing that we have learned that we
can go and have like one query right something like this select star from customers.
Now this is one query and in the output we have one results but did you know that in
SQL we can have like multiple queries and multiple results in one go. So we can do
(1:27:59) everything together like for example let's say I'm selecting as well the data
from orders. So that means we have two queries and now if you go and execute
what can happens you will get two result grids. The first result grid is for the first
query and the second one is for the second query. So with that you can
(1:28:18) do multiple queries in the same window and with that the results can be
splitted into multiple window depend how many queries you have and usually in SQL
you might find that by the end of each query there is a semicolon like this. So at the
end of the first query we have semicolon and for the second query we
(1:28:35) have as well at the end another semicolon. For the SQL server it is not a
must but for other databases if you have multiple queries in one execution you must
separate them with a semicolon and with that the database can understand okay this
is the end of the first query and this is the end of the second query. So you have like
(1:28:54) separations between queries. Okay. Now moving on to another cool thing
in SQL. Now what if we don't want to query the data inside our tables, we would like
to show a static value from us from the one that is writing the query. And this is very
practical. If you are like practicing and you want to check something using a
(1:29:15) value from you, not from the tables. So how we can do that? It is very
simple. We're going to write select and then now after that instead of having a
column name you can go and add any value like 1 2 3. So it is just a number and we
do not specify after that any table. So we leave it like this. Select 1 2 3 and we
(1:29:34) don't need to use the from close. So now if you go and execute it you will
get 1 2 3. So this is a static value. And of course you can go and rename the column
like static number. So execute it again. So with that we have a static value. And you
can go and add anything like string as well. So let's say hello as static
(1:29:53) for example string. So let's go and execute. Now we have two queries. The
second one you can see our static value. Hello. So in queries we can add values
from us. Not only selecting data from the queries but of course you can go and mix
stuff. So we can have like in one query data from the database and static
(1:30:10) data from us. So let me show you what I mean. Let's go over here and say
select and let's go and get for example the ID the first name from the table
customers like this. So with that we can see we are getting data from the database.
But now I can go and add something from me new customer and we can call it
customer
(1:30:29) type. So now what is going on here? Two columns from the database and
one column from us. It is the static one. So if you go and execute it, you can see for
the ID and the first name those data comes from the database. But for each record
we are always getting the same static value new customer, new customer and so
(1:30:47) on. So this piece of information comes from the query. It is not stored
inside the database and those two informations come from the stored data inside the
database. So this is really cool thing. You can add few informations from you and you
can get the data from the database. This is the static values. Okay. One more cool
thing that I
(1:31:08) want to show you that if you have a query like this you are selecting from
table and filtering the data and now you would like not to execute the whole thing.
You would like to execute only a part of this query. So now sometimes as you are
writing a query, you don't want to execute the whole thing. You want to
(1:31:23) execute only part of the query. Like for example, I would like to see all the
customers again in this query without this filter. So instead of removing it and then
query and then again adding it, what you can do, you can highlight what you want
without now the filter and execute. So without the database going
(1:31:39) to execute exactly what you highlighted. And now as you can see I'm
getting all the customers without the filter. And if you don't highlight anything and
execute, what's going to happen? It's still going to execute the whole thing inside the
editor. And this is really nice if you want to query another table
(1:31:53) quickly in the same editor. Like we want to select everything from the
orders just quickly. So you can highlight only this query and execute. And with that
SQL is ignoring everything else and only executing what I'm highlighting. And this is
really nice. It gives us like speed and dynamic. And you're going to
(1:32:08) find me doing that a lot in the course. So this is really nice. Okay. My
friends. So with that we have learned the basics about SQL query. the basic
components of the select statements and with that you can talk to our database in
order to get data. Now in the next chapter we're going to learn how to
(1:32:24) define the structure of our database. So we're going to learn the data
definition language DDL. So let's go. Okay. So usually if you have like an empty
database what you want to do is to go and define the structure of your data. So one
of the first things that we usually do is we go and create a new
(1:32:43) tables. So here we have a command called create and if you use it you can
create a new object inside the database like for example a table. So once you
execute it you're going to get brand new table and usually the table going to be
empty without any data. So it is very simple. This is what the create command does.
(1:33:00) And now let's go to SQL in order to create a new table. So my friends we
have the following task. Create a new table called persons with columns ID person
name birth date and phone. Okay. So this time we will not start by select we will start
with the command create table. So we are telling SQL to create a
(1:33:18) table and after that we have to define the name of the table. So in this task
we have to call it persons. Now we have to go and open two parenthesis like this and
in between we have to define the columns. So what do we need? First we need an
ID. So this is the first column name. And next we have to define which
(1:33:35) data type for this column. It's going to be an int. So it is a number does not
contain any characters. And now next we can define some constraints and we
cannot have a person without an ID. So it should not be in null. So not null. This is
the first column. So we have defined the name of the column, the data
(1:33:50) type and the constraint. Okay. So let's go to the second column and here
we're going to have a comma and the next one name going to be person name. So
this is the column name and the person name we can have. And now the data type
for this column it going to be a varchar because the person name contains
characters. So
(1:34:06) vchar. And now we have to define the length. So I'm going to go with 50
characters. And now I would say this is a must. So each person should has a name.
So we're going to say not null as well. So that we have the name, the type and the
constraint. Now let's move to the third column. It's going to be birth
(1:34:22) date. Now which type of informations we have inside the birth date? So it's
going to be a date, not a number, not characters. So we're going to go with the data
date. And now about the constraint well depends. I would say in our application it is
an optional because this is very personal information and maybe some persons will
(1:34:40) not provide their birth dates. So this is an optional and I will not say it is not
null. So nulls are allowed. Now let's move on to the next one. It's going to be the
phone. So now what is the data type of a phone? Well we have some types numbers
we have characters special characters. So we could have
(1:34:56) anything. So that's why I'm going to go with the farchar. And here you can
specify the length that you think it's okay. I'm going to go with 15. Now of course
depend on the system that you are building. I would say the phones are very
important in order to validate whether this is a real person. So we're
(1:35:09) going to say not null. So we are not allowing nulls in this field. Perfect. So
with that we have covered all the columns that are required. We have defined the
data types and as well the constraints. Now the last thing in each database table we
should has a primary key in order to make sure this table has
(1:35:27) an integrity and maybe as well connectable to other tables. So now what
we're going to do, we're going to go and add the primary key constraint, comma, for
the last column. And then we're going to say constraint. Now we have to give a
primary key name. This is only going to be visible for the database. So
(1:35:42) I'm going to call it PK for primary key. And here persons and then after that
we're going to say primary key. And between two parentheses, we're going to go and
pick which one is the primary key. And of course, it's going to be the ID. So we're
going to go over here and say ID. So again, we are saying there is
(1:35:58) a new constraint. This is the name of it. It's only internal for the database.
And then we are saying this one is a primary key on the field ID. So that's it with that.
We have defined a primary key for our table. Let's go and execute it. So as you can
see it is successful. Let's go and check our database for our
(1:36:14) new table. So if you don't see it already, you have to right click on the
database and then go and refresh. So let's go to tables and now we have a brand
new table called persons. So with that we have created our new table. Now of
course for the DDL commands you will not get results or data. All what you're
(1:36:30) getting is a message from the database and the message says here the
command completed successfully and then we have a date when this is completed.
So that means the DDL command will never return data. It is changing the structure
of your database. It's not about retrieving any data and so on. So this command did
(1:36:48) change something in our database and in this scenario it created a new
table and that's why we call this data definition language DDL because we are
defining the database. Now of course if you go and say select star from our new
table persons. So let's go highlight it and then execute it. You will see we are
(1:37:07) getting of course the columns. So the ID, the person name, birth date, the
phone but we don't have any rows that means our table is empty. Now what is very
important to that you go and save those informations in an SQL script because
maybe later you have to redefine this table but let's say that you have
(1:37:25) created different queries and you have lost the script and now I would like
to see again the create statements for this table well there is trick for that if you go to
the left side you see the persons right here right click on it and then you have here
script table as and now we have here different options that
(1:37:41) you can run on the table and the first one says create two Then let's go to
new query editor. So now what happened? The database did read the metadata
informations about the person and created your DDL query with many extra stuff that
we haven't done. But this is the template that the database uses. So
(1:37:59) now we can see a lot of stuff. But what is interesting is this create table. So
we can see create table the schema DBU the default one then the persons and then
we have our columns the data type and as well the constraints. So with that you got
back your DDL statements and many other stuff about the table
(1:38:17) which is now not interesting. But now what I really need is to see the create
statements about this table. So this is how you can get back your DL command. But
of course what I recommend you is always put your code inside a get repository and
always keep it up to date. So that always you can check your
(1:38:33) work and extend it. Okay. So now what else you can do with the structure
of your database? If you have already a table, what you can do, you can go and edit
and change the definition of the table. So for example, let's say I would like to add a
new column. In order to do that, we can use the command alter. Alter means you
want
(1:38:54) to edit the definition of your table and you want to change it like adding new
column or maybe changing the data type and anything in the definition of the table.
So the alter command, you can use it in order to change the definition of your table.
And now let's go back to scale and try to change something. All
(1:39:10) right. Now the task says add a new column called email to the person's
table. So it is very simple what you can do. We can use the alter table command. So
we are not creating new table. We want to edit already existing table. So which table
we want to modify it's going to be the persons. So we are telling SQL
(1:39:27) we want to change something in the table persons. And of course we have
to tell SQL what we want to change. Are we removing a column? Are we adding
column? In this scenario we want to add new column. So let's go and add the email
information. So this is the column name and as you are creating a table you have
(1:39:43) to define column name the data type and the constraint. So now for the
emails we're going to have like characters, numbers, special characters. So we're
going to go with the varchar and about the length it's going to be let's say 50 and I'm
going to say each person has to has an email. So it's going to be not
(1:39:58) null. So with that we are adding completely a new column. So that's it. Let's
go and execute it. Now again this is not a query. This is a DDL command and in the
output we will not get data. We will get a message whether everything went correctly.
So it says command completed successfully and the time when
(1:40:15) this is completed. Now we can go and do a simple query just to have a
check to the table. So and now you can see we have our columns and at the end we
have a new column called emails. This is very important. If you are adding new
column it's going to be always at the end of the table. But now you might say you
(1:40:31) know what I would like to have the email like something in the middle
maybe after the person name. Well, in order to do that, you have completely to
delete and drop the table and create it from the scratch using create command which
is might be bad if you have data inside the table. So if you are fine by adding your
(1:40:47) new column at the end, you can use the alter table. But if you say I would
like it in the middle, then sadly you have to go and drop everything and start from the
scratch. Okay. So now let's have another task and it says remove the column phone
from the person's table. So now we're going to do exactly the
(1:41:02) opposite. We're going to go remove it completely with its data from the
table. So we're going to still saying alter table persons. We are saying we want to
edit the definition of the table persons. And now instead of adding we will be
dropping a column. And then after that we have to specify as well
(1:41:18) the column name. It's going to be the phone. But we don't have to mention
again the data type and the constraint. And that's because the database already
knows those informations. So we need those informations if we are creating
something new. That's why we can get rid of that. We just need the column name
(1:41:33) and the database is going to do the rest. So let's go and do that. Now you
can see successful. And now let's go and check our table. And now as you can see
we have the ID, person name, birth date, email, and we don't have the column
phone. Be careful. If you are deleting column, you will be losing as well all
(1:41:48) the data inside this column. So as you can see, this is very simple. This is
how we can edit the definition of our table by adding and removing columns. Okay,
now moving on to the last one in this group of commands. So now so far what we
have done, we have created something new in the database. We have
(1:42:08) changed the definition of something inside our database. And now the last
one, you can go and drop something from the database. Let's say we have another
table and we don't need it anymore. So we can go and use the drop command in
order to remove the table completely from the database. And this means as
(1:42:26) well removing everything the table and the data inside it. So now let's go to
SQL and let's drop something from our database. Okay. So now our task says delete
the table persons from the database. This is the simplest form of command in SQL
but yet the most risky one. So what we need? We have to delete
(1:42:43) and drop the whole table persons. We don't need it anymore. We're going
to say drop table and then all what we have to do is to give the name of the table
persons. So three words. You don't have to specify anything. Just destroy the table
persons. Let's go and execute it. It is successful. So as you can see it
(1:43:01) is very simple. Now on the left side to your database go refresh and go to
the tables and you will not see the table persons. So the drop command it is very
simple but yet very risky. So if you compare now create table with a drop table you
can see destroying things is way easier than building it. Those are
(1:43:18) the commands create alter drop. those commands we use in order to
define the structure of our database the DDL commands that was very simple all
right so that's all about the data definition language DDL and with that you have
learned how to define new stuff in your database now moving on to the next one
(1:43:36) we're going to learn about the data manipulation language and here we're
going to learn how to manipulate our data inside the database let's go all right so
now what we're going to do we're going to go and modify and manipulate your data
inside the database. So now sometimes what happens you have a table inside your
database
(1:43:56) and the table is empty. You don't have any rows any data inside the table.
Now in order to add your data to the table what you can do you can use the
command insert. So insert going to go and add new rows to your table and of course
not always the table must be empty to add your data. You can add new rows to
(1:44:13) already existing data and SQL going to go and append it at the end of the
table. Now my friends in order to insert new data to the target table there are two
methods. The first and the classical way in order to insert new data we can use the
insert command and manually specifying the values that should be
(1:44:31) inserted to the table. So you're going to start specifying in the script the
values and then they're going to be inserted as a new rows to the target table. So in
this process you are manually inserting new values to the table using like an SQL
scripts. So now we're going to focus on this scenario on
(1:44:47) how to insert data. All right. Now let's check quickly the syntax of the insert
command. It start with the keyword insert into and after that we have to specify the
table name. So where we want to insert and then we make a list of all columns that
we want to insert. And then we specify list of columns where we're
(1:45:03) going to insert values into them. And after that we say values. And finally
we're going to go now and specify the data that should be inserted to the table. and
we make it as well as a list like we have done for the columns. Now in the insert
statements specifying those columns it is totally optional. So
(1:45:19) if you don't specify the columns of the table then SQL going to expect you
to insert values into each column because sometimes of course we don't want to
insert value for each column. You can skip few columns of course but if you want to
insert a value for each column either you go and specify them as a list
(1:45:37) or you can skip it. Now for the insert statements there is very important
rule. The number of columns and values must match. So if you specify here three
columns then you must insert as well exactly three values. So this must be matching.
And one last thing about the syntax you can insert multiple values in
(1:45:55) one go. So for each row you can specify a list of values that must be
inserted. So that's all about the syntax. Let's go back to SQL in order to practice
insert command. Okay. So now let's go and insert a new customers. So it's very
simple. It start with insert into. So we are saying we want to insert data into.
(1:46:13) So we have to go and specify the table name customers. Now after that we
have to specify list of columns where we want to insert data into it. And what we can
do we can go and check which columns do we have inside our table. So we can see
we have ID, first name, country, score. And we can go and make a list of that.
(1:46:29) So we can say ID, first name, country and score. So we just have a list of
all columns inside our table customers. Now what we need? We need the values. So
which data should be inserted. So we can go and open two parenthesis. And now we
have to specify an ID. We know the last customer was five. So we're going to go
(1:46:49) with the customer six. Now we have to give the name of the customer. Let's
go for Anna. And then a country. Let's go for USA. And this customer has no scores.
So what we can do? We can say null. So we don't know the score of this customer.
nulls means nothing we don't know. So with that you can go and insert
(1:47:07) one row. But now let's say that I would like to go and insert like a second
row one more customer. What we can do we can separate this with a comma and
then we can go and repeat the whole thing again. So the ID is seven. The next one
let's call this customer Sam and we don't know the country of this customer. So we're
(1:47:25) going to say it's null. But the score we know it already. It is 100. So as you
can see we are adding a value for each of those columns. And if you don't know the
answer then make it null. if the database allows it to be null. Some columns they are
not allowed to be null like the primary key. So if you go and
(1:47:40) say over here null the database will not allow it. Well actually we can go
and test it. Let's execute. And you can see you cannot insert the value null into the
column ID. So this is not allowed. Going to have a seven. But for the other columns it
is allowed. You can go and check the definition of the table. Now
(1:47:55) we go and execute. Now the output of the modifications command is going
to always indicate what happens to the data. So it says two rows affected. Affected
might be inserted, updated, deleted. So you're going to get a general statement from
the database. But you are getting how many record is affected. So we got two
(1:48:13) because we have inserted two records. So now as you can see it's not like
the query. We are not getting any data in the output. We are just getting a message.
So this is a big difference between querying the data using the selects and modifying
the data using inserts. We are doing now direct modifications to the data inside our
(1:48:30) database. Of course, if you want to see the data in the customers, what we
can do, we can go and query the data, right? So, let's go and do that. Select star
from customers. I would like to see the whole table. So, market and execute it. Now,
you can see we have seven customers. So, we just manipulated our
(1:48:47) data. We have here Anna and Sam. This is how you can insert data to the
database. Now, there's like few rules you have to be careful as you are inserting new
data to your tables. You have to pay attention that the order of the columns that you
have defined. insert is matching the values that you are
(1:49:03) inserting over here. Let's have an example. I'm going to go and remove this
over here and let's say that we are inserting a new one number eight and now in the
first name instead of the name of the customers we have inserted the country like
USA and in the country we have inserted the name is just mistake
(1:49:19) and we are all human right? So let's have a name like this max. Now if you
go and execute it the database can accept it because it is really hard for the
database to understand that you have made here an error. Both of them are var and
the database doesn't care about the content of the data as long as you are
(1:49:34) following the rules of the data type. So now if you go and select the data
from the customers you can see now we have a customer called USA from the
country max. So the SQL going to do it blindly like you insert the data as long as you
are following the data type rules and the constraints. So for example, if you
(1:49:50) made this error over here and you say the id is max and let's say the first
name is let's say nine and you execute it here the database is smart enough to say
you know what there is something wrong the ID should not be strange so the
database going to reject your inserts be careful of the order of your
(1:50:07) columns now let's go and query again our table now if you are in the insert
commands defining all the columns exactly like the table so as you can see we have
here complete match ID first name country score we have all the columns and as
well the correct order there is like lazy way you can go and remove the whole thing
over here and
(1:50:24) with that the database can understand okay we are inserting values to all
of the columns so going to understand you are inserting something to each columns
in the correct direction so let's go and do that correctly nine and here let's say we
have from Germany so if you go and execute it it will be working even
(1:50:44) though we didn't define the columns and that's because the values that we
are inserting as exactly the same number of columns of the table and following as
well the rules. Now moving on to the next one, you can go and add only two columns
in the definition. If you know already always the country and the score
(1:51:00) is null. We know only two informations, the ID and the name. Then you
don't have always to go and say null null null and so on. We can go and skip that.
Okay. So now let me show you what I mean. We're going to go after the table name
and we're going to define only two columns, the ID and the first name. So that means
(1:51:15) we are telling SQL we want to insert only two columns. And now you have
to be careful. If you define here two columns then the values should be as well two
columns. So we're going to remove the country and the score. And we can go and
add only two informations. So 10. And we can go and add here for example Sara. So
(1:51:33) if you go and execute it, it will be working. And now what is skill is doing
with the other two columns. It's going to be nulls. So let's go and select again from
our table. You can see here Sara has null in the country and as well in the score
because we didn't define those informations. But be careful, you
(1:51:48) cannot here skip a column that is not allowed to be null. So you have
always have in your list all the columns that are not null. So for example, I cannot go
and insert only the first name. I will get an error because the database can try to
insert a null in the ID and this is not allowed. So you can skip
(1:52:04) only nullable columns. All right, my friends. So that was the first method on
how to insert data to your target table as you saw by typing manually the values
inside an insert command using values. And now let's move to another methods.
We're going to insert data but this time not manually. We're going to insert data
(1:52:27) using another table. So imagine we have the following scenario. We have
an already existing table with data and this going to be the source table, the source
of your data and we have another table. This table is empty and we want to insert a
new data to this target table. Now what we can do, we can take
(1:52:43) the data from the source table and insert it into the target table without
manually writing the script for the values. So we are moving the data from one table
to another. Now in order to do that we need to do two steps. The first step we have
to write an SQL query using select from and so on in order to select
(1:53:01) the data that we need from the source table. And once you do that you will
get a results. So this is like you are doing a normal query. You right select and you
will get an answer with the results. And now what we can do in the next step we can
take this results and use an insert command in order to insert this results
(1:53:20) into the target table. And with that we have moved the data from the source
table to the target table. So first write the query on the source table. And the second
step use an insert to move this results to the target table. So let's go back to the
scale in order to do that. So now we have the following
(1:53:37) task and it says insert data from the table customers into the table persons.
So that means the source table is the customers and the target table is persons.
Now how I usually do it that I keep my eye on the target table to understand the
structure of this table and I start writing the query from the
(1:53:55) source table. If you go to the left side, we can see okay, we have here an
ID. We have here person name, birth date and phone. And you can see only the birth
date except nulls and the rest we have always to provide informations. So with that I
have now understanding about the table persons. Now next I'm going to
(1:54:12) go and start writing the query from the source. So we start like this. Select
star from our table customers just to have an overview of our table. Now the next
step we're going to go and design a perfect result from this query that is matching
the target table. So in the output we need ID and we have it from
(1:54:30) the customer from the original table. We're going to go and select ID. Okay.
So now next we need a person name and here we have from the original table
something called first name. So this is a perfect match. So we're going to go and
select this table as a second column. So we have covered the first
(1:54:46) two. Then the third one is going to be the birth date. Well, my friends, we
don't have birth dates, but the database can accept it as a null. So, I'm going to go
and write a null because I don't have such information from the source table. And
now the next one going to be the phone as well. We don't have phone
(1:55:00) informations. But we cannot have it as a null because it says here not null.
So, what we're going to do, we're going to go and add a static value, a default value.
So, we're going to have two single quotes and in between we're going to say
unknown. Since it is var, it can accept this word. So, now let's go and
(1:55:16) just query. So we have the ID, we have the first name, the birth date is
empty, and the phones is unknown. Now you might say, but the column name is not
matching with the column name of the persons. Well, the database does not care
about that. As long as the result of the data is matching the table, it can go and
(1:55:35) insert it. So the database will never compare the column names together.
But if you like and go and add here like the aliases exactly like the target table it will
not hurt but it has no effect on the results. All right. Okay. So now we have like query
select and we have a results but this is not an insert. So
(1:55:52) how we going to insert the result of this into the table persons. Well for that
we need the insert into command. So insert into and now we have to specify the
target table going to be the persons. And of course you can go and list all the column
names but if you have like exact match you can skip it
(1:56:11) but for me I would like always to add it just to make sure that we don't have
any issue. So the ID, person name, birth date and the phone. So that's it. Let's go
and execute. So it is working now. We can see 10 rows affected. Well that means 10
rows are inserted from the table customers into the target persons. And
(1:56:33) now what we can do we can go and query the table persons just to check
that everything is working perfectly. Select star from persons and let's go and
execute. And with that you can see our 10 persons that we have added from the
customers. So with that we have moved the data from one table and inserted
(1:56:51) into another table. And as you can see it was very simple. First you have to
write a query from the source table in order to collect the data that you need. and
then you go and insert it into the target table. So this is really nice and easy and this
is another way on how to insert data into your database. Okay, so with that we have
(1:57:14) learned how to insert data to our tables. Now let's say that I don't have
something new. I don't have any rows to be added to my table but I have an update. I
would like to go and change the content of the already existing rows. So what you
can do? We can use the command updates in order to change the
(1:57:31) content of already existing rows. So again my friends insert going to go and
insert completely new rows but update going to go and change the data of already
existing row. Now let's have a look quickly to the syntax of the updates. It start with
the keyword updates and then we have to specify the table name and after that we're
going to
(1:57:51) use sit in order to specify what are the new values for the columns. So you
have to write down for each column that you want to update a new value and you
separate the columns of course using a comma. Now after that we have to specify
as well a wear condition. So it's like the queries you say where and then you
(1:58:07) write a condition and if you don't do that and you don't use the wear clause
what going to happen you will be end up updating all the rows inside your table. So
that's why we need always the wear clause. All right. So that's all about the syntax.
Let's go back to SQL in order to update our data. Okay. So let's
(1:58:24) have the following task and it says change the score of customer 6 to zero.
So that means we have to go and modify the data of the customer ID equal to six. So
now first I would like to go and have a look to our data. So select star from
customers and now the task is targeting this customer over here and we
(1:58:41) would like to replace the null to zero. Now how we can go and update this
information inside the table? We can use the update command. So what we going to
do? We're going to start writing update and after that we have to specify the table
name. So what we are updating? We are updating the customers and then
(1:58:58) we're going to tell the database to set the value of the score to a zero. So
we would like to update and change the value from null to a zero. And now here
comes something very risky. Don't execute this query yet. If you do that, what's going
to happen? The database going to go to the table customers and
(1:59:14) replace all those values of all customers to zero. So it's going to go and
update the whole table and this is of course very risky. That's why in the update
command we have to give a wear condition a filter in order to target only specific row
or the rows that you want really to modify. In this case we
(1:59:31) want to change only one row. So what we have to do is to go and specify
the work condition like we have done in the select query. Nothing new, right? So
we're going to say where the customer ID is equal to six. And with that SQL will not
go and update everything. First it's going to filter the data and then
(1:59:49) updates. And now before I execute just to make sure I go and check which
data going to be affected. So it's very simple you go and select star from table
customers and then I go and take the exact where and put it in my query and then I
select the whole thing and execute. And now if this query gives me
(2:00:06) the data that should be modified then I'm doing the update command
correctly. And in this case we are targeting only one customer. This is the customer
number six. And with that I feel really confident with my update. So what we can do
since I'm going to use this later I'm going to put the whole thing in a
(2:00:23) comment and if I execute now only the update going to be executed. So
let's go and do that. Now very important to check the message you can see one row
is affected which is really good because if I see here 10 rows is affected that means
everything is updated. Now let's go and check the data. I'm going to go
(2:00:40) and remove the wear here and check the whole table. Now you can see we
still have the old scores only Anna has now score zero instead of null. So this is how
I usually update the data. You have to do it very carefully. Now let's move to another
task. It's going to say change the score of the customer number
(2:00:57) 10 to zero and update the country to UK. So now this time we are targeting
the user number 10. As you can see she doesn't have the country and score. And
the task wants us to change the score to a zero and the country to UK. So now how
we going to do it? We're going to use the exact same command but with
(2:01:14) different condition. So the ID this times is equal to 10 and the score is to
zero. But now we have to change as well the country. Now if you want to do multiple
updates, you're going to have here a comma after the score and the new line and
let's say country equal and then we're going to add UK. So select
(2:01:32) the whole thing and let's go and execute. So again it is affecting only one
row. This is really good. And if you go and check the table search for Sara, you can
see in one update we have updated two columns the country and as well the score.
So with that we have solved the task. It's very simple. Now
(2:01:50) moving on to the second task. It says update all customers with a null
score by setting their score to a zero. So this time we are not speaking about one
specific customer. We are talking about updating the data for a subset of customers.
So now imagine you have like hundreds of customers and you are making
(2:02:07) one update command for each customer. It's going to be really wasting of
time. Now instead of that we can specify a condition that targets multiple customers
and we're going to do the update for those customers in one go. So now let's see
how we're going to do it. We are talking only about replacing the
(2:02:22) nulls with a zero. So we don't need the country. So set score equal to zero.
But now we will not be specific for the ids. Now we have to make a new condition. It's
going to say like this where score is null. Now of course in the course we have a full
dedicated chapter about the nulls and here all what we are doing is
(2:02:40) we are searching for scores that is equal to null. But we cannot write an
equal we have to write it like this is null. Of course before we update anything we
have to go and test it in a query. So select star from customers where score is null.
Let's go and execute. Now as you can see we have two customers where the score
is null. So
(2:02:59) that means this condition is targeting a subset of customers and we're
going to do now the updates for multiple rows for this subset. So that means we can
run this query. Let's go and execute it. Now you can see two rows are affected. So
that means multiple rows got affected got updated. So now if you go and query
(2:03:19) our table customers you can see we don't have any nulls inside the scores
and we have replaced all the nulls with a zero. And of course you can do the same
thing. you can go and make an update command in order to replace all the nulls in
the country to maybe something unknown or any default value that you want. So this
(2:03:37) is how you can update multiple rows in one go. All right my friends. So with
that we have learned how to insert new rows to our tables and as well how to update
the content of already existing row. Now the last thing or command that we can do to
the data inside the table that we can go and remove rows from our table and we
(2:03:59) can do that using the command delete. So if you use delete SQL going to
go and start removing already existing rows inside your table. All right. Now for the
syntax of the delete it's going to be very simple. We're going to say delete from and
then we're going to write the table name. And here comes
(2:04:16) something very important. We have to add a wear condition. And it's like
the update. If you don't do that, if you don't include where condition, what going to
happen? You will end up deleting all the rows inside the table. So the syntax is very
simple. Let's go back to scale in order to delete some data. Okay. So now we have
the following
(2:04:33) task. Delete all customers with an ID greater than five. So now we have to
go and delete all the customers that we recently added. So how we going to do it?
It's very simple. We're going to say delete from. So that means I want to delete
something from a table. And we have to specify the table name. It's
(2:04:49) going to be the customers. So the syntax is very simple. Now my friends,
this is more risky than updates because if you execute it like this, don't do that yet.
Wait, what's going to happen? All the data of the customers going to be deleted. So
you will get an empty table and we will not do that. So now we're
(2:05:05) going to do exactly like the update command. We're going to specify the
work clause. So it says the ID should be greater than five. So that means ID higher
than five. So with that we are defining a subset of the data that should be deleted,
not everything. And if we check in the updates, we have here
(2:05:21) to do a double check before deleting anything. So again what we do, we
select star from table customers and we're going to go and copy the work condition
in order to test what going to be deleted. So it's going to be all the customers that is
higher than five. And with that I'm making sure that my delete
(2:05:37) command is correct which is from what I see here is correct. So those five
customers should be deleted. So now let's go and delete those customers. And now
very important to read the message. It says five rows affected. So that means five
customers got deleted. And this is better than 10 of course. So
(2:05:54) let's go and check what customers left. So we have 1 2 3 4 5. Those are
the original customers. And everything else got deleted. And with that we have
solved the task. And this is how we can delete data from tables. Be very careful.
Always test before doing the delete command. Okay. So now we have the
(2:06:11) following task. And it says delete all data from table persons. So that
means we have to go and drop everything from the table persons. But we don't want
to delete the table. We just want to delete the data inside the table now. So now what
we're going to do, we're going to write delete from. And now we have to
(2:06:27) specify the table persons. And if you execute it, what's going to happen?
SQL going to go and drop all the data in the persons. But in SQL, we have more
interesting command. If you want to delete everything from the table persons, we
have that truncate. Truncate. It is exactly like delete from persons. It's going to go
and make the
(2:06:44) whole table empty. But why I like to use truncate because it is way faster
than deletes. If you have large tables, the delete command going to be really slow
because with the delete there is like a lot of things happening behind the scenes.
There is like logs and protocols. But if you are using trunk,
(2:06:59) the database going to skip all those extra stuff and it's going to be very fast.
So if you want to delete all the data from table, you can do it like this if it's like small
table. But what I usually do, I go and write truncate and then table. we're going to get
the same effect and with that I'm saying reset
(2:07:16) everything make the table empty. So let's go and execute it and now with
that you will not get the number of deleted rows and that's why it's truncate it's way
faster. It is not protocoling anything it's not logging anything it just go and delete all
the data without any extra steps. So this is how we can delete all the data from a
(2:07:34) table but the table still exists. Okay my friends, so with that you have
learned the basics on how to manipulate your data inside the database the data
manipulation language DML and with that I can tell you we have covered the basics
of SQL. So with that we have covered the beginner level. Now in the
(2:07:50) next chapters we will be in the intermediate level and the first thing that
you're going to learn in the intermediate level you will learn how to filter your data
and we're going to cover many operators that you can use inside the workclass. So
let's go. All right. So now let's have an overview about all different operators in SQL.
So
(2:08:09) the first group of operators we have the comparison operators. They are
the easiest one where all what we have to do is to compare two values and we have
like six different variants and how to do that. Now to the next one we have the logical
operators. We use it in order to combine multiple operators. And moving
(2:08:25) on to the next one we have the range operator. Here we have only one, the
between. We're going to use it in order to check whether a value falls within a
specific range. Now moving on to the next one, we have the membership operator.
And here we have two things. We have the in operator or not in. Here
(2:08:40) all what you have to do is to check whether a value is in a list or not. And
the last category that we have is the search operator. And here as well we have only
one operator that like we use it in order to search for a specific thing in a text. So my
friends, we're going to go through all those operators
(2:08:56) one by one. Okay. So now let's go and deep dive into the first category the
comparison operators and we're going to cover all those stuff. So what is exactly
comparison operator? Okay. So what is exactly comparison operators? It is very
simple. We want to compare two things and there is a lot of things that we can
compare
(2:09:16) in SQL. But the formula for that going to be always like this. So we have
the first expression and then operator and then we have another expression and this
going to form something called condition. So here we have a lot of variance. We can
compare one column to another column. So for example, you can
(2:09:32) go and compare the first name with the last name. So both of the
expressions are columns here. Another scenario, you want to compare a column
with a value, a static value. Like for example, you say the first name must be equal to
a value like John. So now we are comparing a column with a value. It's not anymore
(2:09:51) two columns. Now we have another scenario where we want to apply a
function to a column and then compare the results to maybe a value. So for
example, we apply the upper function to the first name and then this must be equal
to a value like John with all the letters in the uppercase. And one more thing that you
can compare you can write
(2:10:10) an expression in one of the sides like for example you can say if we
multiply price with the quantity it must be equal to 1,000 for example. So here we
have an expression. We have multiple columns included in one sides and the output
of this expression must be equal to 1,000. And now the last one is going to be a
(2:10:29) little bit more advanced and we're going to cover that of course in other
chapter. We can include a whole query the complete query to one of the sides and
we call this a subquery. So in one of the sides you're going to write a whole query
select from where whatever you want and you go and compare the
(2:10:45) result of this query to for example a value or a column. So as you can see
in a scale we can compare a lot of things together. Either comparing the columns
together or a column with a value or we use a function or an expression or even a
whole query. So this is how we build conditions in SQL. Okay my friends. So
(2:11:02) let's see how the conditions works in SQL. So we have our data the name
the country the score and let's say that we have built a condition where it says the
country must be equal to the USA. So this is very simple comparison operator and
this is the condition that we are using inside the work clause. So once
(2:11:20) you apply this filter to your data what going to happen? SQL going to go
row by row evaluating whether it is meeting the condition. If it's not fulfilling the
condition then SQL going to remove it from the results. But if it is fulfilling the
condition it's going to keep it. So now we are comparing the
(2:11:35) values of column together with a static value the USA. So we're going to
compare whatever value we get from the country together with the USA. So now let's
see how is going to apply this filter to our data for the first customer Maria. Now you
can see the value inside the country is Germany. So Isql now going to go and
(2:11:51) compare Germany to USA since it is not equal. Then is going to understand
okay Maria is not fulfilling the condition. So it is false and is going to go and remove
this customer from the results. So she is not fulfilling the condition. Moving on to the
next one to Joan. Now S is going to take the value inside the
(2:12:10) country the USA it is equal to USA. So that means John is fulfilling the
condition and Isl going to be happy about it. So it is true and this means is going to
keep Joan in the final results. Now moving on to George the value is UK not equal to
USA. He is not fulfilling the condition. Is going to go
(2:12:27) and remove him from the final result. Same thing for Martin. Germany is
not equal to USA. Is going to remove this customer as well. And to the last one bit
better you can see the value is USA. So USA equal USA. The condition is fulfilled.
SQL is happy about it and going to leave the customer in the output. So now if you
go and apply this
(2:12:45) condition using the comparison operator to your data only two customers
going to be left in the output. This is exactly how the conditions and the comparison
operators works in SQL. Okay. So now let's start with the first operator. It's very
simple. We have the equal. It's going to checks if the two values
(2:13:02) are equal. That's very simple. Let's have an example. Okay. So now we
have this task. It says retrieve all customers from Germany. So this is very basic.
We're going to go and select and we're going to select all the columns since we don't
have any specifications from the table customers. And if you go
(2:13:15) and execute it, you will get all the customers. But we don't need that only
the customers that comes from Germany. So we have to go and apply a condition
using the wear clause country equal to the value Germany. So make sure you are
writing it exactly like in the database otherwise it will not work. So let's go
(2:13:33) and execute and with that we are getting only the customers from
Germany. So it is very simple and this is why we use the equal operator. Okay. So
now moving on to the next one again very simple. If you want to check if two values
are not equal we can use the not equal operator. So let's have an example. Okay. So
now
(2:13:50) we let's have the opposite task. It says retrieve all customers who are not
from Germany. So this is very simple. We are saying here who are not they are not
equal to Germany. So we can use the not equal operator in order to get these
customers. So with that as you can see after executing we are getting all the
(2:14:09) customers country is not equal to Germany and there's like another way on
how to do the not equal doing it like this we'll get the same results. All right my
friends moving on to the next one. We can check if a value is greater than another
value. So we use the greater operator. Let's have an example.
(2:14:26) Okay. So now the next task it says retrieve all customers with a score
greater than 500. Now we want to filter the data based on the score. So we're going
to say where score and now the task says greater than 500. We're going to use the
operator greater than 500. It's very simple. So with that we will
(2:14:46) get only the customers where the score is higher than 500. So for example
Maria it's not fulfilling the condition. The same thing for the Peter and as well for
Martin it must be greater than 500. So if you go executed you will get only those two
customers because they are greater than 500. Okay, moving on to the
(2:15:05) next one. This time we're going to check if a value is greater than or equal
to another value. So it is like mix between the greater than and the equal. If one of
them is fulfilled then the value going to meet the condition. So let's have an example
for that. Now, if the task says retrieve all customers with a
(2:15:22) score of 500 or more, this time we're going to go and include the customers
where their score is equal as well to 500 or higher. So, we're going to have a similar
condition based on the score and the 500's value, but this time we're going to say
greater or equal to 500. So, if you go now and execute it, this
(2:15:41) time we're going to see the customer Martin with the score of 500. So, in
this scenario, we're going to use greater or equal. All right. Right. So now let's keep
moving. The next one is as well very simple. We're going to check this time if a value
is less than another value. So we're going to use the
(2:15:57) less operator. Let's have an example. Now moving on to another simple
task. Retrieve all customers with a score less than 500. So this time we want all the
customers with a lower score. And we're going to use exactly the opposite. It's going
to be the score is less than 500. And again here it is not equal, right?
(2:16:14) So if you go and execute, you will get all the customers with a low scores.
he will not get to Martin because Martin is equal to 500. So with that we have solved
the task. We have all the customers with the score less than 500. Okay my friends,
now moving on to the last one. I think you already got it. So
(2:16:31) we're going to check whether a value is less than or equal to another value.
So you can go and combine the less operator together with the equal and if one of
them is fulfilled then the value going to meet the condition. So let's have an example
for that. This time we are retrieving all customers with a score of
(2:16:48) 500 or less. So the query going to be very similar but we are saying it is
less or equal to 500. So we are including the value in our condition. And with that as
you can see we still have our two customers where we have the score less than 500
but we have now as well Martin with a score of 500. Okay my
(2:17:06) friends. So with that we have covered the first group the comparison
operators. Now we're going to move on to the next group. We're going to speak
about the logical operators and here we have three and or not. So let's start with the
first one. What is exactly and operator. Okay. So now what is the definition of the and
it says all
(2:17:28) conditions must be true. So all the conditions that you have in the wear
clause must be true in order to keep the row in the results. So let's understand what
this means. things going to get more complicated where you can have not only one
condition but you might have multiple conditions in your query. So
(2:17:45) here we're going to add a second condition where we're going to say not
only the country must be equal to USA but also the score must be higher than 500.
So now you have two conditions and you have to put them in the wear clause. Now
you have to combine those conditions using the logical operator and here we
(2:18:02) have two options two operators the and operator and the or operator. In
this scenario, if you say and then SQL is very restrictive. Both of the conditions must
be true in order to keep the row in the results. So now let's see how this going to
work. Now for the first row and for the first condition you can see the
(2:18:20) country is Germany and it is not fulfilling the first condition. So this going to
be false. And as well if you check the second condition for the first row you can see
the score is 350. So that means this customer is as well not fulfilling even the second
condition. So both of the conditions is false and it's
(2:18:37) going to go I remove this customer from the results. Now to the next one
John you can see John is fulfilling the first condition because the country is equal to
USA and as well fulfilling the second condition. His score is 900 and this is higher
than 500. So now SQL going to be very happy about it because both of them
(2:18:54) is true and this is the only way in order to keep the row in the output
because we are using the operator and so John going to stay in the output. Now
moving on to George. He is not fulfilling the first condition. But now the second
condition is fulfilled. His score is 750 and this is higher than 500. So now it's like
50/50 right. In
(2:19:12) one side it's false but the other side is true. But this is not enough for the
ant operator. Both of them should be true in order to keep the result in the output.
That's why SQL going to remove this row. Now moving on to Martin. He is not
fulfilling both of the conditions. So SQL going to go I remove it from the
(2:19:28) results. And now for the last one. Peter is fulfilling the first condition. the
country is equal to USA but the second condition is sadly not fulfilled so we have the
score zero not higher than 500 again we have the same scenario it's 50/50 and this
is not enough for the ant operator that's why SQL going to go I
(2:19:46) remove it so as you can see if you use an and operator a lot of rows going
to be removed if one of the condition is not met so the ant operator is very restrictive
both of the conditions must be fulfilled to keep the row in the results so this is exactly
how the and operator works. Okay. So now we have the
(2:20:04) following task. Retrieve all customers who are from USA and have a score
greater than 500. So here we are like combining multiple conditions and let's go and
do it step by step. So the first thing that we have to go and select the data from the
correct table. So select star from customers and with that we are
(2:20:20) getting all the customers from the table. Now the first condition we need
the customers that come from USA. So we need only those two customers and in
order to do that as we learned we can go and use the wear clause and the condition
going to be country equal to USA. So if you go and execute we will
(2:20:37) get those two customers. Nothing is new. We have used the compression
operator equal. But we are not done yet. We have another condition from those two
customers. We need only the customers where their score is higher than 500. So
now by looking to those two customers you can see we see that the bitter here
(2:20:54) does not have a score higher than 500 and we don't want to see that in the
results. So now what we have to do we have to go and write a condition for this one
over here. So this is based this time on the scores not on the country. So the score
should be greater than 500. Now as you can see we have the
(2:21:11) first condition for the first one here and the second condition for the second
requirement. Now the question how to connect those two conditions. So here we
have two options and or and to be honest this is very simple the task says it
customer should fulfill both of the conditions should be from USA and as
(2:21:29) well at the same time greater than 500. So it is very simple real and so with
that we have connected both of those conditions and if you go and query it you will
get only one customer that is fulfilling our conditions. So from all customers we have
only one customer that's fulfilled this condition that
(2:21:48) comes from USA and at the same time the score of this customer is higher
than 500. So this is how we use the ant operator in order to connect two conditions.
Okay my friends. So that's all for the ant operator. Let's speak now about the or
operator. All right. Now the or operator it says at least one condition must be
(2:22:11) true. So it is less restrictive than the and it is enough to have one condition
true in order to keep the row in the results. Let's understand exactly what this means.
Okay. So now we have the same scenario. We have two conditions and in SQL you
have to connect them either using the and operator or the or
(2:22:27) operator. In this scenario we're going to talk about the or operator. And as
we said at least one of the conditions must be fulfilled in order to leave the record in
the results. So let's see what's going to happen here. Now the first customer Maria
she is not fulfilling the first condition and as well the second condition. So both of
(2:22:46) them is false and this is the only scenario where SQL going to remove the
record from the results because it is not fulfilling the minimum at least one of them
should be true. Both of them is false then SQL going to go and remove this row. Now
moving on to the next one to John. John is from USA and has higher
(2:23:02) score than 500. Both of the conditions is green. So both of them is true and
this is more than enough to keep the row in the output. That's why we will see John
in the outputs. Now moving on to the third one, George. George is not fulfilling the
first condition because UK is not equal to USA. But John this
(2:23:20) time is fulfilling the second condition. So we have here true and since we
have at least one true, this is good enough to keep the record in the output. So you
will see George in the results. Now moving on to Martin. He is not fulfilling the first
condition as well not fulfilling the second condition.
(2:23:37) Both of them is false and this is not enough to keep the result in the output.
So that's why it's still going to go and remove it. Now moving on to the last one.
Peter he is fulfilling the first condition but not the second condition but still everything
is fine because he is fulfilling at least one condition. So
(2:23:52) we have the minimum and it's still going to leave it in the output. So as you
can see the or operator is not restrictive like the and operator. It's enough to have
one true in order to keep the data in the output. And this is exactly how the or
operator works. Now let's see the second task. Retrieve all customers who
(2:24:08) are either from USA or have a score greater than 500. So it is a very similar
task. We have two conditions. So we need the customers that are either from USA.
So it is based on this country equal to USA. And the second condition is the score is
greater than 500. But this time we are very relaxed. either
(2:24:28) this condition is fulfilled or the second one. So instead of having and we will
be using the operator or. So it is enough to fulfill one of those conditions. And if you
go and execute now as you can see we are getting more results because it is easier
to fulfill the conditions. So we can see those three customers either fulfilling the
(2:24:46) first condition or the second one. All right my friends. So that's all for the or
operator and we're going to move to the last one in this group the not. So what do we
mean with the not operator? Okay. So now what is this operator not? It is a reverse
operator. It's going to go and exclude the matching values. So
(2:25:07) what this exactly means? Let's have a very simple example. All right. So
now the net operator is not like the or and the ands. This operator will not go and
combine two conditions. So you can use it with only one condition. And let's say that
our current condition is like this. The country must be equal to USA.
(2:25:24) So this is like a comparison operator. And if you apply it to your data, as we
learned, it's going to leave only two customers, John and Peter, because they fulfill
the conditions and all other customers will be removed because they don't fulfill the
condition. So nothing crazy so far. But now if you go and
(2:25:39) apply the not operator to the condition, what going to happen? You're going
to reverse the whole truth. So you are saying if this condition is fulfilled, it must be
removed from the final results. So it is switching everything. We want to see the
customers that is not fulfilling the condition. So now let's
(2:25:57) see what can happen if you apply the not operator together with the
condition. We can see that the first customer is not fulfilling the condition which is
great thing. This is exactly what we want. We want the customer that is not fulfilling
the condition. That's why going to be happy about it and SQL going to make it
(2:26:12) true and leave it in the output. So Maria is fulfilling the whole thing. She is
not meeting the condition. So SQL going to leave it at the output. Now for the next
one. So this customer is fulfilling the condition and that is not a good thing. So SQL
going to go and this time remove John from the results
(2:26:29) because he is fulfilling the condition. And moving on to George. So George
is not fulfilling the condition which is amazing. So that's why SQL going to keep this
time George in the output. The same thing for Martin. Martin is not fulfilling the
condition. So Isl going to keep the customer and better he is
(2:26:46) fulfilling the condition. So SQL going to go and remove this customer from
the output. So as you can see we have reversed everything right. The not operator
going to make the true false and the false true. Okay. So this is how it works. Now
let's go back to SQL in order to practice. Okay. The next task
(2:27:02) it says retrieve all customers with a score not less than 500. So this sounds
really funny. As usual we're going to go and select star from customers. And now we
have to filter the data based on this condition. So the score is not less than 500.
Well, you can go and say well the score is higher, greater or equal to
(2:27:23) 500, right? And with that it is not less than 500. So if you go and execute it,
we just solve the task, right? We get all the customers that are not less than 500. Or
you can go and use the not operator to make things more funnier. So you go over
here and say it is not and then you switch it. So you make like
(2:27:44) this. So the score is less than 500. But as we use here not then we twisted
everything. So we are saying the score is not less than 500. And if you execute it you
will get the exact same results. Convert the truth. If you remove it and execute you
will get everything that is less than 500. But if you put the nut
(2:28:03) you will convert the whole logic. So if you go and execute you are not
getting the scores that are less than 500. So this is really nice. This is how you use
the nut operator. Okay my friends. So with that we have covered everything about
the logical operators. Now we're going to move to the third group. We're
(2:28:19) going to talk about the range operator. And here we have only one the
between. So what is exactly between operator? Okay. So what is between? It's going
to go and check if a value falls within a specific range. So you have a range and you
are checking whether your value is in the range or outside the
(2:28:39) range. So let's understand exactly what this means. Okay. So now in order
to build a range you need two things. You need the lower boundary for the range and
you need as well the upper boundary. Once you have two boundaries then you have
a range and everything between those two boundaries going to be true
(2:28:56) and everything outside those boundaries going to be false. So now for
example let's say that we have the lower boundary 100 and the upper boundary 500.
And there is one thing that you have to understand about the between the
boundaries are inclusive. So that means if a value is exactly 100 or exactly 500
(2:29:12) then it's going to considered as a true. So it is considered to be inside the
range. Now if you apply this filter to our data where we say the score must be
between 100 and 500 going to go and do the following. So for the first customer
Maria is going to go and check whether her score is inside the boundaries. So
(2:29:32) as you can see 300 is between 100 and 500. So she is in the green area
and that's why Isque going to be happy about it and leave the customer in the
outputs. Now moving on to John. John has 900. As you can see 900 is greater than
500. So this value is going to be outside the boundaries on the right side
(2:29:50) and this means the score of John is not in the range. That's why he is not
fulfilling the condition and SQL going to go and remove this customer from the
results. Now moving on to George 750. The same thing outside the range. SQL will
not accept it and remove this customer from the final results. Now
(2:30:07) moving on to Martin his score is 500 and this is exactly at the boundary. So
if it's like 5001 it's going to be outside. So since between is inclusive then SQL going
to accept it and Martin considered to be in the range and fulfilling the condition. So
SQL going to keep him in the final result. Now here are speaking
(2:30:25) about better he has zero score and this is less than 100. So in the left side
not in the range. So not fulfilling the condition and SQL going to go and remove him.
This is exactly how between works in SQL. It's very simple. Okay. So now we have
the following task and it says retrieve all customers whose score falls
(2:30:42) in range between 100 and 500. So let's start as usual by selecting all data
from customers and execute it. Now the task says everything. We need all
customers in a range. So we have a lower value and a higher value. So in order to
do that as usual we're going to use the where and then we're going to specify
(2:31:02) the column that we want to filter on. So it's going to be the score and since
we have like two boundaries we can go and use the function between and we start
with the first boundary the lowest boundary. So it is the 100 and 500 the high
boundary the upper boundary. So between 100 and 500. So now let's go and
(2:31:19) execute it. And with that we get only those two customers because they are
between this window. Now there is another way in how to solve this task by not using
between. We can go and use the comparison operator together with a logical
operator and. So let me show you how we can do that. I'm going to go and
(2:31:38) copy the whole thing. And now we're going to write two conditions. So first
the score should be higher or equal to 100 because the boundaries is inclusive and
the other one the score is less or equal to 500. So this is the upper boundary. So
with that we have the two conditions and we can go and connect
(2:31:56) them using the and operator. So it's like very similar to the between we
have an and between the upper and the lower boundaries but we are using the
comparison operators. So it is higher or equal to 100 and lower or equal to 500. If
you go and run this query you will get exactly same results. Now if you ask
(2:32:14) me which method is my favorite I'm going to go with this method and I will
skip the between because each time to be honest for me I forget about the between
whether the boundaries are inclusive or exclusive. But if I read the script I am going
to see exactly that those boundaries are inclusive because we have
(2:32:30) here the equals. So I really prefer using the compressor operator together
with the and then using between. So it's up to you if you memorize it then go with the
between. But for me I'm going to go with the compression operators. Okay my
friends. So that's all about the between and the range operator. Now
(2:32:45) let's move to another group. We have the membership operator. So here
we have like two. We have the in and the not in. So let's understand what this exactly
means. Okay. So what is in operator? It's going to go and check if a value exist in a
list. So you have a list of values and you are checking whether your
(2:33:06) value is a member of your list. So let's have very simple example in order
to understand what this means. Okay. So now how this works exactly what you have
to do is to go and make a list of values. So let's say that I have a list and there I have
specified two values Germany and USA. So those two are the
(2:33:21) members of this list. Now if you use the n operator it's going to go and
check the value of countries whether it is in the list or not. So let's do it one by one.
For the first customer Maria her country is Germany and Germany is member of the
list. So it's going to be happy and going to leave Maria in the final
(2:33:39) results. Now moving on to John. John comes from USA. USA is member of
the list. So he is fulfilling as well the condition and you're going to see John in the
final results. Now we come to George. George comes from UK and UK is not
member of our list. And SQL going to go and remove this customer from the
(2:33:56) final results not fulfilling the condition. Now for the last two, Martin and
Peter, their country is a member of the list and SQL going to go and leave those
customers in the final results. So as you can see it's very simple. Or what you have
to do is to define the members of a list and use the n operator and if
(2:34:10) the value is a member of this list it's going to be true otherwise it's going to
be false. Now of course the other operator going to be exactly the opposite where we
say not in the list. So we are searching for values that are not in this list. So as we
are using not it's going to go and reverse completely
(2:34:27) the truth. And if you apply this you will get in the result only one customer.
you will get George and the result because the country is UK and UK is not a
member of the list. So if you use not together with the in operator you will get exactly
the opposite effect. So this is how the in and the not in operator works in SQL. Let's
go
(2:34:46) back to scale in order to practice that. Okay. So now we have this task and
it says retrieve all customers from either Germany or USA. Okay. So let's try to solve
this task. This going to be a little bit tricky. So select star from customers as usual
and execute it. So now we need in the results only customer
(2:35:04) that comes either from Germany or USA. So that means this customer over
here should be excluded from the result because he come from UK. So how we
going to write it? It's going to be like this maybe. So the first one going to be the
country is equal to Germany or the country is equal to USA right something
(2:35:22) like this. So if you go and execute it, you will get in the output only the
customers that are either from Germany or USA. And with that we have solved the
task, right? Well, there is another way in order to solve this task which is more clear
and shorter using the n operator. So now how we going to do it?
(2:35:38) Let's go and get the whole thing in another query. And now instead of
having equals and ors and so on, we're going to use the in operator and then we're
going to have like two parentheses and then inside it we're going to have a list of
values. So it's going to be the Germany and then the second value going to be
(2:35:55) USA like this. So we are saying country should be in this list Germany or
USA and if it is like one of those values then the condition is fulfilled. So now if you
go and execute this one over here you will get the exact same results. So my friends,
if you notice that you are repeating yourself in the wear condition
(2:36:13) and you are just changing the value of the condition, it is based on the
same column and you are connecting them using the or then there is something
wrong and always think on this scenario to use the in operator because this can be
really ugly once you have a lot of values. So imagine in our database we have a lot
of
(2:36:30) countries and your query going to be like something like this. So you are
keep repeating country equal or country equal and so on. Instead of that you're
going to have a really nice list of countries in one go. So this is as you can see here
it is easier to extend and as well has better performance. So as
(2:36:47) you can see we are repeating the same thing but we are just changing the
value and we are connecting all those conditions using the or in this scenario go and
use the in operator. All right my friends. So that's all for the membership operators.
Now we're going to speak about the last one the search
(2:37:02) operator. And here we have only one the like. And each time we're going to
say like, I'm going to remind you to like this course. So let's go. Okay. So now what is
like operator? You can use it in order to search for a pattern in your text. So if you
have like a text or characters and you are searching for a specific pattern inside
(2:37:23) the text. So let's have an example in order to understand exactly what this
means. Okay. So now if you don't have yet cafe, go grab one because you have to
focus for this one. Now what we have to do is to define a pattern in is scale. In order
to build a pattern we have like two special characters. If you
(2:37:38) use a percentage you are saying anything. So I'm going to accept anything.
So it could be no characters at all or only one character or many characters. So I'm
saying anything. Now if you use an underscore you are expecting to have exactly
one thing like one character or one number. So it is exactly one. I know this sounds
(2:37:57) complicated but with an example you can understand this. And I can tell
you the percentage is way more famous than the underscore. I rarely really use the
underscore. So now let's say that I build the pattern like this. I say the first character
must be M and then percentage. So here I'm saying in my
(2:38:13) text the first character must be an M and after the first character I really
don't care. It could be any character, any number whatever. So this is the pattern and
now let's have few values in order to say whether it's true or false. So now if you
have the value Mariam. So now you can see the first character is
(2:38:29) an M which is perfect. This is exactly our pattern. The first character must
be an M. And then after the M we got like four characters. So whatever it is totally
fine. We can say Maria is fulfilling our pattern. And this is exactly what we are
searching for. This value is fulfilling the condition. Okay.
(2:38:47) Now moving on to the next value we have m a. So here again the first
character is an M which is perfect. And after that we have only one character a. Well
we have say percentage. So it could be anything one character multiple characters a
number or whatever. So that's why this value can match our pattern and we will see
it in the
(2:39:04) outputs. Now moving on to the next value we have only one m which is as
well totally fine because we are saying the first character must be an M and then
followed with anything. Now moving on to the last scenario we have Emma. Now this
is a problematic because the first character is an E and in our pattern we
(2:39:20) say it must start with M. So we don't have that in this word. The first
character is an E. That's why this value is not fulfilling our pattern and SQL going to
remove this value from the final results. So this is exactly what going to happen if
you have this pattern and those values. Now let's have another
(2:39:37) scenario where you say you know what it could start with anything but for
me it is very important the last two characters it must be an I and N. So we could
start with anything but the last two must be an I and N. So let's take this value Martin
going to go and check immediately the last two characters. So
(2:39:54) you can see we have an I and N and the first part marks it is fine. It could
be anything. So this value is fulfilling the condition because the last two characters is
an I and N. Now moving on to the next one we have vin. So v i n the last two
characters is as well exactly what we are searching for. It is
(2:40:12) fulfilling the condition and we have before it like only v. So we say anything
with a percentage. Right? Now one more we have in. So it is as well fulfilling the
condition because before it we don't have anything. So en is fulfilling as well the
condition. The percentage is always saying anything. Now moving on to the last
scenario we
(2:40:30) have Jasmine. They are not the last two characters. The last two
characters is an N and E and this is not matching our pattern and this why this value
is not fulfilling our pattern and you will not see it in the results. So with that you can
understand how we can search for something in a text using the like
(2:40:46) operator. Let's keep going. Now let's say that I have a percentage at the
start and percentage at the end and in between I have only one character an R. If
you define it like this you are saying if there is an R anywhere it is good enough
whether it's beginning or at the end or in between then the condition
(2:41:03) is fulfilled. So if you have Maria you can see we have an R in the middle.
So in the left side we have two characters on the right side we have two characters
doesn't matter the main thing we have an R somewhere. So this going to be fulfilling
the condition. Now moving on to better we have an R at the end and
(2:41:18) that is totally fine cuz we say at the right side it could be anything. So we
have an R somewhere that's why it's going to fulfill the condition. Now we have
another case where we say Ryan we have an R at the start. So we don't have
anything before and we have after that like three characters which is totally
(2:41:34) fine. So we don't really care about the position of the R. It is totally
acceptable to have an R anywhere. And if you have only an R that is as well good
enough. You don't have anything before. you don't have anything after and that's
okay. But if you have a word like Alice, we don't have any R inside it. So that's
(2:41:50) why this is the only case where you say we don't have here an R and it's
going to remove this value from the results. And this way of searching of something
is very famous. You don't care about the words before this word and after the word,
right? So if you are searching for any word, you're going to say percentage
(2:42:06) before and percentage after. Now I know that we want to practice with the
underscore. So let's say that I have two underscores and then the character B and
then a percentage. So here what I'm saying there should be something in the first
position. There should be as well something in the second position. Then
(2:42:22) the third position should be the character B must be exactly at this position
and after that it could be anything. So we really don't care. I know this is a little bit
complicated. Let's have an example. So we have the value alert. Now we can see
the first position we have something the A. Then the second position we have as well
(2:42:39) something the L. So so far we are good at the pattern and then the third
position we have B. So we have complete match and the rest the ERT whatever. So
with that Albert is matching our pattern. Moving on to the next one rope. You can see
the first character we have something which is good. We have the R.
(2:42:54) Then the second character we have an O. So it's not empty. We have
something. And then the third one we have exactly B. And after that we don't have
anything which is fine. So again this value going to fulfill the condition. So moving on
to the next one. So it start with an A. So we have something in the first
(2:43:11) position. The second position we have as well something the B. But now
the third character it is a problem. It is not P. We have an E. So that's why it is not
following our pattern. And is going to go and remove it. Now moving on to last
example we have an A and an N. So in the first position we have something. The
(2:43:26) second one as well. But the third one we don't have anything. We don't
have a B. So that's why it's going to be removed. So my friends I know that was a lot.
This is exactly how you build a pattern for the like operator using the percentage and
the underscore. But the percentage is more famous. So this is
(2:43:41) exactly how it works. Let's go back to scale in order to have some
examples. All right, let's start with this task. Find all customers whose first name
starts with a capital M. So let's go and start searching for those informations. We're
going to start as usual. Select star from customers. And now we have to
(2:43:56) go and build the filter logic. So we're going to say where. Now we are
searching something in the first name. So we're going to say first name. So that
means it is very important to start with an M and then the rest it doesn't matter. So
we're going to use the like operator in order to search. And we're going to have
(2:44:13) our single quotes and we're going to start with the M. And it doesn't matter
what comes after that. So for us it is very important that the first character is an M.
Let's go and execute it. And with that we got our two customers Maria and Martin.
And both of them starts with an M. So with that we have solved the
(2:44:30) task. It is very simple. Now we have the following task. Find all customers
whose first name ends with an N. So let's go first and select all the customers here.
And we need all those customers where they are having an N at the end. So we
have John and as well Martin. So how we going to do it? The same thing where
(2:44:49) first name like since we are searching but here we're going to change the
expression. So it must ends with an N as a last character. So before that it doesn't
matter whether it is the first character. So it could be anything but the last character
of the word should be an N. So that's it. Let's go and
(2:45:05) execute. And with that we got John and Martin because the last character
is an N. It is very simple, right? It is all about where we're going to place this
percentage. Okay. So now we have the next task. Find all customers whose first
name contains an R. So here we don't have like specifications whether
(2:45:22) it is at the start or at the end. Somewhere there should be an R. So if you
go and execute first without any wear condition you can see here for example Maria
we have in the middle somewhere an R George George as well Martin and Peter at
the end. So we have a lot of names with an R. So how we can search for that? We're
going to stick
(2:45:40) with the where first name like and here our character going to be an R and
we're going to put before it and after it a percentage. So it doesn't matter what is
before it or after it somewhere there should be an R. So let's go and execute it. And
with that we got all our customers where somewhere we have an R.
(2:45:57) As you can see it is very simple. If you put it before and after then you are
open for more results. And this is usually used a lot in order to search for a value
inside your database. All right. Now we're going to move to a funny one. It kind of
says find all customers whose first name has an R in the third position for some
reason. I
(2:46:15) don't know why. So let's go and execute our customers here without any
filter. So it is for us very important to find the customers where in the third position we
have an R like here for example Maria the third character is an R which is okay but
with Peter over here it is not the third character so it is
(2:46:31) not fulfilling the condition. So how we going to write that? It going to say
like this where the first name like but we have to write it now from the start. So the
first position going to be an underscore the second position going to be as well an
underscore and now in the third position going to have an R. So
(2:46:47) with that we make sure the third position and an R and before it we have
two positions and now afterward it doesn't matter what comes after that it could be
nothing or characters. So if you go and execute it like this we will get Maria and
Martin and we will not get Peter because the R is not in the third
(2:47:04) position. So now if you don't do it correctly with the underscores let's go
and remove one of them and execute. You will get nothing because we don't have
any first name where the second position is an R. So you have to be very careful
with this. All right my friends. So this is how you search inside your values.
(2:47:20) And with that we have covered all different groups of operators that you
can use inside a wear clause. So with that you have learned how to filter your data
using multiple operators that you can use inside the wear clause. So you can filter
anything now in SQL. Now we will move to very interesting topic. You
(2:47:37) will learn how to combine your data from multiple tables. And here we have
two main methods. The first one is SQL joins and the second set operators. And they
are really big topics. So we're going to first focus on the SQL joins. And here we
have a lot of things to cover. So now we are talking about the core of SQL. So
(2:47:54) let's go. All right. So now we have two tables, table A and table B. And the
big question here is how to combine those two tables. What do we want exactly? Do
you want to combine the rows or the columns? And now if you say I would like to
combine the columns then we are talking about joining tables. So we're
(2:48:16) going to use joins in SQL. So now let's say that we are joining the table A
with the table B and we start from the table A. So SQL going to take the columns and
the rows of the table A and SQL going to call it the left table because we started from
there and then we join it with the table B and SQL going to call
(2:48:34) the second table as the right table. And here what's going to happen? and
SQL going to take the columns and the rows from the right table and put it side by
side with the columns and rows of the table A. So we are like combining the columns
we are putting them side by side. And now if you say you know what I
(2:48:50) don't want to do that I would like to combine the rows both of the tables
having the same columns. I just want to stack them. So we are now talking about
another methods. It is called the set operators. So here there is like no left and right.
So since we started with the table A, the SQL going to take the
(2:49:06) columns and the rows of the table A and put it in the results. And then it's
going to go to the second table, table B and it's going to take only the rows and put it
below the rows of the the table A. So we are putting the rows beneath each others.
We are doing like appending. So that means as we are using
(2:49:21) the set operators, we are combining the rows. Our table going to be longer
but with the joins we are combining the columns side by side and we are getting
wider table. But now for each methods there are different types. So now for example
in order to do the joints we have four very famous types. We can do
(2:49:38) an inner join, full join, left join, right join. But of course there are more than
that but those are the basics. And for the set methods we have as well types. We
have the union, union all except and intersect. And for each methods there are like
different rules. In order to join the tables we have to
(2:49:55) define the key columns between the two tables. Don't worry we're going to
learn about that later. This is the requirement in order to join tables and the
requirement of combining tables using the set operators the tables in your query
should has the exact same number of columns but here you don't need any like key
in order to combine
(2:50:12) the tables. So guys if you look at this in order to combine two tables first
you have to decide do I want to combine the columns or the rows. So first you have
to decide in the methods and after that you have different types on how exactly
you're going to go and combine the data and of course there are rules that you
(2:50:28) have to follow. Now, of course, we're going to go and cover everything in
the course, but now in this section, we're going to learn how we're going to combine
the tables using the SQL joins. So, we're going to go and dive into this word. All right.
So, now what is exactly SQL joins? Now, let's say that we have
(2:50:46) two tables. On the left table, we have the customer name. So, we have four
customers. And on the right table, we have the country informations about the
customer. And now we would like to query both of those informations the names and
the countries. Now in order to query those two tables in one query first we
(2:51:02) have to connect them. And in order to connect those two tables we need a
key a column that exist on the left and on the right sides. And by looking to this the
common column here is the ID of the customer. Now once we connect those ids
together we will be able to query those tables together and SQL going to start
(2:51:21) matching those ids. So for the ID number one, we will get the name Maria
and the country Germany. And the ID2 is connecting John to USA. And now you can
see the ID3 is not connectable. So we cannot connect it to the right side. But for the
ID4, we can use it in order to connect Martin to Germany. So this is
(2:51:40) exactly what happens if you join two tables. You connect those two tables
using a common column, a key like the ID. And once we have matching value, we
can connect the two rows together. So this is what we mean with SQL joins. Now you
might ask why do we need actually joins? Well, the first and very
(2:52:01) important reason is to recombine your data. So now usually in databases
the data about something like the customers could be spreaded into multiple tables.
Like we could have table called customers, another one where we have the
customer addresses and a third table where you can find the orders of the
(2:52:18) customers and maybe another one where you can find the reviews of the
customers. So as you can see the data of the customers is spreaded into like four
tables. Now how about I would like to see all the data about the customers in one
results. So I would like to see the complete big picture about our
(2:52:34) customers. What we can do, we can go and connect those four tables
using the SQL joins. And once we do that in one query, I will be able to combine all
those tables in one big results. And this is the most important reason why we use
SQL joins in order to combine all the data about specific topic in order to see the
(2:52:54) big picture. Now, another reason why we use SQL joins is to do data
enrichment. It is where I want to get an extra data and extra information. So let's say
that you are querying the table customers and this is your main table the master
table. So you are able to see all the data that you need but sometimes what
(2:53:12) happens you would like to get an extra information from another table like
for example the zip codes of the countries. So you would like the help of another
table we call it a reference table or sometimes lookup table where there is like one
extra information that you would like to add it to your master
(2:53:28) table to the primary source of your data. So now what we can do we can
join those two tables in order to enhance our table. So we are getting one extra
relevant informations for the customers and this process we call it data enrichments.
I'm getting an extra data for my main table. So this is another
(2:53:44) reason why we use joins. All right. So now so far we have used joins in
order to get the data from two tables. But now there is another use case for the SQL
joins. We use it in order to check the existence of your data in another table or
maybe as well the not existence. So let's say that I have a table called
(2:54:03) customers and I'm working with this table and doing queries. But now I
would like to check something. I would like to check whether our customers did order
something. Now in order to check that I need the help of another table for example
the table orders. So that means I'm using the table orders only for my
(2:54:19) check. So I don't want to get any extra data from the orders in my final
results. I'm just using the table orders and we call in this table a lookup. So now what
we can do we can connect those two tables together. And now based on the
existence of the customers inside the second table the orders either the
(2:54:36) customer going to stay in the final results or going to be removed. So that
means I'm filtering the data based on the join. And of course I can check as well the
net existence. I would like to see in the final results all the customers that didn't order
anything. So it is the same scenario. So my friends,
(2:54:53) those are the main three reasons why you use SQL joins. First, if you want
to combine the data from multiple tables in one big picture. So I use join in order to
get the data from different tables. The second use case, you are working with one
table but you would like to get an extra information from another table.
(2:55:10) So you are doing it like something called data enrichments. And in the third
scenario, we don't want to combine the data. We want just to join it with another
table in order to do a check to check the existence of your records in another table.
So this is why we need joins in SQL. Now there is like a lot of
(2:55:31) different possibilities on how to join tables, how to join the data. Now in
order to make it easy to understand, we're going to visuals as like two circles. So we
have the table A and a table B. The table A is on the left side. We call it the left table.
And the table B going to be on the right side
(2:55:48) and we call it the right table. The side of the tables is very important. Now if
you combine those two circles, you will get three different possibilities. The circles
going to overlap. And here exactly where we can have the matching data between
the two tables. So the data is available on the left and on the
(2:56:04) right. Or another possibility you want to get all the data from one of the
tables. So you can get all the rows from one circle. And the third possibility you want
to get only the unmatching data from one table. So if something exists in one table
but not in the other table then we call it unmatching data. So
(2:56:21) those are the three scenarios that you have to ask yourself once you are
combining tables and this can generate a lot of join types. So here we have like
basic SQL joins those are the classical one and here depends on the scenario
whether you want only matching all or all the rows from either left or right
(2:56:38) and we have advanced SQL joins where we focus on the unmatching data.
Now we're going to go and cover all those types one by one. So we're going to start
first with the basics and the first option that you have is to get all the data without
joining tables. So let's see what this means. So what do we mean with no join? Well,
(2:56:58) we want to returns the data from two tables without combining them. So
actually this is not a joint type because we are not combining anything. We just want
to query the data from two tables. So that means from the table A we want to see all
the rows everything and from the table B we want to see everything as well all the
rows. So that
(2:57:17) means we want to see two results and there is no need to combine them.
So let's see the syntax of that. So all what you have to do is very simple. Select star
from table A and then semicolon and then start another query. Select star from table
B. So that's it. And of course since we are not combining
(2:57:34) the data there will be no join in the syntax. So that's it. Let's go to SQL in
order to do that. Okay. So now we have the following task. It says retrieve all data
from customers and orders in two different results. So that sounds that we don't have
to go and combine the tables together. And all what we can do
(2:57:50) is the following. We can go and select the data from the first table like this
and then we make another query for the second table the orders and we don't have
to go and combine them in one big query. We just use a very simple select
statements in order to retrieve the data. So if you go and execute it since
(2:58:09) you have two separate queries you will get two results and with that in one
result you will get all the customers and in the other result you will get all the orders
and the data is not combined at all. So this is how you query two tables without
combining them. So with that we are getting all the data without
(2:58:25) joining the tables. Now we're going to start talking about the first type of
join the inner join where we start combining the data from two tables. So let's go.
Okay. So now what is exactly an inner join? So this type going to return only the
matching rows from both tables. So that means we will see in the output
(2:58:46) only matching rows. So now what do we need from the left table? We want
only the matching data. So we will not get the whole circle of A. We will get only
where we have an overlapping with the table B. So we want to see the data from A
only if it exists in the table B. And now what do we need from the table B?
(2:59:04) Exactly the same thing only the matching data. So that means I don't want
to see all the data from B. I want to see only the data in B that has a match from the
table A from the left side. And with that you will get only the matching data from both
tables. Now let's see how we can write that in SQL. So it is a usual
(2:59:21) query and always we start with a select. So we select for example all the
columns from and here we specify the table name. So it's going to be a. So so far
nothing new. But now we want to add as well the table B in the same query. In order
to do that we use the keyword join and then we say table B the name of the table.
(2:59:39) And since we have like different types of joins in SQL, you can specify the
type of the join before the keyword join. And if you don't specify anything, the default
type is inner join. But my friends, the best practices is always mention the type. I
don't like to skip the defaults because in projects maybe
(2:59:57) not everyone is aware of the defaults. So don't skip that. Always specify the
type. So now what we're going to do, we're going to put the keyword inner before the
join. And with that SQL going to know how to deal with the rows between two tables.
But still we are not done there. We have to tell SQL how to
(3:00:13) combine the tables. And with that we use the keyword on. And after that
you specify the join condition. And as we learned in order to join two tables we have
to find out a common column in order to match the data. Right? And usually in scale
they are the keys or ids. So the condition can be like this.
(3:00:31) the key from the table A must be equal to the key from the table B. So this
is the join condition and using this join SQL can go and start matching the data from
the left table and the right table. And there is one thing that is very important while
you are joining the tables you have to understand about the
(3:00:49) order of the tables in your query. Now in the inner join the order of the
tables doesn't really matter. So whether you start from A or you start from B it doesn't
matter because you will get the same results. Both of the tables has the same
priority and it doesn't matter where we start whether we say from A
(3:01:05) join B or we say from B join A we will get the exact same results. So in the
inner join you don't have to worry about the order of the tables. So that's all about the
inner join. Now let's go back to scale in order to practice. Okay. So now we have the
following task and it says all customers along with their
(3:01:22) orders but only for customers who have placed an order. So my friends that
means we need the data from the customers and from the orders from two tables
and we have to put everything in one results. That means we have to join two tables.
Now let's go and do it step by step. So we're going to go and say
(3:01:38) select star from customers and then we have to go and join it with the
orders. We're going to say join orders. Now you have to go and specify the join type.
Is it inner, left, full and so on. Well that's depend on the task. It says we want all
customers but only for customers who have placed an order. So
(3:01:57) there is like condition right here. We don't want to see everything from the
customer. We just want to see only the matching data only if the customers has an
order in the orders table. And for that we can go and use the inner join. Of course if
you can leave it like this you will get the same effects but I'm
(3:02:14) going to go and specify it like this inner join just to make it clear. We are
speaking about the inner join. And after that we have to go and specify the join
condition. So we have to go and find a common column between the customers and
the orders. So how I usually do it I go and explore both of the tables. So I'm
(3:02:29) going to go and select everything from customers and as well everything
from the orders. So let's go and execute. Now we're going to start searching where
do we have a common column between those two tables. So we have the from the
first table first name, country score and you don't find any of those informations in the
second
(3:02:48) table. The only one is the ID. So the ID of the customer and the ID of the
customer you can find it in the orders the second column here. So this is the
common column between those two tables. And usually in databases we create ids
exactly for this in order to connect tables. So it's really rarely that we're
(3:03:05) going to use like a country or score or first name in order to join tables. We
usually use the ids. So let's go back to our query and use those two columns. So it's
going to be the ID from the customers equal to the customer ID. So that's it. With that
we have the condition we have decided on the type
(3:03:23) and we can go and execute it. Now you can see we are getting only three
customers. Right? If you don't apply the inner join we can see that we have five
customers. So that means actually we have two customers without any orders any
matching data from the other table. And as well you can see very nicely we
(3:03:41) have now not only the columns from the customers but as well all the
columns from the orders side by side. So with that we have combined the data and
as well with that we have solved the task but we will not leave our query like this
because it is not really good practices. What we have to do is to go
(3:03:56) and select only the columns that really make sense in our query because in
many cases in your tables you will have a lot of columns that is not needed like for
example if you check here you see we have the customer ID here and as well the
customer ID over here. So it's like repetition and it's enough to see it
(3:04:12) only once. So what you have to do is to go and pick few columns that we
want. For example, I'm going to start with the ID maybe the first name and that's all
from the first table. Let's go and get the order ID and I don't want the customer ID
again. So from the second table I'll get add the sales. So let's
(3:04:28) go and execute it. And with that you can see very nicely the customer's
name and their orders with the sales. And now comes something very important.
Sometimes if you have two tables you might have columns that having the same
names. Like imagine the order ID in the table orders it's called ID. So that
(3:04:44) means we have the same name in both tables and this kind of makes SQL
very confused. And here you will get an error tells you I really don't know what do
you mean with the ID. Is it from the table customers or from the orders? So we have
to tell SQL exactly from which table did this column come from. So in
(3:05:00) SQL in order to do that what we do before the column name you write
again the table name the customers and then you make a dot and now we are telling
SQL this column the ID it comes from the table customers and SQL will not be
confused about it and it's going to go and get the ID from the customers. And
(3:05:18) for the second id you can go over here and as well before it you say orders
do id so that knows okay this ID come from the orders and the other one comes from
the customers and it is always good practice especially if you are joining tables to
always assign for each column a table because after a while if you
(3:05:36) open your query and you see okay the sales does the sales come from the
customers or the orders and if you have a long list of columns it's going to be really
confusing so that's why we consider it best practices if you always assign for each
column the table name especially if you are doing joins. So
(3:05:52) it's going to be like this. But of course if you have like only one table it's
clear that all the columns in the select comes from this table. But since here we are
dealing with multiple tables it is good to show it like this. And of course here we don't
have the ID. We have the order ID and the same thing for
(3:06:05) the join condition. So the ID from here comes from the customers and the
customer ID come from the orders. So now it is clear for everyone which column
come from which table. But now you might say you know what each time I have to
write the customers this is very long name and sometimes in real projects
(3:06:23) you're going to see tables that has really long name and it's going to be
really annoying to add it each time before each column right so instead of that we
can go and assign aliases for the tables but only for the columns so usually we go
over here and say as and maybe you can go and use only one character like the first
character C.
(3:06:41) And now instead of saying customers you can go over here and say C. The
same thing for the second column and as well over here. And you can use now the C
in everywhere in your query. The same thing for the orders. You can go over here
and say has O. And now instead of orders you say O on here. And now it is very
easily to
(3:06:59) see those two columns comes from the C that means the customers and
those two columns comes from the O the orders. Those are the best practices as
you are joining tables together in SQL. And of course with that we have solved the
task. And about the order of the tables, it doesn't matter where do you start. So
(3:07:15) for example, if you take the orders here and put it in the join and get the
orders in the from. So I just switch the tables and execute it, you will get the exact
same results. So if you are doing inner join between two tables, don't worry about the
order of the tables. Okay. So now let's go and instant
(3:07:32) exactly how executed the inner join. Okay. So now again here we have our
query. Then we have the two tables customers and orders. And here we have the ID
where we are joining the data. So this is the ID from the table customers and this is
the customer ID that we have in the orders. Now let's see how SQL can
(3:07:48) execute this. So we are saying I would like to see the ID and the first name.
So we will get the ID, the first name from the table customers and we would like to
get the order ID and as well the sales from the table orders. So our result going to
focus on those four columns. Now the data should be joined
(3:08:06) between those two tables using the inner join and SQL going to start from
the left table from the customers because we say from customers. So it's going to
start matching the ID from the left table with the right table. So it's going to say okay
is there a match from the first record from the first order?
(3:08:21) Well yes it is the same ID and then SQL going to say okay that condition is
fulfilled and we are allowed to see the data. So the data will be presented in the
output. So we're going to have the ID Maria and the order ID from Maria and the
sales of this order. So there is a match. Then SQL going to go to the
(3:08:41) second record. Well, we don't have a match. The third we don't have
match. And so on for the last one. So we have only one match for this ID. Then SQL
going to go again to the customers and pick the second one and start matching
again with the first order. Do we have a match? Well, no. Then it's going to go
(3:08:57) to the second. Well, now we have a match. So SQL going to be happy. the
condition is fulfilled and we will see the results. So we're going to see the first name
and as well the order information for this customer in the output. It's going to keep
searching. So we don't have a match as well here. So that's it. Now for the third
customer as
(3:09:15) well from the start there match no to the second to the third and here we
have a match. So it's going to go and show this informations since there is a match.
So the customer three George with the order from this customer order ID and the
sales as well in the output. Now it's going to go and keep continuing the
(3:09:33) search. Well, we don't have any match. Then it's still going to go to the
fourth customer and start matching. Do we have here an ID? Do we have here a
match? Well, no. Then the second, third, and fourth. We don't have any order for this
ID. There is no match at all. And since we are saying inner join then SQL
(3:09:50) will not allow to show the data of this customer in the results. There is no
match and SQL going to totally ignore this customer. Then we're going to go to the
last one and start as well matching this ID with the orders. Well, there is no match as
well. SQL going to go and exclude this user from the results. So
(3:10:07) this is exactly how the inner join works. it start from the left side and start
matching the data on the right side and only if there is match the result going to be
presented in the output and this is exactly why we are getting this results and how
the inner join works. So now if you look again to the reasons why we are joining
tables we
(3:10:25) can say we can use the inner join in order to recombine the multiple tables
into one big picture. So the first use case and as well we can use the inner join in
order to filter the data. So since we are saying only the matching data that means we
are filtering the data we are checking the existence of
(3:10:42) the records in another table. So you can use inner join either to combine
data from multiple tables or you can use it as well only for filtering purposes only to
check the existence of your rows. So this is usually the two use cases of inner. All
right. So that's all about the first type the inner join. Next
(3:10:59) we're going to talk about the left join. So we're going to focus on the left
side. So let's go. Okay. So now what is exactly left join? This type going to returns all
the rows from the left table and only the matching from the right table. So now if you
look again to our two circles A and B. What do we need from the left table?
(3:11:20) We want to see everything all the rows all the data. So that means we will
get a full circle. And now from the right table we want to get only the matching data.
So that means we don't want to see everything from the table B. We want to see only
the records that has match to the table A. So that means my friends
(3:11:38) the left table has here more priority. This is the primary source of your data.
The main source we cannot miss anything. This is very important. We want to see all
the data. But from the table B, it is a secondary source of data and we are joining it
only to get an additional data. So I don't want everything. I want
(3:11:56) only the data that has matched to the lift table. So this is what we mean
with a lift join. Now if you look to the syntax it's going to be very similar to the inner
join. So we start from the left table the A. Then we say left join the right table B and
then the same condition using keys. So here we just
(3:12:12) switch the type. Instead of inner we have now left. But now here with the
syntax we need to be very careful. The order of the tables now is very important. You
have to start from the correct table. So you have to mention the left table exactly in
the from clause and then you join it with the right table. So in the join you have to
(3:12:31) specify the right table. If you don't do it like this then you will not get all the
data from a and you will not get the results that you are expecting. So this is what we
mean with the left join. Let's go back to scale in order to practice. All right. So now
we have the following task. It says get all
(3:12:46) customers along with their orders including those without orders. So again
here we need the data from two tables the customers and orders and we want
everything in one result. So that means we have to go and join the data. And now the
task says includes those without orders. So that means I want to see
(3:13:03) everything the matching data and the unmatching data from the table
customers. And by looking to our query this is not working because we are not
getting everything right. We are getting only the customers that has match in the
table orders. And this is not of course fulfilling the task. So now if you read
(3:13:19) the task you can understand the main table here is the customers. We are
not speaking about to see all the orders and not missing any order and the orders
here is only for additional informations. So now in order to not lose any data for the
customers we make sure we start from the table customers. So that means now the
customers on the
(3:13:37) left side and now after that instead of inner join this is not good thing for
this task. We're going to say left join and with that we guarantee we will get all the
data from the customers. Now we say left join orders and of course the condition
going to stay like this. This is how we are connecting the two tables.
(3:13:54) So actually that's it. Let's go and execute it. And now by looking to the
result you can see that we have now five customers even the customers that didn't
place any orders. So you can see Martin and Peter they don't have any order ID. So
that means they didn't order anything. And as you can see is showing
(3:14:11) us nulls when there is no match. So with that we have solved the task. Now
my friends one more thing as I told you the order of the tables is very important
because the customer is now the left table because you start from it and the second
table the orders is the right table. Now if you go and switch them
(3:14:28) like this. So we start from the orders and then join it with the customers and
you go execute it you will not get all the customers and of course the task is now not
solved. So as you can see you are getting now completely different result if you go
and switch the tables. So be careful where you start and how
(3:14:44) you join the tables in order to get the effects that you want. All right. So
now I'm going to put everything back like before. Now let's go and understand how is
exactly executed this query. Okay. So now again we have the data from customers
and orders and this time we are doing the lift join. So now let's
(3:15:00) see how is going to do it. So going to say okay we need the ID and the first
name and we will get that as well in the results and from the right table we need only
those two informations the order ID and the sales in the output. So those are the
columns that we need. So now SQL in the left join going to do it a little
(3:15:16) bit differently. It's going to start as well from the lift table from the
customers. But this time going to go and immediately put the result in the output
without like trying to match anything and to check whether the data exist or not
because it doesn't matter not doing any validation whether the customer
(3:15:33) exist in the orders. Since it's lift join is still going to show all the data from
the lift table. So there will be like no check. But now as a next step in order to get the
order ID and the sales SQL will start searching. So SQL going to go over here and
start searching where do we have a customer with this
(3:15:49) ID? Well, it's going to be the first order. We're going to get the order ID and
as well the sales informations and we will see that in the output. So that's it for the
first one. Now it's going to go to the second row and the same thing going to happen
immediately. The SQL going to go and put the result
(3:16:04) in the output without checking anything. And then in order to get the order
data, it will start searching for this ID. So we have it here in the second row. We have
the order ID and the sales. And it's still going to put those results to the output. So
the search for the third one immediately going to put everything
(3:16:22) in the output. And then start searching for orders with this ID. We have it
over here. So this order belongs to the user ID number three. So far we are getting
the same result as the inner joint. But we are not done yet. Now exactly count the
difference this guy going to go and get Martin and put it immediately in the
(3:16:40) output and start searching for an order with this ID. So do we have any
order with the ID number four? Well, we don't have anything this time. SQL of course
will not go and exclude the ID number four. It's going to leave it. But in SQL if there is
no match, we still have to have something in the output. So SQL
(3:16:58) going to go and say the output going to be null like this. We don't know it is
unknown. And the same thing for the sales. So in the lift join if there is no match you
will see nulls. The same thing for the next customer for better. So SQL will go and
put the result immediately in the output and then start
(3:17:16) searching the orders. So do we have anything for the ID number five? We
don't have anything. That's why SQL going to go and present nulls as well in the
output. And that's why you saw nulls in the output because those customers don't
have any orders. So this is exactly the effect of the lift join. you
(3:17:34) will get everything from the lift table and only the matching stuff on the right
side and if there is something not matching you will get nulls. So that's it is this is
how scale execute the left join okay so now back to this use cases of joins if I think
about lift join I can use it in order to recombine data in
(3:17:50) order to build this big picture and as well in the second use case where we
use it in order to get an extra information from another table. So we have a main
table and secondary table. So we use it for both use cases and as well in the third
use case only with a twist that we're going to learn later. So that's
(3:18:07) all about the left join. Now we have another type that is exactly the
opposite of the lift join. We have the right join. So now let's understand what this
means. Okay. So now what is exactly right join? This is the total opposite of the left
join. So this tag going to returns all the rows from the right
(3:18:27) table and only the matching from the left table. So here the main table the
main focus is the right table. So SQL going to get you all the rows everything from
the table B the right table but from the left side we will get only the matching data. So
that means in the left sides you will get only the data that
(3:18:45) has a match on the right side and with that the right table going to be the
primary the main source of your data. So it is very important table but the lift table is
not that important. You are just joining it in order to get additional data. So again
about the syntax it's not that crazy. All what you
(3:19:02) have to do is to change the join type. So instead of left you say right join
and again here the order of the tables is very important because the side here
makes a difference. So we start from the left table A and then right join it to the table
B. So it sounds very similar to the left join. We are just switching
(3:19:20) things. Now let's go back to scale. in order to practice. Okay my friends, so
now we have the following task and it says get all customers along with their orders
including orders without matching customers. So again we have the customers and
the orders and we are doing the join but here the condition is
(3:19:36) different. We want to see all the orders even if they don't have a matching
customer. So that means I would like to see everything from the table orders and the
customers table here is only like supporting and helping. So the main table that we
are focusing on is in the orders. We want to see everything and
(3:19:53) from the customers only the matching and if you are looking currently to
the results you can see we are seeing only three orders right but in the original table
if you go back over here you can see that we have four orders. So we are currently
using this query not seeing all the orders. So now how we going to
(3:20:11) solve it? If you start from the table customers you can say you know what
instead of left join we're going to say right join. And with that you're going to
guarantee you will get everything from the table orders. But now the left table the
customers is not that important and you will see the data of
(3:20:26) the customers only if there is a match. So doing the right join like this
guaranteed to see everything whether there is match or no match. Now if you go and
execute it you can see on the right side the order ID and the sales and we can see
now all the orders and on the left side the ID and the first name.
(3:20:42) We are seeing only the customers if they did order something. And for the
orders without a known customer, we are getting nulls. So with us, you have solved
the task using the right join. So now my friends, you have to go and solve this task to
get the exact same results. But you are allowed to use only the left
(3:20:59) join. So you are not allowed to use the right join. So now go pause the
video, solve the task and meet you [Music] soon. Now my friends, in SQL there is
always alternatives on how to solve a task. So now if you want to get all the data
from B and only the matching from A, you can do it like we have done using
(3:21:20) the right join. But if you go and switch the sides and you make the table B
as a left table and the table A as a right table, you can do that of course in SQL. But
you have to switch the join type. So instead of right, we have to use left now since
the B table now on the left side and as well you have to switch the
(3:21:38) order. So you start from the B table and then you say left join the A table.
and of course the same join condition. And if you do that, you will get the exact same
result as the left query. So if you just switch the tables and as well switch the join
type, you can get the same results. And to be honest, my
(3:21:56) friends, I don't like the right join. It's just in the last 10 years, I always tend
to start from a table and then use a left join. And from my point of view, the left join is
way more famous than the right join. And I think I never used a query where I'm
using a right join. So my advice for you always try to skip the
(3:22:14) right join and stick with the left join just get the order of the tables in the
query correct and you will get the same results. So with that you know an alternative
for the right join. Now all what you have to do is to go and switch the right to left. Uh
this is not enough because if I go and execute it. So now
(3:22:29) all what I have to do is to go and switch the tables like this. So we start
from the table orders because I want to see everything from the orders and then lift
join it with the customers. And of course we don't have to change anything here. It
doesn't matter the order because we have an equal operator here.
(3:22:45) What is very important here is where you start from which table and what is
the table that you are joining with. So if you go and execute it, you will get the exact
same results. So now I'm seeing all the orders. I'm not missing anything and only the
matching customers. And I prefer this way solving this task
(3:23:03) instead of using the right join. All right. So that's all about the right join.
Next we're going to combine everything. We're going to talk about the full join. So
let's go. Okay. So now what is exactly a full join? If you use it, SQL returns everything
all the rows from both tables. So now if you check again our
(3:23:23) circles from the left table, we want to get everything all the rows. So you
will get the whole circle and as well from the right table you want to get everything all
the rows the whole circle. So that you want to get everything the matching the
unmatching all the data from left and right. Now let's check the syntax. It's going to
be
(3:23:41) very simple. The joint type here going to be a full join. And the full join it is
very similar to the inner join. You remember the order of the tables is not important at
all. So there is here no main table and secondary table. Both of the tables are
important and it doesn't matter in your query where you start.
(3:23:58) You can start from A full join B or you can start from B then full join A. you
will get the exact same results. It sounds simple. Let's go to SQL and practice the full
join. All right. So now we have the following task and it says get all customers and all
orders even if there is no match. So now again
(3:24:14) we need the data from customers and orders. But now of course which
type we're going to use? It says even if there is no match but it didn't say no match
from orders or customers. So you can understand from this task we are not focusing
only on the orders or the customers. Both of them are equally important and we need
all the data. So
(3:24:32) that means we need all the data from left, all the data from right and we
can go and use the full join. So now we have this query over here. We are starting
from customers and then joining to orders. But now instead of having left, we're
going to say full join. So now let's go and just execute it. Now if you
(3:24:48) are looking to the left side, you can see we are getting all the customers,
right? So we have our five customers and if you are looking to the right, you can see
all our orders. So with that we have everything from left and everything from right
and the matching data is just side by side in the results and if there is
(3:25:05) no match we are getting nulls. So actually with that we have solved the
task and again it doesn't matter how you start. You can start from the orders and
then join it to the customers and you will get the exact same results. So you are
getting exactly the same data. Now let's go and understand exactly how is
(3:25:20) executed the full join. Okay again we have the data of the customers and
the orders and our full join. So now we're still going to identify those columns that we
want to see in the results. So the ID and the first name, the order ID and the sales
informations to the output. Now it's still going to start
(3:25:36) from the left table since it is started with the customers. It's still going to
take simply everything from the left table and present it in the output. Since it is full
join, we want to see all the data from the left side. And now start searching for
matches from the right table. So let's start with the
(3:25:50) first customer. And as usual, we will get the order from the customer
number one. And the same thing for the second customer, we have as well here
match. So we will get as well. It's like that lift join. And for the third one, we have as
well a match. And we're going to have it like this. And since we don't have
(3:26:07) orders for those two customers, we will get as well nulls in the outputs. So
scale going to mark it with null. The same thing over here. And as well for the last
customer. So we will get nulls for those two customers. And now of course SQL will
not stop here otherwise we will get a left join effect. Now SQL
(3:26:24) going to start looking at the right side to find any order that is not in the
output. So SQL going to see okay the first order is in the output. The second one is
as well in the output. The third but the fourth one is not in the results. So SQL going
to take this result and put it in the output. So this
(3:26:41) order has no match at all from the left side. And with that if you are looking
to the right side you can see SQL going to be happy because we have all the orders
from the right table. And of course SQL will not leave it like this. Instead of that SQL
going to show nulls on the left side. So there is no ID and
(3:26:58) there is no first name. So this is exactly why we got this results. And this is
how SQL executed the full join. Okay. Okay. So now if you are looking to the use
cases I can say you can use the full join in order as well to recombine the data from
multiple tables if you don't want to miss anything from all
(3:27:15) four tables all data the matching and unmatching data but I don't use it
usually for data enrichment for the second use case and where we can use the full
join is in the last use case as well but with a little twist that we're going to learn later.
So this is mainly where we can use the full join. All
(3:27:31) right. So with that we have covered the basic types of joins inner, left, right
and full join. Those are the classical joins on how to combine two tables. Now we're
going to start talking about the advanced SQL joins. And now we're going to cover
the first part the lift anti- join. So let's see what this
(3:27:50) means. Okay. So now what is exactly a lift anti- join? Now in this
mechanism we want to return rows from the left side the left table that has no match
in the right table. So now by looking to our two circles from the left table we want to
see only the unmatching rows. So only rows that exist in table A but it
(3:28:13) don't exist in the table B. So if there is like matching data we don't want to
see it. And now from the right table we don't want anything. We don't want any data.
So that means the only source of your data going to be the left table. And from the
right table we don't need any data. We are just joining the tables
(3:28:29) to do a check to filter the data. So now for the syntax this can be
interesting. We don't have a special type called left anti- join. At least in the SQL
server we still can create this effect. Since we are saying left we can use the type left
join and then as usual the join condition with the keys. But now if you
(3:28:46) leave it like this you will get the effect of the lift join. And we don't want that
because with the lift join you will get the complete circle from the lift table. But now in
order to remove the matching data this overlapping in the middle what we can do we
can use a filter and in order to filter the data
(3:29:01) we use the wear clause. So now in order to get rid of the matching data we
can take the key from the right table and we say the key must be null. So if the key is
null so that means there is no match on the right side. And if you do it like this you
will get the effect of the left anti-join only the data in the left that
(3:29:21) has no match on the right. So now let's go in scale and create this effect.
Okay. So now we have the following task and it says get all customers who haven't
placed any order. So now by looking to this query clearly we are focusing on the
table customers but we want to see the customers that didn't order anything. So they
are in our
(3:29:39) database but the customers are inactive. Now there are like different ways
on how to solve this task but we're going to solve it using the joins. Now let's go and
start by just writing a very simple query where we are selecting everything from the
table customers. Now you can see this is our five customers. And now
(3:29:56) I want to check which of those customers didn't order anything yet. Now
since we are talking about the orders, we can go and join it with the table orders. So
we're going to say lift join the table orders as all and then we're going to go and
connect the tables using the ids with the customer ID. So now if you go
(3:30:12) and execute it now we are still seeing all the customers because we are
using the lift join and now we can see the orders informations of each customer and
you can see immediately those two customers didn't order anything because we are
seeing here nulls right so they are empty there is no orders now we can
(3:30:30) use this information in order to filter the data I just want to see Martin and
Peter so what you can do we can go and say where and all what you have to do is to
take the key that we are using in order to join in the tables this is this one over here
and say this must be null so is null so if you see it like this
(3:30:48) that means you want to see the data if the customer ID is null so let's go
and execute it perfect now you are getting the customers who haven't order anything
and this is exactly the effect that we wanted the left anti-join we are getting the data
from the left side where there are no match on the right side so you
(3:31:08) have always to do it in two steps first join the data as you normally do using
the classical joins the lift join and then the second step you go and use a filter using
the wear clause if you do it like this you can check for not existence and with that we
are getting the effect of the left anti-join so
(3:31:25) that's it okay so now if you are looking to this picture I think you already
know where we use the lift anti- join we're going to use it only in the last use case
where we are checking the existence so if you use the lift join together with the
where you can check for the notexistence of your data in another
(3:31:40) table so This is exactly for this scenario. All right. So that's all about the left
anti- join. Now we're going to speak about the exact opposite of that. We will cover
the right anti- join. So it's going to be very similar but we are just switching sides. So
let's go. Okay. So now what is exactly the
(3:31:59) right anti- join? Well, it is the opposite of the left anti- join. So we want to
return the rows from the right table that has no match in the left table. So again if you
are looking to our two circles. Now what is important is the right table. We want to
see only the unmatching rows from the right table. So only the rows that exist in B
(3:32:20) but not in A. And from the left table we don't need anything. So no data is
needed and that means the only source of data comes from the right table and you
are using the left table as a filter as a lookup just in order to check the existence. So
now the syntax of that going to be very similar to the left
(3:32:38) anti- join. So we don't have a special type called right anti-join. We have to
use the classical one the right join. But if you do that you will get everything from the
right table. And now in order to get rid of the matching data in the middle we use a
filter. We use the wear clause where we say we are
(3:32:53) interested only on the unmatching data. So we take the key from the left
table and we say the key from left is null. And if you do that you will get rid of any
matching data. Is null means there is no match. And again here the same thing the
order of the tables is very important since here we are talking
(3:33:10) about sides and you have to do it correctly. Okay. So now the task says get
all orders without matching customers. So now it is exactly the opposite. We want to
see all the orders that don't have a valid customer. So this is really bad scenario. You
have in your business orders without a valid customers. So let's see how we can
(3:33:29) discover that using SQL joins. Now as you can see we are focusing
completely on the orders. It's not the customers anymore. And we want to see only
the orders where there is no match with the customers. So now again here we have
two steps. The first step we're going to go and do the normal join. So using either
(3:33:43) the left or the right join. Now by looking to this query you can leave it like
this where you can start from the customers. But if you want to fully focus on the
orders you have to switch this from left to right. And with that you will get all the
orders and only the matching customers. And let's go and
(3:33:58) remove this workloads from here first. So I'm just adding comments. And
with that SQL going to totally ignore this line of code. So let's go and execute it. Now
you can see we are getting all the orders right and data from customers only if there
is a match. And now of course this is not the task. We don't
(3:34:14) want to see all the orders. We want to see only the orders where we don't
have a match from the customers. So if you look to this those three orders they are
okay. They are totally fine. We are finding customers for them. So they have valid
customers. But this order here is really bad. So there is no valid
(3:34:30) customer for this order and now our task to show only this type of orders in
the result. Now what we have to do we have to use the workclass in order to get
exactly the effects. So this time we're going to say if the ID of the customer here. So
here we're going to say the ID of the customer from the table customers
(3:34:46) must be null. So we're going to remove this here and take the key join from
the customer and we are saying this ID must be null. So let's go and execute it.
Perfect. With us we have solved the task and we are getting the effect of the right
anti- join and we are getting now those orders that don't have any
(3:35:04) customers. So we have solved the task. Now my friends you have to go
and solve this task without using the right join but still you have to get the same
effects. You want to get exactly those orders without customers. So pause the video
and go solve the task. [Music] Now again as you know me I don't like
(3:35:26) the right joins. We can create the same effects if you switch the sides of the
table. So if you say the B table now on the left side and the A on the right side then
we will get the same effect if you go and switch the type of join from right to left and
you go just switch the tables. So you start from the B table
(3:35:43) since it's on the left side and then join it with the A. And we still say of
course in our work condition where the data from A is null. So there is no match. So if
you do this you will get the exact same results like the lift query by using the lift join
and just switching the tables. So you will get
(3:35:59) the same results and with that you know that in scale we have always
alternatives. I hope that you are done. So it's very simple what you're going to do.
We're going to go and switch the joins and since the orders is the main table we're
going to start first from the table orders. So we are putting it
(3:36:14) on the left side and then the right table going to be the customers. And of
course the condition going to stay as it is. We want to see the orders where there is
no customer. So we don't have to switch anything here or in the join key. So let's go
and execute it. With that you are getting the same exact
(3:36:29) results. Since we are using here the star, it's always starts from the left
table and show the data from the right table. But still the result is valid. We are
getting this type of orders without matching customers. And I prefer this way. All right.
So now with that we have the left, the right and now of course
(3:36:45) what is next? We will get the full. So let's speak about now the full anti-join
in SQL. Let's go. Okay. So now what is exactly a full anti- join? Well, this time we
don't have sides. We want to return only the rows that don't match in either tables.
So what this means? If you are looking to the left circle, we want only the
(3:37:08) unmatching rows. So we don't want the whole circle. We want only the data
that exist in A but it don't exist in B on the right table. Sounds like the left ant join but
since we are saying full then you have to do the same thing on the right side as well.
So on the right table we want only the unmatching rows.
(3:37:24) So we want to see in the result the data that is in B but don't have a match
from A. So it's exactly the opposite. And if you look to this then that means we want
to see only the unmatching data and this is exactly the opposite effect of the inner
join. In the inner join we were interested only on the matching data
(3:37:43) only when there is like overlapping. But now with the full anti-join it is
exactly the opposite. We don't want to see the matching data. We want to see
everything else the unmatching data. So how we going to write this query? Again
here we don't have a special type called full anti-join. We will use the help of
(3:37:58) the classical full join. So the basic one. So you start from a full join b and
then the same key. But now what is interesting is about the where condition. Now we
have like two conditions right? So now in order to get all data from A that has no
match in B, you have to make a filter where you say the key from the B table must be
null.
(3:38:16) And now since we want the exact same thing from the right table, we want
all the data in B that has no match in A. You have to say as well the key from the A
table must be null. So now we have here like two conditions. And in SQL if you have
like two conditions in the work clause, you have here two options either
(3:38:33) use and operator or the over operator. So now the one that we're going to
use here is the or operator. So either the key from right is empty or the key from left
is empty. If you do it like this, you will get the effect of the full anti- join. And of course
since here both sides are equal then the order of
(3:38:50) the tables as well here is not that important. So you can say from A full join
B or from B full join A. It doesn't matter. So now let's go back to scale in order to
create this effect. Okay. Instead we have the following task and it says find
customers without orders and orders without customers. So if you
(3:39:08) are looking to this this means we want to see only the unmatching data
from customers and as well from orders. There is no main table and secondary table.
Both of them are equally important. So now since we are talking about the
unmatching data and the anti-join we have to do it in two steps. The first
(3:39:24) step we're going to do the classical join and then we focus on the wear
clause. So let me remove the wear clause to make it as a comment. Now since we
want the data from left and right, we're going to go and use the full join. So let's go
and execute it. Now you can see we are getting the effect of the full
(3:39:38) join. We are getting all the orders and as well all the customers. But now
we are interested only on the strange cases where they are like orders without
customers like this one here and as well customers without orders. So that means
the first three rows they are not really interesting for us because it is boring.
(3:39:55) We have here matching data and this is totally fine but we are not focusing
on that now. We are focusing only if there is like missing data from left or from right.
As you notice I'm saying or and this is very important because we're going to use the
or operator. So now let's focus on getting this scenario
(3:40:11) over here. We want to get an order without a customer. So that means the
customer ID must be null. And we have it already here. So we are saying where the
ID of the customer is null. So if I go and execute it, I will get only one records only
this one over here. But as well I want to get the opposite
(3:40:29) scenario. So in this scenario, the customer ID must be null. So we're going
to say or the customer ID in the orders is null or we can do it like side by side like
this. Either the right side is null or the left side is null. So if you go and execute it, you
will get the effect of the full anti-join. And with that we are finding
(3:40:50) the customers without orders and orders without customers. I think this is
really fun and as well really easy. So this is how we do the full anti- join. All right. So
now if you are looking to the use cases we use the full anti- join again exactly for the
last use case in order to check the existence. So if you
(3:41:07) combine the full with the where you can check the existence or the
notexistence of your data in another table. So this is exactly the scenario for that.
Okay, my friends, now we have a bonus section where I'm going to challenge you to
solve the following task without using an inner join. So, it says, "Get
(3:41:27) all customers along with their orders, but only for customers who have
placed an order, but without using an inner join." So, pause the video now and go
and solve this [Music] task. Okay, so now let's see how we're going to solve this. We
want the customers, the orders, blah blah blah. But we want only the customers who
have
(3:41:48) placed an order. Previously, we have used the inner join in order to solve
this task. But this time, we are not allowed to use it. So, let's go and solve it. This is
how I'm going to do it. Select star from table customers. Can't give it the alias. So,
now I'm getting all the customers, but I am interested only the customers who have
(3:42:05) placed an order. So, as we know before there's like two customers didn't
order anything, and we don't want to see them in the final results. Now how we will
get that? Well, we can use the help of the table orders in order to check the
existence of our customers there. And of course, I'm not allowed to use the inner
(3:42:21) join. So I'm going to go and use a left join with a table orders and then
combine them as usual. Nothing new with the customer ID. So now let's go and
execute it. As you can see, we are doing it step by step. You don't have to rush
everything in one go. So you start simple, check the results and decide on
(3:42:37) the next step. So now by looking at these results I want to get those three
customers because they have ordered something and we are seeing data about their
orders and I don't want to get in the result the last two. So again we still can use the
customer ID from the right table in order to decide which
(3:42:53) data going to stay in the result and which data should be filtered. We're
going to go and use the wear clause and then the key from the orders and this time
we're going to say is not null. I know we didn't learn yet about the not and the logical
operators but using the not null it means there should be data
(3:43:11) inside the column it must not be null if you do it like this and execute you
will get the exact effect as the inner join. So as you can see as you are joining the
tables using the left join you can control what you want to see using the wear clouds
using the filter and this is how you can solve this task without
(3:43:29) using an inner join. Okay, so with that we have covered all those three
scenarios in order to find the unmatching data. Left, right, full and joints. Now we can
speak about one crazy join. We call it the cross join. This one is totally different from
all other types that we have learned. So let's understand exactly what is the cross
(3:43:47) join. Let's go. So now what is exactly a cross join? Now in some scenarios
we want to combine every row from the left, every row from the right. So that means
I want to see all the possible combinations from both tables. So we are doing
something called like cartesian join. So now if you look at our two circles, we want
everything
(3:44:10) from A and as well everything from B. So that means I want to see
everything from A combined with everything with B. So in this example, we have two
rows in A and three rows in B. If you do a cross join, you will get six possible
combinations by just multiplying the number of rows between A and B. So be careful
using the
(3:44:29) cross join. If you use it, you will get like crazy number of rows in the results
and you're going to make the database really busy finding out the result for you. So
now about the syntax, it's going to be the easiest. So you start as usual from one of
those tables, the A for example, and then you say cross join B.
(3:44:44) So now my friends, if you look at this, you can see it's not like the previous
joins that we have done. We have always before talked about unmatching rows,
matching rows and so on. But here we don't care at all about whether the data is
matching or not. I just want to see all the possible combinations
(3:45:00) everything. So since we don't care about matching the two tables, we don't
have to specify any condition. So there is no need to use the keyword on because
we don't need any condition. So that's it. You just say cross join B and the magic can
happen. So this is a cross join. Let's go to SQL to try that. Okay. So
(3:45:16) now we have the following task. It says generate all possible combinations
of customers and orders. So that means we want everything with everything using
the cross join and this going to be very simple. So we're going to start with select star
from whatever table. So you can start from the customers and then
(3:45:33) you say cross join orders. That's it. Very simple. Let's go and execute it. So
now as you know we have five customers and four orders. And if you multiply them
you will get in the results 20 rows. So now we are getting everything with everything.
even if the data is not matching at all. So you can see for
(3:45:51) example the orders here. So this is one order that belongs only to one
customer the customer ID one. So it is an order from actually Maria but still we are
seeing this same order with the other customers since we want to combine
everything with everything. So there are no rules. The same thing for the next
(3:46:07) set. So this is the second order actually belongs to John but we are seeing
this order with all customers. So that's it. This is how the cross join works. And now
you might ask me why we have this. It makes no sense, right? Well, my friends, I
rarely use it. But sometimes if I want to generate like test data or maybe if you have
like for
(3:46:25) example table called colors and table called products and you would like to
see all the combinations between the products and the colors. So in some scenarios
it makes really sense to see all your products together with all the colors without any
matching conditions or whatever. So there are like few
(3:46:41) scenarios for the cross join if you are like doing simulations or testing. So
this is how we do the cross join. Okay. So that's all about the cross join. And with that
we have covered the four advanced types of joins. Now if you look at this you might
ask okay how I'm going to choose between all those types. So
(3:46:57) you might ask me okay bar how you do it? Well I'm going to show you now
my decision tree that I usually follow in order to choose the correct type. So now if
I'm combining two tables and I want to see in the results only the matching data
between two tables then I go and use the inner join. We don't have
(3:47:17) any other type for that. So that's simple but now if I want to see everything
all the data I don't want to miss anything after joining two tables then I take different
path and here I ask myself is there like one side more important than the other am I
interested in all data from one table from one side
(3:47:33) like here we have like a main table or a master table then I go and use the
lift join but if I want to see all the data from all tables in my query everything so there
is no one table more important than other then I go with the full join So this is another
path and now the third path if I'm interested to see only
(3:47:52) the unmatching data. So I'm doing some kind of checkups and so on. And
here again the same thing do I want to see the unmatching data from only one side.
There is like one table that is important then I go and use the lift anti- join. So I want
to see the unmatching data from one table and I'm using the other table only for the
(3:48:11) check. But in my query if both of the tables are important there is no main
table and secondary table both are important then I go and use the full anti- join. So
actually that's it. This is the decision tree that I follow usually as I'm writing a query.
And you might ask me how about the right join.
(3:48:27) Well as you know me I don't have it at all in my decision tree. So I don't use
it at all. Now by looking to this I can tell you if I check most of the queries that I write
very often I use the left join. So I can tell you this is my favorite way on how to join
tables. So let me show you exactly why. Usually I write queries in order to
(3:48:49) do data analyzes. So in data analytics you have always like starting points.
You have like a topic that you are analyzing like the customer. So you have always
like a master table. So I always start with the main table of my analysis. So in my
query I start from this table from table A the main table.
(3:49:06) And then what happens? The data is not enough in this table. I need some
extra data that comes from another table like the table B. So the table B is only here
like an additional data to the master table. So I go and use the lift join in order to
connect the table B and then I find another interesting information in
(3:49:25) another table in table C. So same things happens. I go and join the tables
using the lift join and so on. So I keep connecting multiple tables to this main table in
the middle. And my query going to look like this. always doing lift joins with multiple
tables. Now, of course, you might say, "Yeah, but
(3:49:42) sometimes you would like to see only the matching data and so on. So, it
makes sense only to use the inner join." Well, in order to do that, I can control
everything that I want to see in the final results using the wear clause. So, in the
wear clause, I define exactly what I want to see in the final result.
(3:49:57) So, with that, I get like more flexibility on whether I want to see the
matching, unmatching data and so on like we done in the lift and join, right? So as
I'm analyzing data I tend very frequently having this setup where I start from the main
table and I lift join all other tables and with the word
(3:50:13) conditions I control the final results. So this is how I connect multiple tables
together. So now if I want to visual this in like circles it's going to look like this. We
have the circle A. So this is the master table the starting point. I want to see all the
data from table A and I live join it then with another
(3:50:30) table B and from table B I want to see only the matching data. So it's like
the lift join. Now what going to happen? I'm going to go and add another table. So
another circle the circle C. And from the circle C, we want to see only the matching
data. And of course you can keep adding circles to this. But it's
(3:50:46) going to be always the same thing. And in your circle going to has only the
matching data. So now as we learned we can use joins in order to combine multiple
tables to get a complete big picture about topic like the customers. I would like to see
everything about the customers in the final results. So
(3:51:01) either you're going to do it like me where you start from the main table and
then go and lift join all other tables or maybe you say you know what there is no
main table about the customer's data all the tables are equally important then you
can go and join all those tables using the inner join if you are
(3:51:16) interested only on the match data so what can happen if you have again
those circles from the A you need only the matching data from B you need as well
only matching data and as well from the third circle so you are interested only on the
overlapping between all all three tables. So you will get only this
(3:51:32) section where you have overlapping between all three tables. So this is of
course another way on how to join multiple tables. Okay. So now my friends let's go
back to scale in order to practice how to join multiple tables. Okay. So now let's have
a task. This going to be a little bit challenging. We
(3:51:48) will be doing multi- joins using the sales DB. Retrieve a list of all orders
along with the related customer product and employee details. And for each order
display the following. We want to see the order ID, the customer name, the product
name, sales price, salesperson name. So there is a lot of things that
(3:52:07) is going on. And the first thing that you're going to notice it does now we
are using different database. We will be not using the my database, we're going to
go and use the sales DB. So this is the first thing that we have to do. So instead of
using my database, so we say use sales DB and then execute it. We are
(3:52:23) now connected to the sales DB. So this is the first thing. So now if you are
reading this task there are a lot of tables that are involved. We need the orders, we
need the customers, products and employees. So there are like four tables needed
in this task and we need different stuff from each table. So now
(3:52:41) how I think about it well it is mainly focusing on the table orders right? So
we need all the orders we cannot miss any order here. So this sounds for me this is
the main table and then it says along with that we need other informations. So that
means the other tables are not that important like the
(3:52:58) orders. So this gives me feeling about what is the main table and this going
to be my starting points. So let's start from that from the table orders. So select star
from and here you have to pay attention that this database has always a schema. It's
called if you look to the left side sales dot the table
(3:53:15) name. So we have to write that now in our query. So we're going to write it
over here sales dot and then the table name orders. Let's go and execute it. Now I
know this is the first time that you are querying this table. We have a lot of
informations here and as well we have a lot of ids. Those ids going to
(3:53:31) help us of course on joining our data with the other tables. So what do we
need from here? We need the order ID. So we have it over here. We're going to get
the order ID. This time the naming convention is different. We don't have like
underscores and comm. We have different type of namings. So be careful
(3:53:48) with that. So what else do we need? We need the sales. So if you go to the
right side over here, we have column gold sales and we're going to go and include it
to the results. Now all the other informations are actually not needed, but I need
those ids in order to join it with the other tables. So now
(3:54:04) what I'm going to do, I'm going to go and give it an alias and all. So now I'm
going to go and assign it for each column. This comes from the orders and as well
the same thing for the sales. So that's it for now. And if I go and execute it, I will get
the orders and the sales. All right, so that's all for
(3:54:19) the first table. Let's go now and see what do we need. We need the
customer's name. Well, actually we don't have this piece of information in the orders.
So all what you have to do is to go and explore in the other tables in order to find this
column. So how I usually do I go and explore the tables like this. So
(3:54:34) I write a symbol select from each tables. So the customers. So now I go
and repeat this for each table inside the database. So we have the customers,
employees, we have an orders, the orders archive and as well the products. So now
I start exploring the table. So if I go to the customers over here, we can see
(3:54:58) we have here five customers and we can see the names of the customers.
So we see the first name and the last name and this is exactly what I need for my
query. Now of course we have to go and connect this table with the orders. So we
need a common column. Usually it's going to be the ID. So here we have the
(3:55:13) customer ID and if you go and query the orders you can find here as well
the customer ID. Now if you are working in big projects you're going to have a lot of
tables and exploring each one of them going to be really hard. So now of course if
you have like in the project hundreds of tables it's going to be
(3:55:28) really hard to explore each table. So instead of that a good project a good
database usually has an entity relationship model er model like the one that we have
for the course. And here you can find easily the tables that you have inside your
database and as well the relationship between them and this
(3:55:44) is very important especially if you want to join tables. So now by just
looking quickly to this diagram I can understand okay there is an ID called customer
ID inside the table orders and it is like a foreign key to the primary key the customer
ID. So that means if I want to connect the orders with the customers I
(3:56:03) have to use that customer ID. So as you can see this is really nice
documentations and I can quickly understand how to join the tables. So now back to
our query. Now what I'm going to do I'm going to say lift join. So with that I guarantee
all the orders going to be presented in the output and I will see always 10 orders. So
now
(3:56:19) let's join it with the table customers sales dot customers and let's give it an
alias like this. And now we're going to build the joining condition. So it's going to be
the customer ID from the table orders equal to the customer ID from the table
customers. So that SQL understand how to match the two tables.
(3:56:37) And now the two tables are connected and I can get the informations now
from the customers. So see let's go and get the first name and as well the last name.
So now let's go and execute it. So now as you can see we have customers for each
order which is really nice. So with that we got the customer name and the
(3:56:56) order ID. Now the next one we need the product name. So either you're
going to go here and start exploring. I think it is inside the table products. And here
you can see we have the product. This is the name of the products. And if you check
our ER diagram, you can see we can connect the table orders with the
(3:57:13) products using the product ID. So we have the product ID in the left and as
well in the right. And now we can go and build this join as well over here. So again I
go with a lift join. I don't want to lose anything from the table orders sales products
and we give it an alias P. Now the condition for that here
(3:57:31) you have to be very focused. You want to get the product from the orders.
So you say O dot product id equal to the product ID from the table products. So as
you can see in the joins we are always joining with the table orders. Right? We are
not trying to join for example the customers with the products.
(3:57:51) Always we are joining with the main table. So with that we have connected
the third table and we can get the information that we need. So we need the
products as I'm going to go and rename it products name. So let's go and execute it.
And with that my friends I'm getting now the product informations
(3:58:09) from the table products. So we have the sales as well and we need the
price. So if you go to the products you can see we have as well price information. I
forgot about it. So let's go and get it as well from the same table. price. So let's go
and execute it. And with that we have as well the prices. Now the last column it
(3:58:26) says we want to get the saleserson name. So the name of the employee
right now if you go and explore as well we have here employees table and execute it.
You can see we have here the name and the last name of the employees and we
have an ID. So now we need this ID as well in the orders. So you can see we have
the
(3:58:44) product ID, the customer ID. We already used those two. But we have here
one more extra ID called the salesperson ID. Of course, it is not called employee ID.
So here you might be a little bit skeptical about it. That's why we have to go and
check again our ER diagram. And as you can see the employee ID from
(3:59:00) the employees, it is connected to the salesperson ID. So that I have better
feeling about it and I understand. Okay, I can connect the orders with the employees
using the salesperson ID. So let's go and do that. I'm going to say lift join. So as you
can see I'm just doing left joins sales dot employees as
(3:59:17) e and the condition again very important always the first table is included in
the join condition and here we're going to say the sales person ID is equal to the
employee ID. So with that we have connected as well the employees and we will get
as well the first name and the last name. So perfect that's it. Let's
(3:59:39) go and execute it. And as you can see guys, now we are getting the name
of the salesperson. Now here comes an issue. As you are joining multiple tables and
you are getting columns from different tables, what can happen? You might
encounter this scenario where you have the same names in multiple tables. So
(3:59:55) now as you can see we have the first name last name from the employees
and as well we have the first name last name from the customers and it's going to be
really hard from the result to understand what are we talking about? Is it the
customers? Is it the employee? That's why in this scenario if you have
(4:00:09) the same names we have to go and start giving aliases. So for the first one
we're going to say customer first name and as well for the last name we're going to
say customer last name. Same thing for the employee. So let's say employee first
name or we can call it the saleserson whatever employee last
(4:00:28) name. So if you go and execute it now it's going to be more clear. Here we
are talking about the name of the customer and here we are talking about the name
of the employee. And again one more thing if you are not using aliases it's going to
be an issue. So for example if you go over here and you don't use the
(4:00:45) table name before the column. So if I go and remove it and execute it you
will see I'm getting an error. Now SQL can't understand what are you talking about. Is
it the first name of the customer or from the employees because you are not specific
about it. So you have to tell SQL to which table belong this column.
(4:01:02) It's very important to use a table name or the alias before the column
name. Especially if you have the same column. So now we will not get an error. And
with that we have solved the task. You have really to pay attention about the join
keys. The condition you have to do it correctly cuz as you can see now we
(4:01:19) have a lot of tables and a lot of columns and sometimes happens an issue
where you specify the wrong columns or the joins and the result can makes at all no
sense. So always double check are you using the correct keys in order to join the
tables. So with that you have solved the task and this is exactly how
(4:01:35) I join tables. I have always a starting point from an important table and
everything else going to be left joined and in my results if I want to remove any
scenario then I go and use the wear clause. So this is how I join multiple tables.
Okay my friends. So with that you have learned now everything about
(4:01:53) how to join the tables in SQL and this is very important to understand. Now
moving on to the second method on how to combine your data from multiple tables.
We have the set operators. So we're going to go and cover how to combine the rows
from multiple tables. So let's go. All right, my friends. So now as we
(4:02:12) learned before, in order to combine two tables we have two methods. If you
want to combine the columns, we use the joins. And we have learned all those
different types on how to combine data using join. So we have covered this section.
But now if we want to combine the rows of two tables, we can use the
(4:02:28) set operators. And here we have four different types. We have union, union
all, except and intersects. So now we're going to go and deep dive into this word on
how to combine the rows of tables using the set operators. And now of course in this
course we're going to cover everything. So let's go. All right. So now let's have a look
(4:02:49) to the syntax of the set operators. Okay. So now let's see that we have the
following query. we are selecting the data from the customers. So this is our first
query or our first select statements and we have another one which is very similar
where we are selecting the informations from the employees and
(4:03:05) this is our second select statement. So now what we can do we can put
between those two queries a set operators like for example the union. We can use of
course any other set operators like the union all intersects except and so on. So as
you can see the syntax is very simple. We have two different queries
(4:03:24) and we just put between them the set operator. So this is how the syntax of
the set operators looks like. All right friends. So now we're going to talk about the
rules of the set operators. And we're going to start with the rule number one the SQL
clauses. In each individual select statements or query.
(4:03:39) We can use almost all the SQL clauses like where join group by having.
But there is only one exception with the order by. Order by you can use it only once
and only at the end of the entire query. So that means we cannot use order by in
each select statements or in each query. We can use it only once and only
(4:03:59) at the ends of the entire query. All right. So about the syntax again here we
have our two select statements and in between them we have the set operators. So
now in each query we can go and use multiple stuff like the join where group by
having. So we can make each query complex as we want. So everything is
(4:04:18) allowed but not the order by the order by must be always placed at the end
of the entire query. So if you want to sort the result by the first name, you have to use
the order by exactly at the end. So we are not allowed to use order by in each query.
Okay. Moving on to the rule number two. The number of columns. The
(4:04:38) number of columns in each query must be the same. Okay. Okay. So now
in order to understand this rule, let's have this very simple example. We're going to
go and select the first name and the last name from the table sales customers. So
this is our first query, our first select statements and let's say that I
(4:04:55) have another one and we want to select the first name last name but this
time from another table, the employees. So with that we have our two queries and I
would like now to go and combine them into one result. So we're going to go and use
the set operator union. Let's go and execute it. So now as you can see in
(4:05:13) the result we will get the first name and last name from two tables the
customers and employees. And it is working because we are fulfilling the rule where
it says the number of columns must be the same in both queries. So how many
columns do we have in the first query? We have two right and as well in
(4:05:28) the second query we have two columns. So that's why everything is
working. So now let's go and break the rule by adding another column to the first
query. So let's say that I would like to have the customer ID as well in the first query
and with that as you can see in the first query we have three columns but in
(4:05:44) the second we have only two. So let's go and execute it. Now as you can
see in the result we will get an error where it says if you are using union intersect
and all those set operators you must have an equal number of columns between
queries. So this is the rule you have to have the same number of columns in order
(4:06:02) to repair it. So I'm going to do I'm just going to remove the customer ID.
Okay. So here again we have two columns and the second one as well two columns
and everything going to be working. Okay. Moving on to the rule number three. The
data types of columns in each query must match must be compatible in
(4:06:15) matching. In order to check that what we're going to do we're going to go to
the object explorer to the left side. Let's go and browse the customers and the
columns. And as you can see we have here the first name and last name with the
same data type. We have the vchar. And if you go to the employees, you can
(4:06:32) see as well the first name, last name having varchar. So the first column is
varchchar from the first query and as well for the employees and as well the last
name from the customers having the same data type as the last name from
employees. So the data type is matching. Now let's go and break this rule.
(4:06:50) Instead of having the first name, I would like to go and use the customer
ID. So now let's check the customer ID on the left side. It is an int, an integer. But the
first name is invarchar. So here we have a mismatch between data types. Let's go
and try to execute it. So now we are getting an error where it says SQL is trying to
(4:07:09) convert the value Frank to an integer. So what this means the first query is
always controlling everything the names and as well the data types. So here we
have an integer and now scale is trying as well to convert the first name values to an
integer and of course it will not work because we have here characters
(4:07:26) inside and it cannot convert characters to an integer. So we have a
mismatch between data types between the customer ID and the first name and that's
why we will get an error. The second column we don't have an issue because it is
varchar in the first table and as well for the second table. So now in order to
(4:07:42) repair it either select a first name in the first query or we can go over here
and say employee ID and with that if I execute it we will not get any errors because
the employee ID is as well an integer and we have a match in the data types. So as
you can see it's not enough to have the same number of columns. You
(4:08:00) have to have as well matching data types between those two queries.
Okay, let's move to the next rule. Rule number four, the order of columns. The order
of columns in each query must be as well the same. Okay, so let's understand what
this means. Now we have here again the same example where we are selecting the
(4:08:16) ID and last name from customers and we are combining it using union with
the employee ID and last name from the employees. And as you can see everything
is working because we have the same number of columns and we have a matching
data types. So now let's go and break it. What I'm going to do I'm just going
(4:08:31) to switch between those two columns. So first I'm selecting the last name
and then the customer ID. So again I have the same number of columns and the ID
is integer matching the ID of the employee and the last name having the same data
type. So let's go and execute it. So here again SQL going to throw an error
(4:08:49) and says SQL is trying to convert the value go back to an integer. So it's
like character to integer. It will not work. So what happened here? I have here the
same informations. I have an ID and last name and ID and last name. Well, SQL
doesn't work like this. SQL going to go and map the first column from the
(4:09:04) first query with the first column with the second query. So it's going to go
and map last name to employee ID. And since they have different data types, SQL
going to throw an error. So SQL doesn't understand or don't know how to map let's
say the ID with the ID and since they have different data types SQL
(4:09:20) going to go and throw an error. So as you can see here we have the same
informations between customers and employees but they don't have the same order.
So SQL cannot go and map the informations because of the names of the columns.
It's going to go and simply just mapping the columns like this. The first column from
the first query with
(4:09:38) the first column from the second query. So as you can see in this rule you
must have the same order of the columns. First the ID and then the last name and
with that it's going to work again. All right moving on to the rule number five. The
column aliases column names that we see in the output in the result is
(4:09:56) defined and determined by the column names of the first query the first
select statements. So that means the first query is responsible of naming the
columns in the output. Okay. So let's understand what this rule means. Again we
have the same example. The customer ID, last name from customers, union,
(4:10:13) employee ID, last name from employees. So if you check closely the
output, you can see that in the output we have the customer ID and not the
employee ID. Even though we have the ids from the employee ID, but as you can
see the first query is controlling the naming of the output. So since the first column
(4:10:31) called the customer ID, you will see it in the output as a customer ID. So
the naming of the like the next queries will be totally ignored. So that's why if you
want to give aliases to the output, you're going to go and do it only for the first query.
So for example, I go over here and say instead of having
(4:10:49) customer ID, I would like to call it as an ID. So now if I go and execute it, as
you can see in the output, we will get an ID. So I don't have to go and in each query
give this alias. So I don't have to go over here and say yeah you are as well the ID
because it's enough to define it from the first query. So
(4:11:07) there's no need to give the same names in the next queries. Let's take
another example where we would like to have an alias for the last name. So I would
like to have it like this last name and let's go and do it in the second query. So last
name let's go and execute it. So now as you can see in the output, we
(4:11:27) still have last name and there's no underscore because this is totally
ignored from SQL. This is not the first query. The first query says you are last name
without underscore. So again if you want to do that we go over here. Let me just get
it and put it in the first query. Let's go and execute it. So my
(4:11:45) friends, the first query is very important in order to give the names for the
output. So if you want to do aliases and to rename stuff, do it only on the first query.
And as well the first query controls the data types. All right. Now to the last rule
matching the correct informations. If in your query you
(4:12:02) fulfill all other rules and you don't have an error in the SQL that doesn't
mean that your result is accurate and correct. You are the only one that is
responsible of mapping the informations between queries correctly because SQL
doesn't understand the content and the informations of your tables of your
(4:12:19) queries. And if you don't match the informations correctly between the
queries, you will get inaccurate and wrong results in the output. Okay. So now back
to our example. Let's say I would like to get the first name and as well the last name
from the customers and the same informations from the employees. Let's go and
execute it. Now
(4:12:38) as you can see it's very nice where we are getting the first name, last name
from both tables in one result and we are fulfilling all the requirements in SQL. Same
numbers, same data types and so on. Now let's go and make incorrect results. So
what I'm going to do, I'm just going to swap the first name and
(4:12:53) last name in the second query. So first last name and then the first name.
So let's go and execute it. So now as you can see we will get results because we are
fulfilling all other rules because we have the same number of columns and as well
we have matching data types. So the first one is character the first
(4:13:11) name and the last name is as well character. So SQL will just present the
result as you define it. But the result is completely wrong because now we have if
you check the first column here the first name. So here we can see last names inside
the first names. For example, Brown and Baker those are last
(4:13:28) names but we can see them inside the first name. And the same thing in
the last name. We now we can see first names inside it. Mary, Carol, they are all first
names. So as you can see the result has really bad data quality. We are now mixing
stuff and it doesn't makes any sense. But SQL will not know
(4:13:44) that because SQL doesn't know the information the content of your data.
It's just mapping the data types. So first name is varchchar the last name as well
vchar. Everything is fine and you will get the results. So my friends you are
responsible of having the same informations mapped between the two queries and
not having an error from a
(4:14:05) skill doesn't mean that we have now correct results. So pay attention to the
informations that you are mapping between the two queries. All right. So those are
the rules of the set operators. So the first one is that the order by can only be used
once at the end of the entire query and all queries must have the same number of
columns,
(4:14:24) the matching data types, the same order of columns and the first query
always control the names and the aliases of the result set and as well the data type.
And the last rule is that make sure that you are mapping the correct informations to
each others between queries. So those are the rules of the set
(4:14:43) operators. Okay. So what is union? Union going to go and return all distinct
unique rows from both queries. So that means it's going to go and combine
everything and all the rows going to be presented at the output. So since it says all
distinct unique rows that means union going to go and remove all
(4:15:05) duplicates from the combined result set. So union going to make sure that
each row going to appear only once. All right. So now let's have this very simple
example. We have two sets of data. We have the customers where we have five
customers with the first names and as well we have another set called
(4:15:22) employees and we have as well the first names of the employees and we
have five employees. And now if you take a look to the first names you can see that
we have the same persons as a customers and as well as employees. We have
given and marry in both sets of data. So now how is k going to execute union it's
going
(4:15:38) to go and return everyone from customers and everyone from the
employees. But now since we have given and married twice in the output we're going
to have them only once. So this is how the union works. It going to go and return
everyone from two sets but without duplicates. All right. So now we have the
following task and it
(4:15:55) says combine the data from employees and customers into one table. So
that means in one table we want to combine all informations from employees and
customers. So which informations do we need? This is the first question that I
usually ask myself. So in order to do that first we have to explore the data.
(4:16:11) So select star from sales customers and then semicolon. Then I'm going to
write another query select star from sales and employees and semicolon. So now
why I'm using two different semicolons because I'm telling SQL we have now two
separate queries. They have nothing to do with each others. And if you go and
execute
(4:16:32) it like this. And now in the output you can see we got two result grids. The
first result grid is for the first query and the second one for the second query. So they
have nothing to do with each others. I just want to explore those two tables in order
to understand how I'm going to map those informations. So now
(4:16:48) if we check those two tables you can see that both of them has ids. So we
can map those informations right. Both of them has as well first name last name. So
that means I can go and map the first name and last name together. Now in the
customers we have country but we don't have this informations in the employee.
(4:17:07) So we have to go and ignore it. And we have as well here score where we
don't have a score for the employees. That means I can go and map three
informations between the customers and employees. Now of course we can go and
think do we need really the ids because it doesn't make really any sense to have
(4:17:23) the ids in the tables. It's not anymore unique because we have here the
custom ID one and employee one. So I think we can go and ignore it. So the only
really two informations that is useful to map is the first name and last name. So now
let's go and add those two informations. So we need the first name, last name and
(4:17:39) the same informations as well from the employees. But now we want
everything to be in one query. That's why I'm going to go and remove the
semicolons. And now we have to go and use set operators between those two
queries. And now in order to combine the data we have two options either union or
union all in this
(4:17:56) example it doesn't mention anything about duplicates and so on. I would
like to go with the union in order to remove the duplicates if there is any. So that's it.
Let's go and execute it. Now as you can see in the output we have only one result
because we have only one big query. And now we have the first
(4:18:12) names and last names from the customers and employees. And now one
more thing about the order of the queries. It doesn't matter whether we start with the
employees or with the customers. we will get the exact same results but pay
attention to the naming of the columns. Always the first query controls the
(4:18:27) names but since now they have the same naming so it should not be a
problem. So if I go and switch those two tables and start it again we will get exact
same results. So now let's understand how scale did combine the data using the
union. Okay. So now we have here the results from the first query and the
(4:18:44) second query employees and customers and we are combining the data
using union. The first step in SQL is that it's going to go and take the columns from
the first query which is from the employees. So it's going to take the first name last
name as a column name to the results. And now the next that is going
(4:19:00) to go and start combining the rows between those two tables. So first going
to go and take the rows from employees and as well going to check whether there is
duplicates in the data. So as you can see we don't have here any duplicates. So
we're going to have the five employees. And now the next step is
(4:19:15) going to start adding rows from the second query from the customers very
carefully without generating any duplicates. We don't have it in the output. That's why
it's still going to go and add it to the result. Append it. And then the next customer we
have Kevin Brown. As you can see, we have it already in the results. That's why will
(4:19:31) not go and add it to the result. Otherwise, it's going to go and generate
duplicates. So it's still going to ignore this customer. The same thing for Mary. We
have Mary as well in the results. So it's going to skip it. And then we're going to go to
the mark. As you can see, we don't have mark in the
(4:19:45) results. That's why SQL going to go and take this customer and put it in the
output. And then the last one, we have Anna. We don't have Anna in the results.
That's why SQL can go and as well add it to the results. And now with this, SQL did
combine the rows between those two tables. And we have here eight persons.
(4:20:01) So as you can see, SQL is combining the data, but very carefully not
generating any duplicates. All right. So that's it. This is how the union operator works.
Okay. So now union all union union all going to go and return all rows from both
queries. So it's very similar to union. It going to go and combine all
(4:20:21) the rows and everything going to be presented in the combined result set.
But the big difference to the union all will not remove any duplicates. It is the only set
operators that doesn't remove duplicates and it going to show all the rows as it is. So
if you have a row 10 times from the query, you will
(4:20:40) find it as well in the output 10 times. Now you might ask me when to use
union and when to use union all. I'm going to say that there is one big difference
between them is that union all has way better performance and it's faster than the
union. And that's because union all doesn't perform additional steps like
(4:20:58) removing duplicates. So my friends that means if you know already that in
my queries there is no duplicates. I know my tables. I know my queries. There's no
duplicates. Don't use union and always use union all because you will get better
performance. Another scenario for the union all is that I would like to
(4:21:15) see the duplicate. I'm doing data quality checks and I would like to see
whether there is duplicate after I combine multiple queries. So in this situation I go
and use as well the union all. Now we have again the same example. We have the
customers and employees and we have as well the same persons Kevin
(4:21:30) and Mary as customers and as well as employees. So now if you want to
combine the data using union all it going to return all rows including duplicates. So
that means SQL going to go and execute union all like this it going to return
everything from customers and everything from employees and Kevin and Mary
going
(4:21:47) to be presented twice in the output. So as you can see union all is returning
all the rows as it is from the two result sets and if there's duplicates in the sets we will
get as well duplicate in the output. So Kevin going to be existing twice in the output
and marry as well twice. So this is how the union
(4:22:05) all works. All right. So now we have very similar SQL task and it says
combine the data from employees and customers into one table including duplicates.
So it's exactly like the last task but this time in the task we are saying include
duplicates. So we cannot go and use union. We have now to go and use union all.
We will have the
(4:22:23) exact same query. So we are selecting the employees first last name and
as well customers first last name. And now instead of using union, we're going to go
and use union all. So all what we have to do is that to go over here and say union all.
So now pay attention to this. As you can see in the union
(4:22:39) previously, we got eight records or eight persons from the output. So now
let's go and execute it and check the results. Now as you can see we got now 10
persons instead of eight. And that's because we have five customers and five
employees and we have duplicates inside the data. We have two duplicates. Now if
(4:22:56) you check we have here Mary and as well over here we have Mary and
same goes for given we have given over here and as well here. So we have
duplicates inside the data and SQL just combine the two tables. Okay. So now we're
going to understand how SQL execute union all in order to combine data. All right.
Again
(4:23:10) we have the two results from queries. We have the employees and
customers and SQL going to do the same steps. First going to go and get the column
names from the first query and put it in the output. It's still going to go and take all the
employees and put it in the output without checking anything. So that means
(4:23:25) if there is duplicates in the data, it's going to be presented as well in the
output. It's very simple. Now it's going to go to the second step and as well take all
the customers and append it into the output like this. So that's it. It's very fast. It's
going to go and just combine all the rows from the
(4:23:42) employees and all the rows from the customers. And with that, we're going
to get that 10 persons. And as you can see, we have duplicates in the data. So we
have marry twice and given as well twice. And that's why union all is the fastest. It
doesn't have any extra steps or checks. Just taking all rows from all
(4:23:56) queries and put it in the output. All right. So as you can see it's very simple,
right? So that's all for the union all. Okay. So what is except sometime we call it
minus in other databases but in SQL server we call it except. So it's going to go and
return a distinct rows from the first query that are not found
(4:24:17) in the second query. So from this definition we can understand that the
order of the queries can affect the final result. There is a first query and a second
query. So it is the only set operator where you have to pay attention to the order of
the queries. And as well it's like the others. It's going to go I
(4:24:35) remove the duplicates from the result set. All right. Again we have this very
simple example. We have two sets, five customers, five employees and there is the
same persons as a customer and as employees Kevin and Mary. So now we're
going to go and combine those two sets using the excepts or sometime we call it
(4:24:50) minus. So it says it's going to return unique rows in the first table that are
not in the second table. So what going to happen? What is the first table? Let's say
the customers on the left side. So here we have five persons. Joseph, Mark, Anna,
Kevin and Mary. So now the rule is we need the customers
(4:25:06) that are not employees. So it's safe for Joseph, Mark and Anna because
they are not existing in the second set. That's why SQL going to return those three
values. But now for the two customers given and marry here there is an issue. Given
and marry they are members of the second set. The second table the
(4:25:24) employees. That's why SQL going to go and exclude them from the output
because they are not fulfilling the rule. So in the output we will get only three
customers and all the values from employees and the common values between
customers and employees will be excluded from the output. So this is how the
(4:25:42) except works. All right. So let's have a very simple skill task and it says find
the employees who are not customers at the same time. Okay. So let's see how
we're going to solve that. We're going to stay with the same queries as usual. We
have the employees and the customers but instead of having union all we're
(4:25:59) going to use the set operator except. So now since we are using except we
have to make sure that the order of the queries are correct. So the first query is the
employees which is correct because we have to find the employees who are not
customers at the same time. So we are focusing on the employees. The first
(4:26:15) table is correct and the second table is customers. If the task says find the
customers who are not employees at the same time then we have to go and switch
it. We have first to query the customers. So now everything is correct. Let's go and
execute it. And now in the output we see three employees who are
(4:26:32) not customers at the same time. So we have Carol, Frank and Michael. But
as we know we have five employees Kevin and Mary. They are not here in the result
because they are customers as well. So now let me show you what can happen if I
just switch those informations. So we start with customers and then with
(4:26:48) employees. Let's go and execute it. As you can see, we're going to get
completely different results. Now we are getting customers informations. And now in
the output, we got three customers who are not employees at the same time. This is
not what we want from this task. So if you do it like this, it's going to
(4:27:03) be incorrect. So pay always attention here to the order of that query. So
now let's go and correct it. So we're going to have first employees and then
customers. Let's execute it. And now let's go and understand how SQL execute the
except operator. All right. So again we have the results from the two queries
(4:27:19) or from two tables and now we are doing except between them. So let's
see how is going to execute it. It's going to take as usual first the names from the first
query from the employees and put it in the output. And now SQL going to present
data only from the first query in the output. And it going to go and use the
(4:27:36) customers only as a check. So SQL will not put any data or rows from the
customers. It will just use the second query as a lookup in order to check the data.
So, it's going to start with the first employee, Frankly. Do we have Frankly in the
customers? Well, no, we don't have it. That's why it's going to
(4:27:54) accept it and put it in the output. And then in the next step, it's still going to
go to the second employee and check. As you can see, we have it already in the
customers. So, SQL going to go and ignore it. It's not allowed to be in the output. The
same thing for Mary. We have it as well in the customers. That's why
(4:28:10) it will not be presented in the output. So Michael, we don't have a Michael
in customers. That's why it can be presented in the output. And as well for Carol, the
same thing. We don't have Carol as a customer and we're going to have it in the
output. So as you can see, we will get data only from the
(4:28:28) first table and the second table only going to be used in order to check the
informations from it. So we don't have in the output any customers, it's only
employees. So now let's check quickly what going to happen if we switch the tables.
So now we have the customers as the first table. SQL going to take the
(4:28:41) columns from the first table and it's going to start presenting the customers
informations in the output and going to go and use the employees only as a lookup.
So do we have Joseph? We don't have it in the employee. And then Kevin and Mary
we have it already in the employees and Mark and Anna are not part
(4:28:58) of the employees that's why can go and present the results in the output
like this. So now as you can see SQL is focusing on the table customers and we are
getting data from the customers not from the employees. Employees is only as a
check. So with that we understand the order of the queries is very important
(4:29:15) for the exceptions. We will get different results if we have different order. All
right. So that's all for the except operator. Okay. So what is intersect? Intersect going
to go and return only row that are common in both queries. It's something very
similar to the inner join and as well here it's going to go
(4:29:36) and remove duplicates. So there will be no duplicates in the output. All
right. Again we have this very simple example where we have five customers and
five employees and now we're going to combine them using the intersect. So what
intersect does it going to go and return common rows between two tables. So how
(4:29:50) SQL going to execute it? It's very simple. SQL going to go and search for
the common values. So what are the common values? It's given and marry and SQL
going to return only those two values given and marry and all others going to be
excluded from the results. It's very simple, right? It's going to go and return only the
common values and
(4:30:10) this is how the intersect works in SQL. Okay, let's have this simple task and
it says find the employees who are also customers. So we're going to have the same
queries employees and customers but instead of having except we're going to go
and use intersect. Since we are finding the common informations between
(4:30:27) the employees and customers it's very simple and straightforward. Let's go
and execute it. And with that we're going to get the Kevin and Mary. This is the two
persons that are at the same time employees and customers. And of course here we
don't have to pay attention to the order of the queries. It's going to
(4:30:42) be the same if we say find the customers who are also employees. So if
you go and just switch for example the customers with employees you will see that
we will get the exact same results. So it doesn't matter which query is first again pay
attention to the first query that define the names. So now let's
(4:30:58) understand how is scale execute intersects behind the scenes. Okay again
our two tables and now we are doing intersects. So as usual SQL going to go and
take the columns from the first query and now we're going to go and find the
common data between those two results. So it's going to do it row by row. So we
have the employee Frank. Do
(4:31:15) we have it as a customer? No. So it will not be in the output. Given brown,
we have it in the employees and as well as a customer over here. So that's why we
will get it in the output. The same thing for Mary. So we have Mary as employee and
as well as customer. So we're going to have it in the output.
(4:31:31) Michael and Carol, they are not customers. They are only employees.
That's why we will not get it in the output. The same thing goes for the customers.
Joseph, we don't have Mark. We don't have Anna because they are not employees.
So with that we're going to get only the common informations between the two tables
or two queries and it
(4:31:48) doesn't matter whether we start with customer or with employees we will
get at the end the same information. All right so that's all it's very simple right this is
how the intersect works in SQL. All right friends, so now we come to the part where
I'm going to show you how I usually use the set operators in my
(4:32:07) projects for data analyszis or for data engineering. So here are the most
important use cases for the set operators. All right, the first use case is combining
similar tables before doing data analyzes. In some scenarios, we want to generate a
report and we end up writing similar queries on top of similar tables and we go at the
end and
(4:32:27) join all the results from the queries in order to present the final report. And
now instead of doing that what we can do first we can go and combine all the similar
informations into one table and then we can do on top of it a query a data analyzes
in order to generate a report and we can do that using the
(4:32:44) union or union all. Let's have few examples. So let's say that we have four
tables employees, customers, suppliers and students. So as you can see all of them
are sharing the same informations. They hold data about persons. So now let's say
that you are generating a report that requires all the individuals
(4:33:01) in the organization in the database. So what you're going to end up doing
is writing SQL query for the employees, another one for customers and as well for
the suppliers and the students. And then you're going to go and merge all the results
from those queries into the final report. Now the issue with this
(4:33:19) setup is that you are having a lot of queries, a lot of similar queries. So you
have it here four times. And now what might happen is that you go and change the
logic of the first two queries and you forget later to do it for the other two and you will
get really inconsistent data in the reports. So instead of that what we can do we can
(4:33:37) go and use the set operators in order to combine first all those tables in
one big table. So what we're going to do we're going to go and use a union in order
to combine those four tables into the table persons. So we're going to have it like
this. So we will get all the rows from the employees and put it
(4:33:54) in the persons all the rows from the customers from the suppliers and as
well from the students and put everything in one big table that holds all the
informations about the individuals that we have inside our database. And now the
next step after we combine the data now we write an SQL query in order to
(4:34:12) analyze this new big table and the result going to be presented in the
reports. And now of course the advantage here is that we have only one SQL query
for the data analyzers on top of this table instead of having it four times. And now if
you go and change the logic of the SQL query, it going to be applied
(4:34:30) automatically on all the data that we have in the database. And we have
done already this example where we have combined the data between the
employees and customers. Another scenario where we have to combine data before
doing any reporting. That's sometimes the database developers tend to divide a
table one
(4:34:45) big table into multiple small tables in order to optimize the performance.
For example, here splitting the orders by the year. We have orders 2022 2023. Now
again here if you want to generate a report in order to analyze the orders over the
years over the time either you're going to go and make a query for
(4:35:04) each of those tables or you're going to go first combining all those tables
into one table called orders. So what we're going to do we're going to use a union
between all those tables in order to generate one central table called the orders. So
all the rows from the first table and all rows from the next table.
(4:35:21) next one and the last one. So, we're going to put everything in one big
table and once we have the orders, we're going to go and write analytical skill query
on top of the orders in order to generate the report. So, as you can see, it's very
important step in order to prepare the data before doing data
(4:35:40) analyszis. Okay. So now let's have the following SQL task and it says the
orders are stored in separate tables. We have the orders and orders archive. Now
combine all orders data into one report without duplicates. Okay. So by looking to the
task we have to combine two tables orders and orders archive. So
(4:35:58) either union or union all. But since the task says without duplicates that
means we have to go with the union. But now before we combine any data we have
first to understand the content of the orders and the orders archive in order to map
the columns correctly. So first we have to go and explore the two tables. So
(4:36:15) let's start with selecting the data from orders everything semicolon and as
well from the second table sales orders archive and as well semicolon. So let's go
and execute it. So now in the output we get two results because we have two
separate queries. The first result is for the orders and the second one is for
(4:36:37) the orders archive. Let me just make it a little bit bigger. And now as you
can see we have almost identical tables. So as you can see we have the order ID,
product ID, customer ID. So everything looks like identical and of course we can go
and check that using the object explorer on the left side. So we have
(4:36:55) here the orders and those are the columns. And if you go to the orders
archive, you can see that we have the exact same columns. So that means we can
go and map all columns from orders with the all columns of orders archive. So let's
go and do that. So I'm just going to remove all semicolons and then we're
(4:37:14) going to go and use the union. So now we have everything in one query.
Let's go and execute it. Now we will get in the output one single results, one single
table with all informations from orders and orders archive. So we have all orders now
in one table and everything currently is matching. So with that we
(4:37:33) have solved the task. We have one result with all orders. We don't have
any duplicates since we are using union and we have combined the data. But now
we have one issue with that. This solution, this query is quick and dirty and actually
it's not following the best practices. So now the best practices
(4:37:49) here is to list clearly all the columns in each query without using star. All
right. So now let's go and do that. Now we need a list of all columns from the table
orders and the table orders archive. And since we have a lot of columns, what we're
going to do, we go to object explorer, right click on the
(4:38:04) table name, and then let's go select the top thousand rows. So let's click on
that. And now we're going to get a very simple select statements where we have all
the column names from the table orders. This is what I usually do if I need all the
columns in the my select statements. So let's go and copy it and
(4:38:21) go back to our query. Then let's go replace the first star with those
columns. And we're going to do the same thing as well for the orders archive since
they have the same names. So let's go and do that as well. So let me just make this
smaller in order to see the query. So now we have a select for the
(4:38:39) table orders with all columns and as well a select with all columns for the
table orders archive. So let's go and execute it. And of course now we're going to go
and get the same results. Now you might ask why we are doing this. Why didn't we
stick with the star? It's quick. It's simple. Well for the
(4:38:56) following reason. So now currently the status is that everything is matching.
We have 100% identical tables. But what happened with the time is that we do
development in our solution and we might go and change the schema of the table
orders. So we might rename stuff, we might add new columns or maybe switch
(4:39:14) the columns. So this means the table order with the time will not be
anymore identical with the archive. And this is of course a problem if you are
mapping the data blindly using the star. So now let me show you what I mean. Let's
say that in this table we are developing the orders and we just switch those two
(4:39:32) columns in the schema for some reason. So now we have the product ID
first and then the order ID. So let's go and execute it. Now if you are using star you
will not notice this informations. But if you are using script you're going to see
immediately that here we have first the order ID and then product ID.
(4:39:49) And here we have the opposite. So it's more clear listing the columns than
using the star. And now as you can see in the output you can see that we have a
problem that here we have order ids and then suddenly we have something like the
product ID. So we're going to have incorrect data which leads to incorrect
(4:40:08) analyzes. So here the best practices to not use the star and to clearly list
all the columns. Now one more technique that I usually use once I'm combining data
is that I add the source of the data inside the query. So what I mean with that now
you can see that we have here two orders with the order ID one they are not
(4:40:26) duplicates they are completely different informations and that's because
they come from different tables. So what I usually do I go and add the source of each
record it's really nice information for the analytics for the users to understand where
these records come from. So how we going to do that? We're
(4:40:42) going to have for example on the first column the following word let's say
orders and we're going to call it let's say that's source table and we're going to do the
same thing as well in the second query. Right? So the source table here is not the
orders it's the orders archive. So I'm just adding a static
(4:41:02) columns to my query in order to see the source of the table. So now we
have here two different values. And let's go and execute it. And now you see we
have created a new column called source table where it has only two values. We
have the orders and the orders archive. Let's go and sort the data by the order ID.
So
(4:41:19) order by order ID. So let's go and execute it. And now you can see it very
clearly. The first order order ID one comes from the table orders and the second one
comes from the orders archive. So this is really nice information that you can add to
your data once you are combining multiple tables. So that's all about this use
(4:41:38) case on how to combine data between different tables. All right. Now we
have another use case for the set operators. It's more for data engineers. We can
use the except in order to find the delta between two batches of data. For example,
data engineers build data pipelines in order to load daily new
(4:41:59) data from the source systems to a data warehouse or a data lake. Now, in
those data pipelines, we have to build a logic in order to identify what are the new
data that is generated from the source system in order to insert it in the data
warehouse. One way to do it is to use the set operator except in order to
(4:42:17) compare the current data with the previous load. Let's have a very simple
example. So in the day number one we have two customers one and two. So what
going to happen in this day we're going to go and load those two customers into the
data warehouse. So in the data warehouse we will get as well one and
(4:42:33) two. So this is for the first day nothing is crazy. We just load the data as it
is. Now for the second day we will get the new data from the source system and it's
going to look like this. So now if you check the second day you can see that we have
again the customer number one we have already loaded to the data
(4:42:47) warehouse. So we have it as the previous day but we have a new
customer ID number three. So now in order to load only the new data we don't need
to load again the customer number one. What we can do? We can do an accept
between the day number two with the previous load with the day number one. So
now if we simply do an
(4:43:07) accept between those two sets we're going to go and identify the new data
that is existing in the source system which is only the record number three. So now
what going to happen if we do except between day two and day one we will get one
record the new record that we're going to go and insert it inside
(4:43:24) our data warehouse. So as you can see this set operator except is very
powerful in order to compare two sets and not only for data analysis we can use it as
you can see for data engineering in order to identify what is the new data that is
generated from the sources in order to insert it inside our data warehouse.
(4:43:43) Okay, one more use case for the set operators that I personally use a lot in
my project is that if you are doing data migrations, you can use the accept in order to
check the data quality and more specifically we can use it in order to check the data
completeness. Okay, so we have the following scenario where we are
(4:44:03) doing data migrations between two databases. So let's say that we would
like to move this table from database A to database B. So we're going to go and load
the table to the new database. And now what is very important after you move the
data is that to check whether all the records did move from database A
(4:44:18) to database B we are not missing anything even one record. So we want to
do data completeness test and there are many methods on how to do this test. One
of them is to use that set operator except. So how we going to do it? We're going to
do an except between the table from database A and the table from
(4:44:36) database B in order to find any record that is still in database A which is not
migrated to the database B. And of course the best result is that we will not get
anything. The result should be empty. If we get an empty that means all the rows
from database A exists in the database B. And now of course we are not
(4:44:55) done yet. We want to do the comparison but the way around. We want to
find any new rows that is in database B that we don't find in database A. Those two
tables must be identical. So now what we're going to do, we're going to do an except
but the first table going to be from the database B. And then we're
(4:45:12) going to compare it with the database A. And we have the same
expectation. The output should be as well empty. And now after doing the except
twice for both sides and we are getting empty in the results. That means those two
tables are identical and we are not missing anything. So this is another amazing use
(4:45:29) case for the set operators in order to improve the quality of your data
migrations and in order to do data completeness test. Okay. So now let's have a
quick summary about the set operators. So the set operator is going to go and
combine the rows of multiple queries, multiple tables into one single result. And we
(4:45:49) have four different types of the asset operators. The first one is the union
where it's going to go and combine all the rows but without including any duplicates.
The second one we have the union all it's very similar. And the third one we have the
except it's going to show all the rows from the first
(4:46:05) query that cannot be found in the second query. And the fourth one we
have the intersect where it's going to show the common rows between two queries.
And of course we have SQL rules in order to use the set operators. Both of the
queries should have the same number of columns, the same data types and the
order of
(4:46:21) columns. And the last rule, don't forget that the first query controls the
aliases, the name of the columns and the data types of the entire result. And we
have found amazing use cases for the set operators. Like for example, using union
and union all in order to combine similar informations into one big table.
(4:46:39) Or we can go and use the amazing except operator in order to compare
two different results in order to find the differences between them. And I usually use it
in order to do data quality checks to test the data completeness. And another use
case as a data engineer you can go and implement the except in
(4:46:56) your logic in your data pipelines in order to identify what are the new data
that must be inserted in your system. Okay my friends. So with that we have learned
all the set operators that we have inside SQL. And with that you have learned how to
combine your data from multiple tables using SQL. So we are
(4:47:12) done with this chapter. Now we're going to go to the right side. So now
we're going to start talking about the functions in SQL. And here we have two big
families. The first one is the row level or the single value functions. And the second
one we have the aggregate analytical functions. So let's start
(4:47:27) with the first one the rowle functions. And here we can group them into
multiple categories. And we will start now with the string functions. But first let's
understand what is exactly functions and why do we need them in SQL. So let's go.
Okay. So what is exactly function and why we need it. Now again we have
(4:47:48) our data inside the table. Now there is like a lot of stuff that you can do with
your data. So sometimes you have to change the values of your data like doing data
manipulation or you want to do some aggregations and analyzes. So maybe you
want to analyze your data and find insights and maybe build reports
(4:48:03) and sometimes you might find bad data inside your tables and you want to
clean that up. So you want to do data cleansing and sometimes you have to do data
transformations and data manipulation on our data in order to solve some SQL tasks
and in SQL in order to solve those tasks we have functions. So again what is exactly
a function? It
(4:48:22) is a built-in code block that accepts an input value. Then the function going
to go and process this value and it going to return a result an output value. So you
give an input value do some transformations and give an output. And we can group
the functions into two big categories. The first one we call it
(4:48:40) single row functions. So you give the function only one value and at the
return you will get as well one value. So the input for the function going to be only
one single value like maria and the output of the function going to be as well single
row value. So one value in one value out. And now the other
(4:48:56) category of functions we call it multirow functions. So for example if you
have the function sum this function accept multiple rows multiple values like it gets
30 10 20 40 the function is then going to go and summarize all those rows and return
in the output only one value. The summarization of all those
(4:49:14) values going to be 100. So the input is multiple rows and the output is one
single value. So those are the two main categories of functions in scale. Now my
friends you have to understand something about the functions that you can go and
nest functions together. So you can use multiple functions together
(4:49:34) in order to manipulate one value. And this technique is not only in SQL in
any programming language. So let's have this example. We have the function left.
It's going to go and extract like few characters. Let's say two characters. So the input
for this function let's say it's Maria. This value going to enter
(4:49:49) the function. The function is going to go and extract the first two characters.
And in the output we will get only two characters m a. So this is one function. We
have an input and output. Now you might say you know what we have multiple steps
on this value. So the first step we want to extract the first two
(4:50:04) characters using the lift function. But we have a second step. So we want
to transform this output into a lowercase characters. So we have another function
lower and the input for this second function will be the output of the first function. So
ma it is at the same time output and input for another function.
(4:50:20) So the lower function going to take this value and convert it into lowerase
character. So it's like inside the factory the materials going to be processed into
multiple stations and the output of one station going to be the input for the next
station. And this is exactly what we can do with the functions. So now how we going
to build
(4:50:36) that? The first step is to start with the first function. So this is simple one
function. Now for the next step what you're going to do on the left side you're going
to write lower and put the whole thing in parenthesis. So now the whole thing the first
function going to be inside another function and with that
(4:50:51) you have nested one function in another and of course if you need a third
function like for example the length what you're going to do you're going to put the
whole thing again between two parentheses. So now that means the output of the lift
going to go to the lower and the output of the lower going
(4:51:05) to go to the length. So it is very simple and the order of the execution for
this will start always in the inner function. So the lift function going to be executed
first and then the outside function the lower and the last function that's going to be
executed is the length. This is how the nested functions
(4:51:20) works in SQL or in any programming language. Now my friends in SQL we
have a lot of functions that's why we have to group them as well into subcategories.
Like if you are talking about the single row functions, we have functions for the string
values and as well for the numeric, the date and time and as well
(4:51:40) functions in order to handle the nulls. And if you are talking about the
multirow functions, here we have basically two groups. The first one is the simple
aggregate functions. Those are the basics in order to aggregate your data. And we
have another advanced one. We call it the window functions or sometime we call it
analytical
(4:51:57) functions. So now if I'm looking to those two groups and now my friends it
is very important to understand those functions because using them you can do
whatever you want with your data and if I'm looking to those two groups the single
row functions those stuff here they are functions in order to manipulate and prepare
the data for the
(4:52:13) second group. So if you are thinking about data engineers and data
analysts the data engineers going to go and prepare the data in SQL using the single
row functions. So you're going to use them in order to clean up, transform,
manipulate your data in order to prepare it for the analyzes. And if you are data
(4:52:29) analyst, you will be mostly using the aggregate functions in almost every
task. So I really see it like this. The single row functions for data engineers and
multirow functions for data analysts. And my friends, what we're going to do in this
course, we're going to visit each of those subgroups one by
(4:52:44) one, exploring the functions, understanding how they work and when we're
going to use them. So let's start with the first group, the string functions. And here
we're going to learn how to manipulate the string values. So let's go. Okay. So now
since we have a lot of string functions, I'm going to go and
(4:53:04) divide them into categories based on the purpose. So for example, we
have a group of functions that's going to go and manipulate the string values. So we
have concatenation, upper, lower, replace, and so on. And another group where we
have only one function. It is where we can do calculations on the string
(4:53:19) values. And the last group, it is all about how to extract something from a
string value. And here we have three functions left, right, substring. So now let's go
and start with the first group about the data manipulation. And the first function we
have here concat. All right. So what is exactly concat or concatenation? It's going to
(4:53:39) go and combine multiple string values into one value. So if you have
multiple things you can put everything in one value. So let's have a very simple
example. Okay. So now let's say that you have one value called Michael. So here
you have a first name and you have totally separated value for the last
(4:53:54) name another column where you have a value like Scott. And now you say
you know what it makes no sense to have the first name separated from the last
name. I would like to go and combine them in one value. So you can go and use the
concat in order to combine those two values or multiple values into one
(4:54:09) single value like Michael Scott. I think that pretty much sums it up. So it is
nicer to see the full name in one value instead of having like two columns for that. So
that's it. This is why we need the concatenations. Now let's go back to scale in order
to try that out. Okay. So now we have the following task.
(4:54:27) Show a list of customers first names together with their country in one
column. So that means we have to make a list of customers and we have to combine
two columns in one. So let's start writing the query. Select. We need the first name,
the country from the table customers. So first let's go and execute
(4:54:44) this. Now as you can see we have list of customers but the issue here the
first name and the countries those two informations are in different columns but the
task says they should be in one column. So now in order to combine those two
things we have to use the concatenate function. So concat. So I'm
(4:54:59) going to start with the first argument. It's going to be the first name and
then the country like this. And we're going to give it a name. Let's call it like this
name country. Now let's go ahead and execute it. Now in the output you can see we
have a new column. It's called name country and we have both of
(4:55:15) the informations in one column. So we have Maria, Germany, join USA. But
it doesn't really look good because there's like no spacing between them. Now we
can go and make some separation between them by just adding one more thing in
between like for example maybe a space. So now we are concatenating the first
name
(4:55:33) together with a space this over here and then the country. So let's go and
execute it. Now as you can see we have nice separations between the first name
and the country. And of course you can go and add different separations like maybe
my notes or underscore and you will get the same effect. So with that
(4:55:48) we have a list of customers where we have the first name together with the
country in one column. As you can see it's very simple. This is how you combine two
columns in one. It is really nice and easy transformation. Okay. So that's all about the
concatenation in scale. Next we're going to talk about
(4:56:03) two functions. The upper and the lower. Okay. So what is upper function?
It's going to go and converts all the characters of a string to an uppercase. It's going
to make everything capitalized. And the lower function is exactly the opposite. It's
going to go and convert everything to a lower case. So let's have very simple
example for
(4:56:24) those two functions. Okay. So now we have like three values with different
cases. The first one where you have only the first character capitalized and the rest
is lowered and then the same value but everything is lowered and a third one where
you have everything with an uppercase. Now if you go and apply the
(4:56:39) function upper to those three values what going to happen for the first
value going to go and turn it into an uppercase. So everything going to be capitalized
not only the first character. And now for the second value going to turn it as well to
completely capitalized. So all the characters going to change. And for the last value it
is
(4:56:56) already capitalized. So in the output you will get the same value. So
actually nothing going to happen for that. So this is simply the uppercase. Now let's
see what can happen if you use the lower case. For the first value only the first
character going to be changed and then you will have everything in lower case.
(4:57:10) The second value it is already a lowerase value. So if you apply lower case
nothing going to happen. You will get the same value. But for the last one everything
here is capitalized and if you apply lower case all the characters going to convert to a
lower case. So my friends this is very simple. Let's go
(4:57:25) back to your skill in order to practice that. Okay. So we have the following
task and it says transform the customer's first name to lowerase. So now as you can
see the first names here the first character is a capital the rest is lowerase. So now in
this task we have to convert the whole thing into
(4:57:41) lower case. So let's go and do that. It's very simple. We're going to say
lower first name and let's go and call it low name. So that's it. Let's go and execute it.
Now if you go and compare the lower name with the first name, you can see all the
characters now in the lower case. So that's it for the task.
(4:57:59) We have transformed the first name to lower case. All right. The next task
is exactly the opposite. Transform the customer's first name to uppercase. So let's go
and have a new column. We're going to say upper then the first name as app name.
So that's it. It's very simple. Let's go and execute. Now
(4:58:19) you can see in the output we have a new column called up name and
inside it we have the first name but now all the characters in upper case. So this is
how you convert the case to lower or to upper in SQL. Okay. So that's all about the
upper and the lower. Next we're going to talk about very interesting
(4:58:35) function. It is the trim. So the trim function going to go and remove the
leading and trailing spaces in your string values. So it's going to go and get rid of the
empty spaces at the start and at the end of a string value. Let's have very simple
example. Okay. So now we're going to have different scenarios. The first one
(4:58:55) you can have like a value join where you don't have any spaces and this is
the normal case. But sometimes you might have it like this where at the start you
have a leading space. You have an empty space or sometimes we call it white
space. In another scenario the space might be at the end of the word. So here
(4:59:10) we call it trailing space and in another scenario you might have both of
them. This is really bad. where at the start you have the leading space and at the
end you have the trailing space. And of course you might not have only one space,
you might have multiple spaces depend on how long did the user press
(4:59:25) the space, right? So of course my friends spaces are really evil and this
makes no sense to have it in your data. Now what you have to do is to do data
cleansing. We have to clean up this miss and you have the best function in order to
clean up the data. You have the trim. So if you apply trim for the first
(4:59:40) value, nothing going to happen because everything is clean and we don't
have any spaces. Now if you apply it for the second case where you have a leading
space if you do that SQL going to go and remove this space. The same thing for the
trailing space. So if you have space at the end the trim function going to
(4:59:55) find it and clean that up. And if you have it at the start and at the end then
it's as well no problem. It's going to go and clean that up. And as well the trim
function can go and clean multiple spaces. So if you have like five spaces 10 spaces
at the end or at the start the trim function going to go and clean that
(5:00:11) up. So this is how the trim works. And now let's go back to our scale in
order to find out whether we have any spaces. Okay. So now we have a very tricky
and interesting task. It says find the customers whose first name contains leading or
trailing spaces. So now by looking to those values we have to find
(5:00:29) any spaces inside the customer's name. Now by just looking to this results
you will not find any white spaces because it's really hard to see especially if it is like
trailing spaces. Now we have to write query order to detect any spaces in the names.
So how we can do that? Okay. So now think about it a little bit
(5:00:48) and I can give you a hint. You can use the function trim in order to remove
any white spaces and you have to use it inside a wear clause. So what we're going
to do we're going to say where. So now we have to build a condition to detect any
spaces. So if you are saying if the first name is not equal to itself
(5:01:04) first name after applying a trim. So after trimming the first name if it is not
equal to the first name so that means there was spaces. So again what is going on
here? Let's go for Maria. If Maria has no nulls if you trim this value nothing going to
happen. The value going to stay exactly like before
(5:01:26) because there is no white spaces. But if in Maria there is any space inside
it. Trimming the value will not be equal to the first name if it contains any spaces. So
if the column is not equal to the same column after trimming it that means there is
spaces. So let's go and execute it. And now we can see in the
(5:01:45) output we have one customer John where we have this situation. Now if
you don't believe me or you don't follow me here we can have another easier check.
So let's go and comment this out and let's have a look to our first names. Now we
can go and calculate the length of the first name like we have done before. So
(5:02:02) length name and let's go and execute it. Now if you can see here Maria we
have five characters but John we have here four characters but the length is five and
that's because we have somewhere space and the space going to count as a
character. So here there is like something wrong right and you can check
(5:02:20) the others as well everything is matching but only John we have here an
issue and now in order to see this more clearly we're going to use two functions the
trim and the length. So first let's go and trim the first name. And after trimming the
values, I'm going to calculate the length. So we are nesting together the trim and the
(5:02:38) length. And I'm going to call it length. Trim name. So let's go and execute it.
Now we can see the length before trimming any value. And we can see the length
after trimming the values. So you can see over here that join before trimming is five
and after trimming is four. So we have here an issue. Now we
(5:02:59) can make things more clear where we can go and subtract the length of
the first name with the length of the first name. But first we trim the values. So here
we can call it maybe a flag or something. So let's go and execute it. Now by looking
to the flag it is really easy to now to see if we have a zero then
(5:03:17) everything is fine. We don't have any white spaces. But if we have higher
than zero like here one then this is an indicator that we have a white space. Either
you do it like this where the first name is not equal the first name after trimming or
you use more complicated solution where you say where and I'm going to remove
this from here
(5:03:36) the length of the first name is not equal to the length after trimming so not
equal so if you go and execute it you will get exactly again join so this is how we
detect any empty spaces inside our data using the trim function or maybe as well
using the length but I really prefer the first solution it is
(5:03:55) way easier using one function. All right, so that's all about how to remove
the empty spaces using the trim. Next, we're going to talk about very important
function called replace. Now the replace function going to go and replace a specific
character. So that means we have something old and we want to replace it with
something
(5:04:15) new. Let's have a very simple example to understand it. All right. So now
imagine we have a phone number where the data is splitted by a dash. Now let's say
that I don't like to have the dash in my data. I would like to have slash like any other
special character. Now in order to replace the dash, we can use the
(5:04:29) function replace. So we have to specify for SQL two things. The old value
the dash with a new value the slash. So if you do that in the output it's going to go
and remove all those dashes between the numbers and the replacement going to be
the dash between them. So it's very simple, right? All what you are doing is
(5:04:47) replacing an old value with a new value and that's why we call it replace.
But we can use this function as well in order to remove something not only we
replace and you can do that by not specifying anything in the new value like just the
single quotes and with that it's going to be nothing a blank.
(5:05:03) So now what's going to happen is still going to go and replace the dash
with a blank and that means I'm just removing the dashes from the output. So if you
do it you will remove the dash and you will get only numbers. So if the replacement
going to be a blank then that means this function will be replacing any value
(5:05:19) that you specify. So this is exactly how it works and this is why we use the
replace function in SQL. Now let's go back in order to practice. So let's do the same
example. This time we're going to go and select from a static value. So we're going
to get 1 2 3 4 5 6 7 8 9 0. So if you go and execute it, you can see
(5:05:37) we are getting the phone number. Now let's go and remove the dashes
from this value. So let's have a new line and we start with replace. The first thing that
you have to specify for SQL the value itself. So let's go and get the value. This is the
first argument. The second argument going to be the old value. So
(5:05:53) the old value going to be the dash. And now the third argument will be the
replacement. And since we want to remove it, we don't want to replace it with
anything. We will have just single quotes and nothing between them. So there's no
space between those single quotes. Now we can go and rename stuff
(5:06:09) like this is the phone. And this is a clean phone. Let's go and execute it.
Now, as you can see in the output of the function, we don't have any dashes
between the numbers. And you can go and test stuff. Like for example, I can go and
add a slash and execute it. You will see slashes between them. So you can go
(5:06:27) and try multiple stuff. So this is one nice use case for the replace function.
Now there is another use case for the replace function is that sometimes in my data
file names going to be stored like for example, let's say reports.t txt and now let's say
that I would like to change the file format from .txt to CSV.
(5:06:44) Now how we're going to do that we're going to go with a new line say
replace and then the first argument going to be the value. So let's take our value
from here and now what is the old value it's going to be the txt and I want to replace
it with another format with another extension. So it's going to be
(5:07:03) the CSV. So we're going to say this is the new file name and this is the old
file name. So let's go and execute it. And now as you can see in the output SQL did
replace the txt with SCSV. This is as well where I use the replace function in my
projects. So my friends the replace function is really fun and those
(5:07:22) are two nice use cases for the replace. All right. So that's all about the
replace function in SQL and with that we have covered the whole datamations. Now
in the next group we're going to talk about the calculations. And here we have only
one function the length. Now the length function it's very simple. It's going to go and
count
(5:07:42) how many characters you have in one value. So you are calculating the
length of a value. Let's have very simple example to understand it. Okay. So now
let's say that we have the value Maria. If you apply the length function for that what's
going to happen? It's going to go and start counting how many
(5:07:56) characters we have inside this value. So the m is 1. a 2 3 4 5 in the output
you will get the number five. So five is the length or the total number of characters in
this value. Now let's say that you have a number like 350. If you go and apply the
length function still is going to go and count how many digits do we
(5:08:14) have. The three is 1 5 2 3. So the total length for that going to be three. So
you can apply it even for numbers and not only that you can go and apply it on a
date value. So let's say that you have the following date 2026 1st 23. So SQL going
to go and count each digit each character even the underscores not only
(5:08:32) the numbers underscore is as well a digit right? So the total length of this
date it's going to be 10. So you can apply any data type to the links function and in
the output you will get always a number. That's it. This is how you can count the
number of characters in any value. Let's go back to scale in
(5:08:47) order to practice that. Okay. So now we have the task calculate the length
of each customer's first name. So it is very simple. We're going to go and apply the
function length len to the column first name and we're going to call it length name. So
let's go and execute it. And with that as you can see we are
(5:09:06) getting in the output numbers and these numbers are the number of
characters of each name of our customers. So this is how we calculate the length
and that's it for this group. Now moving on to the next one. It's going to be very
interesting. Now we're going to talk about how to extract something from a
(5:09:20) string value. And here we're going to cover now two functions the left and
the right. Now the lift function going to go and extract specific number of characters
from the start of a string value. So if you want to get few characters at the beginning
of a value, you can use the lift. But now the right
(5:09:40) function is exactly the opposite. It's going to go and extract specific number
of characters from the end of string value. So if you want few characters from the
end of your value, you can use right. Now in order to apply the left or the right
function, you have to give SQL two things. The value where you want to
(5:09:57) extract a part from it and the number of characters, how many characters
you want to extract and this is the same for the left and the right. Now let's say that
we have again this value Mariam. And now if the task says I would like to extract the
first two characters and since we are talking about the starting position,
(5:10:12) we're going to use the lift function. And since it says two characters, we're
going to go with the two. So it's going to start counting M is 1, A is two and after that
it's going to stop and make a cut and it's going to go and return the two characters M
A. So we are counting from the left side going to the right
(5:10:27) side. Right now if your task says extract the last two characters here we
are talking about the end position of your value and for that we're going to use the
right function since we are approaching from the right side and since we want only
two characters the number of characters going to be two. So
(5:10:43) this time going to start counting from the right side moving to the left side.
So A is one, I is two and that's it. Then SQL going to stop and extract only those two
characters. I A. So if you want to extract data at the starting position, you use the left.
But if you want to extract characters from the end
(5:10:59) position of your value, then you use the right function. Now let's go back to
scaler in order to practice. Okay. So now we have the following task. Retrieve the
first two characters of each first name. So we just need the first two characters.
Since we are coming from the left side, we can go and use the
(5:11:15) function left. So it's very simple. First name and we need only two
characters. So two. So we're going to call it first to character. Let's go ahead and
execute it. And now you can see in the output we have two characters MA. Now with
John we have only G because we have a leading space. Well, you can
(5:11:32) leave it like this or you can transform it. And then George we have G and
so on. So with that we are getting the first three characters. Now in order to fix it for
John what we're going to do we're going to say trim first and then apply the lift. So
with that we are getting rid of all white spaces and then we
(5:11:48) apply the lift. So with that everything looks perfect. So for John we have jo.
So this is how we can get the first two characters of a column. Now let's move to the
next one. The task says retrieve the last two characters of each first name. So this
time we need the last two. So we are coming from the right side. So
(5:12:05) we're going to do it like this. We're going to say write first name and then
as well too. So last two character let's go and execute it. And now as you can see in
the output we have new column where we have the last two characters from the first
name. So we have here I a er and for John as well working and that's
(5:12:25) because we don't have any trailing spaces but if you have any trailing
spaces then go and use that trim function. All right so that's all for the left and right
and now we're going to go to the last function. we have the substring. So the
substring going to go and extract a part of a string at a specified position. So this
time we
(5:12:45) don't want something from the beginning or the end. We want something
like in the middle. So we want to specify the starting position and we want to extract
few characters from there. So let's have very simple example to understand it. Now
in order to use the substring you need three things. The first one is the
(5:13:00) value itself where you want to extract a specific part from it and then you
have to specify the starting position where SQL going to start extracting the
characters that you want and as well SQL needs the links how many characters we
have to extract. So now let's say that we have the following task after the
(5:13:17) second character extract two characters. So from reading this you can see
we specified the starting position this is the second character and the length going to
be the two characters. So let's have this example. Well, if you have Maria, so now
we have to specify the starting position. Now we are saying
(5:13:33) after the second character. So the first character m is one. Then a is two.
After two, we got the position number three, right? So starting from R. So that means
we have to specify for SQL three because the starting position going to be number
three. This is after the two. Now we want only two characters. So we want the
(5:13:51) R and the I. If you give this to SQL Maria starting position three and the
length two, SQL can go and extract the two characters the R I. And this is exactly
what we want. We want two characters after the second position, the second
character. So with that, we didn't extract something from the left or from the right. We
extracted at
(5:14:10) specific position. And this is exactly why we need the substring. Now let's
make it a little bit more difficult where we're going to say after the second character
extract everything all the characters. So not only RA I I would like RA I A. So now
nothing's changed about the starting position. It's going
(5:14:26) to stay at three. But now if you are looking to this value and you want to
extract everything starting from R. That means you have to specify the length of
three. But this is not really good because let's have another value in the same
column. So we have Martin. So the starting position going to be as well R.
(5:14:41) And now the lengths going to be different. So we have here four
characters. So now the length is not anymore three. It is four. But you have to specify
something at the end for SQL. You can go for four. That's fine for Maria as well. But if
you have a lot of values, it's going to be really hard to specify exactly the correct
length.
(5:14:58) That's why instead of specifying a static number like three or four, we can
use another function. So now my friends, if you use the length function, you will get
the total number of characters, right? So for Maria, you will get five. For Martin, you
will get six. And those numbers are okay to use in the length
(5:15:15) because they are more than what we need. And that's totally fine. So if you
are saying okay for Maria start from the third position and cut for me five characters
SQL going to find only three but you will not get an error. So you are extracting more
than you need and you will always get all the characters
(5:15:31) after the starting position. So this is a little trick that we use in order to
make the links dynamic where we cannot find one value that we can use in all
scenarios. And now let's go back to SQL in order to practice the substring. Okay. So
now we have the following task and it says retrieve a list of customers
(5:15:47) first names after removing the first character. So now don't ask me why but
for some reason we don't want to see the first character of the first names. We want
to remove it. So how we can do that? We cannot use the left or the right. We have to
go with the substring because it is little bit more complicated. So substring and let's
go
(5:16:05) and get and the first argument going to be the value. So it comes from the
first name and then the second argument is the starting position. So where we want
to start since it is saying I want all the characters after the first character. So that
means we will be starting from the position number two. So for example
(5:16:23) Maria here the first character M position number one and we want to start
our substring from the position number two. So that was so that was the easy part.
Now the next one the question is how much characters we want to leave. So do we
leave here like four characters like in Maria we have four characters
(5:16:41) but in John we have only three then the next one is four and so on. So if
you go for example with four and let's call it sub name. So we make it static. What
can happen? It's going to work for some scenarios like Maria. We have here Ara and
for better we are getting it. But for Martin it is not working. We are not
(5:17:01) getting the last N because it has like five characters after the first one. And
by just looking to the result as you can see we have here one issue with John and
that's because the first character is an empty string. So this is really annoying. So
that's why we use the trim first just to get rid of all those white
(5:17:18) spaces. And now you can see it's working fine. So we are not getting the J.
We have everything after the first character. So now instead of having this static
what we're going to do we're going to make it variable. So we're going to go and use
the length of the first name. So with that we make sure we
(5:17:35) have enough length to extract. And this can work for any value inside the
first name even if the name is like 20 characters. So let's go and execute. And now
you can see for Martin it is now working. So we have here like five characters after
the M. And here we have four characters after the M as well. And
(5:17:53) here we have three characters after the G. So it is working completely and
it is full dynamic. So this is the trick by using the links together with the substring.
And as you can see now we are using three functions in one go. We have the length,
we have the trim and we have the substring. And this is what happens
(5:18:11) in scale. we use multiple functions together in order to solve like complex
tasks. So this is how you can extract a substring from a string. All right. So that's all
about the substring and with that we have covered a lot of very important string
functions in SQL and now you have enough tools in order to
(5:18:28) manipulate the string values in your data. Okay my friends. So with that we
have learned how to manipulate your string values inside SQL using the string
functions. Now we will move to the second one. you will learn how to manipulate the
numbers, the numeric values. So let's go. Okay. So now let's have this example
(5:18:48) 3.516. Now let's say that you want to apply the function round and you are
using two decimal places. So what going to happen? It's going to go and keep only
two digits after the decimal point. So five and one and the third digit after the decimal
six. It will decide whether the number going to round up or
(5:19:06) stay as it is. And now since six is higher than five. So that means SQL
going to go around the numbers up. So instead of having 51 we will get 52. And after
that the third digit going to reset to zero. So in the out you will get 3.52. Now let's say
that you have done round but only for one decimal place.
(5:19:25) Now it's still going to go and keep only one decimal place and that is the
five. And the second digit this time going to decide whether we round up or not. And
now since one is less than five, there is no need to round up and the five going to
stay as it is. It will not turn to six. So there is no round up and the
(5:19:42) digits after the five going to reset to zero. So we're going to get 3.5. Now
let's say that you say round zero. So that means I don't want to see any digits after
the decimal point. So now SQL going to go and check the first digit after the decimal
point, the five. This one going to decide whether the
(5:19:59) three going to turn to four or not. And now since we have five it is good
enough to round the number because either five or above five going to round the
numbers. So that's why it's going to be a round up and SQL going to return at the
end four and all the digits after the decimal points going to be reset to
(5:20:16) zero. So this is exactly how the round function works in SQL. So now let's
see how we can do that in SQL. Okay. So now let's go and practice about the
number functions. So what we're going to do we're going to write SQL select but this
time we will not select any data from the database. We going to practice using
(5:20:31) our static value like for example the value 3 dot 516. So let's go and
execute it. So with that I have this decimal number. Now let's go and start practicing
the round function. So now let's go and round this number 3.516 and this time we are
rounding to decimals. So let's go and call it round two and let's go and execute it. So
as
(5:20:54) you can see in the output we are rounding two decimal places and we have
the two because as we learned the six going to go and round it up. Now let's go and
do the same thing for one. So let's round one execute. And as you can see in the
output we are rounding to one decimal. So we have the five and
(5:21:12) everything is zero. And we don't have six here because the one is lower
than five and it will not round up the numbers. And let's and round by the zero. it is
rounding it to an integer to the four and all the decimal digits are zero and we have
four because we have five and five going to round up the
(5:21:31) number. So as you can see it is really nice and this is how we round
numbers in SQL. Now there is another number function which is really cool called
APS or the absolute what it going to do it's going to go and convert any negative
number to a positive. So let me show you what I mean. Let's go and say we have
(5:21:51) like minus 10. So this is a negative number. But if I say APS, so the
absolute of the minus 10, what I will get? I will get a positive number. So it's like
giving us the absolute of any number or in other words, it is like converting the
negative to a positive. And if the number is already positive,
(5:22:08) nothing going to happen. So if I say the absolute of the 10, I will get as well
a 10. So this is really nice and cool function that is really important in order to
transform numbers in many scenarios like if you have mistakes on your database
like let's say minus sales makes no sense to have sales that is
(5:22:26) minus. So in order to correct the data we can use the APS in order to
convert all the negative numbers to a positive. So this is really nice cool and easy
function to learn. All right my friends. So that's all for the numeric functions. We have
covered two very simple functions and now in the next topic we
(5:22:41) have a lot of functions about how to manipulate the date and time in SQL.
So let's go. So what is a date? If you take a look at calendar and you pick any date,
for example, August 20th, 2025, this date could represent an event like a birth date.
Happy birthday. Happy birthday. or a project deadline at your work and
(5:23:10) mainly it has three components. The first part is a fourdigit number
indicating the year. Then the next component it is the month. So normally we
represent the month with a number between 1 and 12. And the last component is the
day. This is a number between 1 and 31 depending on the month. Now in database
we call this structure of those
(5:23:31) three components a date. So this is what we mean with dates in SQL. All
right. All right. So now let's move to the next one. What is time? Time refers to a
specific point within a day. Like for example, we have 18:00, 55 minutes, and 45
seconds. So this structure has as well three components. The first one we
(5:23:50) call it the hours. It is as well a number between 0 and 23 indicating the
hour of the day. Then the next one, it is the minutes. This is a number between 0 and
59. Moving on to the last component, we have the second. This is again the same
thing a number between 0 and 59. So now this structure with those
(5:24:08) three components we call it in databases and SQL a time. So this is what
we mean with the time. Now to the last type if you go and combine both the date
together with the time and you put them side by side you will get a new structure and
a new name in the databases and we call it usually time stamp. This name is used in
many
(5:24:29) databases like Oracle, Postgress and MySQL. But in the SQL server, we
have another name for that. We call it date time. So again, it's very simple. The date
time or time stamp has the date information together with the time information. So
here in this example, we have six components from left to right
(5:24:45) and here we have like a hierarchy in this structure. So we start with the
highest which is the year. Then we have the month, the day and then we continue to
the hour, minutes and seconds. So those are the three different types about date and
time informations in SQL. We have the date alone or the time alone
(5:25:02) or together in the date time. All right, let's explore now the data that we
have inside our database searching for date and time informations. Now let's go to
the table orders and if you go and expand it, you will find here two columns having
the data type dates. So we have the order dates with the date
(5:25:20) and as well the shipping date with the data type dates. And if you check the
last column, the creation date, this one is date time 2. So now let's go and query
those informations in order to understand the structure. I'm just going to select the
order ID, the order date, and the ship date and the creation
(5:25:39) time from sales orders and from is big. So let's go and execute it. Now if
you go and check both order date and ship date, you can find that here we have only
the structure or the informations about the date and we have nothing about the time.
So again here we have a year, month and day and that's why they have
(5:25:59) the data type date. Now let's go and check the creation time. Not only we
have the date information but as well we have the time information. So it start with
the date information year, month, day and then we have hour, minute and seconds
and then we have fractions of the seconds, milliseconds and so on. So
(5:26:16) this is how the date time or time stamp looks like in databases and this is
how the date looks like. All right my friends now in SQL I can say that we have three
different sources in order to query the dates. The first one is dates that are stored
inside our database like we saw here in those columns like the order date,
(5:26:40) shipping date, creation time. All those are columns that holds this
informations and they are stored inside our database. So this is the first source of
dates that we can get inside our queries. Let me just remove those stuff and let's
stick with the creation time. So let's just execute it. So those are date and
(5:26:56) time informations stored inside our database. The second type is a hard-
coded date string that we can use inside our queries. Let me show you an example.
So now if we go to a new line, I can go and define a date like this. So 2025 August
20th. So that in this string we have hardcoded a date that is static
(5:27:17) for all rows. Let me just call it hardcoded and let's go and execute it. Now
we can see in the output we're going to get a static date for all rows. So this going to
be the same for all rows inside our table. So this value is not stored inside our
database. This value I just added to our query and hardcoded
(5:27:37) it. So sometimes in queries we define our dates that's going to be used
maybe later in calculations and so on. Now the third source of getting dates inside
our query is using the function get date. Get date is the first and the most important
function that we use in SQL. It's going to go and return the current
(5:27:55) date and time at the moment of executing the query. So let's try that out.
I'm going to go and get a new line. So get dates. It's very simple. It doesn't accept
any values inside the function. So it's going to be empty. So let's call it today. All
right. Let's go and execute it. And of course, we're going
(5:28:12) to get different results because the get date now is the date and the time
that I'm recording this video. So currently it is July 18, 2024. And I'm recording this
around 20 p.m. So as you can see, this going to be as well repeated for each row.
We're going to get always the same value. So again, this depend on the
(5:28:30) execution of that query. So during the tutorial, you're going to learn a lot
about the get date and we're going to use it in a lot of functions. So those are the
three different sources of getting date information inside your query either from a
column inside our database or hardcoded using a string.
(5:28:44) And the third one is using the get date in order to get the current date and
time informations at the moment of the query execution. Nice. Now we have a clear
understanding what is date and time in SQL. The next question is how to manipulate
those informations using SQL functions. Okay. Now we have our date
(5:29:07) August 20th, 2025. One of the things that we can do with the date is we
can go and extract different parts of the date. For example, we are interested only on
the year. So we can go and extract only the year part. Or if you are interested in the
month, you can go and extract the month and you will get
(5:29:23) August. And of course, we can go and extract the day and we will get the
20. So this is the first thing that we can do. We can extract the parts of the dates.
Now another thing that we can do is we can go and change the date format. So
instead of having like a small minus between those date parts, we can go and
(5:29:39) split them using slash. We can even start first with the month August then
20 the day and then the year but having only the short form of the year 25 or we can
go and change the format where we say we don't need any special character we just
leave it as a space. So as you can see we are changing and manipulating
(5:29:56) the format of the date. Another category or task we can go and do date
calculations. So we can go and take our date and add to it for example 3 years or we
can go and find the differences between two dates like we are doing a subtraction or
let's say minus and we will get for example 30 days. So we can
(5:30:13) go and add stuff subtract stuff or find differences between two dates. It's
like we are doing calculations on the date. Now to the last thing that we can do with
this date is we can go and test this date or validate it whether it is a real date that
SQL understands. So we can put it on the test and at the output
(5:30:31) we're going to get true or false or zero and one. So as you can see here we
have different ways or let's say categories on how to manipulate our dates in SQL.
Now we're going to go and group up the different date and time functions under four
categories. The first category and the most important one we have the part
(5:30:47) extraction and here we have around seven different functions that we can
use in order to do this task. Another category we have the format and casting. And
here we have three different functions. Underneath this category we have the format,
convert and cast. And then the third category we have the calculations
(5:31:04) of the dates. We have two functions date add and date diff. And the last
category the validation. We have here only one function called is dates. So as you
can see we have a lot of scale functions. We have 13 date and time functions that
we're going to cover in this tutorial on how to manipulate the date and time
(5:31:20) informations in SQL. And this is how we can group them into four different
categories. Let's start now with the biggest category. We have the part extraction.
We're going to cover all those seven functions in details on how to extract parts. All
right friends, now we're going to cover three very easy quick
(5:31:39) functions in SQL to extract the parts of the dates. So they are very simple.
The day function going to return a day from a date and in the same way the month
going to return the month from a date and guess what the year going to return a year
from a date. Okay. So now in order to understand how they work we
(5:31:58) have a date like this one 2025 August 20th. Sometimes you are not
interested in the whole date. You would like to get only a part from this date. So you
go and use the function day in order to extract the two digit 20. Now in other scenario
you might be interested in the month information. So you would like to
(5:32:17) get those two digits 08. So we can use the function month in order to
extract the month information in order to get the August. So 08 and one more
situation where you want to have only the year information. So you are interested in
the four digits 2025. So you can go and use the function year in order to
(5:32:37) extract it. So in the output if you apply it you will get 2025. So it's very
simple. This is how those three functions work. All right. Now let's check the syntax
of those three functions. It's pretty easy. So we have it always like this. A keyword
called day. This is the function name. And then it accept only one parameter. It is the
(5:32:54) date. The same things for the others. We have a function called month and
it accept as well only one parameter the date and as well for the year the same thing.
So the syntax is very straightforward. It accept only one value the date and we have
the function name like the name of the part that we want to extract. All right. So now
let's
(5:33:12) try out those functions. I will be working with the column creation time. So
let's try for example extracting the year from the creation time using the year
function. So it's going to be very simple. It's going to be year and then creation time
like this. And let's call it year. That's it. Let's go and execute it. Now
(5:33:32) as you can see it's very simple. We have only one year 2025 from the
creation time. So with that as you can see we got a new column where we have only
the year informations inside it. And this information come from the creation date. So
we have only 2025. Now let's go and do the same for the month. So we're
(5:33:48) going to have the same thing month creation time and let's call it month. So
let's execute it. Now as you can see in the output we got as well the number of the
month. So we have here January, February and March and those information as well
are extracted from the creation time and the same thing using the day
(5:34:07) function. So let's go and use that. So creation time and we call it day. So
now as you can see in the output we have the day part from the creation time. So
here we have 1, 5, 10 and so on and all those informations come from the creation
time. So as you can see those three functions are very simple and quick in
(5:34:28) order to extract parts from a date or date [Music] time. All right. So what is
date part? Date part going to go and return specific part of the date as a number. All
right. So now back to our example. We have learned how to extract the day, month
and year. But of course now in a day we have more informations that we
(5:34:50) could extract. Not only those three we could extract for example the week
right the quarter so all those informations are as well stored in this dates we cannot
see it like as a value but inside the SQL you can extract the week and quarter but we
don't have a function dedicated for those stuff because they
(5:35:09) are not commonly used like the year and month and day but still we can
extract those information using the date parts for example we can say date part and
we can specify the part as a week and with that SQL going to return for this example
34 and maybe in other situation you are interested in the quarter right
(5:35:28) so you can specify it like this date part quarter so we are interested in the
part of quarter and in the output you will get three so this is exactly the power of the
date part you can go and extract way more parts that is available in these dates and
one more thing to notice about the date part year and day
(5:35:47) all of them are always generating the output an integer a number. So we
have the for the quarter 3 for the week 34 the day 20 2025 and so on. So all of those
informations are integer. So integer is the data type of the output of these functions.
Okay. So let's have a look to the syntax of the data part.
(5:36:08) It start with the function name date parts and it accept two parameters. The
first one is the part that we want to extract. So we want to define what do we want.
We want the month, the day, the year and so on. And the second parameter is the
date itself. So let's have an example. We can say date part and we
(5:36:25) would like to extract the month from the order dates. So the part is the
month and the order date is the date that we want to extract from. So with that we
are specifying the part as a month. Now in SQL there is another way on how to
specify the parts. We can go and use like an abbreviation of the month. So if
(5:36:42) you specify instead of month instead of writing the whole thing you write
mm you will get the same results. So it's like abbreviation and shortcut in order to
write scripts. But I rarely see that in the implementations. I always tend to write it
completely like this month because it's more like standards if you
(5:36:58) are switching between different databases. So as you can see it's very
simple. You have to give SQL two things which part you want to extract and the date
that you want to extract from. Okay. So now we're going to go and extract different
parts from the creation time using the date part. Let's start for example by extracting
the year
(5:37:14) again. So let's go and do that. date parts and then we have to specify
which part we need. So we're going to write year like this and then the next one
going to be the value. So it's going to be the creation time. So let's call it year and
let's say date parts. Let's go and execute it. So now at the output you
(5:37:36) can see we got as well again the years that is extracted from the creation
time. So it's going to be identical to the year function. So there is no differences
between them. Both of them are integer and it holds the year informations. Now we
can go and try different parts. For example, let's copy the whole thing and let's
extract for
(5:37:54) example the month. So you can go over here and change it to month and
let's rename it execute. So at the output you see we got as well the months is
identical as well to the function month. And the same thing for the day. So we are just
changing the parts and in the output we are getting the parts. So here we have as
well the
(5:38:17) days it is identical to the day function. So so far we don't have something
new from the date part because we have it already from the other functions. But now
we're going to go and extract other parts that are not year month and day. So for
example let's go and get the hours. So we have the date part and here as a part you
say hour and
(5:38:36) let's call it here as well hour. Let's go and execute it. Now you can see in
the output we have a new dedicated column that shows only the information from the
hour. So we have here 12 23 and so on. And those informations comes from the time
and the same thing you can define minutes and so on. But now let's
(5:38:56) go and get something interesting like the quarter. So let's go and duplicate
it and instead of hour let's get quarter. So this information it's not displayed in the
creation time but SQL can go and extract it. So let's call it quarter and let's go and
execute it. Now as you can see in the output we have one
(5:39:18) new field called quarter and inside it everywhere we have a one because
all those dates are in the range of the quarter one. So as you can see this is amazing
of course for reporting and analyzes. Let's go and have something else like the week
day. So we are over here quarter and let's call it week day
(5:39:36) and rename as well this to week day. So let's go and execute it. All right.
So now let's go and get something else like for example the week. So I just
duplicated over here instead of quarter let's write week. So I would like to get the
week number. So let's go and execute it. So now in the output as you can see
(5:39:58) we got a dedicated field that show us the week number from the creation
time. So we can see this dates come from the week number one. Those two come
from week number two and so on. So that's it. As you can see guys all those
informations that you are getting from the date part are numbers. And now we
(5:40:14) can extract way more informations than only the year, month and day. And
even if those informations are not displayed directly in the field itself like the quarter,
weeks and so [Music] on. All right. So now we have very similar function to the date
part. We have the date name. So the only difference here is that it returns the
(5:40:36) name of the date parts. All right. So now back to our example. We have
learned we can extract different types of parts from one date. But we learned as well
that all of them are numbers. How about we would like to extract the name of the
month. So instead of eight, I would like to get the name of the month like
(5:40:53) August. Or instead of the 20, I would like to get the day name like here in
this example, it going to be Wednesday. So in order to get the name of the parts, we
have to use the function date name. So for example, if you use the function date
name using the part month, you will not get eight in the output.
(5:41:11) You will get the full name of the month August. So as you can see we are
getting a string a full name and as well the same thing if you use date name for the
week day you will not get 20 like the day function you will get the name of the day
Wednesday and as well here the output is string so as you can see it's
(5:41:28) very simple we are using the date name in order to get the name of the
parts and the data type of the output here is a string it is not an integer so as you can
see here we have different types of functions that all of them are doing the same job
we are extracting ing parts from one date. Okay. So now by checking
(5:41:46) the data name syntax, it's going to be identical to the date part. So we are
just switching the function name. It needs from you to define the part and as well the
dates. The only difference here is that we are getting different data type at the
output. So here we are getting a string instead of integer. All
(5:42:03) right. So now let's check the date name. It is very similar to the date part.
So we're going to have it like this. We're going to work as well with the creation time.
So we're going to say date name and then after that we have to define the parts. So
let's go for example with the month and our field is as usual the
(5:42:18) creation time and let's call it month date name like this. So that's it. Let's go
and execute it. Now if you go to the output over here you can see we have the month
but this time we don't have numbers. We have the full name of the month. So we
have January, February, March instead of having 1 2 3. So this
(5:42:39) is the big difference between the date name and date part. Date part you
get numbers. Date name you get the name of the part. So let's do the same thing for
the day. We would like to get the name of the day. So I'm just duplicating it. But now
in order to get the full name of the day, we cannot go with the day.
(5:42:57) We're going to go with the week day as a part. So that's it. I will call it week
day. So let's execute it. Now as you can see in the output, we have here a new
column called week day. And inside it we have the name of the day instead of a
number. So here we have Wednesday, Sunday, Friday and so on. So the full
(5:43:16) name of the day go of course with the day. Let's go and try that out. So this
is the day of the month and of course the day of the month has no name and SQL of
course going to return the numbers again. So you can see 1 5 10 20 and so on. But
still there is a difference between the day from the day name and the day from the
date parts. In
(5:43:37) the date parts we are getting integers. So if you store this information in a
new table it's going to be stored as an integer. But in the date that you are getting
from the date name it is a number but still it can be stored as a string value. So the
data type of those numbers is a string and the data types
(5:43:56) of the day from the date part is an integer. And the same thing can happen
if you extract for example a year. So you don't have like a full text of the year. So let
me just do it like this. So if we say a year, you will not get the name of the year.
You're still getting the numbers, the digits, but the data
(5:44:15) type here is a string. So that's it. This is the difference between the date
name and the date parts. For the month and weekday, you will get the full name. For
the other stuff, you will get numbers but with the string data type. So the most
important thing about the date name is to present easy to read and
(5:44:30) human readable informations to the users. So imagine you are building a
report called sales by month and then you show to the user the muscles as numbers
1 2 3 until 12. This is of course okay but it is way more nicer if you present those
informations as a full text. So you go with the date name in order to show instead of
one you show
(5:44:50) January, February, March and the full name of the month. And this going to
look way nicer in reporting for the users. So this is the core use case of the date
name. So what is date trunk? Date trunk going to go and truncate the date to a
specific part. So let's understand what this means. Okay. Now let's check the
(5:45:13) syntax of the date trunk. It's going to be exactly the same like date part and
date name. So you have to define the part and the date that you want to extract
apart from it. So the only thing that is different here we are giving different function
name. So as you can see all those three functions like
(5:45:29) having the same structure you have to provide which part you want to
extract like a month, day, week, hour, minutes and so on and the date or date and
time that you want to extract a part from it and of course with the date trunk we are
getting at the output date or date time. Okay. So now let's understand exactly
(5:45:47) how the date trunk works. We have the following date time and as we
learned we have like a hierarchy where we start with the highest from the year then
we move to the month, day, hours, minutes and seconds and by looking to this
information it is very precise. We know exact second for this information right?
(5:46:04) So the level of details here is very high. We know the seconds of this
event. So now the date going to allow us to change this level of details of this
information by specifying the level of details. Let's take for example if we say the
date trunk minutes. So we are saying we are interested only at the
(5:46:23) minutes level. We are not interesting with the seconds. So what can
happen? Everything between the year and the minutes going to be kept. That means
all those information will not be changed but only the seconds going to be reseted.
We are not interested anymore with the seconds. This is very detailed
(5:46:40) for us. So it's going to go and reset the seconds to 0 0. So we are saying
the minimum level is the minutes and we are not interested anything like before it the
seconds let's say now we say you know what the minutes is very detailed I would like
to be at the hours level so we specify for the date rank hour so
(5:46:58) here things changed we're going to keep the informations now between the
year and the hours and anything after that going to be reseted so now minutes and
seconds going to be in the range of the resets and SQL going to go and reset the 55
to 0 0 so now the level of details is little bit lower now we know only the
(5:47:19) informations until the hours and we are not interested about the minutes
and the seconds and I think you already get it if you say date trunk day what's going
to happen it's going to keep everything between year and day and the whole time
going to be resets so the hours and seconds all those information is going
(5:47:38) to reset to 0 0 so now by looking to this we don't know anything about the
time we know only informations about the dates and now we can go one more step
and we say you know what I'm not interested about the days I'm doing analyszis on
the month level so what is here kept is only two informations year
(5:47:54) and month and everything below that the day and the time going to be
reseted but this time SQL will not reset the date to 0 0 because there is no date
called 0 0 it start always with the first date so it's going to reset to 01 so the dates
parts and the dates going to reset to 01 one and the dates parts in the time
(5:48:15) going to reset to 0 0. So now we are at the level of the month. Now you can
go to the last step and you say you know what I'm interested only on the years and
I'm doing only analyzes at this level at the highest level. So you can go and say date
trunk year and now what's going to happen going to keep
(5:48:32) only the year and everything below that going to be reseted. So between
month and the seconds everything going to resets. So here is scale going to reset as
well the August 2011. So the only value that is kept is the year and everything else is
reseted. So this is the 1st of January and the time is completely reseted. So now we
are at the
(5:48:52) lowest level of details. We know only information about the year and we
don't care about any other parts. So as you can see the date trunk here is not really
extracting a part here. Date trunk is like resetting stuff. So we are navigating through
the hierarchy of the date and time and we are controlling at
(5:49:09) which level we are doing the analyszis. So as you can see at the end it's
not very complicated once you understand how it works and it is very useful in
analyzis. So this is how the date trunk works in SQL. Okay, let's have a few
examples about the date rank together with the creation time. So as you can
(5:49:24) see the creation time the level of it is the seconds. So we have seconds
information with the creation time. Now I would like to move it to the minutes. So let's
go and do this date trunk and we're going to say let's tr it at the minutes level for the
creation time. So let's call it minute date trunk. So
(5:49:45) let's go and execute it. Now if you go and check the output over here and
compare it to the creation time, you can see here we have zeros at the seconds. So
as you can see we have the seconds completely resetted compared to the creation
time. Now let's say that I'm not interested in the time information
(5:50:04) inside the creation time. I would like only to get the date. So in order to do
that, we can use the date trunk where we reset to the level of the day. So let's go and
duplicate it. I'm going to put it over here and instead of minutes, let's say we have a
day and let's go and check the output. Now if you go and check the
(5:50:23) result over here you can see all the time informations are reseted to zeros
and we have here only information about the date. So we have year month and day
and everything else is reset it to zero. Now of course we can go to the maximum
where we say I just need the year. So I don't need anything else. So let's try
(5:50:42) that out. We're going to take date trunk and say year and let's call it year.
So let's go and execute it. Now if you check the output over here you can see that
everything is reseted beside the year. So we have only the year information but
everything else is reseted to the first of January and the time is as well is reseted. So
as you
(5:51:03) can see the output of the date trunk is always as a date time and it help us
as well to navigate through the hierarchy of the day time and we can truncate at the
level that we want. All right. So now we're going to check why data trunk is amazing
function for data analyszis. So let's have this example. We are
(5:51:21) saying select creation time and we want to count the number of orders
based on the creation time from our table sales orders and we're going to use the
group by in order to group the data by the creation time. So let's go and execute it.
Now as you can see we're going to get one everywhere because the level of details
(5:51:42) the granularity or the creation time is very high and that's because here we
have the seconds and since our data is small we will not get like two orders at the
same seconds. Now in data analytics you would like quickly to aggregate the data at
different granularity like for example at the month level. So you can
(5:51:59) do that very quickly using the date trunk and you say you know what let's
say at the month and let's call it creation and we're going to have the same thing for
the group pie. So let's go and execute it. So now as you can see at the output we
have only three rows we don't have like 10 rows and that's
(5:52:18) because we have three months. So that means we just rolled up to the
month level instead of the seconds. And we can see now in the month of January we
have four orders, February as well four and March we have only two. So now we are
talking about different level of details in the output and granularity. And now
(5:52:33) you might say let's go and aggregate the data at different level at the year
level. So you can just change over here the year and execute it. And with that now
we are at the highest level of aggregations. We are at the year level and since in our
data we have only 2025. So we will get the total number of
(5:52:51) orders inside the table and that is 10. And this is really amazing in data
analytics. You can go and quickly change the granularity and the level of aggregation
or details by simply defining the level inside the dates. So this is why the date rank is
amazing. It allow us to do analyszis and aggregations by zooming in and zooming
(5:53:09) out. Okay. So now we're going to talk about the last function in the part
extraction category. We have the end of the month. As the name says, it's going to
go and return the last day of a month. So let's see how end of month works. This is
very simple. So let's take our date 20th August 2025. If you
(5:53:29) go now and apply this function to it, what's going to happen? It's going to
go and change only the day information. So instead of 20, it's going to go to the last
day of the month. So it's going to go and change the 20 to 31. The last day of the
month, August in 2025. Let's take another example is the 1st of February
(5:53:50) 2025. If you apply the end of the month, it's going to go and change the
day from the 1st to 28. The last day of month February. So as you can see, it's very
simple. Let's take another example where it is already the last day of the month. So
we have 31 of March. If you apply the end of the month here, what can happen?
(5:54:11) Nothing going to happen. You're going to get in return the same value. So
this is how it works. And as you can see always the output of the end of the month
going to be as well a date. So this is how end of month work. It is very simple. All
right. Now quickly about the syntax of the end of the month. It's going to have
(5:54:27) the exact same syntax like the day, month, year. It accepts only one
parameter. It is the date. So we have to pass here a date in order to find out the end
of the month. So let's go and find the end of the month of our creation time. So end
of the month like this. And let's have our creation time. So let's see the end of month.
Let's go
(5:54:49) and execute it. And now in the output you can see we have a new column
a date column. And inside it we have values about the end of the month. So for
example here we have January, January, January and so on. So you will see always
here the end of January and the same thing for February and March. So
(5:55:05) that's it. This is really nice function in case you need the end of the month
of each date. Maybe you're creating a report or analyzes where you need this
information. And now you might ask me how about to get the first day of the month.
Is there like any function for it? Well, no. But there is a trick in
(5:55:19) order to get the first day of the month using another function that we just
learned. Think about it. How to get the days as one everywhere. So we have to get
here the 1st of January, the 1st of February, and the 1st of March. So how we can do
that? Well, using the date trunk. So let me show you how we're
(5:55:38) going to do this. So date trunk and we're going to reset at the level of
month. So we don't need the days it going to reset to the first. So our field is creation
time and this going to be the start of month. So let's go and execute it. So now as
you can see in the output we have the start of month and you can see we have
everywhere here
(5:56:01) a one since we reset it at the level of month and this going to give us the
first day of the month. And now you might say you know what here we have a lot of
zeros how to get it exactly like the end of the month and that's because the date rank
give us date and time always. So that means we have to change
(5:56:15) the data type and that we're going to learn later using the cast function but
we can go and do it right now. So we can say cast and we want to change the whole
thing to date. And now that we change the data type from date time to date and in
the output as you can see we have only the date information. So now it's
(5:56:32) really amazing that you got two dates. The first one is the start of the
month and the second is the end of the month. And those information might be
helpful if you are generating reporting and you need the start and the end of the
[Music] month. So now we come to the part where we ask the question why do we
need those
(5:56:49) parts? Why do we need to extract the date parts from a date? So let's have
the following use cases. The first use case of extracting the part is doing data
aggregations and reporting. Sometimes we are building like reports based on our
data and sometimes we have to aggregate our data by a specific time
(5:57:06) unit like for example we are building a reports in order to show the sales by
year. So we have different years and we are aggregating the data based on the year
or you want to drill down to more details where you want to aggregate the data by
the quarter. So in this report we are showing the sales by quarter Q1 2
(5:57:24) 3 4 or you decide to go in more details where you show a report says sales
by month and then you start aggregating your data by the month. So you have
January, February, March and so on. So as you can see we can use those different
parts in order to aggregate the data based on it and these different
(5:57:42) parts can offer us different analyzes with different details. So now we have
the following task and it says how many orders were placed each year. So that
means we have to group up our data by the year and we have to count the number
of orders. Let's go and solve it. So let's go with the select. And now what
(5:57:59) do we need? We need the order date. This going to indicate when the
order is placed. So and we have to go and count the star. So this going to be number
of orders. and from our table sales orders and we have to group up by the order
dates. So that's it. Let's go a and execute it. So now in the output we are
(5:58:21) getting the number of orders but by the order date. So we are still not
there. We have to have it as a year. So we don't need the whole date information.
We need only the year information. So that means we have to go and extract the
part year. In order to do that we can do it like this. So we can go with the year
(5:58:40) and we have it as well in the group I. So that's it. Let's go and execute it.
And with that as you can see we got the number of orders for each year. And since in
our data we have only 2025 we will get only one row. So with that the task is solved.
We are now aggregating the data on the level of the year. Now
(5:58:57) let's have another task which is the same but only different parts. How
many orders were placed each month. So we have to go and change it to a month.
It's very simple. We're going to use the function month and as well in the group by.
So let's go and execute it. And now as you can see in the output we don't
(5:59:15) have one row. Now we have three rows. And that's because we have three
months inside our data. And for each month we will get the total number of orders.
So for the January we have four, February we have four and March we have two
orders. Now you might say you know what I don't want the months as a numbers. I
(5:59:31) would like to have the full name of the month. So in order to do that we're
going to go and use the function date name. So let's go and use date name and then
we have to specify the date part. It's going to be the month and the value going to be
the order date and we have to have the same thing as well in the
(5:59:47) group I. So let's go and execute it. Now you can see in the output we are
getting the full name of the month which is easier to read. So this is one of the use
cases why we need to extract parts from a date in order to aggregate the data on a
specific level. So now let's have the following task and it says show all orders that
were placed
(6:00:09) during the month of February. So that means we don't need all the orders.
We need only a subset of the orders based on the order dates. Now let's go and
check the data. So select star first from sales orders and let's go and execute it. So
now with that we have our 10 orders. Now if you check the order
(6:00:28) date over here you can see that we have orders in January, February and
March. Now we are interested only on the orders that were placed in February. So
only these subsets. So that means we have now to filter the data based on the
month information. So what we're going to do, we're going to have a wear clause.
And
(6:00:45) now we don't need the whole order date. We need only the part month. So
we're going to go with the month and order date and this going to be equal to two.
Since the output going to be in number. So let's go and execute it. Now as you can
see SQL did filter the data and in the output we have only the orders were
(6:01:04) placed in the month of February. So this is as well very common use case.
Why do we need the parts? We use it in order to filter the data based on specific part
of the dates. So as you can see it's very quick and easy. And here my
recommendation is that if you are filtering the data always use the numbers. So
always use a date function
(6:01:24) that gives you a number because it's always faster to search for integers
instead of searching for a character or for string. So don't use the date name function
in order to search or filter for the data. It's better to use the date part or month, year
and day. Since you can work with numbers and numbers
(6:01:42) are always faster to retrieve data and to filter your informations. Okay. So
now we have a lot of functions and I would like now to do a quick recap about the
data type of their results. So as we learned we have functions like day, month, year,
date bar and the output of all those functions going to be integer. It's
(6:02:04) going to be a number. Now we have another function the date time. If you
use it the output of this function going to be a string because here we are extracting
the name of the date part. And if you go and use the date trunk you will get in the
output always date time two. So you are getting both the date
(6:02:20) and time. And the last function that we learned end of month if you use it in
the results you will get the data type date. So this is really important to understand
the data type of the output so that you don't get any unexpected results. All right. So
now you might say you know what those are a lot of
(6:02:36) functions and like I'm saying they are doing the same stuff. We are
extracting the parts of the dates. So now you might ask me how do you decide on
when to use which function? This is how I usually do it. First I ask myself which part I
want to extract. If I want to extract a date or a month then I ask the question do I
(6:02:52) need it as an integer as a number? If it's yes then I go and use the day
function or the month function because they are quick and I will get exactly what I
need. But now if I need the full name of the month or the day then I go with the
function date name. Now moving back if I'm interested on the part year.
(6:03:11) So here we don't have a year name or something. I'm going to go
immediately with the function year. But now let's say that I don't need the day, month
or year. I'm interested in other parts like the week, the quarter and so on. Only for
this scenario, I go with the function date part. So this is my decision process. This is
how I decide
(6:03:31) when to use which SQL function in order to extract the parts of the dates.
All right. All right. So now I have prepared for you here a list of all parts that we can
use inside those three functions date part date name and date trunk. And you can
see in this table the different outputs using those different
(6:03:51) three functions. So for example if you go and use the month with the date
part you will get eight but for the date name you will get August and for the date trunk
you will get truncated date time at the level of the month where you reset the days
and times. So this is a full list of all examples you can go and
(6:04:08) check it. And one more thing that I have prepared for you in order to
practice with all those different parts. I have made one big query with all different
parts. So if you go and download the queries of this chapter, you will find the
following files and let's go now and open all date parts. So we're going to
(6:04:23) go inside it and here we have a long query. So what we're going to do,
we're going to select everything and copy it and let's go back to our scale and paste
it. So let me just zoom out and then let's go and execute the whole thing. So now in
my code I have just done a union for each possible part. For example for
(6:04:43) the year we have date part date name and date trunk and I'm using
currently the get date. So we are manipulating this one and then the output can be
presented over here. So you can see it like this. So if you use the part here for the
date name you will get 2024. The same thing for the date name and this is for the
(6:05:01) date rank. And with that you have all possible parts that you can use in
SQL in one query. So with that you can learn what are the outputs for different parts.
All right. So with that we have learned all those functions on how to extract the parts
of dates. All right. Moving to the second category. We're
(6:05:18) going to learn how to do formatting and casting for the date informations in
SQL using three functions. So now before we deep dive to the formatting and casting
I would like you to understand what is date format. So back to our example we have
here the date and time informations and we understood there is components year
(6:05:39) month day and so on. Now if you check the date time there is combination
of numbers and characters. For example the 2025 is a number but between the
month and the year there is like a minus between them and this is a character. So
now this is a very specific format and in SQL we can have a code for this
(6:05:57) format. So for example let's start with the year we have here four digits and
we can represent it with 4 Y. So Y Y and we call those characters as format
specifiers. So this is how we represent the year. Then between the year and the
month there is like this small minus and then the month is two digits and we're
(6:06:18) going to represent it with two big M. So m M then between the month and
the day there is a minus. So we have as well minus and then the day going to
represented with two digits d and then we have like a space between the date and
time and then we start with the date. So it start with the hour big h and big h because
here we have the
(6:06:37) system of 24 and then we have double points small m small m. So as you
can see here the formats are case sensitive. So there is a big difference between
small m and a big m. So a small m indicates for a minute and big m indicates for a
month. So as you can see here the case format is case sensitive.
(6:06:55) So two small m means minutes but two capital m means month. Then
double point and small 2s. So now the whole code is called the date format. So this
is the date format representation of this value. Now in the world there are different
representations on how to represent a date. So for example in SQL
(6:07:16) we have the international standard ISO6801 and the date format is like we
have learned first it start with the year. So four digit for the years minus two digit for
the month minus two digit for the day. So year month day but in the USA we have
different standards. So first it start with the month. So we
(6:07:36) have mm and then after that it is followed with the day. So we have then
the day and after that at the end we have the year. So this is the sentence format
that is used in USA and in Europe we have different representations of the day. So it
start first with the small. So it starts with the day then the month
(6:07:53) and then the year. So this is exactly the opposite of the international
standards. So as you can see we don't have one standard. We have different ways
on how we represent dates. But in SQL the SQL server is following the format of the
international standards. So SQL server start always with the year
(6:08:10) then month then day. So all dates that are used in our SQL database can
be following this format. Okay. So after we understood what is date format, now let's
talk about formatting and casting. So what is formatting? Is changing the format of
value from one to another. So we are changing how the data looks like. So for
(6:08:32) example, we have our date. So it's following the international standards
start with year, month, then day. Now we can go and change the format using the
function format where we can go and define a different date format like it start with
the month and then we have like slash instead of minus and then the
(6:08:48) day/ year. So in the outer we're going to get it like this and even the years
is only two digits not four. So here we are providing for SQL the format that we would
like to see the data with or you can go with other format where you have three big M
and then four digits for the year and between them is just a space.
(6:09:05) So in the output you will get abbreviation of the month name and then
space and the year. So this is one way on how to format data. But in the scale there
is another function that help us to format data and that is convert. So here we
provide not the format itself we provide style number. So for example the
(6:09:22) style number six. So it can show it like this day space and after that we
have the abbreviation name of the month and then two digits of the year. Or if you
use another style the 112 then you will get the year, month, day without any
separation between them. And of course not only the date and time we can style
(6:09:40) we can style as well numbers and here we can use the function format in
order to change the format of the number. So here if you're using the format of
numeric values then the values will be separated with comma or if you use c for the
currency then you will get the dollar sign or if you go and use p then you
(6:09:55) will get the percentage and at the end you have the percentage character.
So as you can see we can as well change the format of the numbers but only the
dates. So this is what we mean by formatting we are just changing how the value
looks like. Now in the other hand the casting the casting can go and
(6:10:11) change the data type from one to another. So for example if we have the
value 1 2 3 as a string we can go and convert it from the data type string to an
integer. So in the output we will get as well 1 2 3 but as a number or we can go and
change the data type from dates to a string. So in the output it is not
(6:10:30) anymore dates it is a string value or the way around we can change the
data type from a string to a date. So as you can see we can change the data type
from one to another and we can use that using two functions. The first one is and the
most famous one is cast function or in SQL server we can use as well the
(6:10:50) convert function in order to change the data type. So this is what we mean
with casting changing the data type from one to another. All right. So let's start with
the first function the format. So what is format? As the name suggest it formats a
date or time value. So it's like we are changing how the date and
(6:11:12) time looks. Okay. So let's check the syntax of the format and here it
accepts two parameters and the third one is optional. So the first one we have to
provide a value. It could be a date or a number. And the second one we have to
provide the format. So here we are specifying the new look the new format
(6:11:28) for this value. Now the third one it is optional one. It is the culture. Culture
means show me the value whether it's date, time or number. Show me this value in
the style of a specific country or region. So each country each region has different
format. So here we can go and change it to specific region format. But
(6:11:47) as I said it is optional. Let's have an example. So here we are saying go
and format the order dates using the following format. So dd day then slash then we
have the month then slash then the year. So going to go and format this with this
new format. And as you can see here we didn't specify any culture since
(6:12:06) it's optional. Let's see another option where we can say you know what I
would like to have the order date formatted with this format but we would like to go
and add the style of Japan. So we are specifying here the code or the style of Japan.
And of course we can go and use the format not only for the date but as
(6:12:22) well for formatting the numbers. So here we are specifying the value. The
format is D. And as well we have activated the culture option. We are using the style
of France. So this is the syntax of the format. Using this option is not really common.
So I rarely see this format or someone using it. So the first example
(6:12:40) is the most used one in the projects where we have the culture as default
or we are not using the culture at all. And of course if you don't specify anything is
going to go and use the default culture which is enus. So this is all about the syntax
of the format. All right. So now let's have a few examples
(6:12:58) using the format. So we're going to go and format the creation time. So
we're going to do it like this. Format. And what we are formatting? We are formatting
the creation time and now you can go and define any specifier you want. For
example, let's say DD like this. So let's go and check the outputs.
(6:13:15) So execute it. Now if you are using DD, you will get the day information. So
we can see if you're using this specifier, we are getting two digits about the day. So
and as well we are getting the leading zero. So we are getting the 01 05 and all
those informations are the day information. Now let's go and try
(6:13:34) something else. adding one more D. So let's have it 3D and here as well.
So let's go execute it. So now if you check the output, we are getting now the name
of the day. It is not full. So we are getting like a short name of the day or abbreviated
one. So this is sometime nice if you are creating like a calendar
(6:13:51) or something. Let's go and add one more D. So we're going to have 4 D.
And let's go and check the result for this one. Now in the output we are getting the
full name of the day. So it's really nice. Now we are getting full flexibility on how to
format our day. Okay. So now let's keep playing. Let's get something else. I'm just
going to go
(6:14:11) and duplicate everything and I will go with the month now. So this is 2 M, 3
M and 4 M. Let me do it like this. So let's go and execute it. Now as you can see we
are getting the same stuff but for the month. So mm we will get the two digits and 3m
we will get the abbreviated name of the month and for m
(6:14:34) we will get the full name of the month. So it's like we are extracting the date
part from the format but of course we don't use it like this. We will go and write the
whole format that we need for a date. So for example let's go and change this format
to the USA format. So in order to do it so we're going to go
(6:14:53) over here. So let's say format again the creation time. And now we're going
to write the format of USA. So it's going to be mm. Then after that then after the
month we're going to have like minus then day and then after that we're going to get
the year. So for time year and that's it. Let's call it USA format. So
(6:15:12) let's go and excuse it. And now you can see in the outut we got a new
column where we see now the date information but as a USA standards. So it start
with the month then the day and then afterward we got the year. And of course we
can do the same thing in order to generate the standard format of Europe.
(6:15:29) So what we're going to do I'll just duplicate it. And now the format of that
going to start with the day then the month and then the year. So now if you check the
output you can see it start with day minus then we have the month then minus the
year. So as you can see we are changing the format of the date
(6:15:48) from creation time to something new. All right. So now we have the
following task and it says show creation time using the following format. Now we
have a very weird format. So it start with the word day. Then after that we have the
abbreviation of the day and then abbreviation of the month. This is the
(6:16:05) quarter informations. Then the year and after that we have the time and
we're going to say whether it's PM or A.M. So it's little bit weird format that you don't
see it everywhere but still we want to practice on how to construct such custom
format. So let's do it step by step. I'm going to go over here and a
(6:16:22) new line. So the first one is like day. So we don't have any format for that.
It's just like characters. So this one going to be static for all the format. So what we
going to do? We're going to say with a string this is the day. So let's go and execute
it. So with that we got a static value. Everywhere we have
(6:16:41) the word day. So that's it. And after that we have a space. So I'm going to
go and include it after the day in the string. So we have a day then space and after
that we need the abbreviation of the day name. So what we're going to do we're
going to go first with the plus operator in order to concatenate the
(6:16:57) strings. So we need the format function for the creation time. And what do
we need? We need the short name. So it's going to be three times the d. Let's go
and execute it. Let me just say here custom formats. So now as you can see in the
output we have here the day. Then afterward we have space and then the
(6:17:17) abbreviation of the name of the day. So it looks so far good. Now after that
what do we need? We need space and then the abbreviation of the month. So we
can go and add all those stuff together with the format here. So we don't have to
create two formats. So space and the abbreviation of the month is 3 M. So
(6:17:36) let's go and test it. Great. So now as you can see we got the abbreviation
of the month as well side by side. So we so far we have covered this part. Now we
have to move to the second part. So we still need a space and then Q1. Well the Q
going to be static. So we cannot go and extend this format. We have to start
(6:17:55) a new one. So what I'm going to do I'm just going to add a plus here and a
new line. So what do we need? We need first a space between the month and the
quarter. So let's go and add space and we need the Q as a static value like this. Let
me just move it like this. And now after that we need this one like
(6:18:14) this right so now we need the quarter informations and we don't have
format for that that's why we have to go and use the part extraction functions and the
one that we're going to use since we are using string I will go with the date name so
quarter and we are extracting from the creation time so let's go and
(6:18:32) test it so now in the output you can see we have everywhere a Q1 and
that's because all of those dates are in Q1 all right so now we are so far halfway in
our format Not. So now next what do we need? We need like a space and then the
year information and then the time information. So now in order to go and
(6:18:50) get space we're going to do it very simply concatenate and we're going to
have space. Now let's go to a new line and in order to get the year I will go with the
format as well. So format and what do we have? We're going to have the creation
time again. So how we going to format it now? What do we need? We need the year.
So
(6:19:10) it's going to be four times the y and after that we have like space and then
the time information. We still can't do that inside the format, right? So we're going to
have space here. And then next what do we have? We have the hours. So it's going
to be h the small h because here we are talking about the pm and am.
(6:19:24) It's not the 24hour system. And then after that what do we have? The
points double points. Then the minutes going to be small 2 m. And then after that the
seconds. So far this is exactly this part over here. And now what is missing a space
and the PM the designator. So in order to do that we're going to have a
(6:19:43) space as well and then small 2 * tt. All right. So we are almost there. Let's
go and execute it. Now you can see it is working. So we have the year then space
the hours minutes and space and then we have the designator. So this is PM and
this is A.M. which is correct. So that's it. We are done. This is how you can
(6:20:06) create those crazy formats in SQL using the help of format or maybe date
name or maybe some static values like we just added here. So I think it's really fun
formatting the dates in SQL. Now one use case for the format that I frequently use in
my project is using it to format the date before doing aggregations. So it's like part
(6:20:29) extraction but here we have more customizations on how we represent the
date at the reports. So we can show a report like sales by month where we display
for example the date as abbreviation name of the month Jan and as well two digits
for the year 25. So once we change the format like this and then do data
aggregations we will have a
(6:20:48) nice report about the sales by month. So let's have a quick aggregations
using the format. So, we're going to go and say select and now the order date and
count the number of orders from our table sales orders and then group by. But now
before we start using the order date, we have to go and format it. And then if you
take the
(6:21:11) order date, let's go and execute it. So as you can see the level of details is
very high and we have here 10 rows and for each day we have like one order. Now
we learned we can go and use the date part in order to extract one part and then
aggregate on it. So now instead of that we're going to go and use the
(6:21:27) format function. So let's go and change the format and it is the order dates.
And our format going to be like this. So three big M and then two digits for the year.
That's it. And let's call it order dates. And we need this as well for the order date over
here for the group I and here a comma. So that's it. Let's go and
(6:21:46) execute it. So in the output as you can see over here we have three
months and here we having the aggregation the number of orders for each month.
So now it's like the date part but now we are customizing the format as we want. So
we can use the format in order to change the granularity of the date in order to
(6:22:02) do that aggregations. Now I'm going to show you a real use case for the
formatting in real projects. Now our data could be stored in different technologies like
the data could be stored in CSV file or we can get our data using an API call or in
very common scenario our data could be stored in database. So now what we
(6:22:23) usually do we go and extract the data from these different sources into one
central storage. It could happen that you are getting different formats for the dates
and of course this is a problem for analytics. You cannot present different formats for
the dates. What we're going to do we're going to go
(6:22:37) and clean up the formats into one standard format. So that means we have
to format the incoming data to new formats and once we have one standard format
we can use it in analytics and reports. So this is very common use case in data
preparation and in data cleanup by formatting different formats into one
(6:22:55) standard format. Now in SQL we have many different date and time
specifiers and I said they are case sensitive and each one of them has a different
meaning. So I prepared for you as well all possible specifiers that we can use with
the formats. Not only that, if you go back to the queries that you can find in this
(6:23:16) chapter, you can find here date format. So all date formats. If you go inside
it, you can go and copy the whole query and then go back to SQL then execute it.
You can find here a live example because I'm manipulating now the get date. So you
can find here a list of all possible date specifiers that you can use with
(6:23:37) the formats. So I would say go and practice with those different date
formats in order to understand what is possible in SQL. So as we learned not only
we can change the format of the date, we can change as well the format of the
number using the function formats and those are the different possibility
(6:23:51) that you can use as a specifier for this format in order to change the format
of the numbers and as well I have prepared all those different specifiers in one big
query. So if you go inside it and copy it and then put it in SQL and execute it, you will
find here all different possibilities that we have as
(6:24:08) a specifier to change the format of the numbers. All right. So what is
convert? It's very simple. It's going to go and change the value to a different type and
as well at the same time it helps formatting the value. Okay. So let's check the syntax
of the convert and it looks like this. It start with the
(6:24:29) function converts and it accept two parameters the data type first since we
can use this function in order to cast the data types. So you can use string integer
dates and so on and then we have to specify the value. So which value should be
casted. And the last parameter it is optional one where you define the
(6:24:47) style the format of the value. Let's have this very simple example. We are
saying convert to the data type integer int and the value that should be converted is
1 2 3 as a string. So it's going to convert it to integer. We are saying convert to a
vchart and the value that should be converted is the order
(6:25:07) date. So the order date should be a date. So we're going to convert it from
date to v charts using the format or the style of 34. So here we are specifying a style
a format for this value. And of course it is optional and if you are not using anything
the default value that's going to be used is zero. So this is the
(6:25:26) syntax of the convert in SQL. All right. So now we're going to have few
examples on how to work with the convert. So let's go and convert for example string
to integer. So we're going to say for example convert. So what is the target data
type? It's going to be the integer and the value. It's going to be like for
(6:25:43) example 1 2 3. So and let's call it like this string to integer and the function
is convert. So now in the column name as you can see I'm using here brackets and
that's because I'm using like empty spaces and so on and with that I will get more
freedom on how to name things. So this is just the name. So this is no
(6:26:01) function or something. Let's go and excuse it. Now as you can see it's
going to work. So we are converting from a string value to an integer and the output
this 1 2 3 here is not string. This is the data type of integer. All right. So now let's
have another example where we want to convert from string to
(6:26:18) date. So the target going to be the date and the value let's have this value
as usual and we're going to go and call it string to date convert. Okay. So let's go
and execute it. Now in the output we will get this information this string as a date.
And with that we have converted the data type from string to
(6:26:41) dates. Now let's have another example where we want to convert the date
time to a date. As you remember the creation time is a date time and we would like
to have it as only date. So let's go and convert and we would like it to be as well date
but this time it's going to be a column called creation time and let's
(6:26:58) give it the name. So we are converting date time to dates. But of course
here we have to go and select. So from sales orders that's it. Let's go and execute it.
Now, as you can see in the output, we got only date. I'm going to go and select the
creation time in the query as well. So now, as you can see, the
(6:27:20) creation time was before a date time. So, we have the time information as
well. But if you go and cast it using the convert and make it only date. So, SQL going
to go and convert it to date and you're going to lose all the informations about the
time. So, so far what we are doing here is just casting.
(6:27:35) So, we are changing the data type from one to another. But in the convert,
we can do both. We can do casting and formatting. So let's see how we can do that.
I will just get rid of those information at the start. So creation time. And now we're
going to go and convert the date time of the creation time to a varchar to a string.
And as
(6:27:54) well to give it the format of the USA standard format. So let's see how we
can do that. We're going to start with convert. We are changing now to var. So this is
the new data type and the value is the creation time. And now if I don't give it a style,
it's going to stay with the standard format, but we would like
(6:28:11) to have the USA standards. So in order to do that, we're going to go and
add the style of the format. So it's going to be 32. So that's it. Let's have a name like
this. So USA standard and we are using the style of 32. Let's go with that. This is just
a name again. So it's not a function. Let's go ahead and execute it. And now
(6:28:34) in the output we got a new field and the data type of this field is a varchar.
So it's not a date or date time. And as you can see the date now is formatted using
this style the 32 the US standard format. So it start with a month then a day and then
a year. So now let's go and do the same thing in order to get the
(6:28:52) standard format in Europe. So I will just go and copy the whole thing. I will
just change the style. So instead of 32 we're going to go with the 34. And I will just
change the name as well. So, so we are just changing the style. Let's go ahead and
execute it. Now, as you can see, we got the same thing. We have as
(6:29:11) well a v jar and the format now is different. So, we have here the day, then
the month, and then the year. So, this is how you work with the convert function. You
can use it in order to do only casting or not only that, you can do casting and as well
formatting. So, you have both things in one function.
(6:29:28) And now if you're talking about which styles are available, we have many
styles that you can use inside the convert. So I have prepared for you a list of all
styles that you can use with the convert. So we have styles only for the dates and
another styles only for the time and styles for only date time.
(6:29:47) Now in the download folders you can find here one file called all culture
formats. And here you can find one query that I have prepared where you can find
inside it the different cultures and the examples. So let's go and copy it and let's go
back to scale paste it and let's see the results. So now if you
(6:30:05) check the output we got the first column is the cultures that is used. So we
have a lot of cultures like around 17s and you can see how the numbers are
formatted or the date is formatted based on this culture. So it's really fun. You can
check here for example how the format in Japan or Korea or France and
(6:30:20) the German one. If you scroll down, you can find the Arabic, the Russian
and so on. So you can see the format of each dates is changing based on the
culture. So I would say have fun. Go and try those different cultures formats in order
to format your numbers or dates. So what is the cast function? It
(6:30:40) going to go and convert a value to a different data type. So it turns one
data type to another. All right. So now let's check the syntax of the cast. I really like
this one. It is not typical like format or syntax in SQL. So it says the cast is the
function and then inside it we need two things but it's not
(6:31:00) separated like with the comma as we learned before with all other functions
but this time is separated with the keyword as. So it's like the natural English you are
saying cast the value as a data type. So you are casting the value to a new data
type. So let's have this very simple example we have here
(6:31:16) cast the value 1 2 3 as integer. So previously it is string and it going to be
converted to integer. So as you can see it's very simple. Now in this example we are
saying cast this value this string value as a dates. So converted from string to dates.
So as you can see with the cast we don't have here any option of formatting or
styling
(6:31:35) the values. So it's only dedicated for casting the value from one data type
to another one. So this is the syntax of the cast. It is very straightforward and really
nice function. Okay. So now let's have a few examples about the cast. So let's go
and convert a value from a string to integer. So it's very simple.
(6:31:53) We're going to say cast. So now we need the value. So let's go with the 1 2
3. So we have here a string. And then we're going to say as and then we have to
define the data type. So the data type going to be integer. So that's it. So let's give it
the name like this string to integer. Let's go and execute it. Now
(6:32:10) as you can see we got the value but with the data type integer. From string
to integer. Now let's do the way around. We cast from integer to string. So we're
going to say cast 1 2 3 as var jar and we're going to give it a name int to string. So
let's go and execute it. Now in the output we have 1 2 3 but this time it has the data
type
(6:32:33) varchar. Now let's go and work with the date. So we're going to go and
convert a value a string value to a date. So our value going to be the usual one and
we want it from string to date. So we're going to have the data type as date. So let's
give it a name string to date. Let's go and execute it. Now we're going
(6:32:56) to have this value with the data type date. So that's it. Now let's say that I
would like to have this value but as date time. So I will just copy the whole thing and
go to a new line and say date time two. So the name of this going to be string to date
time. Let's go and execute it. Now in the output as you can
(6:33:16) see we are getting not only the date but as well we are getting the time
information. But now since we didn't provide SQL with any time information SQL
going to go and show it as zeros. Now let's do one more casting where we change
the data type from date time to date. So now we need our creation time but we have
to get it from the
(6:33:33) tables. So from sales orders let's go and execute it. So now in the output
you can see the creation time is a date time. We have the time information but we
are not interested about the time information. I would like to have this field as a date.
So it's very simple what we're going to do. We're going to
(6:33:49) say cast. Now the value is creation time and then the keyword as and we
need it as a date. So we're going to give it the name date time to date. So let's go
and execute it. Now as you can see in the output we got the creation time but only
with the date information. We don't have anything about the time. So we get it as
(6:34:09) a date instead of date time. So that's it. This is amazing function SQL and
it's very simple and we can use it only for casting. So only to change the data type
from one to another. And we cannot use this function in order to change the format.
So if you are casting you will get always the standard format from
(6:34:26) SQL. So now let's go and compare our functions side by side. So we have
our three functions. cast, convert and format and we can do two things either casting
or formatting. So by the casting for the first function cast we can change any type to
any other type. So there is no restriction at all. The same
(6:34:48) thing for the converts the same thing we can convert anything to anything.
But for the format we can change only to a string. So any data type like a date or
number to a string value because the main thing for the format is not changing the
data type. Now if you are talking about changing the format of the
(6:35:04) values, you cannot use the cast function in order to change the format. So
the cast function is only for casting. It makes sense. Now about the convert, we can
use it in order to change the format of the date and time. But we cannot use it in
order to change the number formats. And for that we have a
(6:35:20) dedicated function called format. So we can use it to change the format of
the date and time and as well the numbers. So those are the main differences
between those three functions. All right. So with those three functions we have
learned how to do formatting and casting on date informations. Now moving
(6:35:35) on to the third group we have the date calculations and here we have two
functions on how to do date calculations or mathematical operations on the dates. If
okay so now we're going to start with the first function the date add. So what is date
add? Date add can allow us to add or subtract a specific time interval
(6:35:55) to or from a date. So let's understand how the date add work. So here
again we have our date August 20th 2025. So now in some scenarios we would like
to add years to our dates. So for example let's say I would like to add three years to
our date. So we can do that using the date ad. So if you do that in the output
(6:36:14) you will get 2028 August 20th only the date part is changed and where we
have added three years but in other scenarios you would like to go and add months.
So for example let's go and add two months to the August. So in the output you will
get 2025 10 20 with that we have added two months and of course we can go and
(6:36:34) add days to our dates. So for example we're going to go and add five days
to our date. So in the output we'll get the same year 2025 the same month August
but only the day will be changed to 25. So we have added five days to the original
dates. And of course we can go and subtract dates even though that the
(6:36:53) function called date add. So for example, we can go and subtract three
years from our dates and we will get So if you do that, you will get 2022 August 20th
or if you go and subtract two months from our dates. So it's going to stay the same
year 2025. But this time instead of August, we will go back to
(6:37:10) June with the same date 20. And the same thing going to happen for the
days if you go and subtract five days. So the same year 2025, the same month
August, but only the days going to be instead of 20, it's going to be 15. So as you
can see with the date ad you can manipulate the years, the month and the days by
(6:37:27) subtracting or adding new intervals. So this is how the date ad works. All
right. So now let's check the syntax of the date ad. And here things little bit more
complicated. We have to provide three informations. The first one is a part. What do
you want to add? Do you want to add years or months or days and
(6:37:44) so on. Then the second one is interval. So it's like how many days? How
many years? How many months? And then the last one is the date. This is the date
that we're going to be manipulating by adding or subtracting intervals. Let's check
the following example. We are saying here date add. So what is the
(6:38:02) part here is a year. That means we want to manipulate only the year parts.
Then the interval here is two. So it is positive. We want to add two years. So it's
going to go to each order and start adding two years for each date value. Now let's
check another example. Here we are saying date add month. So here we
(6:38:20) want to manipulate the month part. But here we are saying minus4 that
means we want to go and subtract four months from each value in the order date. So
as you can see the value of the interval whether it's positive or negative. We are
controlling here the function whether it is subtraction or addition.
(6:38:37) So let's have few examples about the date add using our field order dates.
So for example let's go and add two years for each date. So we can do it like this
date adds. So we are adding years that's why we're going to go with the part year
and how many years we are adding we are adding two years. So this is our
(6:38:54) interval and our field our value is the order date. So now in the output as
you can see we got a date but this date is always 2 years higher than the order date.
So everywhere you have see 2027. Now let's go and add maybe three months for
each date. Just going to go and copy it and say a month. Let's change the
(6:39:14) interval to three and we're going to call it three months later. So now if you
check the output over here we have a new date but now the difference between it
and the order date we have here always three months more than the order dates. So
for example here we have January but in the new one we have April and for the next
(6:39:34) one we have February and in the new field we have May. So as you can
see we are adding months over here. So as you can see we are adding monthses to
our original filled order date. Now let's say that I would like to go and subtract 10
days. So let's go and do the same. So we're going to have the date add. Since
(6:39:52) we are talking about the days, it's going to be the day. We're going to
subtract 10 days. So minus 10 for the order date. So let's call it 10 days before. Let's
go and execute it. Now we got as well a new date. And this date has always 10 days
before the order date. So for example, let's take the order number seven. In the
order date we
(6:40:14) have 15, but in the new column we have five. So we have subtracted 10
days from the original filled order dates. So as you can see it's very simple to add or
subtract days, year, months using the date add. All right. So what is date diff? diff
stands for difference and date diff can going to can allow us to find the
(6:40:38) differences between two dates. All right. So let's understand how the date
diff works in SQL. Now imagine we have two dates. We have the order date 2025
August 20th and the shipping date is the 1st of February in the next year 2026. Now
we might ask the question how many years have passed between the order date
(6:40:58) and the shipping date. So in order to answer this question we can use the
function date diff and we can define the part year. If you do it like this it's going to
subtract those two dates and it going to return one. So the date difference between
those two dates is exactly one year. But now if the question is how many months are
between
(6:41:17) the order date and the shipping dates. So here again we can go and use
the date diff between the order date and the shipping date but we use the part
month. If you do it like this in the output you will get three months. And now of course
if the question is how many days are between the order date and the shipping
(6:41:34) dates. So here we can use the function date diff where we specify the day
inside it and in the output you will get 68. So this is how the date diff works. You go
and subtract two different dates and you will get in the output a number how many
years how many months how many days. So that's it. All right. Now to
(6:41:52) the syntax of the date diff. It accept here as well three parameters. So the
first one is the parts as usual year, month, day. And then here we need two dates,
not only one, we need two. So we need the starting dates and the ending dates. So
that means here we have the youngest dates and the end date going to
(6:42:10) be the oldest dates. So for example, here we have date diff and we are
saying find the differences in years between the order dates. This is the start date
and the shipping dates. So which dates normally happen? First we have to order
something. So we have the order date and once you order what can happen next is
(6:42:28) the shipping date. That's why the shipping date is as an end date. So we
want to find the differences between them in years or of course if you want to find the
differences between them in days we have to go and change the part from year to
day. So as you can see the syntax is very simple and very logical
(6:42:44) right. All right let's have the following simple task and it says calculate the
age of employees. So let's see how we can solve that. So we're going to go and
select first all the informations from employees. So sales and employees. Okay, let's
execute it. Now in the employees, we don't have any informations about the age, but
we have
(6:43:03) the birthday. So we can go and transform this birthday to an age. And of
course, how we calculate the age? We count how many years between this year and
the birthday. So that means we have to go and use two functions the date diff and
the get day in order to have the year of the current year. So that means we have
(6:43:22) to go and use the function date diff. So let's go and do that. I'm going to go
first selecting only few informations. So employee ID and P date. So let's start with
the date diff. So if we are talking about the age we are calculating how many years
that's why we're going to say as a part going to be the year. So
(6:43:39) what is the starting date is the birth date of the person. So it's going to be
the birth date. And now we need the end date. We don't have here anything about
the end date. The end date going to be the current year. So in order to get the
current year, we're going to go with the function get dates. And with that we are
(6:43:55) getting the current date information. And this is exactly what we want. So
let's close it and let's go and call it an age. So it's very simple. We are counting how
many years between the birth dates and the current dates. So let's go and execute it.
So now we are getting the ages. As you can see the
(6:44:12) first person is 33, the second one is 52 and so on. And now you might
getting different values than I'm getting now. And that's maybe you are doing the
course now in 2025 or 2026 and the employees going to be older than now. Now we
are 2024 and I'm getting those ages. So this is how we calculate the
(6:44:30) age using the help of two functions. The date diff and the get date. Okay.
Okay, so now we have another task for the day diff and it says find the average
shipping duration in days for each month. So here we have a lot of informations.
Let's do it step by step. Let's first find out the shipping durations in days. So let's go
and
(6:44:49) select few informations from our table. So select order ID. We have the
order date, ship date and I think that's it. So from sales orders. So let's go ahead and
execute it. So now we have our 10 orders. We have the order date and the shipping
dates. Now we have to go and create a new field called shipping
(6:45:11) duration. So what is the shipping duration? It is the number of days
between the order dates and the shipping dates. So how many days it took from the
order placement until the day of the shipping. So that means we have two dates and
we have to go and find the differences between them. We're going to
(6:45:29) go with the function date diff. So now since we are saying in days we have
to go with the part day. So what is the start date? The start date is the order date.
And what is the end date? It's going to be the shipping dates like this. So I'm going to
call it day to ship like this. Let's go and execute it.
(6:45:48) So now by checking the result for example for the order one it is ordered at
the 1st of January and it is shipped on 5th of January. So between those two dates
we have around 4 days. So four is the shipping duration and if you go to the order
number three the differences between the order date and the shipping
(6:46:06) date we have around 15 days. So with that we have solved this part
shipping duration in days. But now the task says we have to find the average
duration for each month. So that means we have to go and select for example the
month of January and find the average duration. So we have to go and do a simple
(6:46:24) aggregation. We're going to go to the date if at the start and say average.
And we're going to close it over here. And let's go and rename it average shipping.
And now we have to aggregate by the month. So we don't need the whole order
dates. We need the month of the order date. So like this. We don't need
(6:46:42) of course the order ID, but now we need to group up the data using this
dimension, the month order dates. So that's it. Let's go and execute it. So now in the
output you can see we have three months and for each month we have the average
shipping durations in days. So for the first month it is around 7
(6:47:01) days for February is as well 7 days and for March we have less duration 5
days. So with that we have solved the task. As you can see the date diff is very
strong function in order to do data analytics using the dates information. All right.
Right. So now we have the following task and it says find the number of days
(6:47:19) between each order and the previous order. So there's a lot of stuff going
on over here. Let's do it step by step. Let's start by selecting the basic stuff. So select
order ID, order date from the table sales orders. Let's go and execute it. So we have
our 10 orders and we have the current order dates. So
(6:47:40) now we have to find the differences between two dates. order dates, the
current one and the previous order dates. So in our data, we have the current order
dates, but we don't have the previous order date for each order. And in order to
calculate the previous one, do you remember about the window functions? We can
go and use the lag in
(6:47:58) order to access a value from a previous records. So let's go and do that.
The order date, I'm just going to call it current order dates. And let's go and find the
previous order dates. So we're going to go with the lag of the order date because we
are interested in the value of the order date. Now over we
(6:48:17) have to sort the data. So we're going to sort it by the order date as well. So
this is going to help us always to access the previous value of the order date. So
we're going to call it previous order date. Let's go and execute it and let's check the
result. For the first order, we don't have anything previously. So that's why we
(6:48:39) are getting a null. For the second record, the current order date is the 5th
of January and the previous one is the 1st of January. And this value comes from the
previous record, the previous order. Great. Amazing. So with that we have now the
two dates, the current date and the previous one. And now we can go
(6:48:57) very simply finding the number of days between those two dates. And we
can do using the amazing function date diff. So we are interested on the days that's
why it's going to be the day. So what is the starting day? If you check those two
dates, you can see that the previous order date is the starting date. So
(6:49:15) we're going to take the whole thing, the whole window function and put it
over here. So I just moved my picture. So here is the previous order dates. And now
the end date, what's going to be? It's going to be the current order date which is our
order date like this. So again, we are finding the number of days
(6:49:33) between the previous dates and the current dates. So that's it. Let's close
it. So I'm just going to call it number of days. So let's go and execute it. Now of
course we have here null. So we will get as well null in the output. And now you can
check over here how many days between those two dates. We have exactly
(6:49:53) four days. And as well for the next one we have around 5 days, 10 days
and so on. So we have solved the task. We have now the number of days between
each order and the previous order. So this type of analyszis is very important in the
business. We call it time gap analyzes and we have done it using the
(6:50:10) help of the window function and as well the date function date diff. So date
div function is amazing function to do data analyzes. All right. So with those two
functions we have learned how to do mathematical operations on date informations
or we can call it date calculations. Now moving on to the easiest and the last group,
we have the
(6:50:28) date validation. And here we have only one function, the is date. Okay. So
what is is date? So the is date is very simple. It's going to check whether a value is a
date. So it going to return one if the string value is a valid date or zero if it is not a
valid date. Okay. So let's check quickly the syntax of the is date. It's very
(6:50:53) simple. The keyword is date is the function name and it accepts only one
value. So for example you can pass a string like this and you can ask SQL is it a
date. So is date and the value and of course for this example you will get true or one.
So as you can see we are passing here a string value and we are
(6:51:09) validating whether it is good enough to be a date or as well you can go and
specify a number like here 2025. So is this value a date and of course SQL going to
accept it and say yeah this is a year so you will get as well a one. So you can pass
as well a number or integer. So you are just checking the
(6:51:26) values whether they are suitable enough to be a date. So that's all about
the syntax of the is dates. Okay. So now let's have few examples. For example, let's
go and select and we're going to say is date and we will check a value. So let's say
this value is a string 1 2 3. Let's go and call it date. Check one.
(6:51:42) Let's go and execute it. Now in the output it's going to say no, it is not a
date. And that's why we are getting the value zero which is correct because 1 2 3 is
not a date. Let's pick another value. The same thing is dates. And now the value
going to be the following. So 2025 August 20. So let's call it date
(6:52:02) check 2. And let's go and execute it. Now in the output we will get one.
That means the value that we have provided is a date. And that's why we have a one
in the output because ESKL is saying this is a date. Now let's have another example.
We're going to take the whole thing. So this is a check three and
(6:52:22) remove this from here. But I would like to go and change the format. So
let's say that we start with the day then month and then the year. Let's go and check.
Now in the output you can see it is zero because SQL does not understand the
formats. So we are not following the standard format of the database and
(6:52:40) scale and that's why going to say no this is not a date. This is like a string
value. So this means only if the value is following the status format SQL going to
understand this is a date. Now let's go and check another thing for example let's say
is date and let's have only the year. So 2025 and let's give it
(6:52:57) the name date check for let's go and execute it. Now in the output we will
get one. So that means is considering this value as a date. So that means Iskll is
smart enough to understand okay we have provided a year information and is going
to accept it and say okay maybe this is the 1st of January of 2025. Now
(6:53:17) let's go and do the same thing but for the month let's see whether SQL
going to accept it. So check five and we have the month of August. Let's go and
check now going to say no I don't understand this value this is zero. So that mean
this value is provided is not a date. So by checking those results as you can see
(6:53:37) SQL understand only the standard formats and it allow you as well to check
whether a year is a date. So this is how the is date works in SQL. And now you might
ask well when I'm going to do this when I'm going to check whether the value is a
date or not. Let me give you this following scenario. Now imagine
(6:53:52) that we have the following date. So we have four values as a string. And
now if you check the data you can see that we are following the standard format but
only one value has an issue. So we have here data quality problem. So now what we
want to do, we want to go and cast this string value to a date. We don't
(6:54:08) want this to stay as a string value. We would like to have it in the final
result as a date. So what we usually do is that we go and have like subquery on top
of those values. So like this. So now what we're going to do, we're going to go and
say we would like to go and cast the order dates as date. We don't
(6:54:25) want it as a string. And we're going to call it order dates from these values.
So let me just make it like this and let's go and execute it. Now SQL going to give
you an error and say well I cannot convert everything to a date because you have
maybe corrupt data and this is of course because of this row.
(6:54:46) So SQL is not able to convert this string to a date. But of course now the
example is very simple. We know that but if you have a huge table it's going to be
really hard to identify those issues. But now still I would like to go and convert those
value here. I don't want to get an error. And now if there is
(6:55:03) like some values like here that is corrupt and so on this value could be null.
So how we can force SQL to convert the data type from string to date and not give
us this error. And for this we can go and use the help of the function is date. Let me
show you how I usually do it. So let's go and say let's check
(6:55:20) whether the order date is a date. So let's have it like this. And now before
we go and execute, I'm going to make this as a comment because if I execute it like
this, we will get an error. And let's go and get the order date in our select. So let's go
and execute it. Now as you can see in the output, we have
(6:55:38) our string value. So they are not yet a date. And we have the result of our
check. So as you can see the first row, we are getting a zero. So it's saying this value
is not a date. But for all other values, we are getting one. So they are passing the
check and they are dates. So now what we're going to do
(6:55:54) we're going to go and build a logic where we're going to say go and cast
the value from string to date only if the flag or the check is equal to one. So that
means we can go and use the help of the case when statement. Let me show you
how we can do that. So let's do it step by step. We're going to say case win.
(6:56:11) Now we need the check. So is dates the order date. So if the output of this
check is equal to one then you are allowed to do the casting. So let's go and get the
cast as a result of this condition and if it's not equal to one then it could stay as a
null. So let's have it as a null if it didn't pass the
(6:56:32) test. So end and we can call it new order dates. So now let's go and
execute it. Now as you can see we are not getting error from SQL. So now if you
check the output for the invalid dates we are getting a null. So we are not getting an
SQL error. And now only if these string values are a valid dates
(6:56:49) it's allowed to be casted. So that you can go and cast a string value to a
date even though that you have bad data quality and this is very important step in
order to prepare the data before doing analyszis and it help us as well to find data
quality issues. So for example we can go over here and say you
(6:57:06) know what let's go and search for all issues. So we're going to go and take
the is dates. So let's go and get the check and I'm going to say let me see all string
values that are invalid that are failing the test. So let me execute it. And with that we
are getting this record. And now imagine we have a lot of
(6:57:22) data. So it's now it's really easy to identify those issues by just using the S
dates. So this is as well amazing way in order to identify data quality issues. Now of
course you might say you know what I don't want to see here null. Maybe let's get a
dummy value. Well it's very easy. We can go over here and say
(6:57:39) else. So and we can go and get for example very large value something
like this that is easy to identify. So now with that instead of getting nulls inside your
data you can get such a dummy value. So now you understand the use case of the
is dates and why this function is amazing doing data cleanup. All right. So with that
we have
(6:58:02) covered 13 different date and time functions in SQL. So we have learned
how to extract the date parts using seven different functions and we have learned as
well when to use which one. So they are amazing in order to do data aggregations
and as well filtering. And then we have learned how to change the
(6:58:19) date format from one to another and as well how to change the data types.
And then we learned how to do mathematical operations on our dates. So how we
can add or subtract days, years, months from a date or the amazing function the
date diff where we can go and find the differences in days or years between two
(6:58:38) days. And the last one we can go and validate whether the values that we
have are dates or not. So as we learned date functions are amazing functions in
order to do data analyzes and reporting. All right my friends. So with that we have
learned a lot of very important SQL functions and how to manipulate the date
(6:58:55) and time values in your database using SQL. Now in the next section we're
going to start talking about the null functions in order to handle the nulls inside your
tables. So let's go. So what are the nulls? Imagine you are filling out a forum and
there will be usually like fields that are required and another fields that are optional.
So
(6:59:17) what usually happens? We leave those optional fields unanswered. So we
don't provide any values and we leave it empty. And now once we are done filling out
the form and we click on register, the data will be inserted into database tables. So
now what can happen? The fields where you have provided answers
(6:59:35) and values can be filled inside the table while the unanswered fields will
have no value and this is what we call in SQL a null. So in databases a null means
nothing unknown. It is not equal to anything. So it is not equal to zero or empty string
or blank space. A null is simply nothing. It tells us there is
(6:59:55) no value and it is missing. It's like saying I don't know what this value is. So
this is what a null means in SQL. All right friends, so now we're going to do a deep
dive into special SQL functions on how to handle the nulls inside our data. Now in
some scenarios we have nulls inside our tables and we
(7:00:18) would like to go and remove it and replace it with a new value like for
example 40. And in order to do that in scale we have two functions. The first one
called is a null and the second one called coales. But now let's say that we have
another scenario where we have a value inside our table like the 40 and
(7:00:37) we want to go and make it as a null. So now we are doing the exact
opposite. We are replacing the value with a null and for that we have the SQL
function null if. So as you can see with those two scenarios we are replacing stuff.
So from null to value or from value to null. So they are really helpful in
(7:00:55) order to manipulate the data inside our databases. Now moving on to
another scenario where we don't want to manipulate anything. We want just to
check. So we don't want to replace or convert anything. We want just to check in our
database whether we have a null value. And for that we have a function
(7:01:11) called is a null. But between the is and null there is like space. It is different
than the first function. So if you apply is null you're going to get a boolean true or
false. For this scenario you will get true. Or the second option you can go and check
whether the value is not null. So we can use is not null
(7:01:27) and for this example you can get false. So in the output we are getting a
boolean true or false. So those keywords are really amazing in order to check
whether we have nulls inside our data. So this is the big picture of all functions that
we have in SQL in order to handle the nulls. So now let's go and
(7:01:43) understand those functions one by one. So let's start with the first function
is null. Is null going to go and replace a null with a specific value. Now the syntax of
the isnull is very simple. We're going to use the keyword is a null and it accepts two
arguments. First the value and then the second the
(7:02:05) replacement value. So let's have an example. We can go and use the is
null for the column called shipping address. So we are checking the nulls inside it.
And if SQL encounters any null, it going to go and replace it with the value unknown.
So this going to be like a default value for the nulls. So the
(7:02:22) first value is a column and the second value is like static. Always going to
be the unknown if we find any nulls. Now of course in other scenarios we don't want
to have it always like the unknown. We would like to use another column to help the
first one. So let's have this scenario. So now with this syntax we are
(7:02:40) checking the values of the shipping address and if we find any nulls it's
going to get the replacement from the billing address. So here in this example we
have two columns. We don't have here any static value. We will get the values of the
billing address only if the shipping address is null. So we are
(7:02:58) replacing the nulls using the help of other column. And in the first scenario
we are replacing the nulls with a static value the default value. So let's have a very
simple example in order to learn how this works. So what we are doing we are
checking whether the value is null. If it's yes then we're going to go and
(7:03:15) get the value from the replacement and if the value is not null then show
the value itself. So we have the following example. We are going to check the values
from the shipping address and if there is nulls then go replace it with the default
value na. So let's see how going to go and execute this very simple
(7:03:32) example. We have two orders. The first order we are checking the submit
address is the value of this address is null. Well, no. We have a value a. So that's
why it's scale going to go and return the same value. So in the outputs we will get a.
So if it's not null, it's going to return the same value. So now
(7:03:52) it's going to move to the second order and here we have the shipment
address as a null. So what going to happen here? If the value is null, then we going
to get the replacement value. So what is the replacement value is the NA. So that's
why in the output we will not get a null we will get the N A. So if you check the
(7:04:09) result what happens? We're going to get the addresses from the shipping
address but only if we have a null we will get like default value. It's very important to
understand if you are using the default value in the output you will never get a null.
All right. So let's have another example for the second
(7:04:26) scenario where we are not using a default value we are using a column. So
we have a supportive column that's going to be checked. So in this scenario we are
saying is null shipping address and billing address. So we have two columns and of
course the logic going to be the same right. So we are checking only
(7:04:41) once. Let's see how SQL going to execute this example. We have this time
three orders and we have addresses from the shipments and as well from billing. So
now SQL is always focusing on the shipping address since it is the first column. So
we are not checking the billing address at all. So it start with
(7:04:57) the first order. Is it null? Well, no, we have the value A. So, we will get it as
well in the output and SQL will not get anything from the billing address. So, we will
get a. So, that's it for the first order. Now, it's still going to go to the second order. And
this time, we're going to have a null. So, now in
(7:05:14) the rule, we are saying if the shipping address is a null, go get the value
from the billing address. So, this time we're going to go to the replacement, right? So
we will get the value C in the output because the shipping address is the null. Now
let's move to the third row. As you can see here we have again null.
(7:05:32) So SQL going to go and get the value from the billing address. But here in
this scenario the billing address is as well null. That's why we will get the value null in
the output. So as you can see having the replacements values from a column there
is no guarantee that there will be always a value like here
(7:05:51) in the third order it is a null that's why we will get null as well in the output.
So if you think you are using is null to replace all the nulls by having two columns
you might end up as well having a null in the output if the replacement having nulls.
So if you want to make sure you don't get any nulls in
(7:06:09) the output you have to go and use a static value. So this is how SQL
execute the isnull. All right. So what is coales? Coal is going to go and return the first
null value from a list. All right. So now the syntax of the coales is way better than the
is null. Here it accepts like a list of many values. So here for
(7:06:34) example we have value 1 2 3 you can add four five as much as you want.
So we are creating here a list of values to be checked. So for example, we still can
use it like the isnull where we have the shipping address where we replace the null
with a static value the unknown or as we learned we can go and use two
(7:06:53) columns shipping address and the billing address. So so far it's like the
same use cases as the is null but now of course the kalis is not only limited to two
we can go and use three. So we are saying go check the shipping address if it's null
then go check the billing address. If it is as well null then use
(7:07:09) at the end the default value the static one the unknown. So as you can see
we can use more than two values with the coalis. Okay. So now let's understand the
cowless and how this works. Now the workflow is something similar to the isnull. So
in this example we have two columns shipping address and the billing
(7:07:26) address. It's going to consider it as a list and it's going to start checking
from left to right. So it's going to check the first value from the shipping address
whether it's null. If no, it's not null then we're going to go and get the value one. So
we will get the value from the shipping address. And if yes,
(7:07:42) it is null then it's going to go and get the value two. So we're going to get
the value from the shipping address. Now we have the similar data. We have three
orders. Let's see how going to execute it. So it's going to start with the first row and
it's going to focus on the shipping address. So here the value is
(7:07:57) not null. So we have it as an A. So that's why we will get the value one. So
we will get the value from the shipping address and nothing else going to be
checked. Now moving on to the second row. This time the shipping address is null.
So it's going to go and get the value from the second column and it's
(7:08:16) going to be the C. Right? So in the output we will get C. Now to the last
example, we have it as a null and it's going to go and get the value from the second
column and this time we're going to get as well a null like the is null function. So at
the results we are getting exactly the same result as
(7:08:34) isnull. So for this scenario it doesn't matter whether you use isnull or
kowalis. So now of course we are still not happy with that because I don't want to
see any nulls in the output and I will still need to use the billing address instead of
any static values. So I would like to have everything the
(7:08:50) values from the billing address and as well I would like to have at the end a
default value so that I don't have any nulls in the output. So how we going to solve it?
So now we can use the power of the account list where we can include multiple
values in one function. So what we're going to do we're going to have
(7:09:06) the shipping address first then the billing address and at the end we're
going to have the default value. So we have now a list of three values and of course
our workflow going to be a little bit bigger. So again here it's going to start from the
left to the right. So first it's going to go and check the
(7:09:21) value one. If it is null then it's going to go as well checking the value two.
And if the value two is as well null, we will get the last value. It's going to be the value
three. So now let's run the example again using the new kalis. So the first thing we're
going to go and check the first value which is the
(7:09:37) shipping address for the record number one. So now as you can see the
value is not null. So we have here an a. So what going to happen? We're going to
get the value a as well in the output. So that means this one going to be activated
and we will not check anything else. So that means in the output it's going to be
(7:09:54) like this. and the first value is returned and everything else will be ignored.
So, SQL will not check anything. So, as you can see, we are returning the first null
value. So, now let's move to the second order. Now, we're going to check again the
first value. Is it null? Well, yes. As you can see, we have here a null. So, that means
(7:10:13) we're going to go and activate this path over here on the right side. So,
now SQL will not go blindly putting anything from the billing address in the results.
First SQL has to check it. So SQL going to check it whether it's null or not. SQL going
to go and return it as well in the output. And we have activated this
(7:10:31) path. So SQL is returning the value two which is the value from the billing
address. So now let's move to the third order. SQL first going to go and check the
shipping address. Is it null? Well yes it is null. So that's why SQL going to go and
start checking the second value. So this time SQL will not return
(7:10:51) the billing address value since it's null. It's going to go and return the third
value. And what is the third value? It is our static value the NA. So in the output we're
going to get the NA our default value. So with that as you can see in the output we
will not get any nulls. We are using the default
(7:11:11) value and as well multiple columns. So if you check the output, it's always
the first priority to check the values from the first column, the shipping address. If it's
null, then the second priority going to be the billing address. If it's null, then the last
priority, it's going to be the default value. So as you can
(7:11:29) see, SQL is checking the values from left to right and it stops immediately
once it encounters the first not null value and return it in the results. So this is how
the cow works. All right. So now let's have a quick summary about the differences
between the kowalis and isnull. So as we learned the isnull is limited only to two
values
(7:11:51) where the kowalis is amazing where you can have a list of multiple values
which is a great advantage compared to the isnull. Now if you are talking about the
performance the isnull is faster than the kawalis. So if you want to optimize the
performance of your query then go with the isnull. Now there is another
(7:12:09) problem with the isnull is that we have different keywords for different
databases. So for Microsoft SQL server we use the isnull as we learned but in Oracle
they have different implementations they use the NVL and other database like
MySQL you have if null and all those three functions are doing the same but we have
different
(7:12:29) implementations for different databases but in the other hand the cowis it is
available in all different databases. So here we have like an agreement or standards
between the databases of using the kowalis. So here again this is a great advantage
for the kowalis because if you are writing like scripts and
(7:12:45) someday you want to migrate from one database to another. If you are
using the kowalis you don't have to change anything but if you are using the isnull
then you have to go and adjust your queries and scripts with the correct functions.
That's why I tend always to use the kalis and avoid using the isnull. Only if it's really
necessary
(7:13:03) that I have really bad performance, I go and try the isnull. But I usually stick
with the kowalis. So that is my advice for you. Go with the kowalis and stick with the
standard. Now the use cases of the kowalis and the isnull are very similar and we
mainly use them in order to handle the null before doing any SQL task. For example,
(7:13:27) we can use them in order to handle the null before doing data
aggregations. So let's understand what this means. Imagine that we have three
sales. We have 15, 25, and a null. Now if you go and use an aggregate functions like
the average, what's going to happen? SQL going to calculate it like this. 15 + 25
(7:13:45) divided by two and the average is going to be 20. So as you can see here
SQL is including only the two values 15 and 25 and ignores totally the null value. So
in the calculations the null will not be included because if SQL does that the output
going to be as well null. So the nulls are totally ignored. Now the same
(7:14:06) thing can happen with the other aggregate functions like the sum count if
you are counting the sales min and max. There is only one exception about the
aggregate function count. If you are using it with the star, SQL here is considering
not the values. SQL going to consider the rows. That's why SQL going
(7:14:22) to go and include all those rows and find the output going to be three. Now
in some scenarios, if your business understand the null as zero, then you're going to
have a problem with the result of your analyzes if you don't handle the nulls. So what
we have to do? We have to handle the null before doing the
(7:14:38) aggregations. So we have to go and replace a null with zero using either
the isnar or the kowalis. So once you do that the calculation going to be changed for
the average. So it's going to be 15 + 25 + 0 divided by 3 and the output this time
going to be 13.3. So with that you're going to get more accurate
(7:14:57) results for the business if they understand nulls as zero. All right. So now
we have the following example. It says find the average scores for the customers. So
let's go and solve it. So we're going to go and select the customer ID, the score from
table customers. So let's go and execute it. So as you can see, we have four
(7:15:16) customers with score and the last one doesn't have any score. So we have
it as a null. Let's go and calculate the average for the score and I would like to have
the window function in order to see the details as well. So this is average scores. So
let's go and execute it. Now of course what is going on here?
(7:15:35) The four values going to be added to each others and divided by four and
the null is totally ignored. Now of course the question is what the business
understand with the null. If it is zero then we have inaccurate results. So let's go and
fix it. Now this time we're going to say okay we're going to have
(7:15:52) the average but instead of score we're going to handle the nulls first. So we
have to replace any nulls with zero. We can go and use the kowalis or the isnull. So I
will go with the cabalis like this and score if you find any null make it zero. So that's it
and as well I will go with the window function. So
(7:16:12) average scores let's call it two. Now let's go and execute it. Now as you
can see in the output we got 500 and this is different than the previous average and
that's because we have replaced the null with zero. Let's just go and display it in
order to understand it. So I will copy it and put it here. So let's call
(7:16:31) it score two and execute it. So now SQL is going to summarize all those
values and divided by five and that's why we are getting the 500. So if our business
understand the null as a zero this average going to be more accurate after we
handle the null. As you can see in some scenarios we have to handle the
(7:16:50) nulls before doing any data aggregations. All right, moving on to the next
use case for the kowalis and isnull. We can use them in order to handle the nulls
before doing any mathematical operations. So let's understand what this means
using the plus operator. So if you do plus operator between two numbers like 1 + 5,
you are summarizing
(7:17:12) the values and you will get six. And if you do the plus operator between
string values like a + b. So now what we are doing, we are doing data
concatenations and the output going to be a b. So now if you go and replace the one
with a value like zero. So 0 + 5 we will get five. Nothing fancy about that. And for
(7:17:32) the strings if you go and replace a value with an empty string. So there is
zero characters between the two quotes plus the B. So in the output you will get only
B. So it's fine and nothing is critical. But now we come to the problem. If you use a
null if you replace the one with null in the output you will get a null. because you are
(7:17:52) saying okay five plus something that I don't know so SQL says okay you
are summarizing now a value with a no value it is unknown so I don't as well know
what going to be the answer that's why going to say it's going to be null just don't
know what is the answer and the same thing can happen with anything else
(7:18:09) like the string so if you're saying null plus b and here going to say the same
thing the null is unknown and the answer going to be as well unknown so my friends
this is very critical in the analyzes and working with data. So this means we have to
handle the nulls before doing any mathematical operations. And
(7:18:26) this is not only for the plus operator, it's as well for the other operators like
minus and so on. All right. So now let's have the following task. And it says display
the full name of the customers in a single field by merging their first and last names
and add 10 bonus points for each customer's score.
(7:18:43) So let's go and solve it. We're going to select first the basic informations.
Let's get the customer ID. What do we need? the first name, the last name and we
need the scores. So that's it from sales customers. Let's go and execute it. Now the
first task is that we have to generate a new field called full name
(7:19:03) where we have to go and merge or concatenate their first and last names.
So let's go and do that. We need the first name plus and then let's have a space
between the first and last name and then plus let's have the last name as full name.
So let's go and execute it. Now if you check the result for the
(7:19:27) first customer it is working. So we have Joseph Goldenberg. The same
thing for the second customer. But for the third customer we have here a problem.
Customer doesn't have any last name but she has a first name. So we have here a
Mary. So the full name here is completely null which is not correct. For this example
we have at least to
(7:19:46) show the first name Mary even though that the last name is missing. So the
result is not really accurate and that's because we are doing the plus operator
between a null and marry. So that means we have to go and handle the nulls before
doing any plus operator. So again here we can go with the cowless or the
(7:20:04) isnull. So let's go and create a new field using the cowless. So it's going to
be the last name and now we have to define a new value. If it's null so we could have
like something unknown or we could have like an empty string and we can do that
using two quotes and between them there is nothing. So we are using
(7:20:23) an empty string. So let's go and check the results. Last name two. So let's
go and execute it. Now we can see that the last name over here for marry it has an
empty string and it is not anymore a null. So now SQL knows okay this is a string
and there is no characters inside it. So with that SQL knows more
(7:20:46) informations and we can go and now concatenate those informations. So
let's go and do that. We're going to take the whole thing and replace the last name
with the kowalis. So let me just remove this last name over here and execute it. So
now as you can see things looks better. Now we have in the full name for
(7:21:04) mari only the first name. And of course if you don't like it like this you would
like to have another default value. You can go over here and say something like in a
not available. So let's go and execute it. And with that you can see immediately uh
there is here a missing last name. But it doesn't really look
(7:21:20) good. So I will just remove it and go with the empty string. We're going to
go and execute it. So with that we have solved the first part of the task where we
have the full names and we are not missing any informations from the first name and
the last name. Now let's go to the second part of the task where we
(7:21:35) have to add 10 bonus points for each customer score. So we have to go
and add a 10 for each score. So let's go and do it. I'm going to put it at the end. So
score + 10 and let's give it the name score with bonus. So that's it. Let's go and
execute it. So now in the output you can see it's very easy. We have added a
(7:21:56) 10 for each score. So we have increased the score points for each
customer. But now for the last customer Anna you can see over here she doesn't
have a value in the scores and that's why didn't go and added 10. So we will get as
well a null. And of course this might not be fair that the last customer is not
(7:22:14) getting any point even though that we have increased for all others. So that
means we have to go and handle the null by replacing the null to zero. And only after
that we're going to add a plus to it. So let's go and do that. I'm going to add a kalis if
it is null then go and make it zero. And afterward go and add a 10
(7:22:34) points. So let's go and execute it. So now as you can see at the results
everything now is fair where we have a 10 bonus points for each customers even if
the customer doesn't have any values in the scores like here Anna she has like null
but still she is getting a 10 points. So here again as you can see if
(7:22:52) you don't handle the nulls correctly before doing the mathematical
operations you might get unexpected results. So be careful with the nulls and handle
them correctly before adding anything. Okay, moving on to the next use case for the
kowalis and is null. We can use them in order to handle the null before doing
(7:23:13) joins. This is little bit advanced use case but it's very important to
understand it. So let's understand why this is important. Let's have for example two
tables table A and table B. And in some scenarios we have to go and combine those
two tables using the joins. And now in order to join two tables, we have to go and
specify the
(7:23:31) keys between the table A and table B in order to join on it. So in this
example, we have two keys in order to join the tables. Now here comes the special
case. If those keys don't have any nulls inside it and all the data are filled, then your
join going to work perfectly and you will get the expected results.
(7:23:49) And now you might have a special case where there are nulls inside the
keys. So there are missing values and this is a big problem because in the output
you will get unexpected results and some records will be totally missing. So in this
scenario we have to handle the nulls inside the keys before doing the
(7:24:05) joins. Let's have a very simple example in order to understand this
behavior. All right. So now let's have this very simple example where we have two
tables and we want to combine them. So in the first table we have a year type orders
and in the second table we have as well year type and we have sales. So now we
(7:24:21) would like to go and combine those two tables in order to have all
informations in one result. Now we can go of course and use the inner join between
the table one and table two and the keys for the joins here. As you can see we have
the year in both of the tables and as well the type. So we're going to go and use
(7:24:38) both of those columns as a key for the join. So let's do it step by step how
going to execute this. So we need the year type and the results. So it's going to go
and take those two columns to the results and we need the orders and sales. So it's
going to take as well the orders and the sales from the second
(7:24:56) table. So now let's start doing it row by row. So the first key going to be
those two columns. So we have 2024 and the type A. So now it's going to start
searching for those two informations in the second table. And as you can see we
have here a match, right? So the first row is as well matching since it's inner
(7:25:15) join it going to present in the output only the matching rows from left and
right. So in the outputs we're going to get the whole row from the table one and we
will get the sales from the table two. All right. So that's all for the first row. Now let's
move to the second row over here. So what are the values of
(7:25:34) the keys? We have 20 24 and null. So now if you check the matches on the
right side you can see we have a match here right it is logical so it's as well 20 24
and null so everything is matching and we should get it in the result right SQL cannot
go and use the equal operator in order to join tables so even though
(7:25:55) that is logically it makes sense to have it at the output but still SQL cannot
go and compare the nulls that's why this is a problem for this combination SQL will
not find any matching So we will not get any informations for the combination of 2024
and null. So for us of course in the business this is missing
(7:26:15) informations and as well inaccurate results. So we're going to miss this row
and it's still going to go and jump to the third row. So here what are the values of the
key. We have 20 25 and B. Now it's going to go and search it in the second table and
it's still going to find a match over here. So in the
(7:26:32) outputs we're going to get those values. The the orders going to be 50, the
sales 300. Now it's going to go to the last row and we have here again the same
problem. We have here 2025 and null. And of course if you check the data you will
say yes we have a matching over here but SQL would ignore it. So we have exactly
(7:26:52) the same situation and we will not find it at the results. So at the output we
will get only two rows even though that those two tables are like identicals if you
compare the keys. So with that we are losing data at the results and we are
providing inaccurate results. So my friends if you have nulls inside your
(7:27:10) keys what can happen you will lose records at the output. So here it's very
important to handle the nulls inside the keys before doing the joins. All right so now
in order to fix it we're going to go and use either the kalis or the isnull in the join. So
as you can see we are not using the type directly. We are
(7:27:29) handling it by replacing the null with an empty string. It doesn't matter
which value you are using. The main thing is that you have a value and SQL can go
and map it. So you could have it as empty string or a blank or any default value. But I
usually go with the empty string since it's little bit faster than having
(7:27:47) any other characters. So now what going to happen is we're going to go
everywhere and replace those nulls with an empty string. So now we don't have any
nulls inside our keys and let's go and see what can happen. So we're going to start
with the first row again. Here we have a matching from the right table
(7:28:04) and we're going to see the whole records in the outputs. So we will get as
well the sales as 100. And now it's going to go to the second row over here. So this
time we don't have a null. We have 2024 and an empty string. So now it's going to
go and search for a match and it's going to find it over here. we have as
(7:28:24) well 2024 and an empty string. So now what can happen in the outputs
we're going to get a 204 but here we will get a null. So we will not get an empty string
we will get a null over here and that's because we are handling the null only on the
join. So as you can see we have here the is null type on the join but we don't have
(7:28:44) it on the select. So in the select the type going to be like the original data
and the original data was a null. We are just handling the null in the joints just in
order to let SQL understand how to map and match the data. So in this example, I'm
not changing the values in the select. So that's why we will get
(7:29:02) the original value. But the orders we will get it 40 and the sales going to be
20. Now moving on to the third row. I think you already get it. So let's going to find
the match and the sales going to be 300. All right. Now we're going to move to the
last one. And here we have the same scenario. So we have 2025 and
(7:29:20) an empty string. So it's not null anymore. And SQL going to go and search
for all those informations and it's going to find it over here. So SQL going to take this
fields over here in the type in null not an empty string because in the select we didn't
handle it. So the order going to be 60 and the sales
(7:29:39) going to be 200. So as you can see now the result is complete. We
successfully combined both of those tables in one big results using joins but as well
using the help of the isnull function in order to have a complete results and not miss
any value. So my friends be very careful check always the keys whether they have
(7:30:00) nulls or not and if you find nulls go immediately and handle it so you don't
lose any records in the results and you get accurate analyzes. All right, moving on to
the next use case for the isnull. We can use it in order to handle the nulls before
sorting the data. So imagine we have the following sales 15 25 and null. Now if
(7:30:24) you go and sort the data by the sales ascending from the lowest to the
highest what can happen? SQL going to show the nulls at the start and that is not
because the null is the lowest value because null has no value. But SQL show it like
this. it's going to place it at the start and then below it we're going
(7:30:43) to have the lowest value. So it is the 15 and at the end we're going to have
the 25. Now if you are doing the exact opposite where we are sorting the data from
the highest to the lowest using descending. So what going to happen is going to sort
it like this. We're going to have 25 then 15 and the last thing
(7:31:00) that going to appear in the list going to be the null. So here SQL is showing
the nulls at the end and that is again not because nulls are the lowest value it has no
value but SQL do it like this show it at the end. So this is how SQL deals with the
nulls if you are sorting the data. So in order to understand this
(7:31:19) use case let's have the following task. So the task says sort the customers
from the lowest to the highest scores with nulls appearing last. All right. So let's solve
it. This going to be very interesting one. So we need the customer informations. So
let's go and select and we need the customer ID and the scores
(7:31:37) from sales customers and let's go and execute it. So we have a simple list
of all customers and their scores. But now we have to go and sort the data from the
lowest to the highest. So we're going to go and use the order by clause and we need
the field score. And since it's lowest to the highest that means we need
(7:31:57) to have the ascending and in SQL it is a default. So we don't have to go
and mention it. So let's go and execute it. So now as you can see in the results it
start from the lowest to the highest and the first part of our task is solved. But now of
course we have an issue right because we have a null and as we learned
(7:32:13) SQL going to put it at the first place on the list. But the task says with nulls
appearing last. So we really don't want to see the nulls at the start. We don't worry
about it. So we would like to have it at the end of the list. So that means we have to
go and handle the nulls before sorting the data. And here
(7:32:30) we have two ways to do it. One way that is lazy and the other one is more
professional. So let me show you first the lazy way. We're going to go and replace
the null with a very big number. So for example, what we're going to do, we're going
to go and use the kowalis and we're going to say okay score and
(7:32:48) then let's have a lot of number so that we have a really big score. I just
want to select it in order to see the results. So as you can see it's a very big number
here. So if you take this and replace the order by with the new score. So that's it.
Let's go and execute it. So now if you check the results we have
(7:33:07) already solved the task. We have listed all the customers from the highest
to the lowest and the nulls are at the end. So now the question why do we call this
lazy or not professional and that's because we are defining a static value. And of
course for this example it is working but we don't know later what's
(7:33:22) going to happen. Maybe things change where in this course you're going to
get a higher value than this and then sorting the data will make no sense since the
null going to be like in between values. So who knows your value might be a real
value inside the data. Now let me show you the other way which
(7:33:38) is more professional in order to solve this task where we don't play with
luck at all. So let's go and do that. Let me just move this little bit here. I'm going to go
and create a new logic where we're going to say case when if the score is null then
what's going to happen we want the value one otherwise
(7:33:54) the value going to be zero so end so we are just creating a flag with zero
and one if the score is null then we're going to get the flag of one if we have a value
for the score we will get zero so let's have it like this and I will just go and get rid of
this kalis so let's go and execute it Now if you check
(7:34:16) our new nice flag you can see we have zeros everywhere where we have a
value in the score but only once we have a null we will get the flag of one. So now
once we got this what we're going to do we're going to go and sort our data based on
this flag and the score even though the task is not mentioning
(7:34:33) anything about the flag but we are using it in order to force the nulls to be
at the end of the result. Let me show you how we're going to do that. So let me just
remove all this. So first we want to sort the data by our new flag in order to make
sure that the nulls at the end. So we're going to have our flag and
(7:34:50) then afterward we sort the data by the score. So let's go and have the
score. So again what we are doing first sort the data by the flag in order to push the
nulls at the end. And now once all those values are equal to each others what's
going to happen SQL going to go and sort the data by the score. So SQL
(7:35:08) going to use the scores in order to sort the data and both of them are
ascending. Let's go and execute it. Now as you can see we're going to get exactly
same results. The values from the lowest to the highest and the nulls are at the end.
And as you can see with the order by we didn't use any static values or
(7:35:25) any big numbers. And of course we don't need the flag at the select. So we
can go and remove it. So let's execute it. And with that we have solved the task. So
as you can see we can use those nice functions like the cowis or the isnull in order to
handle the nulls before sorting your data. So what is the function null if
(7:35:46) null if going to go and compare two values and it going to returns a null if
they are equal otherwise if they are not equal it going to returns the first value. Okay.
Okay. So now the syntax of the null if it accepts only two values value one and value
two. So here again of course you can go and use a column
(7:36:06) with a static value like the unknown. So we are comparing the values
between a column and a static value or you can go and compare two columns the
shipping address and the billing address. So again here it accepts only two values.
We cannot have it like the kalis where we have a list of multiple values. All
(7:36:23) right. So now let's understand exactly what do we mean with the null if. So
the workflow going to be like this. SQL going to go and check two values the value
one and the value two. And if they are equal then SQL going to go and return a null.
But if the two values are not equal going to go and return the
(7:36:39) first value. So it is the one on the left side. So by checking the outcomes
here we will never have a scenario where we're going to get the second value. That
means the second value always used as a check. So we are checking against this
value. So either we're going to get the value one or a null. Let's have this
(7:36:56) very simple example. We are saying null if price and we are checking
whether it's equal to minus1. So we are saying if the price is equal to minus1 then go
and replace it with a null because it is data quality issue that we have a price that is
negative. It makes no sense for our business. And if it is minus1 then
(7:37:14) it means for us a null. We don't know the price of this product. So we will
correct it using the null if. Let's check this very simple example. We have two orders.
So SQL going to start with the first order and check the first value. So what is the first
value? Is the price. So here we have a 90. SQL
(7:37:29) going to go and check is 90 equal to minus one. Well, no. That means it's
going to go and execute this path. So that means in the output we will get the first
value which is 90. So in the output we will get a 90. Now let's move to the second
order. Here we have a minus one. So SQL going to check is minus one here equal to
the minus one
(7:37:51) that we have in the null if well yes. So that means SQL going to go and
execute this path where we were going to get the null value in the output and we're
going to get it like this. So now if you compare the result from null if and the price you
can see we don't have any more the minus one. And as you can see now we
(7:38:08) are doing exactly the opposite as kowalis and is null. We are replacing a
real value with a null. Now moving on to the second example and this is very
interesting one in the analytics where we can go and use two columns inside the null
if. So in this example we are saying null if original price and discount price. So SQL
have to go and
(7:38:27) compare the prices between those two columns and if they are equal it
should return a null. And now you might say okay in this example why we are doing
this? Well we can use it in order to highlight or flag special cases inside our data.
And the special case here is if the original price is equal to the
(7:38:44) discount price and if those two prices are equals that means we have an
issue in our program or something like went wrong as we are inserting data. So let's
see what's going to happen for the first row we're going to go and compare the 150
from the original price with the discount price. So they are not equal
(7:39:01) right. So that means going to go and return the original price the 150 in the
output. So let's move to the second order. Here we have the original price 250 and
as well the discount price is 250. So they are equal and if they are equal then we will
get a null in the output. So as you can see again here we
(7:39:21) are not getting any values from the discount. We are using it only for a
check. So with that we have a quick flag like using the nulls as flag in order to
identify where we have equal values. So this is how the null if works. All right friends,
here we have a very nice use case for the null if and that
(7:39:41) is preventing the error of dividing by zero. Let's see what this means. Okay,
let's have the following task and it says find the sales price for each order by dividing
the sales by quantity. So let's go and solve it. This should be very easy. So we need
the order ID. We need the sales and the quantity from sales orders. Let's go and
(7:40:05) execute it. So now we have 10 orders. Those are the sales and the
quantity. So now it's very easy to calculate the price. It's going to be the sales divided
by quantity and we're going to call it price. So let's go and execute it. Now as you
can see we got an error says divide by zero error encountered.
(7:40:26) So that means somewhere we have a zero for the quantity and this is a
problem. Let's go and check the data again. So I'm just going to comment the whole
thing and let's go and execute it. So now by checking the result yes we got for the
order ID 10 here we have quantity zero. So it will not work if you divide by zero of
course. So how we
(7:40:46) can solve it? We can use the magic of the null if where we're going to go
and replace the zero with a null. So getting a null is way better than getting an error.
Right? So let's go and do that. I'm just going to remove the comments. And here
we're going to say null if if the quantity equal to the zero value. So
(7:41:04) that's it. Let's go and execute it. Now as you can see it is working. And with
that we are making sure that we are not dividing by zero. And that's because we
replace it with a null. And if you divide anything by null you will get a null. So if you
check the result over here the order 10 we got the price of
(7:41:23) null which is correct and for the all other values everything is working
because we have values and we didn't replace it with a null that's why we have
values for the price and this is very common use case for the null if we can use it in
order to prevent dividing by zero. All right so what is is null? It's
(7:41:44) going to return true if the value is null. So it is checking the value if it's null
it's going to return true otherwise it's going to returns a false. Now the exact opposite
if you go use the is not null. So if you use these keywords it's going to returns a true
if the value is not null otherwise if it is
(7:42:04) null it's going to go and return a false. Okay. So the syntax for that is very
simple. It start with a value or expression and then after that we're going to have the
keyword is space null and the is not is exactly the same. So we have a value then
afterwards we have the is not null. So we have the not
(7:42:22) operator after that and the is not is exactly the same. So we have a value
then we have the is not the not operator then the null. So it's very simple. Let's have
an example. We are checking whether the values of the shipping address is null. So
we can have it like this. Shipping address is null or we can
(7:42:39) check the opposite whether it's not null. So the shipping address is not null.
It's very easy. Okay. So now let's understand how this works. we are checking the
value. So if the value is null then return a true if it is not null then we return a false.
So as you can see it never returns the value itself or any nulls. So we are getting a
(7:42:59) boolean of true and false. So we are creating like a boolean flag in order to
assist us with the checks. So we have this very simple example price is null and we
have those two rows. So we are checking whether the price is null in the first order it
is not null right that's why we will get a false in the
(7:43:17) output and the second order the value is null so it is correct that's why we
will get true now of course if we go and use the is not null is going to be exact
opposite so is the price not null well yes it's not null that's why you will get a true over
here so now for the second check it is null right so the
(7:43:37) output going to be false we will get the exact opposite. So that's it. It's very
simple how the isnull and is not null works. All right. One very obvious use case for is
null and is not null is by searching for missing informations or searching for nulls.
And maybe after that we can go and clean up our data by
(7:43:59) removing the nulls from our data set. Let's have the following task and it
says identify the customers who has no scores. All right, let's go and solve it. This is
very simple. So let's start by selecting star from sales customers. So we need
everything. Let's go and execute it. Now as you can see we have
(7:44:19) our five customers. But the task says we have to have all the customers
who have no score. So that means the result should return only the last record since
the score of Anna is null. So let's go and have a wear clause. So where and now
what do we need? We need the score. Then we don't use the equal, we use is null
(7:44:38) like this. So that's it. Let's go and execute it. And with that, as you can see,
it's very simple. We have filtered our data and now we can see all the customers
where the score is null. This is a very basic check to understand whether our data
contains nulls. All right, moving on to the next task and it
(7:44:56) says show a list of all customers who have scores. So back to our
example, this time we're going to do exactly the opposite. We want a list of all
customers where we have a value in the scores. So what we're going to do, we're
going to say where score is not null. So if you go and execute it, you can see
(7:45:14) we're going to get a clean list where all the customers have score. And with
that, we get rid of all nulls inside the score field. And maybe this is helpful in order to
do further analyzes. All right friends, now we come to very interesting use case for
the isnull and that is by introducing a new type of
(7:45:34) joints between tables that's going to help us to find the unmatching rows
between two tables. Let's have a quick recap about the joints in SQL in order to
understand the new types. So basically we have two sets or let's say two tables the
left and the right. And if you go and use an inner join what we
(7:45:53) are doing here we are finding only the matching rows between the left table
and the right table. So at the result we will get only the matching rows. Now we have
another type of joints called lift outer join. And if you use this type at the result you
will get all the rows from the left table and as well only the
(7:46:11) matching rows from the right table. Now we have another type which is
exactly the opposite the right join. And here we're going to get all the rows from the
right table and only the matching informations from the left table. And now to the last
type that we learned. We have the full join where we will get all
(7:46:29) the rows from the left and as well all the rows from the right. So we will not
be missing anything. So those are the four basic joints that we have learned in SQL.
But in SQL we have as well other types that are more advanced. But we don't have
in SQL any keywords for that. So the first one called lift anti-join.
(7:46:47) So what we are saying here we need all the rows from the left table but this
time without the matching rows. So all the informations that are matching with the
right table we don't want to see it at the results. And as I said we don't have here an
extra keyword for this type of join. But in order to get this effect
(7:47:07) we're going to go and combine the left join together with the isnull. And with
that we're going to get all the data from the left side but without anything that is
matching the right side. And this we call it left anti- join. And we have another
advanced type for the joints called the right anti- join. This
(7:47:24) is exactly the opposite. So we are saying all the rows from the right table
without having any matching rows from the left table. So all the informations on the
right side that is not matching the left side. So again here we don't have a keyword
for that. We're going to go and work with the right join plus and
(7:47:41) is null. So with that, as you can see, we have two new types of joins added
to our four basic joins. Now this might be confusing. Let's have the following task in
order to understand it. Show a list of all details for customers who have not placed
any orders. All right. So let's see how we can create the effect
(7:47:59) of the left anti-join. So let's do it step by step. We need here two tables. We
need the customers and as well the orders. So since we are focusing on the
customers, the lift table going to be the customers. So let's go and do that. We're
going to go and say select star from sales customers. This is our first
(7:48:18) table. So we are using the alias of C. So let's go and execute it. Now as
you can see we got the list of all customers. So that we have all the details for our
customers. But now we have to go and join it with the orders. So in order to do that
let's have a new line. left join sales sales orders and let's have the
(7:48:40) lso and now we have to go and define the key for the join so on it's going to
be the customer ID equal the customer ID in the order table so now if you go and
execute it now what we're going to do we're going to go and show the order ID from
the table orders so order ID just to see whether we have a match or not so
(7:49:03) let's have it like this and execute it Now let's go and check the results. As
you can see those four columns comes from the table customers and only the last
column come from the orders. So now what is interesting is to check the order ID
whether we have nulls or not. So as you can see for the customer one
(7:49:21) we have everything matching. For the customer two as well we have
orders the three as well for only the last one customer ID 5 we have here a null. So
that means SQL was not able to find any order for this customer. So again what this
means we have only one customer Anna where she doesn't have any order
(7:49:41) but all other customers they did have an order and that's because we have
values from the right table. So once we have values that means we have matching
but since here we have a null that means we don't have any matching. So now since
the left anti- joint says we would like to have all the data from the left table
(7:50:02) without having any matching from the right table. So that means for this
example we would like to get only this customer Anna. And this is exactly as well
fulfilling our task. The task says list all details for customers who have not placed any
order. All data from customers where we don't have matching
(7:50:21) from the orders. Now I think you already got it how to get this effect. We're
going to go and filter the data like the following. So we're going to have the wear
clause and now we need the column from the right table from the orders. So we're
going to go with the customer ID comes from the orders. So we're going to
(7:50:39) say oh customer ID is null. And of course you can go with the order ID as
well. You're going to get the same effect. But I would like always to use the key that
we are using with the join. So let's go and execute it. And now as you can see we got
the effect of the left anti join and with that as you can
(7:50:58) see we got the customer that we are aiming for. So here we have the data
from the left side that is not matching the right side. So the customers who have not
placed an order and with that we have solved the task. So as you can see we have
implemented the left and join by combining the left join together
(7:51:13) with the is null. So this is the power of playing with the nulls in SQL. Now
my friends, there is something that is really confuses a lot of developers or anyone
that is working with data in databases and SQL and that is the differences between
nulls, empty string and blank spaces. So the nulls as we
(7:51:39) learned we are saying I don't know what the value is it is unknown. But now
in the other hand the empty string you are saying I know the value it is nothing. So
the empty string is a string value which has a zero characters. This is totally different
than the nulls. The nulls we don't know anything about it.
(7:51:59) So now sometimes maybe happens to you as you are filling a forum and
you come to one field you go and by mistake hit a space bar and with that you are
entering space into the field and you just jump to the next field without entering any
other values. So we have now like a space character inside the field. This
(7:52:17) is really evil in databases because once the user enter a blank space, it's
going to go and store it as a value inside the database and it's going to take storage.
So it could be one space or many spaces depends on how long you press the space
bar. So the blank space is a string but the size is not zero like the empty
(7:52:36) string. We're going to have a size of how many spaces you have entered.
So here it's not like the null. We know the value it is string and the character of that
going to be space. Okay. So let's see those three scenarios inside scale. Now I have
like a dummy data using the city statements. Don't worry about it.
(7:52:52) I'm going to teach you all those stuff in the next tutorials. So now we have
here like four rows. The first one with a value a. The next one with null. The third one
with empty string. So as you can see there is nothing between those two quotes. And
the last one we have a space between those two quotes. Now
(7:53:09) let's go and query this temporal table. So select star from orders and
execute. So now by looking to the values of the categories you can find all the
scenarios now. So now the first scenario is the easiest one where we have a normal
value. We have here an a. But the other three rows we don't have normal
(7:53:29) values. We have like empty stuff. So the first one going to be the null. So
we don't have a value. This is the special marker from SQL. It says null. So there is
no value. And the other two they are really confusing. As you can see it's really hard
by just looking to the data or to the results whether it is an empty
(7:53:47) string or a blank space. And this confus a lot of developers or anyone
working with data seeing those results. It's really hard to detect the data quality
issues by just looking at the results. So now in this scenario what I do I go and
calculate the length of each value inside my column. So let's go and do
(7:54:06) that. Now we're going to go in the SQL server. We're going to go and use
the function data length and our field going to be the category. So let's call it category
length. So let's go execute it. And now let's check the result. The first one since we
have only one character, the length of that is going
(7:54:25) to be one which is correct. And now to the next row we have the category
null. We don't know the value and as well we don't know the length of the value,
right? So that's why we will get a null. So now by moving to the next one as you can
see those two looking really exactly the same. But now with the help of the
(7:54:40) length or the data length function we can see that the third row or the third
category value has the length of zero. That means it is an empty string and we don't
have any characters over here that is hidden. So with that we are sure this is an
empty string. But now let's move to the last one. Here it is very tricky
(7:55:00) and evil. we have a hidden space inside this value and we can understand
that by the length of this field. So as you can see we have here a one that means we
have here one hidden space inside this value and it is not empty string. So that
means I have here only one space let's go and give it another space and
(7:55:20) calculate the length. So as you can see we have two spaces and that's why
the length is two. So don't count on your eyes in order to understand the spaces. go
and calculate the length in order to be very precise. So now let's go and compare the
three scenarios side by side. So let's start with the first one
(7:55:37) about the representations in the table. The null we're going to see it as a
null inside the table. The empty string going to be like two quotes and nothing
between them. And the blank space it's as well two quotes and between them one or
many spaces. And now if you are talking about the meaning the null means
(7:55:54) unknown. We don't know the value. The empty string it is known but it is
nothing it is empty value. And the third one blank spaces it is as well known and the
spaces are the value. And now if you are talking about the data types since the null
is no value. So we don't have a data type for this and it is like a
(7:56:14) special marker in the SQL. And now the empty string has a data type. It is
a string and the size of this string going to be zero since we have zero characters
inside the empty string. Moving on to the blank spaces, it is a string since a space is
a character and it's going to be the size of one or many. And now if
(7:56:32) we are talking about the storage, the null is the best. They don't consume
or occupy a lot of storage. While the empty string and the blank spaces, they occupy
here storage and memory and they waste the space. So if you are worried about the
storage, the best option here is a null. Now talking about the performance,
(7:56:50) you will get the best performance if you are using nulls. Now the empty
string is as well fast but it is not that fast like the nulls. Now the worst option here is
the blank spaces it is slow. So again if the speed is important for you you have to
have those scenarios as a null. So now if you are talking about
(7:57:08) the comparison and you are searching for those values if you want to
search for the null you have to go and use is null. But in the other hand if you want to
search for the empty string and the blank spaces you have to go and use the
operator equal. So that's all those are the main differences between the null
(7:57:24) empty string and blank spaces. Now you might ask you know what why do
I have to understand the differences between all those stuff the nulls empty strings
and the blanks everything's like empty so why do I care well in new projects I'm
going to promise you that you will be working with sources and data that has bad
data quality and you
(7:57:46) might encounter all those three scenarios in your data and now if you don't
do any data preparations like cleaning up the data handling those three scenarios
and bringing standards to your data and you jump immediately to the analyzes
without doing all those stuff, you will end up providing inaccurate results in your
reports and
(7:58:04) analyzes which leads to wrong decisions. So preparing your data before
doing any analyszis by cleaning up the data, handling those three scenarios and as
well bringing standards is very important step before doing any analyszis. So this is
how we're going to do it together with the stakeholders and the users of your reports
and analyzes.
(7:58:25) You have to define a clear data policies. It's like rules and you have to
commit yourself during the implementations by following those rules. And here we
have three different options. The first one you can go and define the data policies like
this. Only use nulls and empty string but avoid using blank spaces. In my project I
(7:58:45) cannot imagine that there is a scenario where we need blank spaces. They
are just evil. Just go get rid of them. All right. Right. So with this policy, we have to go
and get rid of all blank spaces inside our data. And in order to do that, we have a
wonderful function in SQL called trim. The trim function in
(7:59:02) SQL going to go and remove the spaces from a string from the left side and
as well from the right side. So all the leading spaces and the trailing spaces going to
be removed. So now if we go and apply the trim function on that category, what's
going to happen? All the blank spaces going to be removed and
(7:59:19) it going to be turned into empty string. So let's go and do that. It's very
simple. So we're going to use the trim function and we're going to apply it on the
category. Let's go and call it policy one. So let's go and execute it. So now by just
comparing the policy one with the category. You see like it's
(7:59:39) identical but it's not. Now in order to have a better feeling about this we can
go and test it using the data length. Now let's go again and use the data length
function. So we're going to use it for the whole results and as well I'm going to go
and use it for the category in order to just compare it. So without
(7:59:56) the trim so like this. Let's go and execute it. Now if you go and check the
result as you can see here again we have the length of two because here we have
two spaces but with the policy one we have zero. So those two values after applying
the trim function they have the length of zero and with that we don't have
(8:00:20) blank spaces. So that means now we are sure after applying the trim we
have either a null or empty string. So let me just get rid of all those informations. Now
I am sure both of them are empty string. So as you can see it's very simple using
only one SQL function you are cleaning up the data and bringing
(8:00:39) standards. All right moving on to the option two. You can define your data
policies like this. Only use nulls and avoid both empty strings and as well blank
spaces. So that means in our business we don't have anything meaningful for the
empty string and the blank spaces. We can go and use only the nulls. Okay. So now
let's go and
(8:00:58) implement this rule. We have to go and convert a value to a null. So the
value going to be empty string to a null. And as we learned we can go and use the
null if function in order to get nulls instead of values. So let's go and apply this policy.
But now here we have two values the empty string and spaces. Now
(8:01:15) instead of having two rules for that I'm going to convert first the blank
spaces to an empty string like we have done here. So I'm going to take the result of
this function first as a first step and afterwards we're going to go and use the null if.
So we're going to say null if for the result of the trim if if you
(8:01:37) find any empty strings convert it to null. So that's it policy 2. So as you can
see in the result we have converted those empty spaces and planks to a null. So
with that we are getting three nulls and of course we're going to get the value a. And
now if you compare those three columns side by side you're going
(8:01:56) to see the bully C2 is really easier to understand compared to the previous
ones. Right? So now if you compare the policy two now to the policy one, you can
see it's easier to understand and it's easier as well to handle. So again it's very easy
to do data cleanup with only two functions we have now like
(8:02:12) standards inside our data. And now moving on to the last option, we can
define our data policy like this. Use only a default value unknown and avoid using
anything else like nulls, empty strings and blank spaces. So that means in the
analyzes and reports we want to see the value unknown and we have to
(8:02:31) handle all those three informations and convert them to unknown. So now
in order to implement the policy three we have to go and convert a null with a value a
default value and here we have two options either use the is null or we can go and
use the kalis and I will go with the kowalis so kowalis and I'm going to use directly
(8:02:51) the category. So if you find any null replace it with the default value
unknown and let's call it policy 3. So let's go and execute it. So now if you check the
result over here you see that we got it only once correct. So we replaced the null with
the unknown but we still have like empty spaces and blanks and that's because we
rushed
(8:03:12) using the qualis and we skipped the other steps. So as you can see
preparing the data you have to do it slowly step by step. So first we have to go and
convert everything to a null like the policy 2. And after that the last step we're going
to go and use the default value. So that means instead of using
(8:03:28) the category we have to go and get the result of the policy 2. So let's go
and copy it and replace the category with those two steps and let's go and execute it.
So now as you can see we have the default value for all those three scenarios. First
we have to trim the data in order to remove all the blank
(8:03:46) spaces. The second step, we're going to go and replace all the empty
strings with a null. And with that, we're going to get a null for all those three
scenarios. And finally, we're going to go and replace the nulls with a default value,
the unknown. So, that's it for the three policies. And this is the
(8:04:04) different ways in order to clean up the data and bring standards before
doing analyszis. And now you might ask me, okay, which one should I use in my
project? Like if I want to suggest something for the users, which one should I use?
Well, it really depends on the business, but I tried always to avoid this one, the policy
one, because
(8:04:21) it's always confusing and I have always explained for the users. So now we
are left with the two and three. Well, I use both of them in different scenarios. I
normally go with the policy 2 because it takes less storage and as well the
performance of your queries afterward going to be really good. So if I'm doing
(8:04:38) data preparations in my ETL before inserting it inside a table, I go with the
policy too. But in other hand, if I'm doing a preparation step before showing it in a
report like in Tableau or PowerBI. So if it is like one of the last steps before showing
the data to the users, I go with the policy 3 because if you present a null inside a
(8:04:59) report, it's going to be really hard to read. So having like a word like
unknown, it's easier to understand. Okay, we have here missing data. So again if the
data preparations is exactly before I present the data to the users I go with the policy
3 where I use default values but if I'm using a data preparations before inserting it in
the
(8:05:19) database I go with the policy 2 because it's going to optimize the storage
and it's going to be really bad if you go with the policy 3 because it's really bad to
store the whole world each time there is no value like the unknown. it's gonna take a
lot of space and as well you're going to get bad performance as
(8:05:36) you are building the queries. That's why I tend to store the data using nulls.
If you present it to the users go and show it as a default value. So as you can see it's
very important to understand the differences between the nulls empty strings and
blanks and how to prepare the data by cleaning up the data and
(8:05:51) bringing standards and policies before doing