Python Map
Python Map
data2,
Query
how='left',
on='X1')
data2,
on='X1')
columns='Type',
DataFrame columns={"Country":"cntry",
how='inner',
"Capital":"cptl",
on='X1')
"Population":"ppltn"})
>>> [Link](data1,
Reindexing data2,
how='outer',
on='X1')
Pivot Table >>> s2 = [Link](['a','c','d','e','b'])
columns values='Value',
>>> [Link](range(4),
>>> s3 = [Link](range(5),
index='Date',
method='ffill') method='bfill') Join
columns='Type']) Country Capital Population
0 3
[Link]([5,4,3])]
Horizontal/Vertical
>>> df5 = [Link]([Link](3, 2), index=arrays)
names=['first', 'second'])
> Dates
> Duplicate Data
>>> [Link](df2, #Gather columns into rows
id_vars=["Date"],
value_vars=["Type", "Value"],
>>> df2['Date']= pd.to_datetime(df2['Date'])
>>> [Link](level=0).sum()
Learn Data Visualization online at [Link] One of the most common ways to
show part to whole data. It is also
The donut pie chart is a variant of the
pie chart, the difference being it has a
Heatmaps are two-dimensional charts
that use color shading to represent
Best to compare subcategories within
categorical data. Can also be used to
2D rectangles whose size is
proportional to the value being
commonly used with percentages hole in the center for readability data trends. compare percentages measured and can be used to display
hierarchically structured data
Use cases Use cases Use cases Use cases Use cases
> Capture a trend > Visualize a single value > Capture distributions
Line chart Multi-line chart Area chart Stacked area chart Spline chart Card Table chart Gauge chart Histogram Box plot Violin plot Density plot
$7.47M
Total Sales
Cards are great for showing Best to be used on small This chart is often used in Shows the distribution of a Shows the distribution of a A variation of the box plot.
Visualizes a distribution by
The most straightforward way to Captures multiple numeric Shows how a numeric value Most commonly used variation of Smoothened version of a line chart. and tracking KPIs in datasets, it displays tabular executive dashboard reports variable. It converts variable using 5 key It also shows the full using smoothing to allow
capture how a numeric variable is variables over time. It can include progresses by shading the area area charts, the best use is to track It differs in that data points are dashboards or presentations data in a table
to show relevant KPIs numerical data into bins as summary statistics— distribution of the data smoother distributions and
changing over time multiple axes allowing comparison between line and the x-axis the breakdown of a numeric value connected with smoothed curves columns. The x-axis shows minimum, first quartile, alongside summary statistics better capture the
of different units and scale ranges by subgroups to account for missing values, as the range, and the y-axis median, third quartile, and distribution shape of the data
opposed to straight lines represents the frequency maximum
Use cases Use cases Use cases Use cases Use cases Use cases Use cases Use cases Use cases Use cases Use cases Use cases
Revenue in $ over tim Apple vs Amazon stocks Total sales over tim Active users over time by Electricity consumption over Revenue to date on a Account executive NPS score Distribution of salaries in Gas efficiency of vehicle Time spent in restaurants Distribution of price of
Energy consumption in kWh over tim Active users over time segmen tim sales dashboar leaderboard Revenue to target an organizatio Time spent reading across across age group hotel listing
over tim Lebron vs Steph Curry Total revenue over time by CO2 emissions over time Total sign-ups after a Registrations per webinar Distribution of height in readers Length of pill effects by Comparing NPS scores by
Google searches over time searches over tim country promotion one cohort dose customer segment
Bitcoin vs Ethereum price
over time
Data Analyst
Science
Engineer
One of the easiest charts to Also known as a vertical bar Most commonly used chart A hybrid between a scatter Often used to visualize data A convenient visualization for Useful for representing flows in Useful for presenting Similar to a graph, it
read which helps in quick
comparison of categorical
chart, where the categories
are placed on the x-axis.
when observing the
relationship between two
plot and a line plot, the
scatter dots are connected
points with 3 dimensions,
namely visualized on the x-
visualizing the most prevalent
words that appear in a text
systems. This flow can be any
measurable quantity
weighted relationships or
flows between nodes.
consists of nodes and
interconnected edges. It
Learn Data Skills Online at
data. One axis contains These are preferred over bar variables. It is especially with a line axis, y-axis, and with the size Especially useful for illustrates how different [Link]
categories and the other axis charts for short labels, date useful for quickly surfacing of the bubble. It tries to show highlighting the dominant or items have relationships
represents values ranges, or negatives in values potential correlations relations between data points important flows
with each other
between data points using location and size
Use cases Use cases Use cases Use cases Use cases Use cases Use cases Use cases Use cases
Volume of google Brand market shar Display the relationship Cryptocurrency price Adwords analysis: CPC vs Top 100 used words by Energy flow between Export between countries How different airports are
searches by regio Profit Analysis by region between time-on-platform inde Conversions vs Share of customers in customer countrie to showcase biggest connected worldwide
Market share in revenue and chur Visualizing timelines and total conversion service tickets Supply chain volumes export partner Social media friend group
by product Display the relationship events when analyzing Relationship between life between warehouses Supply chain volumes analysis
between salary and years two variables expectancy, GDP per between the largest
spent at company capita, & population size warehouses
> Date & time functions > Information functions
CALENDAR(<start_date>, <end_date>) Returns a table with a single column named "Date" that COLUMNSTATISTICS() Returns statistics regarding every column in every table. This function
arguments.
> Math & statistical functions > Time intelligence functions > DAX statements
SUM(<column>) Adds all the numbers in a column
DATEADD(<dates>, <number_of_intervals>, <interval>) Moves a date by a specific interval VAR(<name> = <expression>) Stores the result of an expression as a named variable. To
SUMX(<table>, <expression>) Returns the sum of an expression evaluated for each row in a return the variable, use RETURN after the variable is defined
table DATESBETWEEN(<dates>, <date_1>, <date_2>) Returns the dates between specified dates
COLUMN(<table>[<column>] = <expression>) Stores the result of an expression as a column in
AVERAGE(<column>) Returns the average (arithmetic mean) of all the numbers in a column TOTALYTD(<expression>, <dates>[, <filter>][, <year_end_date>]) Evaluates the year-to-date a table.
value of the expression in the current context
AVERAGEX(<table>, <expression>) Calculates the average (arithmetic mean) of a set of ORDER BY(<table>[<column>]) Defines the sort order of a column. Every column can be sorted
expressions evaluated over a table SAMEPERIODLASTYEAR(<dates>) Returns a table that contains a column of dates shifted one in ascending (ASC) or descending (DESC) way.
year back in time
MEDIAN(<column>) Returns the median of a column
STARTOFMONTH(<dates>) // ENDOFMONTH(<dates>) Returns the start // end of the month
MEDIANX(<table>, <expression>) Calculates the median of a set of expressions evaluated
over a table STARTOFQUARTER(<dates>) // ENDOFQUARTER(<dates>) Returns the start // end of the quarter
GEOMEAN(<column>) Calculates the geometric mean of a column
GEOMEANX(<table>, <expression>) Calculates the geometric mean of a set of expressions
STARTOFYEAR(<dates>) // ENDOFYEAR(<dates>) Returns the start // end of the quarter.
> Filter functions ADDCOLUMNS(<table>, <name>, <expression>[, <name>, <expression>]…) Adds calculated columns
to the given table or table expression
SELECTCOLUMNS(<table>, <name>, <expression>[, <name>, <expression>]…) Selects calculated
columns from the given table or table expression
> = Greater than or equal to
EXACT(<text_1>, <text_2>) Checks if two strings are identical (EXACT() is case sensitive).
FIND(<text_tofind>, <in_text>) Returns the starting position a text within another text
records in the right table and returns id left_val id right_val id left_val right_val
returns duplicate values. The same restrictions of UNION FROM artist
missing values for any columns from 1 L1 1 R1 1 L1 R1 hold true for UNION ALL UNION ALL
Result after
FROM album;
far less common than left joins, 3 L3 5 R3 5 null R3
UNION ALL
because right joins can always be re- 4 L4 6 R4 6 null R4
FULL JOIN
6 A 6
One-to-one relationship:
One-to-many relationship:
INTERSECT
Database relationships describe the relationships In a one-to-many relationship, a record in one table can
between records in different tables. When a one-to-one be related to one or more records in a second table. A full join combines a left join and result after FULL JOIN
relationship exists between two tables, a given record in However, a given record in the second table will only be right join. A full join will return all left_table right_table id left_val right_val
one table is uniquely related to exactly one record in the related to one record in the first table. records from a table, irrespective of id left_val id right_val
1 L1 R1
other table. whether there is a match on the 1 L1 1 R1
2 L2 null
The INTERSECT operator returns only identical rows from two tables. SELECT artist_id
Many-to-many relationship:
3 L3 null left_table right_table INTERSECT
In a many-to-many relationship, records in a given table ‘A’ can be related to one or more records in another table ‘B’,
4 L4 6 R4 id val id val INTERSECT SELECT artist_id
and records in table B can also be related to many records in table A. 5 null R3
FROM album;
6 null R4 1 N1 1 N1 id val
1 AC/DC 1 For those who rock 1 ON art.artist_id = alb.artist_id; 2 Aerosmith 3 Restless and wild 2
2 Aerosmith 2 Dream on 2 3 Alanis Morissette null null null
3 Alanis Morissette 3 Restless and wild 2 null null 5 Rumours 6
The EXCEPT operator returns only those rows from SELECT artist_id
CROSS JOIN
4 Let there be rock 1 the left table that are not present in the right table.
5 Rumours 6
FROM artist
EXCEPT
SELECT artist_id
FROM artist
left_table
id val
right_table
id val
Result after
EXCEPT
FROM album;
An inner join between two tables will left_table right_table CROSS JOIN album; Result after EXCEPT:
return only records where a joining 1 N1 1 N1 id val artist_id
id left_val id right_val result after
Result after CROSS JOIN:
field, such as a key, finds a match in result after INNER JOIN
CROSS JOIN 1 N1 4 R2 3 L3 3
both tables. 1 L1 1 R1
name title
id left_val right_val
3 L3 5 R3
2 L2 4 R2 id1 id2 AC/DC For those who rock 4 L4
1 L1 R1 table 1 table 2
3 L3 5 R3 1 A AC/DC Dream on 4 L4 6 R4
4 L4 R2 id1 id AC/DC Restless and wild
4 L4 6 R4 1 B
A2
SEMI JOIN
1 AC/DC Let there be rock
1 C AC/DC Rumours
INNER JOIN join ON one field 2 B Aerosmith For those who rock
2 A
SELECT *
3 Aerosmith Dream on
FROM artist AS art
Result after INNER JOIN:
C
2 B Aerosmith Restless and wild
A semi join chooses records in the first table where a SELECT *
artist_id name title album_id 2 C use of a WHERE clause to use the second table as a filter WHERE artist_id IN
SELECT *
Alanis Morissette Restless and wild SEMI JOIN
2 Aerosmith Restless and wild 3 3 C
Result after Semi join:
FROM artist AS art
Alanis Morissette Let there be rock id col1 col2 id col1
INNER JOIN album AS alb
Alanis Morissette Rumours album_id title artist_id
USING (artist_id); 1 A B 2 B 1 For those who rock 1
alb1.artist_id,
[Link] AS alb1_title,
UNION ALL
INTERSECT EXCEPT
ANTI JOIN
[Link] AS alb2_title
UNION
1 AC/DC 1 For those who rock
FROM album AS alb1
2 Aerosmith 2 Dream on The anti join chooses records in the first table where a SELECT *
ON alb1.artist_id = alb2.artist_id
1 AC/DC 4 Let there be rock a WHERE clause to use exclude values from the second WHERE artist_id NOT IN
WHERE alb1.album_id<>alb2.album_id; The UNION operator is used to vertically combine the results SELECT artist_id
table. (SELECT artist_id
of two SELECT statements. For UNION to work without errors, FROM artist
FROM artist);
LEFT JOIN all SELECT statements must have the same number of
columns and corresponding columns must have the same
UNION
SELECT artist_id
left_table right_table
Left table after
ANTI JOIN
Result after Anti join:
A left join keeps all of the original left_table right_table result after LEFT JOIN data type. UNION does not return duplicates. FROM album; id col1 col2 id col1
records in the left table and returns id left_val id right_val id left_val album_id title artist_id
missing values for any columns from
right_val
Result after UNION Result after UNION: 1 A B 1 A 5 Rumours 6
1 L1 1 R1 1 L1 R1
the right table where the joining field id val artist_id 2 B C 4 D
did not find a match.
2 L2 4 R2 2 L2 null left right 1
1 A
3 L3 5 R3 3 L3 null id val id val 2 3 C
1 B
4 L4 6 R4 4 L4 R2 1 A 1 A 3 4 D
2 A 6
1 B 4 A
Result after LEFT JOIN: 3 A
2 A 5 A
LEFT JOIN on one field artist_id name album_id title name 4 A
SELECT *
3 A 6 A
1 AC/DC 1 For those who rock 1 5 A
FROM artist AS art
1 AC/DC 4 Let there be rock 1 4 A 6 A
LEFT JOIN album AS alb
ON art.artist_id = alb.artist_id;
2
2
Aerosmith
Aerosmith
2
3
Dream on
Restless and wild
2
2 Learn Data Skills Online at [Link]
3 Alanis Morissette null null null
Descriptive Statistics
Cheat Sheet > Numerical Dataset—Glasses of Water V isualizing Numeric Variables
There are a variety of ways of visualizing numerical data, here’s a few of them in action:
earn more online at [Link]
L
300 ml 60ml 300 ml 120 ml 180 ml 180 ml 300 ml Histogram Box plot
Median
To illustrate statistical concepts on numerical data, we’ll be using a numerical
variable, consisting of the volume of water in different glasses.
M easures of Center Shows the distribution of a variable. It converts numerical Shows the distribution of a variable using 5 key summary
data into bins as columns. The x-axis shows the range, and statistics—minimum, first quartile, median, third quartile,
Throughout this cheat sheet, you’ll find terms and specific statistical jargon being used. Here’s a rundown of all the the y-axis represents the frequency and maximum
terms you may encounter. M easures of center allow you to describe or summarize your data by capturing one value that describes the center of
its distribution.
Variable: In statistics, a variable is a quantity that can be measured or counted. In data analysis, a variable is
typically a column in a data frame
Descriptive statistics: Numbers that summarize variables. They are also called summary statistics or aggregations M easure Definition H ow to find it R esult
Categorical data: Data that consists of discrete groups. The categories are called ordered (e.g., educational levels)
if you can sort them from lowest to highest, and unordered otherwise (e.g., country of origin)
Arithmetic mean The total of the values
divided by how many
)
) 205.7 ml > Correlation
Numerical data: Data that consists of numbers (e.g., age).
values there are 7
> Categorical Data—Trail Mix M ode The most common value 300 ml
300 ml 300 ml 300 ml
Correlation is a measure of the linear relationship between two variables. That is, when one variable goes up, does the
To illustrate statistical concepts on categorical data, we’ll be using an unordered
categorical variable, consisting different elements of a trail mix. Our categorical
Other Measures of Location other variable go up or down? There are several algorithms to calculate correlation, but it is always a score between -1
and +1.
1 When X increases, Y decreases. Scatter plot forms a perfect straight line with negative slope
Counts and Proportions inimum The lowest value in your 60 ml -
M
Counts and proportions are measures of how much data you have. They allow you to understand how many data 0 There is no linear relationship between X and Y, so the scatter plot looks like a noisy mess
M aximum The highest value in your 300 ml
points belong to different categories in your data.
data 300 ml B etween 0 and +1 When X increases, Y increases
A count is the number of times a data point occurs in the dataset
A proportion is the fraction of times a data point occurs in the dataset. +1 When X increases, Y increases. Scatter plot forms a perfect straight line with positive slope
Percentile: Cut points that divide the data into 100 intervals with the same amount of data in each interval (e.g., in
the water cup example, the 100th percentile is 300 ml Note that correlation does not account for non-linear effects, so if X and Y do not have a straight-line relationship,
Food category Count Proportion
Quartile: Similar to the concept of percentile, but with four intervals rather than 100. The first quartile is the same the correlation score may not be meaningful.
as the 25th percentile, which is 120 ml. The third quartile is the same as the 75th percentile, which is 300 ml.
Almond 15 15 / 53 = 0.283
Cashew 13 13 / 53 = 0.245
M easures of Spread
Cranberry 25 25 / 53 = 0.472
Sometimes, rather than caring about the size of values, you care about how different they are.
(7 - 1)
300 ml Mean
[Link]
all divided by one less
than the number of data
One of the easiest charts to read Best to compare subcategories within 2D rectangles whose size is
points
which helps in quick comparison of categorical data. Can also be used to proportional to the value being
categorical data. One axis contains compare proportions measured and can be used to display
Inter-quartile range The third quartile minus 180 ml
categories and the other axis hierarchically structured data
the first quartile
represents values 300 ml 120 ml
> Getting started with lists > Getting started with characters and strings
A list is an ordered and changeable sequence of elements. It can hold integers, characters, floats, strings, and even objects.
# Create a string with double or single quotes
Getting started with Python Cheat Sheet # Create lists with [], elements separated by commas
x = [1, 3, 2]
"""
Learn Python online at [Link] List functions and methods A Frame of Data
> How to use this cheat sheet reversed(x) # Reverse the order of elements in x e.g., [2,3,1]
"""
Python is the most popular programming language in data science. It is easy to learn and comes with a wide array of str[0:2] # Get a substring from starting to ending index (exclusive)
powerful libraries for data analysis. This cheat sheet provides beginners and intermediate users a guide to starting
using python. Use it to jump-start your journey with python. If you want more detailed Python cheat sheets, check out Selecting list elements
the following cheat sheets below:
Combining and splitting strings
Python lists are zero-indexed (the first element has index 0). For ranges, the first element is included but the last is not.
Mutate strings
Importing data in python Data wrangling in pandas
Concatenating lists
str = "Jack and Jill" # Define str
3 * x # Returns [1, 3, 6, 1, 3, 6, 1, 3, 6]
[Link]() # Convert
[Link]() # Convert
a
a
string to uppercase, returns 'JACK AND JILL'
type('a') # Get the type of an object — this returns str > Getting started with dictionaries
A dictionary stores data values in key-value pairs. That is, unlike lists which are indexed by position, dictionaries are indexed
> Getting started with DataFrames
> Importing packages by their keys, the names of which must be unique.
Pandas is a fast and powerful package for data analysis and manipulation in python. To import the package, you can
use import pandas as pd. A pandas DataFrame is a structure that contains two-dimensional data stored as rows and
Python packages are a collection of useful tools developed by the open-source community. They extend the
Creating dictionaries columns. A pandas series is a structure that contains one-dimensional data.
capabilities of the python language. To install a new package (for example, pandas), you can go to your command
prompt and type in pip install pandas. Once a package is installed, you can import it as follows.
[Link]({
[Link]([
> The working directory [Link]() # Get the values of a dictionary, returns dict_values([1, 2, 3])
}) ])
df['col']
> Operators NumPy is a python package for scientific computing. It provides multidimensional array objects and efficient operations
on them. To import NumPy, you can run this Python code import numpy as np
df[['col1', 'col2']]
[Link][:, 2]
[Link][3, 2]
22 % 7 # Returns 1 # Get the remainder after [Link]([1, 2, 3]) # Returns array([1, 2, 3])
Manipulating DataFrames
22 / 7 # Divide a number by another with /
division with %
# Return a sequence from start (inclusive) to end (exclusive)
[Link](1,5) # Returns array([1, 2, 3, 4])
# Concatenate DataFrames vertically
# Calculate the mean of each column
# Return a stepped sequence from start (inclusive) to end (exclusive)
[Link]([df, df])
[Link]()
a = 5 # Assign a value to a
[Link]([1, 3, 6], 3) # Returns array([1, 1, 1, 3, 3, 3, 6, 6, 6])
# Get rows matching a condition
# Get unique rows
# Rename columns
df.sort_values(by='col_name')
[Link](n, 'col_name')
movies_indexed = movies.set_index("title")
[Link]("singles")
movies_indexed.reset_index()
# Replace index, left joining new index to existing data with .reindex()
pd.json_normalize(music_exploded["singles"])
avengers_index = ["The Avengers", "Avengers: Age of Ultron", "Avengers: Infinity War",
"Avengers: Endgame"]
# Equivalent to [Link](index=avengers_index) \
The majority of data analysis in Python is performed in pandas DataFrames. These are rectangular datasets consisting # level argument starts with 0 for the outer index
A variable is an attribute for the object, across all the observations. For example, the release dates for all the movies
# Concatenate several columns into a single string column with .[Link]()
# Move (multi-)indexes from a row index to a column index with .stack()
Tidy data provides a standard way to organize data. Having a consistent shape for datasets enables you to worry less
# Each column must be converted to string type before joining
pig_feed_stacked.unstack(level=1)
about data structures and more on getting useful results. The principles of tidy data are
movies["release_year"].astype(str) \
import json
> Datasets used throughout this cheat sheet # Combine several columns into a list column with .[Link]()
# Convert series containing nested elements to JSON string with [Link]()
Throughout this cheat sheet we will use a dataset of the top grossing movies of all time, stored as movies.
.[Link]()
Joe Russo
2.048
> Melting and pivoting # Drop rows containing any missing values in the specified columns with .dropna()
The second dataset involves an experiment with the number of unpopped kernels in bags of popcorn, adapted from the [Link](subset="weight_kg")
Popcorn dataset in the R's Stat2Data package. # Move side-by-side columns to consecutive rows with .melt()
popcorn_indexed = popcorn.set_index("brand")
The third dataset is JSON data about music containing nested elements. The JSON is parsed into nested lists using popcorn_indexed.melt(var_name="trial", value_name="n_unpopped", ignore_index=False)
read_json() from the pandas package. Notice that each element in the singles column is a list of dictionaries.
artist singles
# Where there is a column multi-index, specify id_vars with a list of tuples
popcorn_long \
[Link]
.pivot(values="n_unpopped", index="brand", columns="trial") \
The fifth dataset, pig_feed, shows weight gain in pigs from additives to their feed. There is a multi-index on the columns. popcorn_long \
Antibiotic No Yes
.reset_index()
B12 No Yes No Yes
19 22 3 54
ILTER Replace text with REPLACE() and SUBSTITUTE()
Data Manipulation in Excel
Subset Arrays for Multiple Rows with F
Filter
an array for values that match a value with F ILTER() — Same as =XLOOKUP("Nigeria", A2:A11, B2:D11)
=REPLACE(B2:B11, 2, 1, "X") Replace a substring by position with REPLACE()
=FILTER(B2:D11, A2:A11="Nigeria")
=SUBSTITUTE(B2:B11, "N", "X") Replace specific characters with SUBSTITUTE()
Learn Excel online at [Link] Where the lookup value does not match a key, provide a default value with FILTER(if_empty)
Kingdom", A2:A11, B2:D11, "Country not found")
— Same as =XLOOKUP("United
=FILTER(A2:D11, D2:D11<10)
> Dataset
=INDIRECT(F1) Get the value in a reference to a cell with INDIRECT() — Suppose cell F1 contains the text value "A1"
- A B C D
Fin d Positions in Lists with XMATCH() =ROWS(A2:A11) Get the number of rows in an array with ROWS()
1 Country Country code Internet TLD Phone prefix code Get the position in a list of the first exact match of a value with XMATCH()
=COLUMNS(A2:D2) Get the number of columns in an array with COLUMNS()
3 India IND .in 91 =ROW(A2:A11) Get the number of row number of cells with ROW()
Get the position in a list of the first match that starts with a value with XMATCH(match_mode=1)
Brazil BRA .br 55 Fordata sorted in ascending order, use faster binary search for same task XMATCH(search_mode=2)
Mexico MEX .mx 52 Get the value by row and column number within an array with INDEX() — Row and column numbers start from 1rom 1
=INDEX(A2:D11, 5, 3)
Get the value that matches a condition with XMATCH() and INDEX() combined
Many data manipulation functions let you match any text character using wildcards. Sort an array in ascending order of values in a column with SORT()
=DSUM(A1:D11, "Phone prefix code", A10:D15)
=SORT(A2:D11, 3)
* Match 0 or more characters "sp*y" matches "spy", "spry", and "springy" STDEV of elements matching filters
Database calculation functions and conditional calculation functions allow numeric criteria wildcards.
Randomize row order with SORTBY() + RANDARRAY()
- A B =SORTBY(A2:D11, RANDARRAY(COUNTA(A2:A11)))
> Match values greater than `>10` matches values greater than 10
<= Matches values less than or equal to <=10 matches values less than or equal to 10
<> Match values not equal tor <>10 matches values not equal to 10 > Work with Text Data
Clean text with TRIM() and CL EAN()
Trim all white space except single spaces between words with TRIM()
> Data Transformation =TRIM(" Only single spaces between words remain ")
Subset Arrays for a Single Row with XLOOKUP =CLEAN("alarm" & CHAR(7))
Get the rows of a return array where the keys match a value with XLOOKUP()
=FIND("ia", A2:A11)
Where the lookup value does not match a key, return the next largest value with XLOOKUP(match_mode=1)
=TEXTSPLIT(A4, {"a","e"})
3 Empire State 350 5th Avenue New York New York United States
Left join two datasets with XLOOKUP() — Copy formula down the J column to complete the join
Get rows where a number is greater than a value with WHERE col > n Get the total number of rows SELECT COUNT(*)
SELECT franchise, inception_year
SELECT COUNT(*)
FROM franchises
FROM franchises
Learn SQL online at [Link] WHERE inception_year > 1928
Get the total value of a column with SELECT SUM(col)
Get rows where a number is greater than or equal to a value with WHERE col >= n
SELECT SUM(total_revenue_busd)
WHERE inception_year >= 1928 Get the mean value of a column with SELECT AVG(col)
What is MySQL? SELECT AVG(total_revenue_busd)
Get rows where a number is less than a value with WHERE col < n
FROM franchises
MySQL is an open-source relational database management system (RDBMS) known for its fast SELECT franchise, inception_year
FROM franchises
Get the minimum value of a column with SELECT MIN(col)
performance and reliability. Developed by Oracle Corporation, it's widely used for web
WHERE inception_year <= 1977 SELECT MIN(total_revenue_busd)
FROM franchises
Get rows where a number is not equal to a value with WHERE col <> n or WHERE col != n
The dataset contains details of the world's highest valued media franchises by gross revenue.
Each row contains one franchise, and the table is named franchises.
SELECT franchise, inception_year
Grouping, filtering, and sorting
FROM franchises
Company
SELECT franchise, inception_year
FROM franchises
Company
GROUP BY original_medium
The Pokémon
Pokémon 1996 88 video game 24 ORDER BY total_movies DESC
Company
Get rows where text is equal to a value with WHERE col = 'x'
Disney Princess 2000 45.4 movie
The Walt Disney
Company SELECT franchise, original_medium
Get rows where values in a group meet a criterion with GROUP BY col HAVING condn
FROM franchises
SELECT original_medium, SUM(n_movies) AS total_movies
GROUP BY original_medium
> Querying tables Get rows where text is one of several values with WHERE col IN ('x', 'y')
ORDER BY total_movies DESC
Get all the columns from a table using SELECT * WHERE original_medium IN ('movie', 'video game') Filter before and after grouping with WHERE condn_before GROUP BY col HAVING condn_after
SELECT *
SELECT original_medium, SUM(n_movies) AS total_movies
FROM franchises Get rows where text contains specific letters with WHERE col LIKE '%abc%'
FROM franchises
Get a column from a table by name using SELECT col GROUP BY original_medium
SELECT franchise
FROM franchises
ORDER BY total_movies DESC
Get multiple columns from a table by name using SELECT col1, col2
Filtering on multiple columns
SELECT franchise, inception_year
FROM franchises
WHERE inception_year < 1950 AND total_revenue_busd > 50 Limit the number of rows returned, offset from the top with LIMIT m, n
Arrange the rows in ascending order of values in a column with ORDER BY col
Get the rows where one condition or another condition holds with WHERE condn1 OR condn2 SELECT *
FROM franchises
LIMIT 2, 3
FROM franchises
ORDER BY inception_year
WHERE inception_year < 1950 OR total_revenue_busd > 50 B y default, MySQL uses case insensitive matching in WHERE clauses.
Arrange the rows in descending order of values in a column with ORDER BY col DESC
SELECT *
FROM franchises
WHERE owner = 'THE WALT DISNEY COMPANY'
ORDER BY total_revenue_busd DESC
Get rows where values are missing with WHERE col IS NULL o get case sensitive matching, use WHERE BINARY condn
T
SELECT *
FROM franchises
FROM franchises
FROM franchises
WHERE n_movies IS NULL WHERE BINARY owner = 'THE WALT DISNEY COMPANY'
LIMIT 2
Get rows where values are not missing with WHERE col IS NOT NULL Get the current date with CURDATE() and the current datetime with NOW() or CURTIME()
Get unique values with SELECT DISTINCT
SELECT franchise, n_movies
SELECT CURDATE(), NOW(), CURTIME()
SELECT DISTINCT owner
FROM franchises
FROM franchises WHERE n_movies IS NOT NULL List available tables with show tables
show tables
> Navigating Worksheets
Key > Cell Entry
Key
Functionality Shortcut Functionality Shortcut
Excel Keyboard Move one cell down ENTER or Down arrow Cancel your input to a cell Esc
Move one cell up SHIFT+ENTER or Up arrow Write a new line within a cell CTRL+Enter
Move one cell right TAB or Right arrow o to the end of the line
G CTRL+End
Learn Excel skills online at [Link] Move one cell left SHIFT+TAB or Left arrow
Go to the start of the line CTRL+Home
Move one screen down PageDown
Insert a function SHIFT+F3
Move one screen up PageUp
Display function arguments (when cursor is to right of function name) CTRL+A
Move one screen right ALT+PageUp
Using Shortcuts in Excel Insert AutoSum formula ALT+=
Move one screen left ALT+PageDown
While every action in Excel can be performed by clicking on menu items or dialog boxes, this is Insert hyperlink CTRL+K
Move to first row, first column CTRL+Home
often slower than pressing keys on your keyboard. Regular Excel users can gain productivity
Insert current date CTRL+;
increases by making use of keyboard shortcuts. The shortcuts shown here are for the Windows Move to last filled cell CTRL+End
versions of Excel. Most shortcuts are applicable to Excel in Office 365. Likewise most shortcuts Insert current time CTRL+:
can be used on MacOS by replacing CTRL with CMD.
Key
> Zooming
Insert a comment on the cell CTRL+SHIFT+F2
Key
> Getting Help Functionality Shortcut
Zoom in CTRL+ALT+=
Key
> Text Formatting
Functionality Shortcut
Zoom out CTRL+ALT+-
Functionality Shortcut
Open help browser F1
Apply bold formatting CTRL+B
Open keyboard shortcut browser
Key
> Editing
CMD+/
Apply italic formatting CTRL+I
CTRL+6
Functionality Shortcut
Key
> Access Ribbon Tabs
Undo last action CTRL+Z
Key
> Charts
Redo last undone action CTRL+Y
Functionality Shortcut
Functionality Shortcut
Key
> Selecting Cells
Create a chart of the selected data in new worksheet F11
Open the File menu ALT+F
Create a chart of the selected data in current worksheet ALT+F11
Open the Home tab ALT+H Functionality Shortcut
Open the Insert tab
SHIFT+Space
Key
> Refreshing Data
ALT+P
Select all cells from the current location downwards SHIFT+PageDown
Open the Formulas tab ALT+M Functionality Shortcut
Select all cells from the current location upwards SHIFT+PageUp
Open the Data tab ALT+A Refresh external data in current worksheet CTRL+F5
Select all cells from the current location rightwards SHIFT+End
Open the Review tab ALT+R Refresh external data in all worksheets CTRL+ALT+F5
Select all cells from the current location leftward SHIFT+Home
Open the View tab ALT+W Select all completed cells CTRL+A Run all calculations in all open workbooks CTRL+ALT+F9