Types of Exceptions: String Sequence: Read/Write Text File: SQL DDL:
While True: * Index starts from 0 filep = open(“[Link]”, “r”) CREATE, ALTER, NOT NULL, UNIQUE, DEFAULT,
try: for each_line in filep: PRIMARY/FOREIGN KEY, CHECK, DROP
String Slicing Variable[index] -> Variable[start:stop] print (each_line)
x = int(input(“Please enter a valid [Link]()
number: “) Reverse Slicing Variable[-index] -> Variable[start:stop:step] SQL DML:
------------------------------------------------------------------------------
except ValueError: Concatenate Combine_text = text1 + “ “ + text2 + … filep = open(“[Link]”, “w”) INSERT, SELECT, WHERE, GROUP BY, HAVING,
print(“Oops! That was no valid data = [ ] ORDER BY, UPDATE, DELETE, COMMIT, ROLLBACK
Functions .capitalise(), .upper(), .isupper(), .isalpha() for each_line in data:
number.”) Comparison: =, <, >, <=, >=, <>, !=
-------------------------------------------------- Length len(text) print(each_line, file=filep) Logical: AND, OR, NOT
[Link]()
AttributeError, ImportError, IndexError, .format() ------------------------------------------------------------------------------ Special: BETWEEN, IS NULL, LIKE, IN , EXISTS,
KeyError, IndentationError, TypeError, “Text {:formatting for variable} other text“ [Link]() # Read as a single string DISTINCT
ValueError, ZeroDivisionError, IOError Format .format(matching variable/value) [Link]() # Read line by line as a string Aggregate: COUNT, MIN, MAX, SUM, AVG
Truth Table: s = string, d = decimal, f = float [Link]() # Read as a single list SQL Code Example:
< = left align, ^ = centre align, > = right align [Link](lines) # Lines contain list of string with data
“INSERT INTO table_name VALUES(“_”, ”_”,);”
Read With Open Text/CSV File: “SELECT * from table_name;”
Dictionary Functions: [ ] with open(“[Link]”) as filep:
{“key: ”, value} “UPDATE table_name SET “_”= “XX” WHERE “_”=
data = [Link]()
dict_name.XX() ------------------------------------------------------------------------------
“XX”;“
clear, items, keys, values import csv “DELETE from table_name WHERE “_” = “XX”;”
with open(“[Link]”) as filep:
Tuple Functions: ( ) csv_pointer = [Link](filep) CREATE TABLE table1_name (
Dictionary Comprehension:
tuple_name.count(“a”): returns the # of for each in csv_pointer: col1 INTEGER NOT NULL UNIQUE, col2
dict_name = {key: value for data in iterator if
items whose content matches with “a” print(each) VARCHAR(50) NOT NULL, PRIMARY KEY (col1),
condition} ------------------------------------------------------------------------------
tuple_name.index(“a”): returns the index ---------------------------------------------------------------------
Write With Open CSV File: FOREIGN KEY (col of table1) REFERENCES
number of first “a” in tuple Dictionary Iteration: import csv
table2_name(col PK) );
List Functions: ( ) For key, value in dict_name.items() data = [ ]
with open(“[Link]”, “w”) as filep: SQL Join: * LEFT JOIN supported, RIGHT & FULL JOIN
XX(list_name) OS Module: csv_pointer = [Link](filep) not supported by SQLite
len, min, max, sum csv_pointer.writerow(data)
list_name.XX() Current working dir [Link]() or csv_pointer.writerows(data)
Joining Tables:
append, insert, sort, reverse, split, split(“ ”) Change current dir [Link](path) def get_revenue(): SELECT * FROM MANAGER, EXECUTIVE WHERE
list_name(range(start, stop, step)) revenues = []
Rename file/dir [Link](“source”, “destination”) for i in range(1, 4): MANAGER.MANAGER_CODE =
[Link](float(input("Enter Rev {}:".format(i))) EXECUTIVE.MANAGER_CODE;
List Comprehension: Make dir [Link](file_path) return revenues
list_name = [data for data in iterator if Remove file/dir [Link](path) Inner Join: Combines the results similar to AND
condition] def print_revenue(revenues, filename):
operation look for values common to both tables.
[Link](path) fp = open(filename, "w")
List Iteration: List of files/dirs [Link](“.”) # Current dir print("Revenue Report\n", "-"*20, file=fp) SELECT * FROM MANAGER M INNER JOIN
names = [“John”, “Jane”, “Joey”] [Link](“..”) # Parent dir for index, value in enumerate(revenues): EXECUTIVE E on M.MANAGER_CODE =
print("Revenue for {}: ${:15,.2f} Cumm. Total: E.MANAGER_CODE;
id = [1033, 1034, 1035] import os ${}".format(index+1, value, sum(revenues[:index+1])),file=fp)
for index, element in enumerate(names): new_dir = [Link]() + [Link] + “sub” print("Total ${:,.2f}".format(sum(revenues)), file=fp)
print(“{}’s id is {}.”.format(element, [Link]() Outer Join: Keeps the non-matching results when
if not [Link](new_dir) join is done.
id[index]) [Link](new_dir) def task3(): SELECT * FROM MANAGER M LEFT JOIN EXECUTIVE
[Link](“old path”, “new path”) print_revenue(get_revenue(), "[Link]") E on M.MANAGER_CODE = E.MANAGER_CODE;
enumerate(list) # Returns index & val pair task3()
Pandas Summary: * inplace = True df.__.plot(kind=“bar”)
Code Example (1):
sales_latte = [10000, 12000, 14000, 34000, 12000, 14000] Getting rows [Link][1], [Link][2:4] E.g. [Link](“col_name”).mean().plot(kind=“bar”,
Bar graph
print("In {}, muffin sells for ${:.2f}.".format([Link][i], df.sales_muffin[i]))
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"] y=“comparison_col_name")
[Link][number] # Drop certain row
# Print corresponding month(s) with the highest sales for sales E.g. df2[[“A”, “B”, “C”]].sum().plot(kind=“bar”)
Row operations [Link](another_df) # Insert another dataframe
latte df.”col_name”[row index] df.__.plot(kind=“pie”)
print(months[sales_latte.index(max(sales_latte))]) Pie chart
[Link](x, labels=y)
[Link][0]
latte_dollars = ["${:.2f}".format(each) for each in df.sales_latte]
for each in months: Getting cols # [Link][row index/slicing, “col_name”] df.col_name.plot(kind = "box")
i = [Link](each) [Link][:3, “col_name”] # Get first 3 rows of a column [Link](df.col_name1, df.col_name2)
Box plot
print("In the month of {months[i]}, revenue for latte is E.g. df[“Avg”]= sum([df.b1, df.b2, df.b3, df.b4])/4
Col operations [Link](“col_name”) # Drops certain column
{sales_latte[i]}”) [Link]([Link], [Link])
Replace chars [Link] = [[Link](" ", "_") for col in [Link]]
Code Example (2): [Link](df.col_name1, df.col_name2)
[Link] INCLUDES the last data as specified by ending # Violin plot
def get_revenue(): iloc vs loc [Link](df.col_name, showmeans = True),
revenues = [] [Link] EXCLUDES the last data as specified by ending #
[Link](df.col_name1, df.col_name2)
for each in range(3): # df[“col_name”][row index/slicing] Bar plot E.g. [Link]([Link].value_counts().index,
revenue = float(input("Enter revenue for month Getting first 3 rows df[[“col_name1”, “col_name2”]][:3] [Link].value_counts().values)
{}:".format(each+1))) for 2 cols # [Link][row index/slicing, col index/slicing]
[Link](revenue) # Adding the new revenue Line plot [Link](df.col_name1, df.col_name2)
[Link][:3, :2]
value to the list df.__.plot(kind = “hist”)
[Link]("mobile_fig.png")
Check null entries [Link]().sum() # Checking for null entries in column
for i in range([Link][0]):
return revenues E.g. 50 interval bars: [Link](kind = “hist”, bins = 50)
Count (rows, cols) [Link] # Returns (rows, columns) Histogram
E.g. Predefined bins: [Link](kind = “hist”, bins =
def print_file_revenue(revenues): [0.5,1.5,2.5])
with open(“[Link]”, “w”) as file_pointer: List all data for col df[“col_name”]
Save pic as png:
cummtotal = 0 Find duplicates [Link]() # Returns True if is a duplicated record Check datatype type([Link](“col_name").sum())
print(“Revenue Report\n”, “-“*20, file = file_pointer) [Link](columns={“Org_name”:”New_name”},
for i in list(range(len(revenues): # Call for index Drop duplicates df.drop_duplicates([“col_name”]) Rename columns
inplace = True)
cummtotal += revenues[i] df.”col_name”.fillna() # To fill with NA
print(“Revenue for {}: ${:>15,.2f} Cumm. Total: Fill in missing # Create just a figure and only 1 subplot
df.”col_name”.fillna(“XXX”) # To fill with other strings
${:>15,.2f}”.format(i+1, revenues[i], cummtotal), file = values fig, ax = [Link]()
df.”col_name”.fillna([Link]()) # To fill with mean
file_pointer) # Prints the file in file pointer [Link](x, y)
[Link]() # Drop record with NA
E.g. [Link](kind=“scatter”, x=“Year”, y=“Margin”, color=“_”)
ax.set_title(“XX”)
print(“The company has made a total of ${:>15,.2f} in the Drop record E.g. [Link](axis=0, how=“any”, thresh=None, Fig & Subplots # Create 2 subplots and unpack the output array
last quarter.”.format(cummtotal), file = file_pointer subset=None, inplace=False) fig, (ax1, ax2) = [Link](1, 2, sharey=True)
print_file_revenue(get_Revenue()) Inconsistent unit Use float() or int() to convert text value [Link](x, y)
Import Conventions: Check datatypes [Link] ax1.set_title(“Sharing Y axis”)
import numpy as np # Numerical Python [Link](x, y)
E.g. [Link](x=“Year”, y=“Revenue”,
import pandas as pd # DataFrame Statistical summ. [Link](), df.value_counts()
E.g. [Link][:, 2:8].sum().sort_values().plot(kind="bar")
import seaborn as sns # Visualisation Query df[(df.col_name < 5000) & (df.col_name> 4000)] Plot via sorted val
E.g. df_info.groupby(“col_name”).hist()
import [Link] as plt # [Link]()
%matplotlib inline * Input this too for Jupyter Grouping [Link](“XXX”).”sub_XXX”.describe() [Link](kind = “scatter”)
E.g. [Link](kind=“bar”)
from scipy import module # Data Processing Sort values df.sort_values(“Likes”, ascending = False) Scatterplot [Link](df_total.Year, df_total.Growth)
Data Cleaning: [Link](df_resident.Year, df_resident.Growth)
Summary [Link]() # Index & Column Dtypes
header = [“XX”, …] E.g. df_total = df[[Link] == "Total Pop Growth"]
s=df[“Margin”]*300)
df = pd.read_csv(“[Link]”, names = header) corr = [Link](method=“pearson”) df_resident = df[[Link] == "Resident Pop Growth"]
print([Link](number)) index, columns, values, mean, median, mode, var, std, [Link]([“Total Pop Growth”, “Resident Pop Growth”])
Statistics
----------------------------------------------------------------------------------- percentile, count, sum, min, max, abs, cov, kurt, skew E.g. [Link](data=df, x=“Year”, y=“Growth”,
1D Plotting:
2D Plotting:
3D Plotting:
print(df[df.col_name == “XXX"]) # Indicate () at the end hue=“Category”, fit_reg=False)
----------------------------------------------------------------------------------- Visualisation Summary: * Using pandas, matplotlib, and seaborn Heatmap of corr [Link]([Link]())
import webbrowser [Link](kind = “Typeofplot”, x = “axisname”, y = “yaxisname”, color = “ “, title = “title”,
webbrowser.open_new([Link][75, 3]) legend = True/False Pairplot [Link](df)