Python Data Science Cheat Sheet
Table of Contents
1. Numpy Essentials
2. Pandas Basics
3. Data Visualization (Matplotlib/Seaborn)
4. Scikit-learn Machine Learning
5. Useful Code Snippets
1. Numpy Essentials
import numpy as np
# Create arrays
arr = np.array([1, 2, 3])
zero_arr = np.zeros((3, 2))
one_arr = np.ones(5)
rand_arr = np.random.rand(3, 3)
# Indexing and slicing
arr[0], arr[-1], arr[1:3]
# Operations
arr.mean(), arr.sum(), arr.std()
arr2 = arr * 2
# Reshape
reshaped = arr.reshape((1, 3))
2. Pandas Basics
import pandas as pd
# Create DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
# Read CSV
# df = pd.read_csv('file.csv')
# Basic operations
summary = df.describe()
col_a = df['A']
filtered = df[df['A'] > 1]
df['C'] = df['A'] + df['B'] # Create new column
# Groupby
grouped = df.groupby('A').sum()
3. Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Line plot
plt.plot([1,2,3], [4,5,6])
plt.title('Line Plot')
plt.show()
# Bar plot
plt.bar(['A','B','C'], [3,7,2])
plt.title('Bar Plot')
plt.show()
# Seaborn heatmap
sns.heatmap([[1,2],[3,4]])
plt.show()
4. Scikit-learn Machine Learning
from sklearn.linear_model import LinearRegression
X = [[1], [2], [3]]
y = [2, 4, 6]
model = LinearRegression()
model.fit(X, y)
pred = model.predict([[4]]) # Output: [8]
# Train-test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
5. Useful Snippets
# List comprehension
squares = [x**2 for x in range(5)]
# Lambda function
add = lambda a, b: a + b
# Dictionary comprehension
my_dict = {x: x*2 for x in range(3)}
# Enumerate
for idx, val in enumerate(['a','b','c']):
print(idx, val)
Quick Reference Table
Library Import Statement Key Functionality
NumPy import numpy as np Arrays, math ops
pandas import pandas as pd DataFrames
matplotlib import matplotlib.pyplot Plotting
seaborn import seaborn as sns Stats graphing
scikit-learn from sklearn... ML models, Split
This cheat sheet summarizes essential Python Data Science operations for quick recall.