# [ Advanced Python for Data Science ] ( CheatSheet )
1. Advanced Data Structures
● List Comprehensions with Conditional Logic: squared_even = [x**2 for x
in range(10) if x % 2 == 0]
● Dictionary Comprehensions: squared_dict = {x: x**2 for x in range(10)}
● Set Comprehensions: unique_squared = {x**2 for x in range(-5, 5)}
● Nested Dictionary Comprehensions: matrix = {x: {y: x*y for y in range(3)}
for x in range(3)}
● Defaultdict for Default Values: from collections import defaultdict; dd =
defaultdict(int)
● Counter to Count Hashable Objects: from collections import Counter;
counts = Counter(my_list)
● OrderedDict to Maintain Insertion Order: from collections import
OrderedDict; od = OrderedDict.fromkeys('abcde')
● Deque for Efficient Stack and Queue: from collections import deque; dq =
deque([1, 2, 3])
● ChainMap to Combine Dictionaries: from collections import ChainMap;
combined = ChainMap(dict1, dict2)
● Namedtuple for Readable Tuples: from collections import namedtuple; Point
= namedtuple('Point', ['x', 'y'])
● Heapq for Priority Queues: import heapq; heapq.heappush(heap, item)
● Itertools for Complex Iterations: import itertools;
itertools.permutations('ABCD')
● Bisect for Array Bisection Algorithms: import bisect;
bisect.bisect_left(a, x)
● Functools for Higher-order Functions: from functools import reduce;
reduce(lambda x, y: x+y, [1, 2, 3])
● Zip for Parallel Iteration: for x, y in zip(list1, list2):
2. Functional Programming
● Lambda Functions: multiply = lambda x, y: x * y
● Map for Function Application: squared = list(map(lambda x: x**2,
numbers))
● Filter to Extract Elements: evens = list(filter(lambda x: x % 2 == 0,
numbers))
● Reduce for Cumulative Operation: from functools import reduce; product =
reduce(lambda x, y: x * y, numbers)
By: Waleed Mousa
● Partial Functions for Arguments: from functools import partial; add_five
= partial(add, 5)
● Itertools for Advanced Iteration: cyclic = itertools.cycle('ABCD')
● Generators for Lazy Evaluation: (x**2 for x in range(10))
● Decorator Functions for Meta-programming: @cache def fibonacci(n):
● Use of Closure to Enclose State: def outer(x): return lambda y: x + y
● Any and All for Condition Checking: any([True, False]), all([True, True])
3. Concurrency and Parallelism
● Threading for I/O-bound Tasks: from threading import Thread; thread =
Thread(target=function, args=(arg,))
● Multiprocessing for CPU-bound Tasks: from multiprocessing import Process;
process = Process(target=function, args=(arg,))
● Concurrent Futures for Async Execution: from concurrent.futures import
ThreadPoolExecutor; executor.submit(function, arg)
● Asyncio for Asynchronous Programming: import asyncio; asyncio.run(main())
● Use of Locks in Threading: from threading import Lock; lock = Lock()
● Semaphore for Controlling Access: from threading import Semaphore;
semaphore = Semaphore(2)
● Condition Variables for Synchronization: from threading import Condition;
condition = Condition()
● Event for Signaling Between Threads: from threading import Event; event =
Event()
● Queue for Thread-safe Data Exchange: from queue import Queue; queue =
Queue()
● Using map with ProcessPoolExecutor: with ProcessPoolExecutor() as
executor: results = executor.map(func, args)
4. Debugging and Testing
● Use Assert Statements for Quick Checks: assert x > 0, 'x must be
positive'
● Logging for Debugging and Monitoring: import logging;
logging.debug('Debugging information')
● Pdb for Interactive Debugging: import pdb; pdb.set_trace()
● Timeit for Timing Code Execution: import timeit; timeit.timeit('func()',
setup='from __main__ import func')
● CProfile for Performance Profiling: import cProfile; cProfile.run('func()')
● Memory Profiler for Memory Usage: from memory_profiler import profile;
@profile def my_func():
By: Waleed Mousa
● Using PyTest for Unit Testing: def test_function(): assert func(x) ==
expected
● Mock for Testing in Isolation: from unittest.mock import Mock; mock =
Mock()
● Coverage for Test Coverage Measurement: coverage run -m pytest; coverage
report
● Use Type Hints for Static Type Checking: def greet(name: str) -> str:
5. Performance Optimization
● Using NumPy for Efficient Numeric Computation: import numpy as np;
np_array = np.array([1, 2, 3])
● Pandas for Efficient Data Manipulation: import pandas as pd; df =
pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
● Cython for Compiling Python: import cython; @cython.cfunc
● JIT Compilation with Numba: from numba import jit; @jit def
sum_array(arr):
● Use of Cache to Avoid Recomputation: from functools import lru_cache;
@lru_cache(maxsize=None) def fib(n):
● Vectorization to Replace Loops (NumPy, Pandas): df['col3'] = df['col1']
+ df['col2']
● Use of Pandas Categoricals for Memory Efficiency: df['col'] =
df['col'].astype('category')
● Memory Views for Large Data Manipulation: memoryview(np.array([1, 2, 3]))
● Batch Processing for Large Datasets: for batch in pd.read_csv('file.csv',
chunksize=1000):
● Using HDF5 or Feather Format for Large Data Storage:
df.to_hdf('data.h5', 'table')
6. Advanced File Handling
● Read/Write JSON Files: import json; with open('data.json', 'r') as f:
data = json.load(f)
● Working with CSV Files: import csv; with open('data.csv', newline='') as
f: reader = csv.reader(f)
● Manipulating ZIP Files: from zipfile import ZipFile; with
ZipFile('file.zip', 'r') as zip_ref:
zip_ref.extractall('path_to_extract')
● Handling Large Files with Generators: def read_large_file(file_object):
yield from file_object
By: Waleed Mousa
● Use Pickle for Object Serialization: import pickle; pickle.dump(obj,
file)
● Working with Binary Data: with open('file.bin', 'wb') as f:
f.write(b'Hello World')
● Use Glob for File Path Pattern Matching: from glob import glob;
file_paths = glob('*.txt')
● Handling XML Data with ElementTree: import xml.etree.ElementTree as ET;
tree = ET.parse('data.xml')
● Working with HDF5 Files for Large Datasets: import h5py; f =
h5py.File('data.h5', 'r')
● Using Pandas to Read/Write Excel Files: df.to_excel('data.xlsx',
index=False); df_read = pd.read_excel('data.xlsx')
7. Advanced Pandas Techniques
● MultiIndex DataFrame Operations: df.set_index(['level_1', 'level_2'])
● Conditional Operations Using np.where: df['new_col'] = np.where(df['col']
> 0, 'positive', 'negative')
● Vectorized String Operations: df['col'].str.upper()
● Pandas SQL-like Queries: df.query('col > 0')
● Pivot Tables for Data Summarization: df.pivot_table(values='D',
index=['A', 'B'], columns=['C'])
● Window Functions for Rolling and Expanding Calculations:
df['col'].rolling(window=5).mean()
● Merging, Joining, and Concatenating DataFrames: pd.concat([df1, df2]);
pd.merge(df1, df2, on='key')
● Apply Functions for Custom Operations: df.apply(lambda row: row['A'] +
row['B'], axis=1)
● Time Series Specific Operations: df.resample('M').mean()
● Categorical Data Handling for Memory Optimization: df['col'] =
df['col'].astype('category')
8. Advanced Visualization Techniques
● Interactive Plots with Plotly: import plotly.express as px; px.line(df,
x='x', y='y')
● Advanced Matplotlib Customizations: fig, ax = plt.subplots(); ax.plot(x,
y)
● Creating Dashboards with Dash or Streamlit: import streamlit as st;
st.line_chart(df)
By: Waleed Mousa
● Seaborn for Statistical Data Visualization: import seaborn as sns;
sns.boxplot(x='x', y='y', data=df)
● 3D Plotting with Matplotlib: ax = fig.add_subplot(111, projection='3d')
● Heatmaps for Correlation Visualization: sns.heatmap(df.corr())
● Pairplot for Multi-variable Analysis: sns.pairplot(df, hue='class')
● Facet Grids for Conditional Plots: g = sns.FacetGrid(df, col='col',
row='row'); g = g.map(plt.hist, 'val')
● Network Graphs with NetworkX: import networkx as nx; G = nx.Graph();
G.add_edge('A', 'B')
● Geospatial Data Visualization: import geopandas as gpd; world =
gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
9. Machine Learning Pipeline Optimization
● Automating Pipeline with Pipeline: from sklearn.pipeline import Pipeline;
pipeline = Pipeline(steps=[('scaler', StandardScaler()), ('clf',
LogisticRegression())])
● Grid Search for Hyperparameter Tuning: from sklearn.model_selection
import GridSearchCV; GridSearchCV(pipeline, param_grid=param_grid)
● Feature Selection Techniques: from sklearn.feature_selection import
SelectFromModel; SelectFromModel(estimator)
● Model Serialization with Joblib for Deployment: from joblib import dump,
load; dump(model, 'model.joblib')
● Cross-Validation Strategies for Robust Model Evaluation: from
sklearn.model_selection import cross_val_score; cross_val_score(model,
X, y, cv=5)
10. Advanced Statistical Techniques
● ANOVA for Feature Selection: from scipy import stats;
stats.f_oneway(df['group1'], df['group2'])
● Linear Regression Diagnostics: import statsmodels.api as sm; sm.OLS(y,
sm.add_constant(X)).fit().summary()
● Kernel Density Estimation for Data Distribution: sns.kdeplot(data)
● Principal Component Analysis for Dimensionality Reduction: from
sklearn.decomposition import PCA; PCA(n_components=2).fit_transform(X)
● Time Series Decomposition: from statsmodels.tsa.seasonal import
seasonal_decompose; seasonal_decompose(series, model='additive')
● Bayesian Inference with PyMC3: import pymc3 as pm; with pm.Model() as
model: # Define priors and likelihood
By: Waleed Mousa
● Survival Analysis for Time-to-Event Data: from lifelines import
KaplanMeierFitter; kmf = KaplanMeierFitter(); kmf.fit(durations,
event_observed)
● Non-Parametric Tests for Independent Samples: stats.mannwhitneyu(x, y)
● Multivariate Regression Analysis: sm.OLS(y,
sm.add_constant(X)).fit().summary()
● Hierarchical Clustering for Unsupervised Learning: from
scipy.cluster.hierarchy import dendrogram, linkage; Z = linkage(X,
'ward')
11. Advanced Neural Network Techniques with TensorFlow/Keras
● Custom Layers for Specific Operations: class
MyCustomLayer(tf.keras.layers.Layer): # Define computations
● Callbacks for Monitoring Training Process: model.fit(X, y,
callbacks=[tf.keras.callbacks.EarlyStopping()])
● TensorBoard for Training Visualization: tensorboard_callback =
tf.keras.callbacks.TensorBoard(log_dir='./logs')
● Custom Training Loops for Granular Control: for epoch in range(epochs):
# Manually iterate over batches
● Implementing Attention Mechanisms for NLP: class
AttentionLayer(tf.keras.layers.Layer): # Define attention computations
● Using Transfer Learning and Fine-Tuning Pre-trained Models: model =
tf.keras.applications.VGG16(include_top=False); model.trainable = False
● Generative Adversarial Networks for Data Generation: class
GAN(tf.keras.Model): # Define generator and discriminator
● Recurrent Neural Networks for Sequence Data: model =
tf.keras.models.Sequential([tf.keras.layers.LSTM(128),
tf.keras.layers.Dense(1)])
● Normalization Techniques for Faster Convergence:
tf.keras.layers.BatchNormalization()
● Custom Loss Functions and Metrics: def custom_loss(y_true, y_pred): #
Define custom logic
12. Advanced Python Tips and Tricks
● Using Walrus Operator for Assignment Expressions: if (n := len(a)) > 10:
print(f"List is too long ({n} elements)")
● Unpacking for Efficient Variable Assignment: a, *rest, b = range(10)
● Using pathlib for Filesystem Path Manipulation: from pathlib import Path;
p = Path('/usr/bin'); p.is_dir()
By: Waleed Mousa
● Dictionary Merging with ** Operator: merged_dict = {**dict1, **dict2}
● Using dataclasses for Boilerplate-free Data Structures: from dataclasses
import dataclass; @dataclass class Point: x: int; y: int
● Using Generators for Memory-efficient Loops: (x**2 for x in range(10))
● Context Managers for Resource Management: with open('file.txt') as f:
contents = f.read()
● Using functools.lru_cache for Memoization:
@functools.lru_cache(maxsize=None) def fib(n):
● Async/Await for Asynchronous Programming: async def fetch_data(): data =
await get_data()
● Type Hints for Improved Code Clarity: def greet(name: str) -> str:
By: Waleed Mousa