0% found this document useful (0 votes)
31 views7 pages

(Advanced Python For Data Science) (CheatSheet)

Uploaded by

aman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views7 pages

(Advanced Python For Data Science) (CheatSheet)

Uploaded by

aman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

# [ Advanced Python for Data Science ] ( CheatSheet )

1. Advanced Data Structures

● List Comprehensions with Conditional Logic: squared_even = [x**2 for x


in range(10) if x % 2 == 0]
● Dictionary Comprehensions: squared_dict = {x: x**2 for x in range(10)}
● Set Comprehensions: unique_squared = {x**2 for x in range(-5, 5)}
● Nested Dictionary Comprehensions: matrix = {x: {y: x*y for y in range(3)}
for x in range(3)}
● Defaultdict for Default Values: from collections import defaultdict; dd =
defaultdict(int)
● Counter to Count Hashable Objects: from collections import Counter;
counts = Counter(my_list)
● OrderedDict to Maintain Insertion Order: from collections import
OrderedDict; od = OrderedDict.fromkeys('abcde')
● Deque for Efficient Stack and Queue: from collections import deque; dq =
deque([1, 2, 3])
● ChainMap to Combine Dictionaries: from collections import ChainMap;
combined = ChainMap(dict1, dict2)
● Namedtuple for Readable Tuples: from collections import namedtuple; Point
= namedtuple('Point', ['x', 'y'])
● Heapq for Priority Queues: import heapq; heapq.heappush(heap, item)
● Itertools for Complex Iterations: import itertools;
itertools.permutations('ABCD')
● Bisect for Array Bisection Algorithms: import bisect;
bisect.bisect_left(a, x)
● Functools for Higher-order Functions: from functools import reduce;
reduce(lambda x, y: x+y, [1, 2, 3])
● Zip for Parallel Iteration: for x, y in zip(list1, list2):

2. Functional Programming

● Lambda Functions: multiply = lambda x, y: x * y


● Map for Function Application: squared = list(map(lambda x: x**2,
numbers))
● Filter to Extract Elements: evens = list(filter(lambda x: x % 2 == 0,
numbers))
● Reduce for Cumulative Operation: from functools import reduce; product =
reduce(lambda x, y: x * y, numbers)

By: Waleed Mousa


● Partial Functions for Arguments: from functools import partial; add_five
= partial(add, 5)
● Itertools for Advanced Iteration: cyclic = itertools.cycle('ABCD')
● Generators for Lazy Evaluation: (x**2 for x in range(10))
● Decorator Functions for Meta-programming: @cache def fibonacci(n):
● Use of Closure to Enclose State: def outer(x): return lambda y: x + y
● Any and All for Condition Checking: any([True, False]), all([True, True])

3. Concurrency and Parallelism

● Threading for I/O-bound Tasks: from threading import Thread; thread =


Thread(target=function, args=(arg,))
● Multiprocessing for CPU-bound Tasks: from multiprocessing import Process;
process = Process(target=function, args=(arg,))
● Concurrent Futures for Async Execution: from concurrent.futures import
ThreadPoolExecutor; executor.submit(function, arg)
● Asyncio for Asynchronous Programming: import asyncio; asyncio.run(main())
● Use of Locks in Threading: from threading import Lock; lock = Lock()
● Semaphore for Controlling Access: from threading import Semaphore;
semaphore = Semaphore(2)
● Condition Variables for Synchronization: from threading import Condition;
condition = Condition()
● Event for Signaling Between Threads: from threading import Event; event =
Event()
● Queue for Thread-safe Data Exchange: from queue import Queue; queue =
Queue()
● Using map with ProcessPoolExecutor: with ProcessPoolExecutor() as
executor: results = executor.map(func, args)

4. Debugging and Testing

● Use Assert Statements for Quick Checks: assert x > 0, 'x must be
positive'
● Logging for Debugging and Monitoring: import logging;
logging.debug('Debugging information')
● Pdb for Interactive Debugging: import pdb; pdb.set_trace()
● Timeit for Timing Code Execution: import timeit; timeit.timeit('func()',
setup='from __main__ import func')
● CProfile for Performance Profiling: import cProfile; cProfile.run('func()')
● Memory Profiler for Memory Usage: from memory_profiler import profile;
@profile def my_func():
By: Waleed Mousa
● Using PyTest for Unit Testing: def test_function(): assert func(x) ==
expected
● Mock for Testing in Isolation: from unittest.mock import Mock; mock =
Mock()
● Coverage for Test Coverage Measurement: coverage run -m pytest; coverage
report
● Use Type Hints for Static Type Checking: def greet(name: str) -> str:

5. Performance Optimization

● Using NumPy for Efficient Numeric Computation: import numpy as np;


np_array = np.array([1, 2, 3])
● Pandas for Efficient Data Manipulation: import pandas as pd; df =
pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
● Cython for Compiling Python: import cython; @cython.cfunc
● JIT Compilation with Numba: from numba import jit; @jit def
sum_array(arr):
● Use of Cache to Avoid Recomputation: from functools import lru_cache;
@lru_cache(maxsize=None) def fib(n):
● Vectorization to Replace Loops (NumPy, Pandas): df['col3'] = df['col1']
+ df['col2']
● Use of Pandas Categoricals for Memory Efficiency: df['col'] =
df['col'].astype('category')
● Memory Views for Large Data Manipulation: memoryview(np.array([1, 2, 3]))
● Batch Processing for Large Datasets: for batch in pd.read_csv('file.csv',
chunksize=1000):
● Using HDF5 or Feather Format for Large Data Storage:
df.to_hdf('data.h5', 'table')

6. Advanced File Handling

● Read/Write JSON Files: import json; with open('data.json', 'r') as f:


data = json.load(f)
● Working with CSV Files: import csv; with open('data.csv', newline='') as
f: reader = csv.reader(f)
● Manipulating ZIP Files: from zipfile import ZipFile; with
ZipFile('file.zip', 'r') as zip_ref:
zip_ref.extractall('path_to_extract')
● Handling Large Files with Generators: def read_large_file(file_object):
yield from file_object

By: Waleed Mousa


● Use Pickle for Object Serialization: import pickle; pickle.dump(obj,
file)
● Working with Binary Data: with open('file.bin', 'wb') as f:
f.write(b'Hello World')
● Use Glob for File Path Pattern Matching: from glob import glob;
file_paths = glob('*.txt')
● Handling XML Data with ElementTree: import xml.etree.ElementTree as ET;
tree = ET.parse('data.xml')
● Working with HDF5 Files for Large Datasets: import h5py; f =
h5py.File('data.h5', 'r')
● Using Pandas to Read/Write Excel Files: df.to_excel('data.xlsx',
index=False); df_read = pd.read_excel('data.xlsx')

7. Advanced Pandas Techniques

● MultiIndex DataFrame Operations: df.set_index(['level_1', 'level_2'])


● Conditional Operations Using np.where: df['new_col'] = np.where(df['col']
> 0, 'positive', 'negative')
● Vectorized String Operations: df['col'].str.upper()
● Pandas SQL-like Queries: df.query('col > 0')
● Pivot Tables for Data Summarization: df.pivot_table(values='D',
index=['A', 'B'], columns=['C'])
● Window Functions for Rolling and Expanding Calculations:
df['col'].rolling(window=5).mean()
● Merging, Joining, and Concatenating DataFrames: pd.concat([df1, df2]);
pd.merge(df1, df2, on='key')
● Apply Functions for Custom Operations: df.apply(lambda row: row['A'] +
row['B'], axis=1)
● Time Series Specific Operations: df.resample('M').mean()
● Categorical Data Handling for Memory Optimization: df['col'] =
df['col'].astype('category')

8. Advanced Visualization Techniques

● Interactive Plots with Plotly: import plotly.express as px; px.line(df,


x='x', y='y')
● Advanced Matplotlib Customizations: fig, ax = plt.subplots(); ax.plot(x,
y)
● Creating Dashboards with Dash or Streamlit: import streamlit as st;
st.line_chart(df)

By: Waleed Mousa


● Seaborn for Statistical Data Visualization: import seaborn as sns;
sns.boxplot(x='x', y='y', data=df)
● 3D Plotting with Matplotlib: ax = fig.add_subplot(111, projection='3d')
● Heatmaps for Correlation Visualization: sns.heatmap(df.corr())
● Pairplot for Multi-variable Analysis: sns.pairplot(df, hue='class')
● Facet Grids for Conditional Plots: g = sns.FacetGrid(df, col='col',
row='row'); g = g.map(plt.hist, 'val')
● Network Graphs with NetworkX: import networkx as nx; G = nx.Graph();
G.add_edge('A', 'B')
● Geospatial Data Visualization: import geopandas as gpd; world =
gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

9. Machine Learning Pipeline Optimization

● Automating Pipeline with Pipeline: from sklearn.pipeline import Pipeline;


pipeline = Pipeline(steps=[('scaler', StandardScaler()), ('clf',
LogisticRegression())])
● Grid Search for Hyperparameter Tuning: from sklearn.model_selection
import GridSearchCV; GridSearchCV(pipeline, param_grid=param_grid)
● Feature Selection Techniques: from sklearn.feature_selection import
SelectFromModel; SelectFromModel(estimator)
● Model Serialization with Joblib for Deployment: from joblib import dump,
load; dump(model, 'model.joblib')
● Cross-Validation Strategies for Robust Model Evaluation: from
sklearn.model_selection import cross_val_score; cross_val_score(model,
X, y, cv=5)

10. Advanced Statistical Techniques

● ANOVA for Feature Selection: from scipy import stats;


stats.f_oneway(df['group1'], df['group2'])
● Linear Regression Diagnostics: import statsmodels.api as sm; sm.OLS(y,
sm.add_constant(X)).fit().summary()
● Kernel Density Estimation for Data Distribution: sns.kdeplot(data)
● Principal Component Analysis for Dimensionality Reduction: from
sklearn.decomposition import PCA; PCA(n_components=2).fit_transform(X)
● Time Series Decomposition: from statsmodels.tsa.seasonal import
seasonal_decompose; seasonal_decompose(series, model='additive')
● Bayesian Inference with PyMC3: import pymc3 as pm; with pm.Model() as
model: # Define priors and likelihood

By: Waleed Mousa


● Survival Analysis for Time-to-Event Data: from lifelines import
KaplanMeierFitter; kmf = KaplanMeierFitter(); kmf.fit(durations,
event_observed)
● Non-Parametric Tests for Independent Samples: stats.mannwhitneyu(x, y)
● Multivariate Regression Analysis: sm.OLS(y,
sm.add_constant(X)).fit().summary()
● Hierarchical Clustering for Unsupervised Learning: from
scipy.cluster.hierarchy import dendrogram, linkage; Z = linkage(X,
'ward')

11. Advanced Neural Network Techniques with TensorFlow/Keras

● Custom Layers for Specific Operations: class


MyCustomLayer(tf.keras.layers.Layer): # Define computations
● Callbacks for Monitoring Training Process: model.fit(X, y,
callbacks=[tf.keras.callbacks.EarlyStopping()])
● TensorBoard for Training Visualization: tensorboard_callback =
tf.keras.callbacks.TensorBoard(log_dir='./logs')
● Custom Training Loops for Granular Control: for epoch in range(epochs):
# Manually iterate over batches
● Implementing Attention Mechanisms for NLP: class
AttentionLayer(tf.keras.layers.Layer): # Define attention computations
● Using Transfer Learning and Fine-Tuning Pre-trained Models: model =
tf.keras.applications.VGG16(include_top=False); model.trainable = False
● Generative Adversarial Networks for Data Generation: class
GAN(tf.keras.Model): # Define generator and discriminator
● Recurrent Neural Networks for Sequence Data: model =
tf.keras.models.Sequential([tf.keras.layers.LSTM(128),
tf.keras.layers.Dense(1)])
● Normalization Techniques for Faster Convergence:
tf.keras.layers.BatchNormalization()
● Custom Loss Functions and Metrics: def custom_loss(y_true, y_pred): #
Define custom logic

12. Advanced Python Tips and Tricks

● Using Walrus Operator for Assignment Expressions: if (n := len(a)) > 10:


print(f"List is too long ({n} elements)")
● Unpacking for Efficient Variable Assignment: a, *rest, b = range(10)
● Using pathlib for Filesystem Path Manipulation: from pathlib import Path;
p = Path('/usr/bin'); p.is_dir()

By: Waleed Mousa


● Dictionary Merging with ** Operator: merged_dict = {**dict1, **dict2}
● Using dataclasses for Boilerplate-free Data Structures: from dataclasses
import dataclass; @dataclass class Point: x: int; y: int
● Using Generators for Memory-efficient Loops: (x**2 for x in range(10))
● Context Managers for Resource Management: with open('file.txt') as f:
contents = f.read()
● Using functools.lru_cache for Memoization:
@functools.lru_cache(maxsize=None) def fib(n):
● Async/Await for Asynchronous Programming: async def fetch_data(): data =
await get_data()
● Type Hints for Improved Code Clarity: def greet(name: str) -> str:

By: Waleed Mousa

You might also like