0% found this document useful (0 votes)

119 views2 pages

Python Cheatsheet v2 Data Engineer

Uploaded by

Shreyan Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views2 pages

Python Cheatsheet v2 Data Engineer

Uploaded by

Shreyan Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Python Cheatsheet — Data Engineer Interview

(Fresher)
Comprehensive, colored, example-driven cheatsheet: core data structures, sorting & searching algorithms,
libraries, and practical snippets you’ll use in interviews.

Contents 1. Core Data Structures & Complexity

2. Collections & Useful Libraries (pandas, numpy, heapq, bisect)
3. Sorting Algorithms (code + analysis)
4. Searching & Graph Traversals (code + examples)
5. Generators, Itertools, File I/O, Concurrency basics
6. Practical Data-Engineer Snippets (CSV/Parquet, Pandas ops, chunking, streaming)
7. Interview Tips & Common Questions

1. Core Data Structures & Complexity

StructureDescriptionCommon Ops (avg)

Structure Description Complexity

List Ordered, mutable, allows duplicates. Backed by dynamic Index

array.O(1), Append O(1) amortized, Insert/De

Tuple Ordered, immutable. Use for fixed records, keys in dict. Access O(1)

Set Unordered, unique items, hash-based. Add/Remove/Membership O(1) average

Dict Key-value map, hash-based. Lookup/Insert/Delete O(1) average

Deque (collections.deque) Double-ended queue: fast appends/pops both ends. append/pop O(1)

Heap (heapq) Binary min-heap via list. push/pop O(log n)

Array (numpy.ndarray) Contiguous typed array, vectorized ops. Element access O(1), vector ops compact & f

Examples
List
nums = [1, 2, 3] nums.append(4) # slicing first_two = nums[:2]

Dict
student = {'name':'Alice','age':21} age = student.get('age') student['grade']='A'

Set
s = set([1,2,3]) s.add(4) if 2 in s: ...

Deque
from collections import deque d = deque([1,2,3]) d.appendleft(0) val = d.popleft()

2. Collections & Key Libraries for Data Engineering

collections: Counter, defaultdict, namedtuple, deque — useful for logs, counts, grouping.
heapq: min-heap. To simulate max-heap push negatives.
bisect: binary search helpers (bisect_left/right) for insertion points.
itertools: combinations, permutations, groupby, islice, chain — great for streaming data.
pandas: DataFrame/Series — core for ETL, aggregations, joins, resampling.
numpy: numerical arrays, vectorized ops—fast.

Pandas — Quick Practical Examples

import pandas as pd # read CSV in chunks (memory efficient) for chunk in
pd.read_csv('large.csv', chunksize=10_000): process(chunk) # common ops df =
pd.read_parquet('data.parquet') df = df.dropna(subset=['user_id']) agg =
df.groupby('country')['revenue'].sum().reset_index() # merge merged = df1.merge(df2,
how='left', on='id')
3. Sorting Algorithms (with Python examples & complexity)
Selection Sort — O(n^2), Insertion Sort — O(n^2) (good for nearly-sorted), Merge Sort — O(n log n)
stable, Quick Sort — O(n log n) avg (in-place), Heap Sort — O(n log n). Built-in sorted()/list.sort() uses
Timsort (stable, O(n log n) worst-case, optimized for runs).
def merge_sort(arr): if len(arr) <= 1: return arr mid = len(arr) // 2 left =
merge_sort(arr[:mid]) right = merge_sort(arr[mid:]) i = j = 0 merged = [] while i <
len(left) and j < len(right): if left[i] <= right[j]: merged.append(left[i]); i += 1 else:
merged.append(right[j]); j += 1 merged.extend(left[i:]); merged.extend(right[j:]) return
merged # Usage print(merge_sort([5,2,9,1]))

import random def quick_sort(arr): if len(arr) <= 1: return arr pivot = arr[len(arr)//2]
left = [x for x in arr if x < pivot] middle = [x for x in arr if x == pivot] right = [x for
x in arr if x > pivot] return quick_sort(left) + middle + quick_sort(right) # Usage
print(quick_sort([3,6,8,10,1,2,1]))

4. Searching & Graph Traversals

Binary Search (sorted array) — O(log n). DFS/BFS for graphs/trees: O(V+E). Use iterative (stack) or
recursive (watch recursion depth).
def binary_search(arr, target): lo, hi = 0, len(arr)-1 while lo <= hi: mid = (lo+hi)//2 if
arr[mid] == target: return mid elif arr[mid] < target: lo = mid + 1 else: hi = mid - 1
return -1 # Usage print(binary_search([1,2,4,5,9], 5))

from collections import deque def bfs(graph, start): visited = set([start]) q =

deque([start]) order = [] while q: node = q.popleft() order.append(node) for nb in
graph.get(node, []): if nb not in visited: visited.add(nb) q.append(nb) return order #
Usage graph = {'A':['B','C'], 'B':['D'], 'C':[], 'D':[]} print(bfs(graph,'A'))

5. Generators, Itertools, File I/O, Concurrency basics

Generators: lazy evaluation, memory efficient. itertools: groupby, islice, chain, tee. File I/O: use with
open(...) as f, process in chunks for large files. Concurrency: threading for I/O-bound, multiprocessing for
CPU-bound, asyncio for async I/O.
# generator example def read_lines(path): with open(path,'r') as f: for line in f: yield
line.strip() # itertools example import itertools for k, group in
itertools.groupby(sorted(data), key=lambda x: x['user']): handle_group(k, list(group))

6. Practical Data-Engineer Snippets

- Reading large CSV in chunks (pandas) and writing parquet. - Streaming from S3: use boto3, smart_open
or s3fs with pandas. - Efficient joins: ensure indexing, use categorical dtypes for memory savings. - Use
dtype argument in read_csv to reduce memory usage. - Use parquet for faster IO and compression. - Use
vectorized operations in pandas (avoid row-wise loops).
# read csv in chunks and write to parquet partitioned by country import pandas as pd for
chunk in pd.read_csv('big.csv', chunksize=100_000, dtype={'user_id': str}):
chunk.to_parquet('out.parquet', index=False, compression='snappy', engine='pyarrow')

7. Interview Tips & Common Questions

• Explain tradeoffs (time vs memory). Use Big-O. • Talk about stability of sorting when relevant. • For data
engineering: discuss data formats (CSV/Parquet/ORC), partitioning, schema, null handling. • Be ready to
write code (two-pointer, sliding window), and small ETL tasks (reading, grouping, aggregating). • Common
questions: implement LRU cache, merge k sorted lists, streaming median, deduplicate large file (external
sort).

Generated for: Freshers preparing for Data Engineer interviews — concise, practical, and example-driven. Good luck!

Data Structures in Python
No ratings yet
Data Structures in Python
9 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
7 pages
DS ML Python
No ratings yet
DS ML Python
4 pages
Data Structures in Python V2 Presentation
No ratings yet
Data Structures in Python V2 Presentation
15 pages
C Programming and Data Structures 41394658 2025 06-20-08 20
No ratings yet
C Programming and Data Structures 41394658 2025 06-20-08 20
35 pages
Python Data Structures Guide
100% (1)
Python Data Structures Guide
11 pages
Big O Notation and Complexity Guide
No ratings yet
Big O Notation and Complexity Guide
1 page
DSA - Piyushwairale 4 44
No ratings yet
DSA - Piyushwairale 4 44
41 pages
DSA Syllabus PDF
No ratings yet
DSA Syllabus PDF
4 pages
Ashford D. Python For Algorithms and Data Structures 2024
100% (1)
Ashford D. Python For Algorithms and Data Structures 2024
475 pages
Python Data Structures
No ratings yet
Python Data Structures
10 pages
Introduction To Data Structures and Algorithms
No ratings yet
Introduction To Data Structures and Algorithms
5 pages
Basics of DSA
No ratings yet
Basics of DSA
8 pages
Algorithm Assignment
No ratings yet
Algorithm Assignment
12 pages
Basic Data Structures
No ratings yet
Basic Data Structures
5 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Python Notes (Lists, Tuples, Sets, Dictionary)
No ratings yet
Python Notes (Lists, Tuples, Sets, Dictionary)
6 pages
Python With DSA Report Full
No ratings yet
Python With DSA Report Full
3 pages
100 Python DSA Tricks CheatSheet
No ratings yet
100 Python DSA Tricks CheatSheet
4 pages
Unit 2: Advanced Programming With Python: Index of Chapters
No ratings yet
Unit 2: Advanced Programming With Python: Index of Chapters
8 pages
Data Analysis Python Read The Docs Io en Latest
No ratings yet
Data Analysis Python Read The Docs Io en Latest
79 pages
Data Structures
No ratings yet
Data Structures
7 pages
CD3291 PreBook
No ratings yet
CD3291 PreBook
14 pages
Wa0005.
No ratings yet
Wa0005.
7 pages
Document
No ratings yet
Document
16 pages
Python Content
No ratings yet
Python Content
7 pages
Fluent Python - Notatki
No ratings yet
Fluent Python - Notatki
3 pages
Types of Sorting and Searching in Python
No ratings yet
Types of Sorting and Searching in Python
6 pages
Lecture 2 Python Data Structures
No ratings yet
Lecture 2 Python Data Structures
52 pages
Module2 Data Structures
No ratings yet
Module2 Data Structures
10 pages
Extracted
No ratings yet
Extracted
8 pages
DSP 22395 Unit 2 and 3
No ratings yet
DSP 22395 Unit 2 and 3
6 pages
N RQgi 8 Eg DUNFS451 K4 X QXA
No ratings yet
N RQgi 8 Eg DUNFS451 K4 X QXA
61 pages
Dsapb3 Notes
No ratings yet
Dsapb3 Notes
169 pages
Master Data Structures For Your AI Engineering Career
No ratings yet
Master Data Structures For Your AI Engineering Career
96 pages
Breaking Them Down Storing Answers Avoiding Repeated Work: Remembers
No ratings yet
Breaking Them Down Storing Answers Avoiding Repeated Work: Remembers
5 pages
Data Structures in Python Part 1
No ratings yet
Data Structures in Python Part 1
22 pages
Python Data Structures Beginner Notes
No ratings yet
Python Data Structures Beginner Notes
3 pages
Pythonn SE
No ratings yet
Pythonn SE
18 pages
Guide For FAANG Companies
No ratings yet
Guide For FAANG Companies
15 pages
Data Structures PYQ
No ratings yet
Data Structures PYQ
26 pages
Master Data Structures with Python
No ratings yet
Master Data Structures with Python
9 pages
Python For Data Science Cheat Sheet 2.0
100% (1)
Python For Data Science Cheat Sheet 2.0
11 pages
Data Structures in Python Presentation
No ratings yet
Data Structures in Python Presentation
15 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
4 pages
Python Interview Prep Guide
No ratings yet
Python Interview Prep Guide
11 pages
Advanced Data Structures in Python
No ratings yet
Advanced Data Structures in Python
8 pages
Python With DSA Report
No ratings yet
Python With DSA Report
2 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Graph-1 (1) - 1
No ratings yet
Graph-1 (1) - 1
36 pages
Labsheet 2
No ratings yet
Labsheet 2
21 pages
Coding Interview Cheat Sheet
100% (1)
Coding Interview Cheat Sheet
2 pages
The AoPS Introduction To Algebra PDF
63% (19)
The AoPS Introduction To Algebra PDF
658 pages
Math Worksheets
100% (29)
Math Worksheets
467 pages
Art of Problem Solving Prealgebra
82% (28)
Art of Problem Solving Prealgebra
1,011 pages
The Art of Problem Solving Intermediate Algebra
91% (35)
The Art of Problem Solving Intermediate Algebra
720 pages
Progress in Mathematics Grade 5
84% (45)
Progress in Mathematics Grade 5
537 pages
6th Grade Math Textbook, Progress PDF
75% (12)
6th Grade Math Textbook, Progress PDF
586 pages
5 TH Grade Go Math Textbook
67% (15)
5 TH Grade Go Math Textbook
566 pages
1001 Algebra Problems
96% (73)
1001 Algebra Problems
292 pages
6 TH Grade Go Math Textbook
79% (14)
6 TH Grade Go Math Textbook
606 pages
Math Word Problems Book
92% (39)
Math Word Problems Book
368 pages
WorkbookPre Algebra PDF
100% (1)
WorkbookPre Algebra PDF
119 pages
Scholastic Success With Math Grade 5
88% (16)
Scholastic Success With Math Grade 5
66 pages
Introduction To Geometry
88% (26)
Introduction To Geometry
580 pages
Art of Problem Solving Volume 2 and Beyond by Sandor Lehoczky, Richard Rusczyk
86% (21)
Art of Problem Solving Volume 2 and Beyond by Sandor Lehoczky, Richard Rusczyk
305 pages
6th Grade Math Notebook
100% (10)
6th Grade Math Notebook
172 pages
4th Grade Math Book PDF
92% (26)
4th Grade Math Book PDF
522 pages
Pre-Algebra and Algebra
100% (23)
Pre-Algebra and Algebra
66 pages
Challenging Word Problems, Grade 5 Singapore Marshall Cavendish Int (S) Pte LTD - 2011 - Marshall Cavendish Education
100% (10)
Challenging Word Problems, Grade 5 Singapore Marshall Cavendish Int (S) Pte LTD - 2011 - Marshall Cavendish Education
220 pages
Prealgebra: Richard Rusczyk
83% (6)
Prealgebra: Richard Rusczyk
612 pages
Math With Bad Drawings
94% (16)
Math With Bad Drawings
547 pages
Daily Math Review Sheets Grade 6
100% (8)
Daily Math Review Sheets Grade 6
78 pages
Master Multiplication Workbook
90% (10)
Master Multiplication Workbook
45 pages
3 RD Grade Go Math Textbook
100% (11)
3 RD Grade Go Math Textbook
599 pages
Introduction To Number Theory AOPS Part I
100% (6)
Introduction To Number Theory AOPS Part I
192 pages
Precalculus With Trigonometry
94% (33)
Precalculus With Trigonometry
791 pages
AOPS Introduction To Counting & Probability
100% (2)
AOPS Introduction To Counting & Probability
256 pages
[Algebra Essentials Practice Workbook with Answers Linear and Quadratic Equations Cross Multiplying and Systems of Equations Improve your Math Fluency Series] Chris McMullen - Algebra Essentials Practice Workbook with A.pdf
82% (11)
[Algebra Essentials Practice Workbook with Answers Linear and Quadratic Equations Cross Multiplying and Systems of Equations Improve your Math Fluency Series] Chris McMullen - Algebra Essentials Practice Workbook with A.pdf
207 pages
Algebra 1
83% (12)
Algebra 1
658 pages
Grade 5 Math Book
100% (10)
Grade 5 Math Book
212 pages
Envision Math Common Core Workbook
75% (8)
Envision Math Common Core Workbook
270 pages
Week 7 Graded Assignment 1: You Have Last Submitted On: 2025-07-27, 18:51 IST
No ratings yet
Week 7 Graded Assignment 1: You Have Last Submitted On: 2025-07-27, 18:51 IST
3 pages
Summary of The Interview With Pratheepa K
No ratings yet
Summary of The Interview With Pratheepa K
2 pages
Doubts
No ratings yet
Doubts
36 pages
Library Management System Report
No ratings yet
Library Management System Report
3 pages
Endocrine Glands Practice Questions
No ratings yet
Endocrine Glands Practice Questions
3 pages
Class 10 Sample Paper - 1
No ratings yet
Class 10 Sample Paper - 1
8 pages
SP - 7
No ratings yet
SP - 7
22 pages
CLASS 10 Sample Paper - 2
No ratings yet
CLASS 10 Sample Paper - 2
8 pages
Notes
No ratings yet
Notes
25 pages
Notes
No ratings yet
Notes
13 pages
Notes
No ratings yet
Notes
22 pages
Notes
No ratings yet
Notes
18 pages
CSD
No ratings yet
CSD
2 pages
NOTES
No ratings yet
NOTES
5 pages
Interview
No ratings yet
Interview
2 pages
Mid Term Report
No ratings yet
Mid Term Report
11 pages
Biology Unit Test - 1 PP - 1
No ratings yet
Biology Unit Test - 1 PP - 1
2 pages
Annual Paper - I (Solutions)
No ratings yet
Annual Paper - I (Solutions)
9 pages
Thre Dimensional Geometry
No ratings yet
Thre Dimensional Geometry
8 pages
Chapter - 13
No ratings yet
Chapter - 13
45 pages
Chapter - 16
No ratings yet
Chapter - 16
30 pages
Recognizing Loci by Inspection
No ratings yet
Recognizing Loci by Inspection
4 pages
Extra Questions
No ratings yet
Extra Questions
1 page
CHAPTER - 15 (Revision)
No ratings yet
CHAPTER - 15 (Revision)
7 pages
Chapter - 16 - Calculus
No ratings yet
Chapter - 16 - Calculus
12 pages
Statistics Notes 1
No ratings yet
Statistics Notes 1
4 pages
Number Theory
No ratings yet
Number Theory
6 pages
Programming Assignment 2
No ratings yet
Programming Assignment 2
2 pages
Ai SMPS 2025 Programming Assignment 1 TSP
No ratings yet
Ai SMPS 2025 Programming Assignment 1 TSP
1 page
(Restricted) Module Note Systems Thinking HBS
No ratings yet
(Restricted) Module Note Systems Thinking HBS
11 pages
Binomial Heaps for CS Students
No ratings yet
Binomial Heaps for CS Students
26 pages
List ADT - Array Implementation
No ratings yet
List ADT - Array Implementation
21 pages
Practical No 9
No ratings yet
Practical No 9
4 pages
Tree Structures & Binary Trees
No ratings yet
Tree Structures & Binary Trees
18 pages
Data Structure Mcqs
No ratings yet
Data Structure Mcqs
45 pages
Unit 4 PYQs
No ratings yet
Unit 4 PYQs
2 pages
Introduction To Tree Data Structure and Algorithm
No ratings yet
Introduction To Tree Data Structure and Algorithm
23 pages
5 Red Black Trees
No ratings yet
5 Red Black Trees
29 pages
04 - LinkedLists
No ratings yet
04 - LinkedLists
53 pages
Data Structures: Priority Queues & Heaps
No ratings yet
Data Structures: Priority Queues & Heaps
47 pages
Updated DS - ESE - Question Bank 2024
No ratings yet
Updated DS - ESE - Question Bank 2024
11 pages
DS Lab Assignment
No ratings yet
DS Lab Assignment
62 pages
B-Tree Node Deletion Explained
No ratings yet
B-Tree Node Deletion Explained
7 pages
DataStructure - 3
No ratings yet
DataStructure - 3
6 pages
Discrete Mathematics
No ratings yet
Discrete Mathematics
43 pages
Tree DPP
No ratings yet
Tree DPP
3 pages
Weekly Evaluation Report - BCSL305
No ratings yet
Weekly Evaluation Report - BCSL305
1 page
Mid Square Method
No ratings yet
Mid Square Method
2 pages
DSC Lab
No ratings yet
DSC Lab
5 pages
B-Trees and B+-Trees Explained
No ratings yet
B-Trees and B+-Trees Explained
34 pages
Handling Collisions and Rehashing
No ratings yet
Handling Collisions and Rehashing
26 pages
Algo - 2 Linked Linear List L2
No ratings yet
Algo - 2 Linked Linear List L2
30 pages
? Ultimate DSA Sheet-Tricks, Codes & Optimized Approaches
No ratings yet
? Ultimate DSA Sheet-Tricks, Codes & Optimized Approaches
52 pages
DSA - Practical - File (1) Sagar Kumar
No ratings yet
DSA - Practical - File (1) Sagar Kumar
35 pages
ADS NOTES Jntuk
No ratings yet
ADS NOTES Jntuk
107 pages
C Program for Binary Search Tree Operations
No ratings yet
C Program for Binary Search Tree Operations
9 pages
Understanding AVL Trees and Rotations
No ratings yet
Understanding AVL Trees and Rotations
7 pages
CC213
0% (1)
CC213
1 page
Binomial Heaps: Manoj Kumar DTU, Delhi
No ratings yet
Binomial Heaps: Manoj Kumar DTU, Delhi
36 pages
Data Structure Assignment Questions
No ratings yet
Data Structure Assignment Questions
2 pages

Python Cheatsheet v2 Data Engineer

Uploaded by

Python Cheatsheet v2 Data Engineer

Uploaded by

Python Cheatsheet — Data Engineer Interview

Contents 1. Core Data Structures & Complexity

1. Core Data Structures & Complexity

Structure Description Complexity

List Ordered, mutable, allows duplicates. Backed by dynamic Index

Set Unordered, unique items, hash-based. Add/Remove/Membership O(1) average

Dict Key-value map, hash-based. Lookup/Insert/Delete O(1) average

Heap (heapq) Binary min-heap via list. push/pop O(log n)

2. Collections & Key Libraries for Data Engineering

Pandas — Quick Practical Examples

4. Searching & Graph Traversals

from collections import deque def bfs(graph, start): visited = set([start]) q =

5. Generators, Itertools, File I/O, Concurrency basics

6. Practical Data-Engineer Snippets

7. Interview Tips & Common Questions

You might also like