LECTURE 7
🔶 1. Introduction to File Handling in Python
File handling is one of the most essential operations in any programming
language. Applications often need to store data permanently, read previously
stored information, update files, or exchange data with other programs.
Python offers an extremely consistent and simple approach for working with
files of various formats.
In programming, memory (RAM) is temporary storage. Once the program
ends or the computer shuts down, all data stored in memory is lost. For long-
term storage, we use files, which are stored in secondary storage devices
such as hard drives, SSDs, or cloud storage.
Python provides a powerful file-handling interface that supports plain text
files, structured data files, tabular files, and nested hierarchical formats.
These formats include:
TXT (Plain Text)
CSV (Comma-Separated Values)
JSON (JavaScript Object Notation)
XML (Extensible Markup Language)
Each format has its own advantages and is used for different types of
applications.
🟦 1. Introduction to Text Files
A text file (TXT file) is the simplest and most commonly used file format in
computing.
It contains plain text—without formatting, colors, tables, images, or special
structures.
These files store data as readable characters, organized sequentially.
Text files are the foundation of many computer operations, such as:
configuration settings
log files
program output
documentation
temporary storage
communication between programs
In Python, text file handling is extremely simple yet highly powerful due to
Python’s built-in file interface.
Unlike binary files, TXT files are easy to write, read, debug, and inspect using
any editor (Notepad, VS Code, Sublime, etc.).
🟦 2. Why Text Files Are Important in Programming
In real-world projects, TXT files are used for purposes such as:
2.1 Log Files
Programs often need to record:
events
errors
exceptions
user actions
timestamps
Log files help developers debug issues by keeping a running history of what
happened during execution.
2.2 Configuration Storage
Some applications store settings such as:
usernames
passwords (hashed)
API keys
timeout values
default preferences
These can be stored in .txt, .ini, or .cfg files.
2.3 Temporary Data Storage
During long processes, or before writing to a database, a program may
temporarily store intermediate results in a text file.
2.4 Data Transfer Between Programs
Two programs may share information using a simple text file rather than
complex communication protocols.
2.5 File-Based Databases
Smaller programs sometimes use text files as small databases (key-value
storage, lists, counters, etc.).
🟦 3. Understanding the File Handling Workflow
When working with text files, Python follows a clear workflow:
Step 1: Open the File
Using the open() function with the appropriate mode (read, write, append).
Step 2: Perform Operation
Examples:
writing lines
reading contents
searching text
updating file
Step 3: Close the File
So that Python can:
free system resources
complete pending disk writes
avoid file corruption
Using the with-statement closes automatically.
🟦 4. File Open Modes in Python (Detailed
Explanation)
Understanding modes is the most important part of working with files.
Mod Meaning Explanation
e
"r" Read Opens existing file only; error if file doesn't
exist
"w" Write Creates new file or overwrites existing file
"a" Append Adds data at the end; does not delete
previous content
"r+" Read + No overwrite; file must exist
Write
"w+ Write + Overwrites file, creates if missing
" Read
"a+" Append + Reads + appends; file created if missing
Read
"x" Create Creates new file; error if already exists
"t" Text Mode Default mode for reading text
"b" Binary Mode Used for images, PDFs, videos
Examples of Combined Modes
"rt" → read text (default)
"wb" → write binary
"ab+" → append + read + binary
These modes allow precise control over how Python interacts with files.
🟦 5. Opening Text Files
The simplest method:
file = open("[Link]", "r")
content = [Link]()
[Link]()
Why this is not recommended:
If the program crashes before [Link]() executes, the file remains open,
causing:
memory leak
file lock
incomplete writes
✔ Recommended Method — Using with
with open("[Link]", "r") as file:
content = [Link]()
The with statement ensures:
automatic closing
reduced bugs
cleaner syntax
🟦 6. Writing to Text Files (Deep Explanation)
6.1 Overwriting a File (w mode)
with open("[Link]", "w") as f:
[Link]("Welcome to Python file handling.\n")
[Link]("This will overwrite existing content.")
If the file exists → entire content is replaced
If it doesn’t → new file is created
6.2 Appending to a File (a mode)
Append mode is used for:
logs
histories
reports
incremental updates
with open("[Link]", "a") as f:
[Link]("\nAppending a new line at the end.")
Append mode NEVER deletes old content.
🟦 7. Reading Text Files (Very In-Depth)
Python provides multiple strategies for reading text files, each suited for
different scenarios.
7.1 Read Entire File
Used for:
small files
configuration reading
simple operations
with open("[Link]", "r") as f:
content = [Link]()
print(content)
7.2 Reading Line-by-Line
Useful when:
file is large
processing is done per line
memory optimization is needed
with open("[Link]", "r") as f:
for line in f:
print([Link]())
.strip() removes:
newline characters
extra spaces
7.3 readline() Method
Reads the next line each time it's called:
with open("[Link]", "r") as f:
first = [Link]()
second = [Link]()
7.4 readlines() Method
Returns list of ALL lines:
with open("[Link]", "r") as f:
lines = [Link]()
for line in lines:
print([Link]())
🟦 8. Writing Lists and Multiple Lines to Text Files
Example 1: Writing a List to a File
fruits = ["apple", "banana", "mango"]
with open("[Link]", "w") as f:
for fruit in fruits:
[Link](fruit + "\n")
Example 2: Using writelines()
lines = ["Python\n", "Java\n", "C++\n"]
with open("[Link]", "w") as f:
[Link](lines)
🟦 9. Understanding File Encoding (Very Important
Topic)
Every text file is stored using a particular encoding, which determines how
characters are translated into bytes.
Python supports many encodings, but the most important ones are:
9.1 UTF-8 (Recommended)
Supports all languages (English, Urdu, Chinese, Arabic, etc.)
Most modern systems use UTF-8 by default
Lightweight compared to older Unicode formats
Example: Writing a UTF-8 File
with open("[Link]", "w", encoding="utf-8") as f:
[Link]("Hello, this is a UTF-8 encoded file.")
Example: Reading a UTF-8 File
with open("[Link]", "r", encoding="utf-8") as f:
data = [Link]()
9.2 ASCII (Old Standard)
ASCII supports only:
English letters
numbers
basic punctuation
If you try to write Urdu, Chinese, emojis, etc. → ERROR.
9.3 Why Encoding Matters
Because files may contain:
non-English characters
emoji
accented words
special symbols
If wrong encoding is used:
Python throws UnicodeDecodeError
File may not open or may show garbage text
🟦 10. File Pointer and Cursor Movement
When Python reads a file, it uses a file pointer (cursor) to track the current
position.
10.1 Checking Current Position — tell()
with open("[Link]", "r") as f:
print([Link]())
Returns the cursor location in bytes.
10.2 Moving Cursor — seek()
with open("[Link]", "r") as f:
[Link](5) # Move to 5th byte
data = [Link]()
print(data)
seek() is used in:
log analyzers
file scanners
partial reading
skipping headers
reading from middle of file
🟦 11. Searching Inside Text Files (Very Practical)
Often we need to find:
a word
a line
an entry
a keyword
11.1 Search for a Word in File
with open("[Link]", "r") as f:
for line in f:
if "error" in [Link]():
print("Found:", [Link]())
Useful for:
log files
report scanning
keyword extraction
11.2 Find Line Numbers
with open("[Link]", "r") as f:
for num, line in enumerate(f, start=1):
if "python" in [Link]():
print("Line", num, ":", [Link]())
🟦 12. Updating Text Files (The Right Way)
Text files cannot be edited in the middle directly.
You cannot “insert” text in the middle without rewriting.
Therefore, updating is done by:
✔ Reading Entire File
✔ Modifying in Memory
✔ Rewriting File Completely
12.1 Example: Replacing Text
with open("[Link]", "r") as f:
content = [Link]()
content = [Link]("oldword", "newword")
with open("[Link]", "w") as f:
[Link](content)
12.2 Example: Removing a Line
with open("[Link]", "r") as f:
lines = [Link]()
with open("[Link]", "w") as f:
for line in lines:
if "remove this" not in line:
[Link](line)
🟦 13. Writing Structured Data into Text Files
13.1 Key-Value Format (Simple Database)
with open("[Link]", "w") as f:
[Link]("name=Alice\n")
[Link]("age=24\n")
[Link]("city=London\n")
This forms the basis of:
.env files
.ini config files
custom settings
13.2 Multi-Column Text Data
with open("[Link]", "w") as f:
[Link]("Name Marks Grade\n")
[Link]("Alice 88 A\n")
[Link]("James 76 B\n")
🟦 14. Using Context Managers (Deep Theory)
A context manager ensures a file closes automatically.
Python does:
1. Setup file opening
2. Execute block
3. Close file even if an error happens
14.1 Custom Context Manager Example
class FileManager:
def __init__(self, filename, mode):
[Link] = filename
[Link] = mode
def __enter__(self):
[Link] = open([Link], [Link])
return [Link]
def __exit__(self, exc_type, exc_value, traceback):
[Link]()
with FileManager("[Link]", "w") as f:
[Link]("Hello")
This explains how with open() works internally.
🟦 15. Handling File-Related Errors (Very Important)
Real-world programs must handle file issues safely.
15.1 Common Errors
Error Meaning
FileNotFoundError File does not exist
PermissionError No permission to access
UnicodeDecodeEr Wrong encoding
ror
IOError Input/output failure
IsADirectoryError Attempted to open a
directory
ValueError Invalid mode
15.2 Handling Errors with Try-Except
try:
with open("[Link]", "r") as f:
print([Link]())
except FileNotFoundError:
print("The file does not exist.")
except PermissionError:
print("You do not have permissions.")
except Exception as e:
print("Unexpected error:", e)
🟦 16. File Existence Check
Before reading:
import os
if [Link]("[Link]"):
print("File exists")
else:
print("File not found")
🟦 17. Deleting a File
import os
if [Link]("[Link]"):
[Link]("[Link]")
🟦 18. Practical Real-Life Uses of TXT Files
18.1 Log Tracking
Applications store:
crashes
errors
timestamps
user actions
18.2 Storing User Input
CLI applications usually store:
usernames
feedback
questionnaire responses
18.3 Local Database for Small Programs
Notepad-like data storage for:
todo-lists
shopping lists
reminders
18.4 Exporting Program Output
Science and math scripts write results to .txt for:
graphs
calculations
measurements
🟦 19. Searching Patterns in Text Files Using Regular
Expressions (Regex)
Regex (Regular Expressions) allow you to search for complex patterns inside
text files.
Python provides this through the re module.
Common Use Cases:
finding email addresses
searching dates
finding errors in logs
detecting phone numbers
validating patterns
extracting data
⭐ 19.1 Example: Finding All Email Addresses
import re
with open("[Link]", "r") as f:
text = [Link]()
emails = [Link](r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", text)
print(emails)
This returns a list of all email-like patterns in the file.
⭐ 19.2 Example: Find Lines Containing a Date
import re
pattern = r"\d{2}/\d{2}/\d{4}"
with open("[Link]", "r") as f:
for line in f:
if [Link](pattern, line):
print("Date found:", [Link]())
Regex makes file scanning incredibly powerful.
⭐ 19.3 Example: Finding Only Numbers
nums = [Link](r"\d+", open("[Link]").read())
print(nums)
🟦 20. Memory-Efficient File Reading for Huge Files
(100MB, 1GB, etc.)
If a file is huge, reading it with .read() or .readlines() will crash your program
or make it slow.
So Python provides memory-efficient methods.
⭐ 20.1 Reading File Line-by-Line (Streaming)
Best method for large files:
with open("[Link]", "r") as f:
for line in f:
process(line)
This loads only one line at a time.
⭐ 20.2 Reading in Chunks
with open("[Link]", "r") as f:
chunk = [Link](1024) # 1 KB chunk
while chunk:
process(chunk)
chunk = [Link](1024)
Useful when dealing with:
very large documents
massive log files
server data dumps
⭐ 20.3 Processing First N Lines
with open("[Link]") as f:
for i in range(100): # first 100 lines
print([Link]().strip())
🟦 21. Creating a Text-Based Mini Database (Real
Technique)
Many small programs use text files as simple databases.
⭐ 21.1 Example: Storing User Data
[Link]:
id:1, name:Alice, city:London
id:2, name:James, city:Sydney
id:3, name:Emma, city:Berlin
Python Code to Add New User
def add_user(uid, name, city):
with open("[Link]", "a") as f:
[Link](f"id:{uid}, name:{name}, city:{city}\n")
⭐ 21.2 Searching for a User
with open("[Link]", "r") as f:
for line in f:
if "Alice" in line:
print("Found:", line)
⭐ 21.3 Updating a User (Full Rewrite Technique)
with open("[Link]", "r") as f:
lines = [Link]()
with open("[Link]", "w") as f:
for line in lines:
if "id:2" in line:
[Link]("id:2, name:John Adams, city:Tokyo\n")
else:
[Link](line)
🟦 22. Text File Formatting Techniques (Advanced)
TXT files often need formatting:
22.1 Fixed-Width Columns
Name Marks Grade
Alice 88 A
Bob 75 B
Python Code:
with open("[Link]", "w") as f:
[Link](f"{'Name':10} {'Marks':8} {'Grade':5}\n")
[Link](f"{'Alice':10} {88:<8} {'A':5}\n")
22.2 Indented / Structured Text
Useful for:
logs
documentation
hierarchies
with open("[Link]", "w") as f:
[Link]("Project Structure:\n")
[Link](" src/\n")
[Link](" [Link]\n")
[Link](" [Link]\n")
🟦 23. Converting Text Files into Other Formats
(Practical Use)
23.1 TXT → CSV
with open("[Link]") as f, open("[Link]", "w") as out:
for line in f:
parts = [Link]().split()
[Link](",".join(parts) + "\n")
23.2 TXT → JSON
import json
data = {}
with open("[Link]") as f:
for line in f:
key, value = [Link]().split("=")
data[key] = value
[Link](data, open("[Link]", "w"), indent=4)
🟦 24. Backup and Versioning of Text Files
Text files are often used for configuration.
Incorrect writing can break entire programs.
Best Practice: Create Backup Before Updating
import shutil
[Link]("[Link]", "config_backup.txt")
🟦 25. Advanced Text File Algorithms (Real
Programming Logic)
25.1 Counting Lines
count = sum(1 for _ in open("[Link]"))
print("Total lines:", count)
25.2 Counting Words
with open("[Link]") as f:
text = [Link]()
words = [Link]()
print("Total Words:", len(words))
25.3 Counting Frequency of Each Word
from collections import Counter
with open("[Link]") as f:
words = [Link]().split()
freq = Counter(words)
print(freq)
🟦 26. Real-Life Projects Using TXT Files
Here are actual practical projects built using TXT files.
⭐ Project 1 — Password Manager (TXT Storage)
Stores usernames and encrypted passwords.
with open("[Link]", "a") as f:
[Link](username + ":" + password_hash + "\n")
⭐ Project 2 — Task Manager
def add_task(task):
with open("[Link]", "a") as f:
[Link](task + "\n")
⭐ Project 3 — Chat Logger
Stores chat history in plain text.
⭐ Project 4 — Simple Key-Value Database
username=admin
timeout=45
theme=dark
Python reads and loads it into a dictionary.
⭐ Project 5 — Log File Analyzer
Program reads millions of lines and extracts:
errors
warnings
performance time
login failures
🟦 27. Best Practices for Using Text Files
Always use with open()
Always specify encoding (utf-8)
Never read huge files using .read()
Create backups before overwriting
Use exceptions to avoid crashes
Keep file paths organized
Use .strip() when reading lines
Close files (or use context managers)
Avoid storing secure passwords in plain text
Use timestamped file names for logs
1. Introduction to CSV Files
CSV stands for Comma-Separated Values, a simple and widely used file
format for storing tabular data.
A CSV file organizes information in rows and columns, similar to tables in
spreadsheets or databases.
CSV is extremely popular because:
It is simple to create and read
It is compatible with almost every software program (Excel,
Google Sheets, Databases, Python, R, Java, etc.)
It stores data in plain text, making it lightweight
It is excellent for sharing data across different systems
A CSV file typically has the extension:
[Link]
CSV is one of the most universal data formats and is widely used in
industries such as:
Data science
Machine learning
Banking and finance
Government records
Inventory systems
HR management systems
Education (grade sheets, attendance)
2. Structure of a CSV File
CSV files follow a simple structure:
✓ Each row = one record
✓ Each column = one field
✓ Comma ( , ) separates fields
Example:
Name, Age, Department
Alice, 25, Sales
Bob, 30, Marketing
Charlie, 28, HR
2.1 Table Representation
Nam Ag Departm
e e ent
Alice 25 Sales
Bob 30 Marketing
Charli 28 HR
e
2.2 CSV File as Plain Text
A CSV is basically a text file, where:
Row separator → \n (newline)
Column separator → , (comma)
Title, Author, Pages\n
1984, George Orwell, 268\n
Jane Eyre, Charlotte Bronte, 532\n
3. Why CSV Is So Common
CSV is preferred worldwide because:
1. Human readable
The file is simple text.
2. Machine readable
Programming languages easily parse it.
3. Lightweight and fast
No styling, no formatting like Excel.
4. Cross-platform compatibility
Used in Windows, Linux, Mac, Android, Web.
5. Works with databases
CSV files can be imported into MySQL, SQL Server, Oracle, PostgreSQL, etc.
6. Easy to process in Python
Python provides both manual reading and the csv module.
4. How CSV Stores Data Internally
Even though CSV looks simple, it follows specific rules:
Rule 1: Comma separates fields
city,country,population
Rule 2: Newline separates records
row1\n
row2\n
row3\n
Rule 3: Special characters must be quoted
If a field contains comma:
"New York, USA", 21000000
Rule 4: Empty values are allowed
Name,Age
Alice,25
Bob,
Rule 5: CSV does NOT store datatype
Everything is text.
5. CSV Handling in Python
Python allows two main ways to work with CSV files:
Method 1: Manual Processing
1. Open file
2. Read lines
3. Strip newline
4. Split by comma
This helps students understand:
How files work
How text is parsed
How lists are created
5.1 Reading CSV Manually
file_obj = open("[Link]")
csv_rows = file_obj.readlines()
list_csv = []
for row in csv_rows:
row = [Link]("\n")
cells = [Link](",")
list_csv.append(cells)
print(list_csv)
Output
[['Title','Author','Pages'],
['1984','George Orwell','268'],
['Jane Eyre','Charlotte Bronte','532']]
Explanation
readlines() → loads all rows as list
strip("\n") → removes newline
split(",") → divides row into columns
Final result → 2D list (list of records)
5.2 Writing CSV Manually
file = open("[Link]", 'w')
[Link]("Name,Marks\n")
[Link]("Alice,89\n")
[Link]("Bob,76\n")
[Link]()
Important Points
write() does not add newline automatically
You must add \n manually
Always close file
6. Using Python’s Built-in csv Module
Python provides the csv library for easier processing.
6.1 Reading with [Link]
import csv
with open("[Link]") as f:
reader = [Link](f)
for row in reader:
print(row)
Output:
['Alice', '25', 'Sales']
['Bob', '30', 'Marketing']
6.2 Writing with [Link]
import csv
with open("[Link]", 'w', newline='') as f:
writer = [Link](f)
[Link](["Name", "Age"])
[Link](["Sam", 22])
6.3 Reading CSV as Dictionaries
import csv
with open("[Link]") as f:
reader = [Link](f)
for row in reader:
print(row["Name"], row["Age"])
Why DictReader is useful?
Reads header automatically
Output is dictionaries
Easier to use in data science
7. Handling Special Cases in CSV
CSV sometimes includes:
✔ Quoted Fields
"New York, USA", 21000000
✔ Escape Characters
To include quotes inside text:
"John said ""Hello"""
✔ Missing Data
Alice,25
Bob,
Python handles these using:
[Link] with quoting options
csv module dialects
8. Dialects in CSV
CSV files differ across countries:
Country Separator
USA comma (,)
Europe semicolon
(;)
Old tab (\t)
systems
Python supports dialects:
csv.register_dialect("semicolon", delimiter=';')
9. Advantages of CSV
✔ Simple and lightweight
✔ Fast to read/write
✔ Compatible with all systems
✔ No extra software required
✔ Easy to debug
✔ Works well with Python, Excel, Sheets
10. Limitations of CSV
❌ No data types
Numbers, strings, dates all look same.
❌ No support for nested structure
Cannot store complex data like:
{"name": "Alice", "scores": [89, 90, 92]}
❌ No standard about missing values
Some use blank, some use NA, some use null.
❌ Difficult with commas inside fields
Requires quoting.
❌ Does not support styling
Unlike Excel.
11. CSV vs Excel
Feature CSV Excel
File type Text Binar
y
Formatting ❌ No ✔ Yes
Formula ❌ ✔
support
Speed Fast Slow
er
Compatibility Excelle Good
nt
File size Small Large
r
12. Real-Life Applications of CSV
CSV is used in:
1. Data Science / Machine Learning
Datasets like [Link], [Link].
2. Banking
Transactions, account statements.
3. Marketing
Customer data, campaign results.
4. HR
Employee lists, attendance, payroll.
5. Healthcare
Patient reports, hospital records.
6. Government Records
Population census.
7. E-commerce
Product catalogs, orders.
13. Errors and Exception Handling in CSV
Common errors:
1. FileNotFoundError
File path incorrect.
try:
file = open("[Link]")
except FileNotFoundError:
print("File not found")
2. ValueError
Wrong data format.
3. UnicodeDecodeError
File encoding mismatch.
14. CSV in Data Science (Advanced Note)
CSV is the most used format in:
Pandas
NumPy
Machine learning pipelines
Example with pandas:
import pandas as pd
df = pd.read_csv("[Link]")
print([Link]())
15. Summary of CSV Files
CSV means Comma Separated Values
Used for storing table-like data
Very simple, text-based
Easily processed manually or using Python’s csv module
Ideal for data sharing
Supported by almost all software
Has limitations (no types, no nesting)
Very common in data science and business applications
1. Introduction to JSON
JSON stands for JavaScript Object Notation, a lightweight and structured
data format used for storing and transmitting data.
Even though JSON originated from JavaScript, it is now language-
independent and used by almost every programming language including
Python, Java, C#, PHP, R, etc.
JSON is especially popular in:
APIs (Application Programming Interfaces)
Web applications
Mobile applications
Cloud computing
Data science and machine learning
Configuration files
JSON is easy for both humans and machines to read.
A JSON file has the extension:
[Link]
2. Why JSON Is So Popular
JSON is one of the most used data formats in the world because:
✔ Human-readable
Follows a clear key–value structure.
✔ Machine-readable
Almost every language has built-in JSON parsers.
✔ Supports nested data
Unlike CSV.
✔ Lightweight and fast
Less complex than XML.
✔ Used everywhere
APIs, servers, databases, configuration systems.
✔ Supported by web technologies
JavaScript handles JSON natively.
3. Structure of a JSON File
JSON contains data in pairs:
"key": "value"
The entire JSON dataset uses:
Curly braces { } → for objects
Square brackets [ ] → for arrays
3.1 Basic Example
"name": "Alice",
"age": 25,
"department": "Sales"
This is a JSON object containing:
name → string
age → number
department → string
3.2 Nested Structures
JSON supports nested lists and objects.
Example:
"student": {
"name": "John",
"marks": [85, 90, 92],
"address": {
"city": "Delhi",
"pincode": 110001
}
This cannot be stored easily in CSV, which is why JSON is preferred when
working with complex data.
4. JSON Data Types
JSON supports the following data types:
JSON Exampl Equivalent Python
Type e Type
String "Hello" str
Number 25.5 int / float
Boolean true/ True/False
false
Null null None
Object {"a":1} dict
Array [1,2,3] list
5. JSON vs Python Dictionary
JSON object looks almost identical to a Python dictionary.
JSON Python
null None
true True
false False
Uses double quotes Quotes
only optional
6. JSON File Format Rules
Rule 1: Keys must be strings in double quotes
"age": 30
Rule 2: Values may be any JSON data type
Rule 3: Strings must use double quotes
NOT allowed:
'name': 'John'
Rule 4: No trailing comma
Wrong:
"name": "Alex",
Rule 5: Arrays must start with [ ] and contain comma-separated
items
7. JSON in Python
Python provides a built-in module:
import json
This module supports:
Reading JSON
Writing JSON
Conversion between JSON and Python objects
8. Reading JSON Files in Python
8.1 [Link]() — Read from File
import json
with open("[Link]") as f:
data = [Link](f)
print(data)
Output (Python dictionary)
{'name': 'Alice', 'age': 25}
9. Writing JSON Files in Python
[Link]() — Write to File
import json
employee = {
"name": "John",
"id": 101,
"skills": ["Python", "SQL", "AI"]
with open("[Link]", 'w') as f:
[Link](employee, f, indent=4)
Explanation:
indent=4 → beautifies the JSON
Data converted automatically into JSON format
10. Parsing JSON Strings
Sometimes JSON arrives as a string (e.g., from an API).
[Link]() — Convert JSON string to Python
import json
data = '{"name": "Sara", "age": 21}'
parsed = [Link](data)
print(parsed["name"])
11. Converting Python to JSON String
[Link]()
import json
data = {"x": 10, "y": 20}
json_string = [Link](data)
print(json_string)
Output:
{"x": 10, "y": 20}
12. Working with Nested JSON
Example JSON:
"company": "TechCorp",
"employees": [
{"name": "Maya", "age": 29},
{"name": "David", "age": 34}
Accessing nested values:
data["employees"][0]["name"] # Maya
13. Pretty Printing JSON
Useful for debugging.
print([Link](data, indent=4))
14. Validating JSON
Invalid JSON example:
name: "Ravi",
age: 30,
Errors:
Missing quotes
Trailing comma
Use a JSON validator or try loading:
try:
[Link](text)
except [Link]:
print("Invalid JSON")
15. JSON vs CSV vs XML
Feature JSON CSV XML
Data type ✔ ❌ Weak ✔ Strong
support Strong
Nested data ✔ Yes ❌ No ✔ Yes
Speed Fast Fastest Slower
Human Excellen Good Average
readability t
API support ✔ Best ❌ Rare ✔
Common
Storage Medium Small Large
Structure Key– Rows– Tag-
value Columns based
JSON stands in the middle:
More structured than CSV
Less complex than XML
16. Real-World Applications of JSON
1. API Communication
Almost all modern APIs return JSON:
Weather APIs
Google Maps API
Social media APIs (Twitter, Facebook)
2. Web Development
JavaScript directly parses JSON.
3. Mobile Apps
Android and iOS use JSON for data exchange.
4. Databases
MongoDB stores data in JSON-like structure.
5. Configuration Files
Many apps use:
[Link]
[Link]
6. Data Science
Dataset formats:
[Link]
[Link]
model_config.json
17. Advantages of JSON
✔ Readable and simple
✔ Supports hierarchical data
✔ Lightweight
✔ Great for APIs
✔ Works perfectly with JavaScript
✔ Supported by Python's json module
✔ Cross-platform
✔ Ideal for web/mobile applications
18. Limitations of JSON
❌ No comments supported
JSON does not allow:
// This is a comment
❌ No date format
Only string representation.
❌ No built-in support for binary data
Needs Base64 encoding.
❌ Keys must be strings
Numbers or booleans cannot be used as keys.
❌ Larger file size than CSV
Because of braces and keys.
19. Typical Errors When Working with JSON
1. JSONDecodeError
Due to invalid JSON structure.
2. KeyError
Accessing missing key.
3. TypeError
For example, treating list like dictionary.
4. Unicode/Encoding Issues
Special characters require UTF-8.
20. JSON in Data Science
In ML projects, JSON is used for:
Dataset labels
Model architecture
Hyperparameters
Experiment results
Configuration files
Example with pandas:
import pandas as pd
df = pd.read_json("[Link]")
print(df)
21. Summary of JSON Files
JSON stands for JavaScript Object Notation
Stores data in key–value format
Uses { } for objects and [ ] for arrays
Supports nested structures
Widely used in web development, APIs, mobile apps, and cloud
platforms
Python provides [Link](), [Link](), [Link](), [Link]()
Human-readable and easy to use
More structured than CSV and simpler than XML
1. Introduction to XML
XML stands for Extensible Markup Language.
It is a structured, text-based format used to store, organize, and transport
data in a hierarchical way.
XML was developed by the World Wide Web Consortium (W3C) and is
widely used for:
Data storage
Data transfer
Configuration files
Web services
Document systems
SOAP APIs
Mobile and web applications
XML is a self-descriptive format because the data explains itself using
tags.
Example:
<student>
<name>John</name>
<age>21</age>
</student>
XML is more complex than JSON and CSV, but much more powerful for
storing structured and semi-structured data.
2. Why XML Was Created
Before XML, data formats were:
Incompatible
Hard to read
Not structured
Not suitable for the internet
XML was created to:
✔ Store structured data
✔ Make data machine-readable
✔ Make data self-descriptive
✔ Enable data exchange between different systems
✔ Create custom markup tags
3. Structure of an XML Document
XML follows a strict structure.
3.1 XML Declaration (optional)
<?xml version="1.0" encoding="UTF-8"?>
3.2 Root Element
Every XML file must have one and only one root.
<library>
...
</library>
3.3 Child Elements
Elements inside the root.
<library>
<book>...</book>
</library>
3.4 Text Content
Inside elements.
<title>Harry Potter</title>
4. XML Syntax Rules
XML follows important rules:
Rule 1: Every tag must have a closing tag
✔ Correct:
<name>John</name>
❌ Incorrect:
<name>John
Rule 2: Tags are case-sensitive
<Student> ≠ <student>
Rule 3: Attribute values must be in quotes
<book id="101">
Rule 4: One root element only
❌ Invalid XML:
<student></student>
<teacher></teacher>
✔ Valid XML:
<school>
<student></student>
<teacher></teacher>
</school>
Rule 5: Elements must be properly nested
Wrong:
<b><i>Text</b></i>
Correct:
<b><i>Text</i></b>
5. Components of XML
XML consists of several components:
5.1 Elements
An element is everything inside a pair of tags:
<city>Delhi</city>
Elements can contain:
Text
Other elements
Attributes
5.2 Attributes
Attributes describe properties of elements.
<book id="101" category="fiction">
Attributes store metadata, not major content.
5.3 Comments
<!-- This is a comment -->
5.4 Empty Elements
<br />
<img src="[Link]" />
5.5 CDATA Section
CDATA is used to store text that should not be parsed:
<![CDATA[
<note>5 < 10</note>
]]>
6. XML Example: Realistic Structure
<company>
<employee id="101">
<name>Alice</name>
<age>28</age>
<skills>
<skill>Python</skill>
<skill>SQL</skill>
</skills>
</employee>
<employee id="102">
<name>Bob</name>
<age>32</age>
<skills>
<skill>Java</skill>
</skills>
</employee>
</company>
This shows:
Attributes
Nested elements
Lists
Multiple records
7. XML vs HTML
XML HTML
Stores and transports Displays data
data
Tags defined by user Tags predefined
Case-sensitive Not case-sensitive
Strict structure Flexible structure
No predefined tags Predefined <div>,
<h1>, etc.
Data-oriented Presentation-oriented
8. XML vs JSON
Feature XML JSON
Syntax Tag-based Key-value
Data type Weak Strong
support
Nested Excellent Excellent
structure
Human Moderate Very good
readability
Used in SOAP, configs, govt APIs, web,
systems mobile
File size Larger Smaller
Speed Slower Faster
JSON has replaced XML in most APIs because it is:
Lighter
Easier to read
Faster
Native to JavaScript
But XML is still used in many enterprise systems.
9. XML Schemas (Structure Definitions)
XML can be validated using two schema systems:
9.1 DTD (Document Type Definition)
Defines structure:
<!DOCTYPE note [
<!ELEMENT note (to,from,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
DTD limitations:
Old
Not very strict
Limited datatypes
9.2 XML Schema (XSD)
Modern and powerful.
<xs:element name="age" type="xs:integer"/>
XSD supports:
Datatypes
Namespace
Strict validation
Reusability
Used in big organizations and government systems.
10. XML Namespaces
Used when combining XML documents to avoid tag conflicts.
Example:
<book xmlns:edu="[Link]
<edu:title>Data Structures</edu:title>
</book>
Namespaces = unique identifiers for tags.
11. Parsing XML in Python
Python supports XML through:
[Link] (standard)
minidom
lxml (advanced)
BeautifulSoup (from your book)
11.1 Reading XML Using ElementTree
import [Link] as ET
tree = [Link]("[Link]")
root = [Link]()
print([Link])
11.2 Accessing Child Elements
for child in root:
print([Link], [Link])
11.3 Find Specific Element
name = [Link]("name").text
11.4 Find All Elements
skills = [Link]("skill")
for s in skills:
print([Link])
12. Writing XML in Python
import [Link] as ET
root = [Link]("student")
name = [Link](root, "name")
[Link] = "Alice"
age = [Link](root, "age")
[Link] = "21"
tree = [Link](root)
[Link]("[Link]")
Python automatically generates:
<student>
<name>Alice</name>
<age>21</age>
</student>
13. Real-World Applications of XML
1. Government Systems
Aadhaar data
Land records
Census documents
2. Banking
SWIFT messaging
Financial statements
Payment systems
3. Web Services
SOAP APIs
Enterprise services
4. Office File Formats
Microsoft uses XML inside:
.docx
.xlsx
.pptx
5. Android Development
Android uses XML for layout designs
Configuration files
6. Configuration & Settings
Many tools store settings in XML:
Maven
Tomcat
Spring Framework
14. Advantages of XML
✔ Highly structured
✔ Supports nested data
✔ Can validate data using XSD
✔ Platform-independent
✔ Extensible (custom tags allowed)
✔ Good for complex documents
✔ Supports metadata via attributes
✔ Wide industry adoption
15. Limitations of XML
❌ Verbose (large file size)
Tags increase file size.
❌ Slower than JSON
Because of complex structure.
❌ Harder to read
Nested tags can be confusing.
❌ More complex parsing
Requires specific parsers.
❌ Not ideal for simple data
CSV/JSON preferred for simple structures.
16. XML Errors and Exception Handling
Common errors:
1. ParseError
Missing tags or invalid nesting.
2. FileNotFoundError
XML file not found.
3. ValueError
Wrong data in XSD schema validation.
4. AttributeError
Trying to access missing tags.
Error handling example:
try:
tree = [Link]("[Link]")
except [Link]:
print("Invalid XML")
17. XML in Data Science
Though JSON and CSV dominate, XML still appears in:
Government-released datasets
Scientific publications
Metadata files
Medical reports
Legal documents
Pandas can read XML (Python 3.8+):
import pandas as pd
df = pd.read_xml("[Link]")
18. Summary of XML Files
XML stands for Extensible Markup Language
Used for storing and transporting data
Based on tags and hierarchy
Supports metadata using attributes
More powerful but more complex than JSON
Suitable for large enterprise and government systems
Python supports XML with ElementTree
Can be validated using DTD or XSD
Still widely used in banking, Android, web services, and document
formats