0% found this document useful (0 votes)
13 views57 pages

Advanced Programming in Python Lecture 7

Lecture 7 covers file handling in Python, emphasizing the importance of text files for data storage, configuration, and communication between programs. It details the file handling workflow, various file modes, and methods for reading and writing text files, including error handling and memory-efficient techniques. Additionally, it discusses practical applications of text files and introduces regular expressions for pattern searching.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views57 pages

Advanced Programming in Python Lecture 7

Lecture 7 covers file handling in Python, emphasizing the importance of text files for data storage, configuration, and communication between programs. It details the file handling workflow, various file modes, and methods for reading and writing text files, including error handling and memory-efficient techniques. Additionally, it discusses practical applications of text files and introduces regular expressions for pattern searching.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

LECTURE 7

🔶 1. Introduction to File Handling in Python


File handling is one of the most essential operations in any programming
language. Applications often need to store data permanently, read previously
stored information, update files, or exchange data with other programs.
Python offers an extremely consistent and simple approach for working with
files of various formats.

In programming, memory (RAM) is temporary storage. Once the program


ends or the computer shuts down, all data stored in memory is lost. For long-
term storage, we use files, which are stored in secondary storage devices
such as hard drives, SSDs, or cloud storage.

Python provides a powerful file-handling interface that supports plain text


files, structured data files, tabular files, and nested hierarchical formats.
These formats include:

 TXT (Plain Text)

 CSV (Comma-Separated Values)

 JSON (JavaScript Object Notation)

 XML (Extensible Markup Language)

Each format has its own advantages and is used for different types of
applications.

🟦 1. Introduction to Text Files


A text file (TXT file) is the simplest and most commonly used file format in
computing.
It contains plain text—without formatting, colors, tables, images, or special
structures.
These files store data as readable characters, organized sequentially.
Text files are the foundation of many computer operations, such as:

 configuration settings

 log files

 program output

 documentation

 temporary storage

 communication between programs

In Python, text file handling is extremely simple yet highly powerful due to
Python’s built-in file interface.
Unlike binary files, TXT files are easy to write, read, debug, and inspect using
any editor (Notepad, VS Code, Sublime, etc.).

🟦 2. Why Text Files Are Important in Programming


In real-world projects, TXT files are used for purposes such as:

2.1 Log Files

Programs often need to record:

 events

 errors

 exceptions

 user actions

 timestamps

Log files help developers debug issues by keeping a running history of what
happened during execution.

2.2 Configuration Storage

Some applications store settings such as:

 usernames

 passwords (hashed)

 API keys
 timeout values

 default preferences

These can be stored in .txt, .ini, or .cfg files.

2.3 Temporary Data Storage

During long processes, or before writing to a database, a program may


temporarily store intermediate results in a text file.

2.4 Data Transfer Between Programs

Two programs may share information using a simple text file rather than
complex communication protocols.

2.5 File-Based Databases

Smaller programs sometimes use text files as small databases (key-value


storage, lists, counters, etc.).

🟦 3. Understanding the File Handling Workflow


When working with text files, Python follows a clear workflow:

Step 1: Open the File

Using the open() function with the appropriate mode (read, write, append).

Step 2: Perform Operation

Examples:

 writing lines

 reading contents

 searching text

 updating file

Step 3: Close the File

So that Python can:

 free system resources

 complete pending disk writes

 avoid file corruption


Using the with-statement closes automatically.

🟦 4. File Open Modes in Python (Detailed


Explanation)
Understanding modes is the most important part of working with files.

Mod Meaning Explanation


e
"r" Read Opens existing file only; error if file doesn't
exist
"w" Write Creates new file or overwrites existing file
"a" Append Adds data at the end; does not delete
previous content
"r+" Read + No overwrite; file must exist
Write
"w+ Write + Overwrites file, creates if missing
" Read
"a+" Append + Reads + appends; file created if missing
Read
"x" Create Creates new file; error if already exists
"t" Text Mode Default mode for reading text
"b" Binary Mode Used for images, PDFs, videos
Examples of Combined Modes

 "rt" → read text (default)

 "wb" → write binary

 "ab+" → append + read + binary

These modes allow precise control over how Python interacts with files.

🟦 5. Opening Text Files


The simplest method:

file = open("[Link]", "r")

content = [Link]()

[Link]()
Why this is not recommended:
If the program crashes before [Link]() executes, the file remains open,
causing:

 memory leak

 file lock

 incomplete writes

✔ Recommended Method — Using with

with open("[Link]", "r") as file:

content = [Link]()

The with statement ensures:

 automatic closing

 reduced bugs

 cleaner syntax

🟦 6. Writing to Text Files (Deep Explanation)


6.1 Overwriting a File (w mode)

with open("[Link]", "w") as f:

[Link]("Welcome to Python file handling.\n")

[Link]("This will overwrite existing content.")

If the file exists → entire content is replaced


If it doesn’t → new file is created

6.2 Appending to a File (a mode)

Append mode is used for:

 logs

 histories

 reports

 incremental updates

with open("[Link]", "a") as f:


[Link]("\nAppending a new line at the end.")

Append mode NEVER deletes old content.

🟦 7. Reading Text Files (Very In-Depth)


Python provides multiple strategies for reading text files, each suited for
different scenarios.

7.1 Read Entire File

Used for:

 small files

 configuration reading

 simple operations

with open("[Link]", "r") as f:

content = [Link]()

print(content)

7.2 Reading Line-by-Line

Useful when:

 file is large

 processing is done per line

 memory optimization is needed

with open("[Link]", "r") as f:

for line in f:

print([Link]())

.strip() removes:

 newline characters

 extra spaces
7.3 readline() Method

Reads the next line each time it's called:

with open("[Link]", "r") as f:

first = [Link]()

second = [Link]()

7.4 readlines() Method

Returns list of ALL lines:

with open("[Link]", "r") as f:

lines = [Link]()

for line in lines:

print([Link]())

🟦 8. Writing Lists and Multiple Lines to Text Files


Example 1: Writing a List to a File

fruits = ["apple", "banana", "mango"]

with open("[Link]", "w") as f:

for fruit in fruits:

[Link](fruit + "\n")

Example 2: Using writelines()

lines = ["Python\n", "Java\n", "C++\n"]

with open("[Link]", "w") as f:

[Link](lines)
🟦 9. Understanding File Encoding (Very Important
Topic)
Every text file is stored using a particular encoding, which determines how
characters are translated into bytes.
Python supports many encodings, but the most important ones are:

9.1 UTF-8 (Recommended)

 Supports all languages (English, Urdu, Chinese, Arabic, etc.)

 Most modern systems use UTF-8 by default

 Lightweight compared to older Unicode formats

Example: Writing a UTF-8 File

with open("[Link]", "w", encoding="utf-8") as f:

[Link]("Hello, this is a UTF-8 encoded file.")

Example: Reading a UTF-8 File

with open("[Link]", "r", encoding="utf-8") as f:

data = [Link]()

9.2 ASCII (Old Standard)

ASCII supports only:

 English letters

 numbers

 basic punctuation

If you try to write Urdu, Chinese, emojis, etc. → ERROR.

9.3 Why Encoding Matters

Because files may contain:


 non-English characters

 emoji

 accented words

 special symbols

If wrong encoding is used:

 Python throws UnicodeDecodeError

 File may not open or may show garbage text

🟦 10. File Pointer and Cursor Movement


When Python reads a file, it uses a file pointer (cursor) to track the current
position.

10.1 Checking Current Position — tell()

with open("[Link]", "r") as f:

print([Link]())

Returns the cursor location in bytes.

10.2 Moving Cursor — seek()

with open("[Link]", "r") as f:

[Link](5) # Move to 5th byte

data = [Link]()

print(data)

seek() is used in:

 log analyzers

 file scanners

 partial reading

 skipping headers

 reading from middle of file


🟦 11. Searching Inside Text Files (Very Practical)
Often we need to find:

 a word

 a line

 an entry

 a keyword

11.1 Search for a Word in File

with open("[Link]", "r") as f:

for line in f:

if "error" in [Link]():

print("Found:", [Link]())

Useful for:

 log files

 report scanning

 keyword extraction

11.2 Find Line Numbers

with open("[Link]", "r") as f:

for num, line in enumerate(f, start=1):

if "python" in [Link]():

print("Line", num, ":", [Link]())

🟦 12. Updating Text Files (The Right Way)


Text files cannot be edited in the middle directly.
You cannot “insert” text in the middle without rewriting.

Therefore, updating is done by:


✔ Reading Entire File

✔ Modifying in Memory

✔ Rewriting File Completely

12.1 Example: Replacing Text

with open("[Link]", "r") as f:

content = [Link]()

content = [Link]("oldword", "newword")

with open("[Link]", "w") as f:

[Link](content)

12.2 Example: Removing a Line

with open("[Link]", "r") as f:

lines = [Link]()

with open("[Link]", "w") as f:

for line in lines:

if "remove this" not in line:

[Link](line)

🟦 13. Writing Structured Data into Text Files


13.1 Key-Value Format (Simple Database)

with open("[Link]", "w") as f:

[Link]("name=Alice\n")

[Link]("age=24\n")
[Link]("city=London\n")

This forms the basis of:

 .env files

 .ini config files

 custom settings

13.2 Multi-Column Text Data

with open("[Link]", "w") as f:

[Link]("Name Marks Grade\n")

[Link]("Alice 88 A\n")

[Link]("James 76 B\n")

🟦 14. Using Context Managers (Deep Theory)


A context manager ensures a file closes automatically.

Python does:

1. Setup file opening

2. Execute block

3. Close file even if an error happens

14.1 Custom Context Manager Example

class FileManager:

def __init__(self, filename, mode):

[Link] = filename

[Link] = mode

def __enter__(self):

[Link] = open([Link], [Link])

return [Link]
def __exit__(self, exc_type, exc_value, traceback):

[Link]()

with FileManager("[Link]", "w") as f:

[Link]("Hello")

This explains how with open() works internally.

🟦 15. Handling File-Related Errors (Very Important)


Real-world programs must handle file issues safely.

15.1 Common Errors

Error Meaning
FileNotFoundError File does not exist
PermissionError No permission to access
UnicodeDecodeEr Wrong encoding
ror
IOError Input/output failure
IsADirectoryError Attempted to open a
directory
ValueError Invalid mode

15.2 Handling Errors with Try-Except

try:

with open("[Link]", "r") as f:

print([Link]())

except FileNotFoundError:

print("The file does not exist.")

except PermissionError:
print("You do not have permissions.")

except Exception as e:

print("Unexpected error:", e)

🟦 16. File Existence Check


Before reading:

import os

if [Link]("[Link]"):

print("File exists")

else:

print("File not found")

🟦 17. Deleting a File


import os

if [Link]("[Link]"):

[Link]("[Link]")

🟦 18. Practical Real-Life Uses of TXT Files


18.1 Log Tracking

Applications store:

 crashes

 errors

 timestamps
 user actions

18.2 Storing User Input

CLI applications usually store:

 usernames

 feedback

 questionnaire responses

18.3 Local Database for Small Programs

Notepad-like data storage for:

 todo-lists

 shopping lists

 reminders

18.4 Exporting Program Output

Science and math scripts write results to .txt for:

 graphs

 calculations

 measurements

🟦 19. Searching Patterns in Text Files Using Regular


Expressions (Regex)
Regex (Regular Expressions) allow you to search for complex patterns inside
text files.
Python provides this through the re module.

Common Use Cases:

 finding email addresses

 searching dates

 finding errors in logs

 detecting phone numbers


 validating patterns

 extracting data

⭐ 19.1 Example: Finding All Email Addresses

import re

with open("[Link]", "r") as f:

text = [Link]()

emails = [Link](r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", text)

print(emails)

This returns a list of all email-like patterns in the file.

⭐ 19.2 Example: Find Lines Containing a Date

import re

pattern = r"\d{2}/\d{2}/\d{4}"

with open("[Link]", "r") as f:

for line in f:

if [Link](pattern, line):

print("Date found:", [Link]())

Regex makes file scanning incredibly powerful.

⭐ 19.3 Example: Finding Only Numbers

nums = [Link](r"\d+", open("[Link]").read())


print(nums)

🟦 20. Memory-Efficient File Reading for Huge Files


(100MB, 1GB, etc.)
If a file is huge, reading it with .read() or .readlines() will crash your program
or make it slow.

So Python provides memory-efficient methods.

⭐ 20.1 Reading File Line-by-Line (Streaming)

Best method for large files:

with open("[Link]", "r") as f:

for line in f:

process(line)

This loads only one line at a time.

⭐ 20.2 Reading in Chunks

with open("[Link]", "r") as f:

chunk = [Link](1024) # 1 KB chunk

while chunk:

process(chunk)

chunk = [Link](1024)

Useful when dealing with:

 very large documents

 massive log files

 server data dumps

⭐ 20.3 Processing First N Lines


with open("[Link]") as f:

for i in range(100): # first 100 lines

print([Link]().strip())

🟦 21. Creating a Text-Based Mini Database (Real


Technique)
Many small programs use text files as simple databases.

⭐ 21.1 Example: Storing User Data

[Link]:

id:1, name:Alice, city:London

id:2, name:James, city:Sydney

id:3, name:Emma, city:Berlin

Python Code to Add New User

def add_user(uid, name, city):

with open("[Link]", "a") as f:

[Link](f"id:{uid}, name:{name}, city:{city}\n")

⭐ 21.2 Searching for a User

with open("[Link]", "r") as f:

for line in f:

if "Alice" in line:

print("Found:", line)

⭐ 21.3 Updating a User (Full Rewrite Technique)

with open("[Link]", "r") as f:

lines = [Link]()
with open("[Link]", "w") as f:

for line in lines:

if "id:2" in line:

[Link]("id:2, name:John Adams, city:Tokyo\n")

else:

[Link](line)

🟦 22. Text File Formatting Techniques (Advanced)


TXT files often need formatting:

22.1 Fixed-Width Columns

Name Marks Grade

Alice 88 A

Bob 75 B

Python Code:

with open("[Link]", "w") as f:

[Link](f"{'Name':10} {'Marks':8} {'Grade':5}\n")

[Link](f"{'Alice':10} {88:<8} {'A':5}\n")

22.2 Indented / Structured Text

Useful for:

 logs

 documentation

 hierarchies

with open("[Link]", "w") as f:

[Link]("Project Structure:\n")

[Link](" src/\n")
[Link](" [Link]\n")

[Link](" [Link]\n")

🟦 23. Converting Text Files into Other Formats


(Practical Use)
23.1 TXT → CSV

with open("[Link]") as f, open("[Link]", "w") as out:

for line in f:

parts = [Link]().split()

[Link](",".join(parts) + "\n")

23.2 TXT → JSON

import json

data = {}

with open("[Link]") as f:

for line in f:

key, value = [Link]().split("=")

data[key] = value

[Link](data, open("[Link]", "w"), indent=4)

🟦 24. Backup and Versioning of Text Files


Text files are often used for configuration.
Incorrect writing can break entire programs.

Best Practice: Create Backup Before Updating


import shutil

[Link]("[Link]", "config_backup.txt")

🟦 25. Advanced Text File Algorithms (Real


Programming Logic)
25.1 Counting Lines

count = sum(1 for _ in open("[Link]"))

print("Total lines:", count)

25.2 Counting Words

with open("[Link]") as f:

text = [Link]()

words = [Link]()

print("Total Words:", len(words))

25.3 Counting Frequency of Each Word

from collections import Counter

with open("[Link]") as f:

words = [Link]().split()

freq = Counter(words)

print(freq)
🟦 26. Real-Life Projects Using TXT Files
Here are actual practical projects built using TXT files.

⭐ Project 1 — Password Manager (TXT Storage)

Stores usernames and encrypted passwords.

with open("[Link]", "a") as f:

[Link](username + ":" + password_hash + "\n")

⭐ Project 2 — Task Manager

def add_task(task):

with open("[Link]", "a") as f:

[Link](task + "\n")

⭐ Project 3 — Chat Logger

Stores chat history in plain text.

⭐ Project 4 — Simple Key-Value Database

username=admin

timeout=45

theme=dark

Python reads and loads it into a dictionary.

⭐ Project 5 — Log File Analyzer

Program reads millions of lines and extracts:

 errors

 warnings

 performance time
 login failures

🟦 27. Best Practices for Using Text Files


 Always use with open()

 Always specify encoding (utf-8)

 Never read huge files using .read()

 Create backups before overwriting

 Use exceptions to avoid crashes

 Keep file paths organized

 Use .strip() when reading lines

 Close files (or use context managers)

 Avoid storing secure passwords in plain text

 Use timestamped file names for logs

1. Introduction to CSV Files


CSV stands for Comma-Separated Values, a simple and widely used file
format for storing tabular data.
A CSV file organizes information in rows and columns, similar to tables in
spreadsheets or databases.

CSV is extremely popular because:

 It is simple to create and read

 It is compatible with almost every software program (Excel,


Google Sheets, Databases, Python, R, Java, etc.)

 It stores data in plain text, making it lightweight

 It is excellent for sharing data across different systems

A CSV file typically has the extension:

[Link]
CSV is one of the most universal data formats and is widely used in
industries such as:

 Data science

 Machine learning

 Banking and finance

 Government records

 Inventory systems

 HR management systems

 Education (grade sheets, attendance)

2. Structure of a CSV File


CSV files follow a simple structure:

✓ Each row = one record

✓ Each column = one field

✓ Comma ( , ) separates fields

Example:

Name, Age, Department

Alice, 25, Sales

Bob, 30, Marketing

Charlie, 28, HR

2.1 Table Representation

Nam Ag Departm
e e ent

Alice 25 Sales

Bob 30 Marketing

Charli 28 HR
e

2.2 CSV File as Plain Text

A CSV is basically a text file, where:

 Row separator → \n (newline)

 Column separator → , (comma)

Title, Author, Pages\n

1984, George Orwell, 268\n

Jane Eyre, Charlotte Bronte, 532\n

3. Why CSV Is So Common


CSV is preferred worldwide because:

1. Human readable

The file is simple text.

2. Machine readable

Programming languages easily parse it.

3. Lightweight and fast

No styling, no formatting like Excel.

4. Cross-platform compatibility

Used in Windows, Linux, Mac, Android, Web.

5. Works with databases

CSV files can be imported into MySQL, SQL Server, Oracle, PostgreSQL, etc.

6. Easy to process in Python

Python provides both manual reading and the csv module.

4. How CSV Stores Data Internally


Even though CSV looks simple, it follows specific rules:

Rule 1: Comma separates fields

city,country,population

Rule 2: Newline separates records

row1\n

row2\n

row3\n

Rule 3: Special characters must be quoted

If a field contains comma:

"New York, USA", 21000000

Rule 4: Empty values are allowed

Name,Age

Alice,25

Bob,

Rule 5: CSV does NOT store datatype

Everything is text.

5. CSV Handling in Python


Python allows two main ways to work with CSV files:

Method 1: Manual Processing

1. Open file

2. Read lines

3. Strip newline

4. Split by comma

This helps students understand:

 How files work


 How text is parsed

 How lists are created

5.1 Reading CSV Manually

file_obj = open("[Link]")

csv_rows = file_obj.readlines()

list_csv = []

for row in csv_rows:

row = [Link]("\n")

cells = [Link](",")

list_csv.append(cells)

print(list_csv)

Output

[['Title','Author','Pages'],

['1984','George Orwell','268'],

['Jane Eyre','Charlotte Bronte','532']]

Explanation

 readlines() → loads all rows as list

 strip("\n") → removes newline

 split(",") → divides row into columns

 Final result → 2D list (list of records)

5.2 Writing CSV Manually

file = open("[Link]", 'w')

[Link]("Name,Marks\n")
[Link]("Alice,89\n")

[Link]("Bob,76\n")

[Link]()

Important Points

 write() does not add newline automatically

 You must add \n manually

 Always close file

6. Using Python’s Built-in csv Module


Python provides the csv library for easier processing.

6.1 Reading with [Link]

import csv

with open("[Link]") as f:

reader = [Link](f)

for row in reader:

print(row)

Output:

['Alice', '25', 'Sales']

['Bob', '30', 'Marketing']

6.2 Writing with [Link]

import csv

with open("[Link]", 'w', newline='') as f:

writer = [Link](f)
[Link](["Name", "Age"])

[Link](["Sam", 22])

6.3 Reading CSV as Dictionaries

import csv

with open("[Link]") as f:

reader = [Link](f)

for row in reader:

print(row["Name"], row["Age"])

Why DictReader is useful?

 Reads header automatically

 Output is dictionaries

 Easier to use in data science

7. Handling Special Cases in CSV


CSV sometimes includes:

✔ Quoted Fields

"New York, USA", 21000000

✔ Escape Characters

To include quotes inside text:

"John said ""Hello"""

✔ Missing Data

Alice,25

Bob,

Python handles these using:

 [Link] with quoting options


 csv module dialects

8. Dialects in CSV
CSV files differ across countries:

Country Separator

USA comma (,)

Europe semicolon
(;)

Old tab (\t)


systems

Python supports dialects:

csv.register_dialect("semicolon", delimiter=';')

9. Advantages of CSV
✔ Simple and lightweight

✔ Fast to read/write

✔ Compatible with all systems

✔ No extra software required

✔ Easy to debug

✔ Works well with Python, Excel, Sheets

10. Limitations of CSV


❌ No data types

Numbers, strings, dates all look same.

❌ No support for nested structure

Cannot store complex data like:

{"name": "Alice", "scores": [89, 90, 92]}


❌ No standard about missing values

Some use blank, some use NA, some use null.

❌ Difficult with commas inside fields

Requires quoting.

❌ Does not support styling

Unlike Excel.

11. CSV vs Excel


Feature CSV Excel

File type Text Binar


y

Formatting ❌ No ✔ Yes

Formula ❌ ✔
support

Speed Fast Slow


er

Compatibility Excelle Good


nt

File size Small Large


r

12. Real-Life Applications of CSV


CSV is used in:

1. Data Science / Machine Learning

Datasets like [Link], [Link].

2. Banking

Transactions, account statements.

3. Marketing
Customer data, campaign results.

4. HR

Employee lists, attendance, payroll.

5. Healthcare

Patient reports, hospital records.

6. Government Records

Population census.

7. E-commerce

Product catalogs, orders.

13. Errors and Exception Handling in CSV


Common errors:

1. FileNotFoundError

File path incorrect.

try:

file = open("[Link]")

except FileNotFoundError:

print("File not found")

2. ValueError

Wrong data format.

3. UnicodeDecodeError

File encoding mismatch.

14. CSV in Data Science (Advanced Note)


CSV is the most used format in:

 Pandas
 NumPy

 Machine learning pipelines

Example with pandas:

import pandas as pd

df = pd.read_csv("[Link]")

print([Link]())

15. Summary of CSV Files


 CSV means Comma Separated Values

 Used for storing table-like data

 Very simple, text-based

 Easily processed manually or using Python’s csv module

 Ideal for data sharing

 Supported by almost all software

 Has limitations (no types, no nesting)

 Very common in data science and business applications

1. Introduction to JSON
JSON stands for JavaScript Object Notation, a lightweight and structured
data format used for storing and transmitting data.
Even though JSON originated from JavaScript, it is now language-
independent and used by almost every programming language including
Python, Java, C#, PHP, R, etc.

JSON is especially popular in:

 APIs (Application Programming Interfaces)

 Web applications

 Mobile applications

 Cloud computing
 Data science and machine learning

 Configuration files

JSON is easy for both humans and machines to read.

A JSON file has the extension:

[Link]

2. Why JSON Is So Popular


JSON is one of the most used data formats in the world because:

✔ Human-readable

Follows a clear key–value structure.

✔ Machine-readable

Almost every language has built-in JSON parsers.

✔ Supports nested data

Unlike CSV.

✔ Lightweight and fast

Less complex than XML.

✔ Used everywhere

APIs, servers, databases, configuration systems.

✔ Supported by web technologies

JavaScript handles JSON natively.

3. Structure of a JSON File


JSON contains data in pairs:

"key": "value"

The entire JSON dataset uses:

 Curly braces { } → for objects


 Square brackets [ ] → for arrays

3.1 Basic Example

"name": "Alice",

"age": 25,

"department": "Sales"

This is a JSON object containing:

 name → string

 age → number

 department → string

3.2 Nested Structures

JSON supports nested lists and objects.

Example:

"student": {

"name": "John",

"marks": [85, 90, 92],

"address": {

"city": "Delhi",

"pincode": 110001

}
This cannot be stored easily in CSV, which is why JSON is preferred when
working with complex data.

4. JSON Data Types


JSON supports the following data types:

JSON Exampl Equivalent Python


Type e Type

String "Hello" str

Number 25.5 int / float

Boolean true/ True/False


false

Null null None

Object {"a":1} dict

Array [1,2,3] list

5. JSON vs Python Dictionary


JSON object looks almost identical to a Python dictionary.

JSON Python

null None

true True

false False

Uses double quotes Quotes


only optional

6. JSON File Format Rules


Rule 1: Keys must be strings in double quotes

"age": 30
Rule 2: Values may be any JSON data type

Rule 3: Strings must use double quotes

NOT allowed:

'name': 'John'

Rule 4: No trailing comma

Wrong:

"name": "Alex",

Rule 5: Arrays must start with [ ] and contain comma-separated


items

7. JSON in Python

Python provides a built-in module:

import json

This module supports:

 Reading JSON

 Writing JSON

 Conversion between JSON and Python objects

8. Reading JSON Files in Python


8.1 [Link]() — Read from File

import json

with open("[Link]") as f:

data = [Link](f)
print(data)

Output (Python dictionary)

{'name': 'Alice', 'age': 25}

9. Writing JSON Files in Python


[Link]() — Write to File

import json

employee = {

"name": "John",

"id": 101,

"skills": ["Python", "SQL", "AI"]

with open("[Link]", 'w') as f:

[Link](employee, f, indent=4)

Explanation:

 indent=4 → beautifies the JSON

 Data converted automatically into JSON format

10. Parsing JSON Strings


Sometimes JSON arrives as a string (e.g., from an API).

[Link]() — Convert JSON string to Python

import json

data = '{"name": "Sara", "age": 21}'

parsed = [Link](data)
print(parsed["name"])

11. Converting Python to JSON String


[Link]()

import json

data = {"x": 10, "y": 20}

json_string = [Link](data)

print(json_string)

Output:

{"x": 10, "y": 20}

12. Working with Nested JSON

Example JSON:

"company": "TechCorp",

"employees": [

{"name": "Maya", "age": 29},

{"name": "David", "age": 34}

Accessing nested values:

data["employees"][0]["name"] # Maya

13. Pretty Printing JSON


Useful for debugging.

print([Link](data, indent=4))

14. Validating JSON

Invalid JSON example:

name: "Ravi",

age: 30,

Errors:

 Missing quotes

 Trailing comma

Use a JSON validator or try loading:

try:

[Link](text)

except [Link]:

print("Invalid JSON")

15. JSON vs CSV vs XML


Feature JSON CSV XML

Data type ✔ ❌ Weak ✔ Strong


support Strong

Nested data ✔ Yes ❌ No ✔ Yes


Speed Fast Fastest Slower

Human Excellen Good Average


readability t

API support ✔ Best ❌ Rare ✔


Common

Storage Medium Small Large

Structure Key– Rows– Tag-


value Columns based

JSON stands in the middle:

 More structured than CSV

 Less complex than XML

16. Real-World Applications of JSON


1. API Communication

Almost all modern APIs return JSON:

 Weather APIs

 Google Maps API

 Social media APIs (Twitter, Facebook)

2. Web Development

JavaScript directly parses JSON.

3. Mobile Apps

Android and iOS use JSON for data exchange.

4. Databases

MongoDB stores data in JSON-like structure.

5. Configuration Files

Many apps use:

[Link]
[Link]

6. Data Science

Dataset formats:

 [Link]

 [Link]

 model_config.json

17. Advantages of JSON


✔ Readable and simple

✔ Supports hierarchical data

✔ Lightweight

✔ Great for APIs

✔ Works perfectly with JavaScript

✔ Supported by Python's json module

✔ Cross-platform

✔ Ideal for web/mobile applications

18. Limitations of JSON


❌ No comments supported

JSON does not allow:

// This is a comment

❌ No date format

Only string representation.

❌ No built-in support for binary data

Needs Base64 encoding.

❌ Keys must be strings


Numbers or booleans cannot be used as keys.

❌ Larger file size than CSV

Because of braces and keys.

19. Typical Errors When Working with JSON


1. JSONDecodeError

Due to invalid JSON structure.

2. KeyError

Accessing missing key.

3. TypeError

For example, treating list like dictionary.

4. Unicode/Encoding Issues

Special characters require UTF-8.

20. JSON in Data Science


In ML projects, JSON is used for:

 Dataset labels

 Model architecture

 Hyperparameters

 Experiment results

 Configuration files

Example with pandas:

import pandas as pd

df = pd.read_json("[Link]")

print(df)
21. Summary of JSON Files
 JSON stands for JavaScript Object Notation

 Stores data in key–value format

 Uses { } for objects and [ ] for arrays

 Supports nested structures

 Widely used in web development, APIs, mobile apps, and cloud


platforms

 Python provides [Link](), [Link](), [Link](), [Link]()

 Human-readable and easy to use

 More structured than CSV and simpler than XML

1. Introduction to XML
XML stands for Extensible Markup Language.
It is a structured, text-based format used to store, organize, and transport
data in a hierarchical way.

XML was developed by the World Wide Web Consortium (W3C) and is
widely used for:

 Data storage

 Data transfer

 Configuration files

 Web services

 Document systems

 SOAP APIs

 Mobile and web applications

XML is a self-descriptive format because the data explains itself using


tags.

Example:

<student>
<name>John</name>

<age>21</age>

</student>

XML is more complex than JSON and CSV, but much more powerful for
storing structured and semi-structured data.

2. Why XML Was Created


Before XML, data formats were:

 Incompatible

 Hard to read

 Not structured

 Not suitable for the internet

XML was created to:

✔ Store structured data

✔ Make data machine-readable

✔ Make data self-descriptive

✔ Enable data exchange between different systems

✔ Create custom markup tags

3. Structure of an XML Document


XML follows a strict structure.

3.1 XML Declaration (optional)

<?xml version="1.0" encoding="UTF-8"?>

3.2 Root Element


Every XML file must have one and only one root.

<library>

...

</library>

3.3 Child Elements

Elements inside the root.

<library>

<book>...</book>

</library>

3.4 Text Content

Inside elements.

<title>Harry Potter</title>

4. XML Syntax Rules


XML follows important rules:

Rule 1: Every tag must have a closing tag

✔ Correct:

<name>John</name>

❌ Incorrect:

<name>John

Rule 2: Tags are case-sensitive

<Student> ≠ <student>

Rule 3: Attribute values must be in quotes

<book id="101">
Rule 4: One root element only

❌ Invalid XML:

<student></student>

<teacher></teacher>

✔ Valid XML:

<school>

<student></student>

<teacher></teacher>

</school>

Rule 5: Elements must be properly nested

Wrong:

<b><i>Text</b></i>

Correct:

<b><i>Text</i></b>

5. Components of XML
XML consists of several components:

5.1 Elements

An element is everything inside a pair of tags:

<city>Delhi</city>

Elements can contain:

 Text
 Other elements

 Attributes

5.2 Attributes

Attributes describe properties of elements.

<book id="101" category="fiction">

Attributes store metadata, not major content.

5.3 Comments

<!-- This is a comment -->

5.4 Empty Elements

<br />

<img src="[Link]" />

5.5 CDATA Section

CDATA is used to store text that should not be parsed:

<![CDATA[

<note>5 < 10</note>

]]>

6. XML Example: Realistic Structure


<company>

<employee id="101">

<name>Alice</name>

<age>28</age>

<skills>
<skill>Python</skill>

<skill>SQL</skill>

</skills>

</employee>

<employee id="102">

<name>Bob</name>

<age>32</age>

<skills>

<skill>Java</skill>

</skills>

</employee>

</company>

This shows:

 Attributes

 Nested elements

 Lists

 Multiple records

7. XML vs HTML
XML HTML

Stores and transports Displays data


data

Tags defined by user Tags predefined

Case-sensitive Not case-sensitive

Strict structure Flexible structure

No predefined tags Predefined <div>,


<h1>, etc.

Data-oriented Presentation-oriented

8. XML vs JSON
Feature XML JSON

Syntax Tag-based Key-value

Data type Weak Strong


support

Nested Excellent Excellent


structure

Human Moderate Very good


readability

Used in SOAP, configs, govt APIs, web,


systems mobile

File size Larger Smaller

Speed Slower Faster

JSON has replaced XML in most APIs because it is:

 Lighter

 Easier to read

 Faster

 Native to JavaScript

But XML is still used in many enterprise systems.

9. XML Schemas (Structure Definitions)


XML can be validated using two schema systems:

9.1 DTD (Document Type Definition)


Defines structure:

<!DOCTYPE note [

<!ELEMENT note (to,from,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT body (#PCDATA)>

]>

DTD limitations:

 Old

 Not very strict

 Limited datatypes

9.2 XML Schema (XSD)

Modern and powerful.

<xs:element name="age" type="xs:integer"/>

XSD supports:

 Datatypes

 Namespace

 Strict validation

 Reusability

Used in big organizations and government systems.

10. XML Namespaces


Used when combining XML documents to avoid tag conflicts.

Example:

<book xmlns:edu="[Link]

<edu:title>Data Structures</edu:title>
</book>

Namespaces = unique identifiers for tags.

11. Parsing XML in Python


Python supports XML through:

 [Link] (standard)

 minidom

 lxml (advanced)

 BeautifulSoup (from your book)

11.1 Reading XML Using ElementTree

import [Link] as ET

tree = [Link]("[Link]")

root = [Link]()

print([Link])

11.2 Accessing Child Elements

for child in root:

print([Link], [Link])

11.3 Find Specific Element

name = [Link]("name").text

11.4 Find All Elements

skills = [Link]("skill")
for s in skills:

print([Link])

12. Writing XML in Python


import [Link] as ET

root = [Link]("student")

name = [Link](root, "name")

[Link] = "Alice"

age = [Link](root, "age")

[Link] = "21"

tree = [Link](root)

[Link]("[Link]")

Python automatically generates:

<student>

<name>Alice</name>

<age>21</age>

</student>

13. Real-World Applications of XML


1. Government Systems

 Aadhaar data

 Land records
 Census documents

2. Banking

 SWIFT messaging

 Financial statements

 Payment systems

3. Web Services

 SOAP APIs

 Enterprise services

4. Office File Formats

Microsoft uses XML inside:

 .docx

 .xlsx

 .pptx

5. Android Development

 Android uses XML for layout designs

 Configuration files

6. Configuration & Settings

Many tools store settings in XML:

 Maven

 Tomcat

 Spring Framework

14. Advantages of XML


✔ Highly structured

✔ Supports nested data

✔ Can validate data using XSD

✔ Platform-independent
✔ Extensible (custom tags allowed)

✔ Good for complex documents

✔ Supports metadata via attributes

✔ Wide industry adoption

15. Limitations of XML


❌ Verbose (large file size)

Tags increase file size.

❌ Slower than JSON

Because of complex structure.

❌ Harder to read

Nested tags can be confusing.

❌ More complex parsing

Requires specific parsers.

❌ Not ideal for simple data

CSV/JSON preferred for simple structures.

16. XML Errors and Exception Handling


Common errors:

1. ParseError

Missing tags or invalid nesting.

2. FileNotFoundError

XML file not found.

3. ValueError

Wrong data in XSD schema validation.

4. AttributeError
Trying to access missing tags.

Error handling example:

try:

tree = [Link]("[Link]")

except [Link]:

print("Invalid XML")

17. XML in Data Science


Though JSON and CSV dominate, XML still appears in:

 Government-released datasets

 Scientific publications

 Metadata files

 Medical reports

 Legal documents

Pandas can read XML (Python 3.8+):

import pandas as pd

df = pd.read_xml("[Link]")

18. Summary of XML Files


 XML stands for Extensible Markup Language

 Used for storing and transporting data

 Based on tags and hierarchy

 Supports metadata using attributes

 More powerful but more complex than JSON


 Suitable for large enterprise and government systems

 Python supports XML with ElementTree

 Can be validated using DTD or XSD

 Still widely used in banking, Android, web services, and document


formats

You might also like