0% found this document useful (0 votes)

13 views57 pages

Advanced Programming in Python Lecture 7

Lecture 7 covers file handling in Python, emphasizing the importance of text files for data storage, configuration, and communication between programs. It details the file handling workflow, various file modes, and methods for reading and writing text files, including error handling and memory-efficient techniques. Additionally, it discusses practical applications of text files and introduces regular expressions for pattern searching.

Uploaded by

hafizsameer13092003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views57 pages

Advanced Programming in Python Lecture 7

Uploaded by

hafizsameer13092003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

LECTURE 7

🔶 1. Introduction to File Handling in Python

File handling is one of the most essential operations in any programming
language. Applications often need to store data permanently, read previously
stored information, update files, or exchange data with other programs.
Python offers an extremely consistent and simple approach for working with
files of various formats.

In programming, memory (RAM) is temporary storage. Once the program

ends or the computer shuts down, all data stored in memory is lost. For long-
term storage, we use files, which are stored in secondary storage devices
such as hard drives, SSDs, or cloud storage.

Python provides a powerful file-handling interface that supports plain text

files, structured data files, tabular files, and nested hierarchical formats.
These formats include:

 TXT (Plain Text)

 CSV (Comma-Separated Values)

 JSON (JavaScript Object Notation)

 XML (Extensible Markup Language)

Each format has its own advantages and is used for different types of
applications.

🟦 1. Introduction to Text Files

A text file (TXT file) is the simplest and most commonly used file format in
computing.
It contains plain text—without formatting, colors, tables, images, or special
structures.
These files store data as readable characters, organized sequentially.
Text files are the foundation of many computer operations, such as:

 configuration settings

 log files

 program output

 documentation

 temporary storage

 communication between programs

In Python, text file handling is extremely simple yet highly powerful due to
Python’s built-in file interface.
Unlike binary files, TXT files are easy to write, read, debug, and inspect using
any editor (Notepad, VS Code, Sublime, etc.).

🟦 2. Why Text Files Are Important in Programming

In real-world projects, TXT files are used for purposes such as:

2.1 Log Files

Programs often need to record:

 events

 errors

 exceptions

 user actions

 timestamps

Log files help developers debug issues by keeping a running history of what
happened during execution.

2.2 Configuration Storage

Some applications store settings such as:

 usernames

 passwords (hashed)

 API keys
 timeout values

 default preferences

These can be stored in .txt, .ini, or .cfg files.

2.3 Temporary Data Storage

During long processes, or before writing to a database, a program may

temporarily store intermediate results in a text file.

2.4 Data Transfer Between Programs

Two programs may share information using a simple text file rather than
complex communication protocols.

2.5 File-Based Databases

Smaller programs sometimes use text files as small databases (key-value

storage, lists, counters, etc.).

🟦 3. Understanding the File Handling Workflow

When working with text files, Python follows a clear workflow:

Step 1: Open the File

Using the open() function with the appropriate mode (read, write, append).

Step 2: Perform Operation

Examples:

 writing lines

 reading contents

 searching text

 updating file

Step 3: Close the File

So that Python can:

 free system resources

 complete pending disk writes

 avoid file corruption

Using the with-statement closes automatically.

🟦 4. File Open Modes in Python (Detailed

Explanation)
Understanding modes is the most important part of working with files.

Mod Meaning Explanation

e
"r" Read Opens existing file only; error if file doesn't
exist
"w" Write Creates new file or overwrites existing file
"a" Append Adds data at the end; does not delete
previous content
"r+" Read + No overwrite; file must exist
Write
"w+ Write + Overwrites file, creates if missing
" Read
"a+" Append + Reads + appends; file created if missing
Read
"x" Create Creates new file; error if already exists
"t" Text Mode Default mode for reading text
"b" Binary Mode Used for images, PDFs, videos
Examples of Combined Modes

 "rt" → read text (default)

 "wb" → write binary

 "ab+" → append + read + binary

These modes allow precise control over how Python interacts with files.

🟦 5. Opening Text Files

The simplest method:

file = open("[Link]", "r")

content = [Link]()

[Link]()
Why this is not recommended:
If the program crashes before [Link]() executes, the file remains open,
causing:

 memory leak

 file lock

 incomplete writes

✔ Recommended Method — Using with

with open("[Link]", "r") as file:

content = [Link]()

The with statement ensures:

 automatic closing

 reduced bugs

 cleaner syntax

🟦 6. Writing to Text Files (Deep Explanation)

6.1 Overwriting a File (w mode)

with open("[Link]", "w") as f:

[Link]("Welcome to Python file handling.\n")

[Link]("This will overwrite existing content.")

If the file exists → entire content is replaced

If it doesn’t → new file is created

6.2 Appending to a File (a mode)

Append mode is used for:

 logs

 histories

 reports

 incremental updates

with open("[Link]", "a") as f:

[Link]("\nAppending a new line at the end.")

Append mode NEVER deletes old content.

🟦 7. Reading Text Files (Very In-Depth)

Python provides multiple strategies for reading text files, each suited for
different scenarios.

7.1 Read Entire File

Used for:

 small files

 configuration reading

 simple operations

with open("[Link]", "r") as f:

content = [Link]()

print(content)

7.2 Reading Line-by-Line

Useful when:

 file is large

 processing is done per line

 memory optimization is needed

with open("[Link]", "r") as f:

for line in f:

print([Link]())

.strip() removes:

 newline characters

 extra spaces
7.3 readline() Method

Reads the next line each time it's called:

with open("[Link]", "r") as f:

first = [Link]()

second = [Link]()

7.4 readlines() Method

Returns list of ALL lines:

with open("[Link]", "r") as f:

lines = [Link]()

for line in lines:

print([Link]())

🟦 8. Writing Lists and Multiple Lines to Text Files

Example 1: Writing a List to a File

fruits = ["apple", "banana", "mango"]

with open("[Link]", "w") as f:

for fruit in fruits:

[Link](fruit + "\n")

Example 2: Using writelines()

lines = ["Python\n", "Java\n", "C++\n"]

with open("[Link]", "w") as f:

[Link](lines)
🟦 9. Understanding File Encoding (Very Important
Topic)
Every text file is stored using a particular encoding, which determines how
characters are translated into bytes.
Python supports many encodings, but the most important ones are:

9.1 UTF-8 (Recommended)

 Supports all languages (English, Urdu, Chinese, Arabic, etc.)

 Most modern systems use UTF-8 by default

 Lightweight compared to older Unicode formats

Example: Writing a UTF-8 File

with open("[Link]", "w", encoding="utf-8") as f:

[Link]("Hello, this is a UTF-8 encoded file.")

Example: Reading a UTF-8 File

with open("[Link]", "r", encoding="utf-8") as f:

data = [Link]()

9.2 ASCII (Old Standard)

ASCII supports only:

 English letters

 numbers

 basic punctuation

If you try to write Urdu, Chinese, emojis, etc. → ERROR.

9.3 Why Encoding Matters

Because files may contain:

 non-English characters

 emoji

 accented words

 special symbols

If wrong encoding is used:

 Python throws UnicodeDecodeError

 File may not open or may show garbage text

🟦 10. File Pointer and Cursor Movement

When Python reads a file, it uses a file pointer (cursor) to track the current
position.

10.1 Checking Current Position — tell()

with open("[Link]", "r") as f:

print([Link]())

Returns the cursor location in bytes.

10.2 Moving Cursor — seek()

with open("[Link]", "r") as f:

[Link](5) # Move to 5th byte

data = [Link]()

print(data)

seek() is used in:

 log analyzers

 file scanners

 partial reading

 skipping headers

 reading from middle of file

🟦 11. Searching Inside Text Files (Very Practical)
Often we need to find:

 a word

 a line

 an entry

 a keyword

11.1 Search for a Word in File

with open("[Link]", "r") as f:

for line in f:

if "error" in [Link]():

print("Found:", [Link]())

Useful for:

 log files

 report scanning

 keyword extraction

11.2 Find Line Numbers

with open("[Link]", "r") as f:

for num, line in enumerate(f, start=1):

if "python" in [Link]():

print("Line", num, ":", [Link]())

🟦 12. Updating Text Files (The Right Way)

Text files cannot be edited in the middle directly.
You cannot “insert” text in the middle without rewriting.

Therefore, updating is done by:

✔ Reading Entire File

✔ Modifying in Memory

✔ Rewriting File Completely

12.1 Example: Replacing Text

with open("[Link]", "r") as f:

content = [Link]()

content = [Link]("oldword", "newword")

with open("[Link]", "w") as f:

[Link](content)

12.2 Example: Removing a Line

with open("[Link]", "r") as f:

lines = [Link]()

with open("[Link]", "w") as f:

for line in lines:

if "remove this" not in line:

[Link](line)

🟦 13. Writing Structured Data into Text Files

13.1 Key-Value Format (Simple Database)

with open("[Link]", "w") as f:

[Link]("name=Alice\n")

[Link]("age=24\n")
[Link]("city=London\n")

This forms the basis of:

 .env files

 .ini config files

 custom settings

13.2 Multi-Column Text Data

with open("[Link]", "w") as f:

[Link]("Name Marks Grade\n")

[Link]("Alice 88 A\n")

[Link]("James 76 B\n")

🟦 14. Using Context Managers (Deep Theory)

A context manager ensures a file closes automatically.

Python does:

1. Setup file opening

2. Execute block

3. Close file even if an error happens

14.1 Custom Context Manager Example

class FileManager:

def init(self, filename, mode):

[Link] = filename

[Link] = mode

def __enter__(self):

[Link] = open([Link], [Link])

return [Link]
def __exit__(self, exc_type, exc_value, traceback):

[Link]()

with FileManager("[Link]", "w") as f:

[Link]("Hello")

This explains how with open() works internally.

🟦 15. Handling File-Related Errors (Very Important)

Real-world programs must handle file issues safely.

15.1 Common Errors

Error Meaning
FileNotFoundError File does not exist
PermissionError No permission to access
UnicodeDecodeEr Wrong encoding
ror
IOError Input/output failure
IsADirectoryError Attempted to open a
directory
ValueError Invalid mode

15.2 Handling Errors with Try-Except

try:

with open("[Link]", "r") as f:

print([Link]())

except FileNotFoundError:

print("The file does not exist.")

except PermissionError:
print("You do not have permissions.")

except Exception as e:

print("Unexpected error:", e)

🟦 16. File Existence Check

Before reading:

import os

if [Link]("[Link]"):

print("File exists")

else:

print("File not found")

🟦 17. Deleting a File

import os

if [Link]("[Link]"):

[Link]("[Link]")

🟦 18. Practical Real-Life Uses of TXT Files

18.1 Log Tracking

Applications store:

 crashes

 errors

 timestamps
 user actions

18.2 Storing User Input

CLI applications usually store:

 usernames

 feedback

 questionnaire responses

18.3 Local Database for Small Programs

Notepad-like data storage for:

 todo-lists

 shopping lists

 reminders

18.4 Exporting Program Output

Science and math scripts write results to .txt for:

 graphs

 calculations

 measurements

🟦 19. Searching Patterns in Text Files Using Regular

Expressions (Regex)
Regex (Regular Expressions) allow you to search for complex patterns inside
text files.
Python provides this through the re module.

Common Use Cases:

 finding email addresses

 searching dates

 finding errors in logs

 detecting phone numbers

 validating patterns

 extracting data

⭐ 19.1 Example: Finding All Email Addresses

import re

with open("[Link]", "r") as f:

text = [Link]()

emails = [Link](r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", text)

print(emails)

This returns a list of all email-like patterns in the file.

⭐ 19.2 Example: Find Lines Containing a Date

import re

pattern = r"\d{2}/\d{2}/\d{4}"

with open("[Link]", "r") as f:

for line in f:

if [Link](pattern, line):

print("Date found:", [Link]())

Regex makes file scanning incredibly powerful.

⭐ 19.3 Example: Finding Only Numbers

nums = [Link](r"\d+", open("[Link]").read())

print(nums)

🟦 20. Memory-Efficient File Reading for Huge Files

(100MB, 1GB, etc.)
If a file is huge, reading it with .read() or .readlines() will crash your program
or make it slow.

So Python provides memory-efficient methods.

⭐ 20.1 Reading File Line-by-Line (Streaming)

Best method for large files:

with open("[Link]", "r") as f:

for line in f:

process(line)

This loads only one line at a time.

⭐ 20.2 Reading in Chunks

with open("[Link]", "r") as f:

chunk = [Link](1024) # 1 KB chunk

while chunk:

process(chunk)

chunk = [Link](1024)

Useful when dealing with:

 very large documents

 massive log files

 server data dumps

⭐ 20.3 Processing First N Lines

with open("[Link]") as f:

for i in range(100): # first 100 lines

print([Link]().strip())

🟦 21. Creating a Text-Based Mini Database (Real

Technique)
Many small programs use text files as simple databases.

⭐ 21.1 Example: Storing User Data

[Link]:

id:1, name:Alice, city:London

id:2, name:James, city:Sydney

id:3, name:Emma, city:Berlin

Python Code to Add New User

def add_user(uid, name, city):

with open("[Link]", "a") as f:

[Link](f"id:{uid}, name:{name}, city:{city}\n")

⭐ 21.2 Searching for a User

with open("[Link]", "r") as f:

for line in f:

if "Alice" in line:

print("Found:", line)

⭐ 21.3 Updating a User (Full Rewrite Technique)

with open("[Link]", "r") as f:

lines = [Link]()
with open("[Link]", "w") as f:

for line in lines:

if "id:2" in line:

[Link]("id:2, name:John Adams, city:Tokyo\n")

else:

[Link](line)

🟦 22. Text File Formatting Techniques (Advanced)

TXT files often need formatting:

22.1 Fixed-Width Columns

Name Marks Grade

Alice 88 A

Bob 75 B

Python Code:

with open("[Link]", "w") as f:

[Link](f"{'Name':10} {'Marks':8} {'Grade':5}\n")

[Link](f"{'Alice':10} {88:<8} {'A':5}\n")

22.2 Indented / Structured Text

Useful for:

 logs

 documentation

 hierarchies

with open("[Link]", "w") as f:

[Link]("Project Structure:\n")

[Link](" src/\n")
[Link](" [Link]\n")

[Link](" [Link]\n")

🟦 23. Converting Text Files into Other Formats

(Practical Use)
23.1 TXT → CSV

with open("[Link]") as f, open("[Link]", "w") as out:

for line in f:

parts = [Link]().split()

[Link](",".join(parts) + "\n")

23.2 TXT → JSON

import json

data = {}

with open("[Link]") as f:

for line in f:

key, value = [Link]().split("=")

data[key] = value

[Link](data, open("[Link]", "w"), indent=4)

🟦 24. Backup and Versioning of Text Files

Text files are often used for configuration.
Incorrect writing can break entire programs.

Best Practice: Create Backup Before Updating

import shutil

[Link]("[Link]", "config_backup.txt")

🟦 25. Advanced Text File Algorithms (Real

Programming Logic)
25.1 Counting Lines

count = sum(1 for _ in open("[Link]"))

print("Total lines:", count)

25.2 Counting Words

with open("[Link]") as f:

text = [Link]()

words = [Link]()

print("Total Words:", len(words))

25.3 Counting Frequency of Each Word

from collections import Counter

with open("[Link]") as f:

words = [Link]().split()

freq = Counter(words)

print(freq)
🟦 26. Real-Life Projects Using TXT Files
Here are actual practical projects built using TXT files.

⭐ Project 1 — Password Manager (TXT Storage)

Stores usernames and encrypted passwords.

with open("[Link]", "a") as f:

[Link](username + ":" + password_hash + "\n")

⭐ Project 2 — Task Manager

def add_task(task):

with open("[Link]", "a") as f:

[Link](task + "\n")

⭐ Project 3 — Chat Logger

Stores chat history in plain text.

⭐ Project 4 — Simple Key-Value Database

username=admin

timeout=45

theme=dark

Python reads and loads it into a dictionary.

⭐ Project 5 — Log File Analyzer

Program reads millions of lines and extracts:

 errors

 warnings

 performance time
 login failures

🟦 27. Best Practices for Using Text Files

 Always use with open()

 Always specify encoding (utf-8)

 Never read huge files using .read()

 Create backups before overwriting

 Use exceptions to avoid crashes

 Keep file paths organized

 Use .strip() when reading lines

 Close files (or use context managers)

 Avoid storing secure passwords in plain text

 Use timestamped file names for logs

1. Introduction to CSV Files

CSV stands for Comma-Separated Values, a simple and widely used file
format for storing tabular data.
A CSV file organizes information in rows and columns, similar to tables in
spreadsheets or databases.

CSV is extremely popular because:

 It is simple to create and read

 It is compatible with almost every software program (Excel,

Google Sheets, Databases, Python, R, Java, etc.)

 It stores data in plain text, making it lightweight

 It is excellent for sharing data across different systems

A CSV file typically has the extension:

[Link]
CSV is one of the most universal data formats and is widely used in
industries such as:

 Data science

 Machine learning

 Banking and finance

 Government records

 Inventory systems

 HR management systems

 Education (grade sheets, attendance)

2. Structure of a CSV File

CSV files follow a simple structure:

✓ Each row = one record

✓ Each column = one field

✓ Comma ( , ) separates fields

Example:

Name, Age, Department

Alice, 25, Sales

Bob, 30, Marketing

Charlie, 28, HR

2.1 Table Representation

Nam Ag Departm
e e ent

Alice 25 Sales

Bob 30 Marketing

Charli 28 HR
e

2.2 CSV File as Plain Text

A CSV is basically a text file, where:

 Row separator → \n (newline)

 Column separator → , (comma)

Title, Author, Pages\n

1984, George Orwell, 268\n

Jane Eyre, Charlotte Bronte, 532\n

3. Why CSV Is So Common

CSV is preferred worldwide because:

1. Human readable

The file is simple text.

2. Machine readable

Programming languages easily parse it.

3. Lightweight and fast

No styling, no formatting like Excel.

4. Cross-platform compatibility

Used in Windows, Linux, Mac, Android, Web.

5. Works with databases

CSV files can be imported into MySQL, SQL Server, Oracle, PostgreSQL, etc.

6. Easy to process in Python

Python provides both manual reading and the csv module.

4. How CSV Stores Data Internally

Even though CSV looks simple, it follows specific rules:

Rule 1: Comma separates fields

city,country,population

Rule 2: Newline separates records

row1\n

row2\n

row3\n

Rule 3: Special characters must be quoted

If a field contains comma:

"New York, USA", 21000000

Rule 4: Empty values are allowed

Name,Age

Alice,25

Bob,

Rule 5: CSV does NOT store datatype

Everything is text.

5. CSV Handling in Python

Python allows two main ways to work with CSV files:

Method 1: Manual Processing

1. Open file

2. Read lines

3. Strip newline

4. Split by comma

This helps students understand:

 How files work

 How text is parsed

 How lists are created

5.1 Reading CSV Manually

file_obj = open("[Link]")

csv_rows = file_obj.readlines()

list_csv = []

for row in csv_rows:

row = [Link]("\n")

cells = [Link](",")

list_csv.append(cells)

print(list_csv)

Output

[['Title','Author','Pages'],

['1984','George Orwell','268'],

['Jane Eyre','Charlotte Bronte','532']]

Explanation

 readlines() → loads all rows as list

 strip("\n") → removes newline

 split(",") → divides row into columns

 Final result → 2D list (list of records)

5.2 Writing CSV Manually

file = open("[Link]", 'w')

[Link]("Name,Marks\n")
[Link]("Alice,89\n")

[Link]("Bob,76\n")

[Link]()

Important Points

 write() does not add newline automatically

 You must add \n manually

 Always close file

6. Using Python’s Built-in csv Module

Python provides the csv library for easier processing.

6.1 Reading with [Link]

import csv

with open("[Link]") as f:

reader = [Link](f)

for row in reader:

print(row)

Output:

['Alice', '25', 'Sales']

['Bob', '30', 'Marketing']

6.2 Writing with [Link]

import csv

with open("[Link]", 'w', newline='') as f:

writer = [Link](f)
[Link](["Name", "Age"])

[Link](["Sam", 22])

6.3 Reading CSV as Dictionaries

import csv

with open("[Link]") as f:

reader = [Link](f)

for row in reader:

print(row["Name"], row["Age"])

Why DictReader is useful?

 Reads header automatically

 Output is dictionaries

 Easier to use in data science

7. Handling Special Cases in CSV

CSV sometimes includes:

✔ Quoted Fields

"New York, USA", 21000000

✔ Escape Characters

To include quotes inside text:

"John said ""Hello"""

✔ Missing Data

Alice,25

Bob,

Python handles these using:

 [Link] with quoting options

 csv module dialects

8. Dialects in CSV
CSV files differ across countries:

Country Separator

USA comma (,)

Europe semicolon
(;)

Old tab (\t)

systems

Python supports dialects:

csv.register_dialect("semicolon", delimiter=';')

9. Advantages of CSV
✔ Simple and lightweight

✔ Fast to read/write

✔ Compatible with all systems

✔ No extra software required

✔ Easy to debug

✔ Works well with Python, Excel, Sheets

10. Limitations of CSV

❌ No data types

Numbers, strings, dates all look same.

❌ No support for nested structure

Cannot store complex data like:

{"name": "Alice", "scores": [89, 90, 92]}

❌ No standard about missing values

Some use blank, some use NA, some use null.

❌ Difficult with commas inside fields

Requires quoting.

❌ Does not support styling

Unlike Excel.

11. CSV vs Excel

Feature CSV Excel

File type Text Binar

Formatting ❌ No ✔ Yes

Formula ❌ ✔
support

Speed Fast Slow

Compatibility Excelle Good

File size Small Large

12. Real-Life Applications of CSV

CSV is used in:

1. Data Science / Machine Learning

Datasets like [Link], [Link].

2. Banking

Transactions, account statements.

3. Marketing
Customer data, campaign results.

4. HR

Employee lists, attendance, payroll.

5. Healthcare

Patient reports, hospital records.

6. Government Records

Population census.

7. E-commerce

Product catalogs, orders.

13. Errors and Exception Handling in CSV

Common errors:

1. FileNotFoundError

File path incorrect.

try:

file = open("[Link]")

except FileNotFoundError:

print("File not found")

2. ValueError

Wrong data format.

3. UnicodeDecodeError

File encoding mismatch.

14. CSV in Data Science (Advanced Note)

CSV is the most used format in:

 Pandas
 NumPy

 Machine learning pipelines

Example with pandas:

import pandas as pd

df = pd.read_csv("[Link]")

print([Link]())

15. Summary of CSV Files

 CSV means Comma Separated Values

 Used for storing table-like data

 Very simple, text-based

 Easily processed manually or using Python’s csv module

 Ideal for data sharing

 Supported by almost all software

 Has limitations (no types, no nesting)

 Very common in data science and business applications

1. Introduction to JSON
JSON stands for JavaScript Object Notation, a lightweight and structured
data format used for storing and transmitting data.
Even though JSON originated from JavaScript, it is now language-
independent and used by almost every programming language including
Python, Java, C#, PHP, R, etc.

JSON is especially popular in:

 APIs (Application Programming Interfaces)

 Web applications

 Mobile applications

 Cloud computing
 Data science and machine learning

 Configuration files

JSON is easy for both humans and machines to read.

A JSON file has the extension:

[Link]

2. Why JSON Is So Popular

JSON is one of the most used data formats in the world because:

✔ Human-readable

Follows a clear key–value structure.

✔ Machine-readable

Almost every language has built-in JSON parsers.

✔ Supports nested data

Unlike CSV.

✔ Lightweight and fast

Less complex than XML.

✔ Used everywhere

APIs, servers, databases, configuration systems.

✔ Supported by web technologies

JavaScript handles JSON natively.

3. Structure of a JSON File

JSON contains data in pairs:

"key": "value"

The entire JSON dataset uses:

 Curly braces { } → for objects

 Square brackets [ ] → for arrays

3.1 Basic Example

"name": "Alice",

"age": 25,

"department": "Sales"

This is a JSON object containing:

 name → string

 age → number

 department → string

3.2 Nested Structures

JSON supports nested lists and objects.

Example:

"student": {

"name": "John",

"marks": [85, 90, 92],

"address": {

"city": "Delhi",

"pincode": 110001

}
This cannot be stored easily in CSV, which is why JSON is preferred when
working with complex data.

4. JSON Data Types

JSON supports the following data types:

JSON Exampl Equivalent Python

Type e Type

String "Hello" str

Number 25.5 int / float

Boolean true/ True/False

false

Null null None

Object {"a":1} dict

Array [1,2,3] list

5. JSON vs Python Dictionary

JSON object looks almost identical to a Python dictionary.

JSON Python

null None

true True

false False

Uses double quotes Quotes

only optional

6. JSON File Format Rules

Rule 1: Keys must be strings in double quotes

"age": 30
Rule 2: Values may be any JSON data type

Rule 3: Strings must use double quotes

NOT allowed:

'name': 'John'

Rule 4: No trailing comma

Wrong:

"name": "Alex",

Rule 5: Arrays must start with [ ] and contain comma-separated

items

7. JSON in Python

Python provides a built-in module:

import json

This module supports:

 Reading JSON

 Writing JSON

 Conversion between JSON and Python objects

8. Reading JSON Files in Python

8.1 [Link]() — Read from File

import json

with open("[Link]") as f:

data = [Link](f)
print(data)

Output (Python dictionary)

{'name': 'Alice', 'age': 25}

9. Writing JSON Files in Python

[Link]() — Write to File

import json

employee = {

"name": "John",

"id": 101,

"skills": ["Python", "SQL", "AI"]

with open("[Link]", 'w') as f:

[Link](employee, f, indent=4)

Explanation:

 indent=4 → beautifies the JSON

 Data converted automatically into JSON format

10. Parsing JSON Strings

Sometimes JSON arrives as a string (e.g., from an API).

[Link]() — Convert JSON string to Python

import json

data = '{"name": "Sara", "age": 21}'

parsed = [Link](data)
print(parsed["name"])

11. Converting Python to JSON String

[Link]()

import json

data = {"x": 10, "y": 20}

json_string = [Link](data)

print(json_string)

Output:

{"x": 10, "y": 20}

12. Working with Nested JSON

Example JSON:

"company": "TechCorp",

"employees": [

{"name": "Maya", "age": 29},

{"name": "David", "age": 34}

Accessing nested values:

data["employees"][0]["name"] # Maya

13. Pretty Printing JSON

Useful for debugging.

print([Link](data, indent=4))

14. Validating JSON

Invalid JSON example:

name: "Ravi",

age: 30,

Errors:

 Missing quotes

 Trailing comma

Use a JSON validator or try loading:

try:

[Link](text)

except [Link]:

print("Invalid JSON")

15. JSON vs CSV vs XML

Feature JSON CSV XML

Data type ✔ ❌ Weak ✔ Strong

support Strong

Nested data ✔ Yes ❌ No ✔ Yes

Speed Fast Fastest Slower

Human Excellen Good Average

readability t

API support ✔ Best ❌ Rare ✔

Common

Storage Medium Small Large

Structure Key– Rows– Tag-

value Columns based

JSON stands in the middle:

 More structured than CSV

 Less complex than XML

16. Real-World Applications of JSON

1. API Communication

Almost all modern APIs return JSON:

 Weather APIs

 Google Maps API

 Social media APIs (Twitter, Facebook)

2. Web Development

JavaScript directly parses JSON.

3. Mobile Apps

Android and iOS use JSON for data exchange.

4. Databases

MongoDB stores data in JSON-like structure.

5. Configuration Files

Many apps use:

[Link]
[Link]

6. Data Science

Dataset formats:

 [Link]

 model_config.json

17. Advantages of JSON

✔ Readable and simple

✔ Supports hierarchical data

✔ Lightweight

✔ Great for APIs

✔ Works perfectly with JavaScript

✔ Supported by Python's json module

✔ Cross-platform

✔ Ideal for web/mobile applications

18. Limitations of JSON

❌ No comments supported

JSON does not allow:

// This is a comment

❌ No date format

Only string representation.

❌ No built-in support for binary data

Needs Base64 encoding.

❌ Keys must be strings

Numbers or booleans cannot be used as keys.

❌ Larger file size than CSV

Because of braces and keys.

19. Typical Errors When Working with JSON

1. JSONDecodeError

Due to invalid JSON structure.

2. KeyError

Accessing missing key.

3. TypeError

For example, treating list like dictionary.

4. Unicode/Encoding Issues

Special characters require UTF-8.

20. JSON in Data Science

In ML projects, JSON is used for:

 Dataset labels

 Model architecture

 Hyperparameters

 Experiment results

 Configuration files

Example with pandas:

import pandas as pd

df = pd.read_json("[Link]")

print(df)
21. Summary of JSON Files
 JSON stands for JavaScript Object Notation

 Stores data in key–value format

 Uses { } for objects and [ ] for arrays

 Supports nested structures

 Widely used in web development, APIs, mobile apps, and cloud

platforms

 Python provides [Link](), [Link](), [Link](), [Link]()

 Human-readable and easy to use

 More structured than CSV and simpler than XML

1. Introduction to XML
XML stands for Extensible Markup Language.
It is a structured, text-based format used to store, organize, and transport
data in a hierarchical way.

XML was developed by the World Wide Web Consortium (W3C) and is
widely used for:

 Data storage

 Data transfer

 Configuration files

 Web services

 Document systems

 SOAP APIs

 Mobile and web applications

XML is a self-descriptive format because the data explains itself using

tags.

Example:

</student>

XML is more complex than JSON and CSV, but much more powerful for
storing structured and semi-structured data.

2. Why XML Was Created

Before XML, data formats were:

 Incompatible

 Hard to read

 Not structured

 Not suitable for the internet

XML was created to:

✔ Store structured data

✔ Make data machine-readable

✔ Make data self-descriptive

✔ Enable data exchange between different systems

✔ Create custom markup tags

3. Structure of an XML Document

XML follows a strict structure.

3.1 XML Declaration (optional)

<?xml version="1.0" encoding="UTF-8"?>

3.2 Root Element

Every XML file must have one and only one root.

...

</library>

3.3 Child Elements

Elements inside the root.

</library>

3.4 Text Content

Inside elements.

<title>Harry Potter</title>

4. XML Syntax Rules

XML follows important rules:

Rule 1: Every tag must have a closing tag

✔ Correct:

❌ Incorrect:

<name>John

Rule 2: Tags are case-sensitive

Rule 3: Attribute values must be in quotes

<book id="101">
Rule 4: One root element only

❌ Invalid XML:

✔ Valid XML:

</school>

Rule 5: Elements must be properly nested

Wrong:

Correct:

5. Components of XML
XML consists of several components:

5.1 Elements

An element is everything inside a pair of tags:

<city>Delhi</city>

Elements can contain:

 Text
 Other elements

 Attributes

5.2 Attributes

Attributes describe properties of elements.

<book id="101" category="fiction">

Attributes store metadata, not major content.

5.3 Comments

5.4 Empty Elements

<br />

<img src="[Link]" />

5.5 CDATA Section

CDATA is used to store text that should not be parsed:

<![CDATA[

<note>5 < 10</note>

]]>

6. XML Example: Realistic Structure

<name>Alice</name>

<skills>
<skill>Python</skill>

</skills>

</employee>

</skills>

</employee>

</company>

This shows:

 Attributes

 Nested elements

 Lists

 Multiple records

7. XML vs HTML
XML HTML

Stores and transports Displays data

data

Tags defined by user Tags predefined

Case-sensitive Not case-sensitive

Strict structure Flexible structure

No predefined tags Predefined <div>,

<h1>, etc.

Data-oriented Presentation-oriented

8. XML vs JSON
Feature XML JSON

Syntax Tag-based Key-value

Data type Weak Strong

support

Nested Excellent Excellent

structure

Human Moderate Very good

readability

Used in SOAP, configs, govt APIs, web,

systems mobile

File size Larger Smaller

Speed Slower Faster

JSON has replaced XML in most APIs because it is:

 Lighter

 Easier to read

 Faster

 Native to JavaScript

But XML is still used in many enterprise systems.

9. XML Schemas (Structure Definitions)

XML can be validated using two schema systems:

9.1 DTD (Document Type Definition)

Defines structure:

<!DOCTYPE note [

<!ELEMENT note (to,from,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT body (#PCDATA)>

DTD limitations:

 Old

 Not very strict

 Limited datatypes

9.2 XML Schema (XSD)

Modern and powerful.

<xs:element name="age" type="xs:integer"/>

XSD supports:

 Datatypes

 Namespace

 Strict validation

 Reusability

Used in big organizations and government systems.

10. XML Namespaces

Used when combining XML documents to avoid tag conflicts.

Example:

<book xmlns:edu="[Link]

<edu:title>Data Structures</edu:title>
</book>

Namespaces = unique identifiers for tags.

11. Parsing XML in Python

Python supports XML through:

 [Link] (standard)

 minidom

 lxml (advanced)

 BeautifulSoup (from your book)

11.1 Reading XML Using ElementTree

import [Link] as ET

tree = [Link]("[Link]")

root = [Link]()

print([Link])

11.2 Accessing Child Elements

for child in root:

print([Link], [Link])

11.3 Find Specific Element

name = [Link]("name").text

11.4 Find All Elements

skills = [Link]("skill")
for s in skills:

print([Link])

12. Writing XML in Python

import [Link] as ET

root = [Link]("student")

name = [Link](root, "name")

[Link] = "Alice"

age = [Link](root, "age")

[Link] = "21"

tree = [Link](root)

[Link]("[Link]")

Python automatically generates:

<name>Alice</name>

</student>

13. Real-World Applications of XML

1. Government Systems

 Aadhaar data

 Land records
 Census documents

2. Banking

 SWIFT messaging

 Financial statements

 Payment systems

3. Web Services

 SOAP APIs

 Enterprise services

4. Office File Formats

Microsoft uses XML inside:

 .docx

 .xlsx

 .pptx

5. Android Development

 Android uses XML for layout designs

 Configuration files

6. Configuration & Settings

Many tools store settings in XML:

 Maven

 Tomcat

 Spring Framework

14. Advantages of XML

✔ Highly structured

✔ Supports nested data

✔ Can validate data using XSD

✔ Platform-independent
✔ Extensible (custom tags allowed)

✔ Good for complex documents

✔ Supports metadata via attributes

✔ Wide industry adoption

15. Limitations of XML

❌ Verbose (large file size)

Tags increase file size.

❌ Slower than JSON

Because of complex structure.

❌ Harder to read

Nested tags can be confusing.

❌ More complex parsing

Requires specific parsers.

❌ Not ideal for simple data

CSV/JSON preferred for simple structures.

16. XML Errors and Exception Handling

Common errors:

1. ParseError

Missing tags or invalid nesting.

2. FileNotFoundError

XML file not found.

3. ValueError

Wrong data in XSD schema validation.

4. AttributeError
Trying to access missing tags.

Error handling example:

try:

tree = [Link]("[Link]")

except [Link]:

print("Invalid XML")

17. XML in Data Science

Though JSON and CSV dominate, XML still appears in:

 Government-released datasets

 Scientific publications

 Metadata files

 Medical reports

 Legal documents

Pandas can read XML (Python 3.8+):

import pandas as pd

df = pd.read_xml("[Link]")

18. Summary of XML Files

 XML stands for Extensible Markup Language

 Used for storing and transporting data

 Based on tags and hierarchy

 Supports metadata using attributes

 More powerful but more complex than JSON

 Suitable for large enterprise and government systems

 Python supports XML with ElementTree

 Can be validated using DTD or XSD

 Still widely used in banking, Android, web services, and document

formats

Chapter 5
No ratings yet
Chapter 5
27 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
File Handling in Python
No ratings yet
File Handling in Python
65 pages
File Handling - 7
No ratings yet
File Handling - 7
48 pages
III Unit Files in Python
No ratings yet
III Unit Files in Python
16 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
62 pages
5 File Handling 1
No ratings yet
5 File Handling 1
71 pages
Python UNIT 4 New
No ratings yet
Python UNIT 4 New
18 pages
Class 12: Python File Handling
No ratings yet
Class 12: Python File Handling
10 pages
Python File Operation: Sharada Desai Sharada - Desai@vit - Edu
No ratings yet
Python File Operation: Sharada Desai Sharada - Desai@vit - Edu
43 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
95 pages
CH 5 - File Handling
No ratings yet
CH 5 - File Handling
42 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
60 pages
CLASS XII COMPUTER SCIENCE NOTES Chapter 2 File Handling in Python
No ratings yet
CLASS XII COMPUTER SCIENCE NOTES Chapter 2 File Handling in Python
4 pages
Xiicomp SC 25
No ratings yet
Xiicomp SC 25
27 pages
File CC
No ratings yet
File CC
23 pages
Data File Handling
No ratings yet
Data File Handling
10 pages
Text and Binary File-1
No ratings yet
Text and Binary File-1
10 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
3 pages
File I&O
No ratings yet
File I&O
28 pages
5 File Handling
No ratings yet
5 File Handling
34 pages
File Handling
No ratings yet
File Handling
61 pages
Understanding Python File Operations
No ratings yet
Understanding Python File Operations
12 pages
File Handling (Text, Binary, CSV) CLASS XII COMPUTER SCIENCE
No ratings yet
File Handling (Text, Binary, CSV) CLASS XII COMPUTER SCIENCE
56 pages
Module 5
No ratings yet
Module 5
17 pages
02.file Handling
No ratings yet
02.file Handling
11 pages
File Handling
No ratings yet
File Handling
56 pages
Text Files Notes
No ratings yet
Text Files Notes
13 pages
Python Fundamental Ten
No ratings yet
Python Fundamental Ten
5 pages
Python File and Directory Management
No ratings yet
Python File and Directory Management
32 pages
Python Notes 4pdf
No ratings yet
Python Notes 4pdf
8 pages
File Handling in Python
No ratings yet
File Handling in Python
5 pages
File Handling Text Binary CSV Class Xii Computer Science
No ratings yet
File Handling Text Binary CSV Class Xii Computer Science
54 pages
File Handling Notes
No ratings yet
File Handling Notes
8 pages
Class 12 COMPUTER SCIENCE PPT Chapter 2 File-Handling-In-Python
No ratings yet
Class 12 COMPUTER SCIENCE PPT Chapter 2 File-Handling-In-Python
60 pages
Unit - 4 File Handeling
No ratings yet
Unit - 4 File Handeling
5 pages
Text File Handling in Python
No ratings yet
Text File Handling in Python
9 pages
FileHandling2023 Text File-1
No ratings yet
FileHandling2023 Text File-1
56 pages
23 Prashant Python 22 A Exp11
No ratings yet
23 Prashant Python 22 A Exp11
14 pages
Python File Handling Guide
No ratings yet
Python File Handling Guide
30 pages
Unit 4 - Files - OOP
No ratings yet
Unit 4 - Files - OOP
80 pages
Creating and Manipulating Files: File Handling
No ratings yet
Creating and Manipulating Files: File Handling
31 pages
5 File Handling 1
No ratings yet
5 File Handling 1
56 pages
File Handling Basics in Python
No ratings yet
File Handling Basics in Python
4 pages
Python File Handling Course Guide
No ratings yet
Python File Handling Course Guide
40 pages
File Handling
No ratings yet
File Handling
81 pages
File Handling - Text File Notes & Programs
No ratings yet
File Handling - Text File Notes & Programs
12 pages
Unit-4 Files and Data Bases Notes
No ratings yet
Unit-4 Files and Data Bases Notes
39 pages
Computer Science Grade XII Unit 1 Chapter 4
No ratings yet
Computer Science Grade XII Unit 1 Chapter 4
4 pages
PSP Unit-V Notes
No ratings yet
PSP Unit-V Notes
10 pages
Data File Handling - 1
No ratings yet
Data File Handling - 1
34 pages
File Handling Notes Class XII
No ratings yet
File Handling Notes Class XII
3 pages
Introduction To Python Files
No ratings yet
Introduction To Python Files
15 pages
Python File IO Guide
No ratings yet
Python File IO Guide
4 pages
Python 9
No ratings yet
Python 9
5 pages
Chapter 6 Python File Operation
No ratings yet
Chapter 6 Python File Operation
36 pages
File Handling Notes
No ratings yet
File Handling Notes
6 pages
04 File Handling
No ratings yet
04 File Handling
40 pages
FileHandling - ComputerNetworks Support Material
No ratings yet
FileHandling - ComputerNetworks Support Material
44 pages
DS Lecture # 9 Tree
No ratings yet
DS Lecture # 9 Tree
42 pages
? Lecture 5 by CR
No ratings yet
? Lecture 5 by CR
19 pages
? Lecture 8 by CR
No ratings yet
? Lecture 8 by CR
16 pages
? Lecture 7 by CR
No ratings yet
? Lecture 7 by CR
13 pages
Computer Networks Lecture1 To 3 Notes
No ratings yet
Computer Networks Lecture1 To 3 Notes
4 pages
Lab Lecture 1
No ratings yet
Lab Lecture 1
2 pages
Sorting Algorithms I J Comparison
No ratings yet
Sorting Algorithms I J Comparison
4 pages
Advanced Programming in Python Lecture 6
No ratings yet
Advanced Programming in Python Lecture 6
53 pages
Phase Noise Analysis in GoldenGate
No ratings yet
Phase Noise Analysis in GoldenGate
25 pages
Sarvottam Resume2
No ratings yet
Sarvottam Resume2
1 page
Ethical Hacking - Assignment
No ratings yet
Ethical Hacking - Assignment
7 pages
ASSIGNMENT ECO745 Soalan
No ratings yet
ASSIGNMENT ECO745 Soalan
2 pages
IC Engines
No ratings yet
IC Engines
31 pages
Growth Percentiles for Girls 0-24 Months
No ratings yet
Growth Percentiles for Girls 0-24 Months
1 page
Lesson Plan Grade 7-Agatha (Expressing Ideas and Opinions)
No ratings yet
Lesson Plan Grade 7-Agatha (Expressing Ideas and Opinions)
2 pages
XYZ Corp Financial Analysis 2021
No ratings yet
XYZ Corp Financial Analysis 2021
7 pages
IMEI Policy's Impact on Buying Decisions
No ratings yet
IMEI Policy's Impact on Buying Decisions
10 pages
Ground Fault Protection Overview
100% (1)
Ground Fault Protection Overview
46 pages
Collaborative Professional Development Model
No ratings yet
Collaborative Professional Development Model
12 pages
DLL Stat and Prob Pop Proportions
100% (1)
DLL Stat and Prob Pop Proportions
3 pages
Sensitive LPG Leakage Alarm
No ratings yet
Sensitive LPG Leakage Alarm
2 pages
Test Yourself GMP Maths
No ratings yet
Test Yourself GMP Maths
2 pages
Methods of Kiln Reconstruction
No ratings yet
Methods of Kiln Reconstruction
5 pages
Variations in Mathematics Problems
No ratings yet
Variations in Mathematics Problems
3 pages
CSET308 Student's Result
No ratings yet
CSET308 Student's Result
114 pages
Probability and Statistics - BE03000251 - Assignment - EC - CE - CIVIL
0% (1)
Probability and Statistics - BE03000251 - Assignment - EC - CE - CIVIL
25 pages
ESP Winter 2023
No ratings yet
ESP Winter 2023
10 pages
A10 - Obstetrics Main Handout Oct 2024 Timothy Christian Adonis
No ratings yet
A10 - Obstetrics Main Handout Oct 2024 Timothy Christian Adonis
60 pages
Bello Mistura Oluwaseyi Main Work - 111803
No ratings yet
Bello Mistura Oluwaseyi Main Work - 111803
89 pages
Cybersecurity Threats & Insights
100% (1)
Cybersecurity Threats & Insights
27 pages
Grade 6 Integrated Science
No ratings yet
Grade 6 Integrated Science
4 pages
Cinderella by Charles Perrault
No ratings yet
Cinderella by Charles Perrault
10 pages
History of Environmental Protection in India
No ratings yet
History of Environmental Protection in India
13 pages
HC105 NavCadCatResistance PDF
No ratings yet
HC105 NavCadCatResistance PDF
2 pages
Week 01 - Introduction and Developing IT Strategy For Business Value
100% (1)
Week 01 - Introduction and Developing IT Strategy For Business Value
27 pages
Engine Lube Primer 9200132
No ratings yet
Engine Lube Primer 9200132
4 pages
Null 1
No ratings yet
Null 1
7 pages
Pe Unit 4 Csi
No ratings yet
Pe Unit 4 Csi
9 pages