ITERATIONS-we go through the sequence of the objects we have and for each object we keep
a count.
Iterator itself has an initialization which maybe in some cases is 0. We stop when we run out of
things.
So iteration has an initialization step which in this case is 0 and at each stage we are moving the card
and updating the values of these variables.
Variable- here count is the variable whose value keeps changing.
Filtering of the Data- we can filter multiple things at the same time and while filtering we can
keep a track of them separately as well.
FLOWCHARTS AND PSEUDOCODES are the two ways of formally representing the steps for
the programming
Pseudocodes is English way of describing the steps and flowcharts is diagrammatic
representation of the sequence of steps or algorithms we are going to use.
WEEK 1- L-6
Sanity of Data: Key Points
1. Data Validation Basics
Data must be correct and meaningful in context.
Each field (e.g., name, gender, date of birth, marks) must:
o Contain the right type of data.
o Fall within an expected or legal range.
o Respect predefined formats.
2. Common Data Errors in Student Records
Invalid data types:
o E.g., card number as "Q" instead of a valid numeric value.
o Names containing symbols or digits.
Invalid values:
o Gender other than "M" or "F".
o Marks outside acceptable range (e.g., negative, too high).
Precision issues:
o Values like 64.5 may be okay; 64.43942 is suspiciously over-precise.
Arithmetic inconsistencies:
o Subject marks not summing up to the total.
o Total doesn't match the sum even if individual entries seem valid.
Field mismatches:
o Names entered in town/city fields.
o Town and date of birth values interchanged.
Wrong subjects:
o Presence of subjects like Biology when only Maths, Physics, Chemistry are allowed.
Invalid dates:
o Non-existent dates (e.g., 32 Jan or 31 Apr).
o Valid leap year dates (e.g., 29 Feb) must consider the year.
3. Ambiguous Identifiers
Different representations of the same person’s name (e.g., "Aditi Ram" vs "R Aditi").
Hard to resolve by a computer without contextual matching.
4. Sanity Checks in Shopping Bills
Key Fields:
o Item, Category, Quantity, Price, Cost, Total.
Checks:
o Quantity × Price = Cost.
o Sum of all costs must equal total.
Common issues:
o Code instead of name: Item field shows product codes, not names.
o Fractional values in whole units: E.g., 3.6 T-shirts.
o Unrealistic prices: E.g., ₹1 for a T-shirt, ₹12000 for carrots.
o Syntax errors: E.g., 2.1.5 as a quantity.
o Missing values: E.g., NA for price or blank quantity.
o Incorrect cost calculation: Quantity and price do not match cost.
o Category mismatch: E.g., Milk labeled as Stationery.
o Unit mismatch: E.g., Quantity meant to be 0.4 typed as 400.
5. Checks in Word Data
Fields: Word, Position, ID, Part of Speech, Letter Count.
Issues:
o Invalid ID: Alphanumeric like "3F" where only numbers are expected.
o Spelling errors: Incorrect spelling causes mismatch with expected letter count.
o Wrong part of speech: Requires understanding context from the sentence.
o Ambiguity in word usage:
E.g., "open" can be adjective or verb based on context.
o Letter count should match the number of characters in the word.
6. Detection and Automation
Some issues can be detected using:
o Format validation (e.g., numeric fields).
o Range checks.
o Consistency checks (e.g., totals).
o Reference lists (e.g., valid towns, categories).
o External databases (e.g., expected price ranges).
Others require contextual or linguistic understanding:
o Part of speech tagging.
o Identifying name variations.
o Resolving ambiguous field placements.
WEEK 1- L-7
Introduction to Datatypes – Key Notes
1. Context
Builds on the previous lecture that examined errors in datasets.
Observed that data values (e.g., names, marks) follow certain rules and constraints.
Not all operations are valid on all types of data.
2. What Is a Data Type?
A data type defines:
o The valid range or set of values an element can take.
o The valid operations that can be performed on those values.
Purpose: Ensure sanity and correctness of data usage.
Helps computers enforce constraints on variables (e.g., valid input, allowed operations).
3. Using Data Types in Programming
Declaring a variable with a data type tells the computer:
o What values it can hold.
o What operations are allowed.
Any invalid value or operation can be rejected at runtime or compile time.
Data types help maintain data integrity and reduce errors.
Basic Data Types Discussed
A. Boolean
Values: True or False.
Operations:
o AND, OR, NOT.
o Used in logical conditions (e.g., if gender is M and DOB is within range).
B. Integer
Values: Whole numbers (positive, negative, and zero).
Operations:
o Arithmetic: +, -, *, / (division may need redefinition as integer quotient).
o Comparisons: <, >, == (yield Boolean results).
C. Character
Values:
o Alphanumeric characters: A–Z, a–z, 0–9.
o Special characters: e.g., . , : ; * / %
Operations:
o Equality check (e.g., checking if a word ends with .).
o Other operations are usually not defined or not relevant for basic use.
Conclusion
Each data type restricts value domain and defines valid operations.
Helps enforce data correctness and guides computation effectively.
Ensures safe and meaningful processing of information in programming.
Summary
Data types define rules for what kind of data can be stored and manipulated.
They prevent misuse of data by enforcing valid value ranges and operations.
Data types help in writing correct, reliable programs that work with structured datasets.
Examples from real-world data:
o Marks must be integers within 0–100.
o Names must be strings without symbols or numbers.
o Gender must be either ‘M’ or ‘F’.
Lecture 1.8 – Subtypes of Basic Datatypes
1. Purpose of Subtypes
While basic data types (Boolean, Integer, Character) define broad categories of data,
subtypes help us:
o Narrow down the valid value range.
o Restrict allowed operations even further.
o Apply stricter checks for specific real-world applications (like marks, gender, cities,
etc.).
2. Subtypes of Integer
A. Sequence Number
Found on data cards.
Valid range: 0 to a defined maximum (e.g., 10,000).
Invalid operations:
o Arithmetic like addition, subtraction, multiplication, division (do not make sense).
Allowed operations:
o Comparisons (e.g., <, ==) to check order or equality.
B. Marks
Represents scores in subjects like Physics, Chemistry, Maths.
Valid range: 0 to 100 (or up to 300 for totals).
Allowed operations:
o +: Total marks, summing scores.
o -: Finding deviation from average, difference between highest and lowest.
o Comparisons: <, ==, etc.
Disallowed operations:
o *, /: Multiplication/division between marks has no real meaning.
C. Count
Used to track totals (e.g., number of students, boys, girls).
Valid values: Non-negative integers from 0 to a max.
Allowed operations:
o Addition (+): E.g., boys + girls = total students.
o Subtraction (-): Difference between counts.
o Comparisons: To check which count is greater.
Disallowed operations:
o Multiplication/division.
3. Subtype of Character
A. Gender
A specific field using character type.
Valid values: Only M or F.
Invalid values: Any other letter, digit, or symbol.
Allowed operation:
o Equality check (==) – e.g., is gender = 'M'?
4. String Data Type and Its Subtypes
A. String
A sequence of characters (no restriction on length or character types).
Includes alphanumeric and special characters.
Operations:
o Checking if a specific character exists (e.g., ends with a .).
o Equality (==): Character-by-character comparison.
Subtypes of String
i. Name
Subtype of string with constraints:
o Should not include special characters or numerals.
o Typically uses only letters (capital + small).
o Some exceptions: a dot may be allowed after initials (e.g., Aditi R.)
ii. City
Similar to names:
o No numerals or special characters expected.
o Only alphabetic strings allowed.
iii. Word
From paragraph data.
o May include some special characters (e.g., . , ; at the end).
Operation:
o Equality (==) for comparison.
B. Category (Part of Speech)
E.g., Noun, Verb, Adjective, Preposition, Pronoun, etc.
Only finite set of allowed values.
Values like "Vedanayagam" are not valid categories.
Useful in analyzing and labeling words in a paragraph.
Summary of Subtype Restrictions
Allowed
Subtype Base Type Value Constraints
Operations
Comparison
Sequence Number Integer 0 to max
only
Allowed
Subtype Base Type Value Constraints
Operations
+, -,
Marks Integer 0 to 100 (or 300 for total)
comparisons
+, -,
Count Integer 0 to max
comparisons
Gender Character Only 'M' or 'F' Equality (==)
Letters only; no
Name String Equality
symbols/numbers
City String Letters only Equality
Word String Alphanumeric + few symbols Equality
Category (POS tag) Categorical Finite set (noun, verb, etc.) Equality
Lecture 1.9 – Transformation of Sub-Datatypes
✅ Objective
To understand how certain subtypes require transformation from one format to another (often into
integers) so that:
Operations like addition, comparison, etc., can be performed.
Data becomes machine-friendly while remaining human-readable when printed.
1. Transforming Dates
🔹 Problem:
Dates like 31st January or 4th June are typically strings.
String operations only support equality (==), not comparisons (<, >) or arithmetic (+, -).
🔹 Solution:
Convert the date into an integer representing its day number in the year.
o E.g., 1st Jan = 0, 2nd Jan = 1, 31st Jan = 30, 1st Feb = 31, etc.
o In a non-leap year, range = 0 to 364; in a leap year, 0 to 365.
🔹 Benefits:
Allows comparison: e.g., day1 < day2
Allows arithmetic: e.g., day1 + 5 = new date
🔹 Challenge:
Integers like 0, 31, etc., are not human-friendly.
🔹 Fix:
Introduce a print operation to convert the internal integer back to a string:
o E.g., print(0) ⇒ 1st Jan; print(31) ⇒ 1st Feb
2. Handling Fractional Marks
🔹 Problem:
Teachers often give marks like 62.5, which is a real number.
Integers can’t store decimals; rounding loses accuracy.
🔹 Naïve Options (Problematic):
Round to 62 or 63 ⇒ Loses precision.
Store as string ⇒ Arithmetic not possible.
🔹 Solution:
Multiply the fractional mark by 100 and store as an integer.
o E.g., 62.5 ⇒ 6250
Perform all operations as integers.
Use a print() function to divide by 100 when displaying:
o print(6250) ⇒ 62.5
🔹 Advantage:
Preserves accuracy and supports:
o Addition, subtraction, comparisons.
3. Prices and Amounts (from Shopping Bills)
🔹 Nature:
Amounts like 27.50 are money values with two decimal places.
🔹 Approach:
Multiply by 100 ⇒ 27.50 becomes 2750
Store and operate as integers
Print operation converts back to ₹27.50
🔹 Result:
Supports meaningful arithmetic: sum of bills, difference in cost, etc.
4. Quantity
🔹 Description:
Quantities can be:
o Whole units: 3 cans, 5 T-shirts
o Fractions: 0.25 kg, 1.75 L
🔹 Handling:
Multiply by 100 ⇒ 1.25 kg becomes 125
Store as integer, and use:
o +, -, comparisons
Print operation displays 125 as 1.25
✅ General Transformation Technique
Original Stored
Field Transformation Supports Printed As
Type As
Convert to day number Comparison, arithmetic 1st Jan, 3rd
Date String Integer
(0–365) (optionally) Feb
Fractional
Real (float) Multiply by 100 Integer +, -, comparison 62.5, 78.25
Marks
Float
Price/Amount Multiply by 100 Integer Arithmetic, comparison ₹27.50
(money)
Quantity Float Multiply by 100 Integer Arithmetic, comparison 1.25, 0.75
✅ Conclusion
Transforming complex or fractional data into integers helps:
o Preserve precision
o Enable arithmetic and comparison
o Maintain human readability via print conversions
This method ensures data is both machine-processable and user-friendly.
Lecture 1.10 – Introduction to Complex Data Types
✅ Objective
To explore how basic and sub-datatypes can be combined to form complex data types, which allow
structured representation of real-world data like marks cards, shopping bills, and word records.
1. Review of Basic Concepts
Basic Data Types:
o Boolean: True, False
o Integer: Whole numbers (positive, negative, zero)
o Character: Alphanumeric + special symbols
String: Sequence of characters
Subtypes: Refined types that impose constraints on values or operations (e.g., Marks,
Gender, City)
2. Complex Data Types
Complex data types combine multiple fields (each of which has its own datatype) into one unit.
There are two main kinds:
A. Record (Struct / Tuple / Class)
A collection of fields, each with a name and datatype.
Represents a single logical entity like a marks card or a word entry.
📌 Example: Marks Card
Fields and their types:
SeqNo: SeqNo type
Name: Name type
Gender: Gender type
DateOfBirth: Date type
City: City type
Maths, Physics, Chemistry, Total: Marks type
➡️All fields together make one MarksCard record.
📌 Example: Word Record (in Paragraph)
Fields:
SeqNo: SeqNo type
Word: Word type
Category: Category type (e.g., noun, verb)
LetterCount: Count type
➡️Forms a WordInParagraph record.
B. List (Sequence)
A list is a sequence of elements of the same type.
Analogous to strings (sequence of characters).
📌 Examples:
A list of marks cards is a List of MarksCard records.
A paragraph is a List of WordInParagraph records.
A set of shopping bills is a List of ShoppingBill records.
3. Nested Complex Types: Shopping Bill Example
✅ Fields in a Shopping Bill:
SeqNo: SeqNo type
StoreName: Name type
CustomerName: Name type
TotalAmount: Amount type
Items: A list of item records
📌 Each Item Record contains:
ItemName: Name
ItemCategory: String
Quantity: Quantity
UnitPrice: Amount
Cost: Amount
➡️So, a ShoppingBill is a record that contains a list of records (items), making it a nested data
structure.
4. Summary of Key Concepts
Concept Description
Datatype Defines the values a variable can take and what operations can be done on it.
Subtype A restricted version of a base type (e.g., marks, gender).
Record A collection of named fields, each with its own type.
List A sequence of values, usually of the same type.
Nested Types A record containing a list of records (e.g., shopping bill with item list).
✅ Key Takeaways
Records let you group different types of data into one structure.
Lists let you store a sequence of similar items (e.g., multiple marks cards).
You can combine and nest these to build hierarchical data models.
This structure ensures data validity and operational correctness.
WEEK-2
Lecture 2.1: Conditional Termination in Iteration
1. Concept Overview
Iteration: A process where a set of items (e.g., student report cards, shopping bills, words) is
examined one by one to perform a task.
Conditional termination: Instead of going through the entire collection, we stop as soon as a
desired condition is met.
This avoids unnecessary processing and increases efficiency.
2. Example 1: Student With High Marks
Goal: Find if there is any student who scored more than 90 in Physics, Chemistry, and Math.
Method:
o Iterate over all student cards.
o If any subject score < 90, skip the card.
o If a card meets the criteria, note the student's identity.
Problem: The iteration continues through remaining cards even after the student is found.
Desire: Stop iteration after finding the required card.
3. Example 2: Shopping Bills – 3 Different Categories
Goal: Find any bill that includes purchases from three different categories, e.g., Food,
Apparel, Toiletries.
Method:
o Iterate through each bill.
o Check the diversity of item categories.
Observation: Once a matching bill is found, there’s no need to continue iterating through
the rest.
Conclusion: Again, we need conditional termination after success.
4. Example 3: Finding First Verb in a Specific Sentence
Goal: Identify the first verb in the third sentence of a text.
Approach:
o Track sentence count by detecting full stops.
o Switch to verb search once the third sentence starts.
o Stop after finding the first verb.
Challenge: Must track sentence transitions and change search behavior dynamically.
5. Challenges in Conditional Exit
Problem: Exiting from the middle of an iteration (especially in flowcharts) can disturb the
logical structure.
Risks:
o Partial processing can leave data (cards, sentences) in an inconsistent state.
o Flowcharts can become messy with multiple exit points.
6. Solution: Use of a Boolean Flag (e.g., found)
Introduce a variable found to control termination.
o Initialized as false.
o Set to true when the desired item is found.
Loop condition becomes:
o while more cards and not(found) → only continues if both are true.
Effect:
o Ensures smooth, structured exit from the iteration.
o Eliminates the need to jump out from the middle.
o Keeps the logic clean and traceable.
7. Flowchart Implementation Pattern
Initialize: found = false
Iterator loop condition: hasNext AND not(found)
Within the loop:
o Check if current item meets criteria.
o If yes, set found = true
o Else, continue.
Exit: When no more items or item is found.
8. Advantages of This Approach
Efficient: Stops as soon as condition is satisfied.
Clean structure: Maintains a single point of exit in flowcharts.
Retains context: The found variable also tells if a match was found or not.
Generalizable: Applies to various searches (students, purchases, text parsing).
9. Summary of Key Patterns
Pattern Description
Filter pattern Processes all items meeting some criteria.
Search + stop pattern Stops iteration once condition is met.
Tracks progress (e.g., sentence count) before starting actual
Track + conditional search
search.
Loop continues only while item remains AND item not yet
Combined condition iteration
found.
Lecture 2.2 – Local Operations and Maximum in Single Iteration (Part 1)
1. Objective
Understand how to determine the maximum value (e.g., highest marks) from a collection
(like student report cards).
Perform the operation using only a single pass (iteration) through the dataset.
Compare this with previous tasks like computing average marks or category-based filtering.
2. Problem Setup
Given a set of student cards containing total marks.
Goal: Identify the student with the highest total marks.
Requirement: Perform this task efficiently in one iteration.
3. Initial Strategy: Using a Reference Card
Start by setting aside the first card as the temporary highest.
As you go through the rest:
o Compare each card's total with the current maximum.
o If the new card's marks are higher, replace the old maximum.
At the end of the iteration, the held card is the one with the highest total.
Example Process:
Start: 240 → current max.
Next: 196 → skip.
Next: 252 → replace 240.
Continue comparing: only replace when a higher value is found.
Final highest: 281 (replacing earlier 276).
4. Improved Strategy: Using a Variable (Local Operation)
Use a variable max to store the current maximum.
Initialize max = 0, assuming marks are non-negative.
For each card:
o If current card’s total > max → update max.
Example Process:
1. First card: 174 → max = 174
2. Next card: 209 → update max = 209
3. Then 227 → update max = 227
4. 219 → smaller → no update
5. 254 → update
6. 261 → update
7. 221 → skip
8. 281 → update
9. Continue until end…
Final Result: max = 281
5. Key Concepts and Insights
a. Iterator with Local Update
A single pass through data.
A local variable (max) stores relevant cumulative/comparative information.
b. Update Condition
Conditional check changes dynamically:
o Filtering based on variable (e.g., current max value), not fixed criteria like "only
boys" or "marks > 90".
c. Comparison with Filtering
In previous lectures, filtering conditions were static (e.g., gender, fixed thresholds).
In this lecture:
o The condition changes as max is updated.
o This adds a dynamic filtering dimension.
d. Efficient Memory Use
Instead of storing all data or multiple cards, only one variable (max) is used.
This keeps resource usage minimal while achieving the objective.
6. Summary
Concept Explanation
Iterator Go through each card once.
Local Operation Operate on one card at a time and update a variable (max).
Dynamic Filtering Condition changes based on current max value.
Goal Efficiently compute maximum without storing all intermediate data.
Code
python
CopyEdit
# Initialize variables
max_marks = 0
topper = None
# Iterate through each student record
for student in students:
if student["marks"] > max_marks:
max_marks = student["marks"]
topper = student["name"]
# Output the result
print("Topper:", topper)
print("Highest Marks:", max_marks)
💡 Explanation
Step Description
max_marks = 0 Start with minimum possible marks.
topper = None No topper known at the beginning.
for student in ... Iterate through all cards once.
If current student’s marks exceed current max,
if student["marks"] > max_marks
update both.
At the end, display the student with the highest
print(...)
marks.
Lecture 2.3 – Local Operations and Max in Single Iteration (Part 2)
🎯 Objective
Find the longest sentence (in terms of number of words) in a paragraph using a single
iteration.
Apply and extend the local operations technique introduced earlier.
🧾 Problem Setup
A paragraph is split into individual words, each written on a separate card.
Punctuation (e.g., full stop) is attached to the last word of each sentence.
Cards are placed in sequential order to preserve sentence structure.
💡 Task
Determine the sentence with the most number of words.
Do not store all sentence lengths; instead, track only the current and maximum counts.
🔁 Iterative Process
1. Initialize two variables:
o longest_sentence = 0 (stores maximum sentence length seen so far)
o count = 0 (tracks current sentence length)
2. Iterate through each word/card:
o Increment count by 1 for every word.
o If a word ends with a full stop:
Compare count with longest_sentence
If count is greater, update longest_sentence
Reset count = 0 to begin tracking the next sentence
⚙️Example Walkthrough
Let’s say we encounter these sequences:
Sentence 1: 4 words → longest_sentence = 4
Sentence 2: 7 words → update → longest_sentence = 7
Sentence 3: 8 words → update → longest_sentence = 8
Sentence 4: 20 words → update → longest_sentence = 20
Sentence 5: 25 words → final update → longest_sentence = 25
At the end, longest_sentence = 25
🔎 Key Concepts
Concept Description
Local Operation Count words one by one, sentence-wise, without storing full sentences.
Dynamic Filtering Filter based on condition: card contains a full stop.
Variable Comparison Compare count (current sentence length) with longest_sentence.
Variable Reset Reset count = 0 after each sentence ends.
Single Iteration Go through the word cards only once.
🔄 Evolving Computation Patterns
Previous Lecture This Lecture
Compared card value to constant (e.g., Compared one variable (count) to another
marks > 90) (longest_sentence)
Max was directly available on card Max had to be calculated across multiple cards
Only max was updated Here, count is incremented and reset; max is updated
📌 Summary
We used two variables:
o count → temporary, reused to count words in each sentence
o longest_sentence → stores the max word count seen
The entire task was completed in one pass through the cards (one iteration).
This is an extension of max-finding logic, now applied to a computed quantity (word count)
rather than a direct card value.
code representation (in Python) for finding the longest sentence by word count in a single iteration,
as described in Lecture 2.3 – Local Operations and Max in Single Iteration (Part 2):
🧾 Problem Setup
We are given a list of words, where a word may end with a full stop (.) to indicate the end of a
sentence.
python
CopyEdit
words = [
"It", "was", "Monday", "morning.",
"Swaminathan", "was", "reluctant", "to", "open", "his", "eyes.",
"He", "shuddered", "at", "the", "very", "thought", "of", "school.",
...
✅ Code
python
CopyEdit
# Initialize variables
longest_sentence = 0 # stores max number of words in any sentence
count = 0 # counts words in current sentence
# Iterate through each word
for word in words:
count += 1 # every word counts
if [Link]('.'): # sentence end
if count > longest_sentence:
longest_sentence = count # update longest if needed
count = 0 # reset count for next sentence
# Output the result
print("Longest sentence has", longest_sentence, "words.")
Step What It Does
count += 1 Tracks number of words in current sentence
if [Link]('.') Detects the end of a sentence
if count > longest_sentence: Updates longest sentence word count if needed
count = 0 Resets count for the next sentence
Lecture 2.4 – Local Operations and Max in Single Iteration (Part 3)
🎯 Objective
Identify which store has the most number of purchases (i.e. bills).
Do this efficiently using a single iteration through the dataset.
Extend the idea of local operations and dynamic max tracking to an unknown number of
categories (stores).
🛒 Dataset
A pile of shopping bills from different stores.
Each card contains a store name.
🧠 Problem
Which store has the highest number of purchase entries?
This means finding the store with the most bills generated, i.e., most frequent occurrence in the list.
🔄 Challenges
Unknown number of stores:
o Unlike earlier problems (e.g., boys/girls, fixed number of categories), we don’t know
in advance how many stores exist.
o Need to dynamically handle an expanding set of categories.
Efficient counting and comparison:
o Want to process everything in a single pass.
🧮 Approach – Phase 1: Count Bills Per Store
1. For each card (bill):
o Extract the store name.
o If this store is seen for the first time, start a new counter (initialize to 1).
o If it has been seen before, increment its existing counter.
2. This process results in:
o Dynamic creation of variables for each unique store.
o Each variable stores the current count of bills for that store.
🧮 Approach – Phase 2: Track Maximum Simultaneously
Maintain a max variable that stores the highest count so far.
Whenever a store’s count increases, check:
o If it exceeds the current max, then update max.
This avoids the need for a second pass to find the maximum.
🧾 Example Walkthrough
Suppose the bill order is:
SV, Big Bazaar, SV, SV, Big Bazaar, Sun, SV, Big Bazaar, Sun, Sun, SV, SV, SV, SV, Sun, SV, SV, Sun, SV, Sun,
SV, SV, Big Bazaar
Tracking Example:
Store Count (final)
SV Stores 15
Big Bazaar 6
Store Count (final)
Sun General 9
Max (updated as we go) 15
At each step:
When SV's count goes from 10 → 11 → 12… → 15, max is updated accordingly.
🧠 Key Concepts
Concept Description
Dynamic Variable Creation Start counters on the fly as new store names appear.
Local Operation Update a store's count locally.
Filtering Identify which store's counter to update.
Compare store’s count (dynamic variable) to current max (also
Variable-Variable Comparison
dynamic).
Perform both counting and max tracking in one pass over the
Single Iteration cards
WEEK 2 LECTURE-5
Key Concepts Covered
🔹 Comparing Shops Based on Sales Data
Initial metric used: Number of bills generated by each shop.
o SV Stores: 15 bills
o Sun General: 9 bills
o Big Bazaar: 6 bills
Issue: Number of bills doesn’t reflect sales volume. Some shops generate fewer but higher-
value bills.
📊 Improved Metric – Total Bill Value
Better measure of performance: Sum of the bill amounts per shop.
This helps identify which shop is doing best overall in terms of total sales.
🔁 Single Iteration Approach
Goal: Calculate total per shop and find the maximum total – all in one pass through the data.
🧠 Strategy:
1. Track each shop's running total:
o For each bill, add its value to the corresponding shop’s total.
o If it’s a new shop, initialize its total.
2. Update max variable:
o If the shop’s updated total exceeds current max, update max.
🧮 Example (simplified):
Shop Bill Value Running Total Max Total
SV 567 567 567
Big Bazaar 1525 1525 1525
SV +1341 1908 1908
Big Bazaar +4174 5699 5699
... ... ... ...
Big Bazaar +798 13284 13284
📌 Final Totals
SV Stores: 7120
Big Bazaar: 13284 (maximum)
Sun General: 4009
📈 Observations
Big Bazaar had fewer bills but very high value, leading to the highest total.
Shops with more bills but lower values (like SV or Sun General) had lower total sales.
➗ Average Bill Size (Optional Extension)
To calculate average:
Average Bill Size=Total ValueNumber of Bills\text{Average Bill Size} = \frac{\text{Total Value}}{\
text{Number of Bills}}Average Bill Size=Number of BillsTotal Value
SV: 7120 / 15 ≈ 474.7
Big Bazaar: 13284 / 6 ≈ 2214
Sun General: 4009 / 9 ≈ 445.4
👉 Big Bazaar has the highest average bill value, confirming its superior performance in revenue
terms.
🧠 Takeaway
Local operations like filtering and updating totals can be done efficiently in a single pass.
Choosing the right metric (e.g., total value vs. number of bills) is crucial for accurate analysis.
Here's a Python implementation of the logic described in the lecture, which processes a list of bills
and computes:
1. Total value of bills per shop
2. Shop with the maximum total bill value
3. (Optionally) Average bill value per shop
LECTURE 6
Topic Overview
This lecture discusses:
How to count word frequencies in a paragraph.
How to find the most frequent word.
How to do it efficiently using:
o ✅ Two iterations
o ✅ One single iteration (optimized approach)
📘 Key Concepts
🔹 Goal
Determine which word occurs most frequently in a given text (e.g., a paragraph).
🔁 Two-Step Method (Two Iterations)
Step 1: Count Word Frequencies
Go through each word in the paragraph.
For every new word, create a counter (initialize with 0, then increment).
Maintain a dictionary or list of word-count pairs.
Treat words case-insensitively (e.g., “It” and “it” are the same).
Step 2: Find Word with Maximum Frequency
After building the word count table, iterate through it.
Keep track of the highest frequency value.
Optionally store the corresponding word.
⏳ This requires two full passes over the data:
1. One for counting.
2. One for finding the max.
⚡ Optimized Method – Single Iteration
How it works:
While counting frequencies, simultaneously keep track of the maximum frequency seen so
far.
No need for a second loop.
Algorithm:
For each word in the paragraph:
1. Convert to lowercase (for uniformity).
2. If word is new, add to the dictionary with count = 1.
3. If word exists, increment the count.
4. After updating count, check if it’s greater than current max.
o If yes, update max_count.
o Optionally, update most_frequent_word.
📌 Challenges
Must create counters dynamically — can't predetermine how many words will appear.
Words may be repeated in different capitalizations (e.g., It vs it) — need normalization.
Some words (e.g., the) appear very frequently and must be tracked accurately.
📈 Final Observations
Example paragraph had ~66 total words and ~50 unique words.
The most frequent word was:
o "the" with a count of 6.
🧾 Implementation Tips
Use a dictionary or hash map for word-count pairs.
Track:
o max_frequency
o Optionally, most_frequent_word
Handle:
o Case normalization
o Punctuation removal
o Hyphenated words as single units
PYTHON CODE:
import re
from collections import defaultdict
# Sample paragraph (from the lecture)
paragraph = """
It was Monday morning. Swaminathan was reluctant to open his eyes. He considered Monday
specially unpleasant in the calendar.
After the delicious freedom of Saturday and Sunday, it was difficult to get into the Monday mood
of work and discipline.
He shuddered at the very thought of school: that dismal yellow building; the fire-eyed
Vedanayagam, his class-teacher;
and the Head Master with his thin long cane.
"""
# Step 1: Clean and split paragraph into words
# - Convert to lowercase
# - Remove punctuation except hyphen (for words like fire-eyed)
words = [Link](r'\b[\w\-]+\b', [Link]())
# Step 2: Initialize dictionary and tracking variables
word_count = defaultdict(int)
max_count = 0
most_frequent_word = None
# Step 3: Iterate through words and update counts + track max
for word in words:
word_count[word] += 1
if word_count[word] > max_count:
max_count = word_count[word]
most_frequent_word = word
# Step 4: Output results
print("Word Frequencies:")
for word, count in word_count.items():
print(f"{word}: {count}")
print("\nMost Frequent Word:")
print(f"'{most_frequent_word}' occurred {max_count} times.")
Lecture 2.7 – Max in a Single Iteration without Losing Information & Applications of Frequency
Count
🔹 Part 1: Tracking Maximum Value with Identity
🧠 Problem:
Earlier, we found the maximum marks in a single iteration.
But we lost the identity (e.g., who scored the max).
✅ Solution:
Track:
max_value – the highest marks found so far.
max_card_indices – the card number(s) that had this maximum.
Approach:
Initialize max_value = 0 and max_card_indices = [] or -1.
For each card:
o If current mark > max_value:
Update max_value
Reset max_card_indices to current card index.
o If current mark == max_value:
Append current card index to max_card_indices.
💡 Key Insight:
If multiple entries have the same max value, store all.
This avoids storing entire cards; use indices instead.
🔹 Part 2: Application of Frequency Counts in Filtering
🎯 Use Case:
Remove frequently occurring words from a text (e.g., "the", "and", "was").
Helps in search engines and text comparison:
o Rare words distinguish documents better.
🔁 Two-Pass Strategy:
1. First iteration: Count frequency of all words.
2. Second iteration: Filter out words with frequency > 1.
🧠 Considerations:
Normalize case (e.g., treat "It" and "it" as same).
Ignore punctuation.
Maintain two piles:
o Kept (rare/unique words).
o Discarded (frequent/common words).
📊 Result Example:
Out of 48 unique words:
Only 9 occurred more than once: "it", "was", "Monday", "to", "his", "he", "the", "of",
"and".
✅ Lecture 2.8 – Flowchart for Max Marks
🔹 Goal:
Design a flowchart to:
1. Track the maximum marks in a pile of cards.
2. Identify the card(s) where the maximum occurred.
🔹 Flowchart Steps (Basic Max Tracker):
1. Start
2. Initialize:
o max = 0
o maxCardId = -1
3. While there are cards in pile:
o Read card X.
o If [Link] > max:
Set max = [Link]
Set maxCardId = [Link]
o Else if [Link] == max:
Append [Link] to maxCardId list.
o Else: do nothing.
4. End:
o max holds the highest marks.
o maxCardId holds all card IDs with max marks.
🧠 Key Concepts:
maxCardId should be a list to handle duplicates.
Append IDs for cards with marks equal to current max.
Use [Link] and [Link] to refer to values on a card.
🧾 Notes on Symbols:
= assigns a value.
== checks for equality.
Use append to add to a list (e.g., append [Link] to maxCardId).
Lecture 2.9: Introduction to Pseudocodes
🎯 Objective
Introduce pseudocode as a textual and structured way to represent algorithms.
Compare it with flowcharts and step-by-step textual listings.
📊 Why Pseudocode Instead of Flowcharts?
Flowcharts Pseudocode
Visual and easy to understand Textual and compact
Good for small processes Better for large, evolving procedures
Hard to share and edit collaboratively Easy to edit, version, and share in teams
Difficult to track changes Easy to track differences between versions
🧩 Flowchart Elements Review
Start/End: Terminal nodes (oval)
Processes: Actions (rectangles)
Decisions: Conditions (diamonds)
Arrows: Control flow (direction of logic)
🔁 Example Flowchart – Counting Cards
1. Initialize count = 0
2. While there are cards in Pile 1:
o Pick a card
o Move to Pile 2
o Increment count
3. End
✍️Plain Text Version (Early Attempt)
Steps are numbered and use goto style references (e.g., “go back to step 2”)
Hard to read, error-prone, and difficult to maintain.
🔹 Pseudocode: Cleaner & Structured
✅ Features of Pseudocode:
Text-based
No step numbers
Uses standard constructs:
o Assignment: count = 0
o Conditionals: if, else
o Loops: while, for
o Blocks: Denoted using { ... }
🧾 Example:
plaintext
CopyEdit
count = 0
while (Pile 1 has more cards) {
pick up a card
move it to Pile 2
count = count + 1
🧠 Key Benefits of Pseudocode
Easy to understand, even for non-programmers.
Mimics the logic of actual programming languages without strict syntax.
Useful for collaboration, documentation, and transition to code.
Supports concepts like:
o Repetition (loops)
o Conditional execution (if-else)
o Variable tracking
📌 Conclusion
Pseudocode is a middle ground between visual flowcharts and real code. It allows:
Clear logic definition
Efficient collaboration
Easy updates and modifications
Lecture 2.10 – Pseudocode for Iteration with Filtering
🎯 Goal
Translate flowcharts involving iteration and filtering into structured pseudocode.
🧠 What is Pseudocode?
A simplified, readable version of code.
Helps express logic in plain language + basic syntax.
Ideal for explaining algorithms without worrying about programming language syntax.
🔁 Basic Iteration (Counting Cards)
🔹 Pseudocode:
Key Concepts:
count = 0: Assignment
while (...): Repetition as long as condition is true
{ ... }: Block of statements tied to loop
➕ Modifying to Sum Marks
🔹 Logic:
Replace count with sum
Add only the Maths marks field from each card
🔹 Pseudocode:
Filtering – Only Sum for Boys
🔹 Add Condition with if
🔹 Pseudocode:
Concepts Introduced:
if – one-time conditional execution.
== – equality test (different from = which is assignment).
[Link], [Link] – accessing fields in a structured data type.
👦👧 Handling Both Boys and Girls
Track BoySum and GirlSum separately.
🔹 Pseudocode:
Finding Maximum Marks
Track the maximum maths mark seen so far.
🔹 Initialization:
maxM = 0 (assuming marks are ≥ 0)
🔹 Pseudocode:
Tracking Max Marks with Card ID
Also record which card had the max.
🔹 Pseudocode:
Pseudocode Cheat Sheet: Iteration, Filtering, Sum, and Max
WEEK-3
Lecture 3.1 – Presentation of Datasets in the
Form of a Table
🎯 Objective
To learn how to represent structured data—originally stored in cards—as tables for easier
processing, especially by programs and algorithms.
📇 1. Card Format → Table Format
Each card contains:
A fixed number of fields/attributes
o Example (Grade Card):
Card ID, Name, Gender, Date of Birth, City, Maths, Physics,
Chemistry, Total
🔄 Transformation:
Each card becomes a row
Each attribute becomes a column
✅ This makes it:
Easier to scan and compare
Easy for programs to access specific data via row and column labels
A single data structure to process and pass to functions
📝 2. Example: Student Grade Table
Card ID Name Gender DOB City Maths Physics Chemistry Total
10 Kavya F 21/01/2003 Chennai 87 72 88 247
17 Siddharth M ... Chennai ... ... ... ...
Easy to extract Maths marks of student with Card ID = 10.
✍️3. Word List Example (from Paragraph)
Attributes per word:
Word ID, Word, Part of Speech, Letter Count
ID Word Type Length
01 reluctant adjective 9
02 considered verb 10
🧾 4. Complex Case: Shopping Bill Data
🔸 Fixed fields (1 per bill):
Shop Name, Bill ID, Customer Name, Total
🔸 Variable fields (many per bill):
Item, Quantity, Price, Cost
⚠️Problem:
Each bill has different number of items.
o Akshaya’s bill: 5 items → 5 rows
o Srivatsan’s bill: 7 items → 7 rows
📊 5. Solution 1: Flattened Table (One table with duplicate
rows)
Bill ID Shop Name Customer Name Total Item Qty Price Cost
B001 SV Store Akshaya 500 Soap 2 20 40
B001 SV Store Akshaya 500 Shampoo 1 100 100
... ... ... ... ... ... ... ...
📌 Pros:
Works with standard table formats.
Easy to scan/filter.
📌 Cons:
Duplicate data for shop/customer/total across rows.
🧠 6. Solution 2: Split into Two Tables
Table 1: Bill Summary
Bill ID Shop Name Customer Name Total
B001 SV Store Akshaya 500
Table 2: Item Details
Bill ID Item Qty Price Cost
B001 Soap 2 20 40
B001 Shampoo 1 100 100
... ... ... ... ...
📌 Benefit:
No redundancy.
Efficient for large datasets.
Tables are linked via Bill ID.
Lecture 3.2 – Below Average Students in
Two Iterations & Grade Allocation
🎯 Objectives
1. Identify students scoring below average in Maths.
2. Classify students into grades A, B, C, D based on marks.
📌 Part 1: Finding Below-Average Students
🔍 Why Not During Iteration?
If you try filtering while computing the average (in one pass), the average keeps
changing, making earlier decisions invalid.
Cards initially "below average" may no longer be below if average increases later, and
vice versa.
Result: Inefficient and error-prone process requiring repeated re-checking.
✅ Two-Iteration Approach (Better)
🔹 Iteration 1: Calculate Average
Track:
o sum = 0
o count = 0
For each card:
o Add marks to sum, increment count
At the end:
average=sum/count
Example Result:
Sum = 2171, Count = 30 → Average = 72.36
🔹 Iteration 2: Filter Below Average
For each card:
o If marks < average, mark as below average
Total: 15 students below average (exactly half, but this is coincidental)
🧠 Important Insight:
Average doesn’t always split data evenly.
If one student scores very high, many others can fall below average, even with similar
marks.
📌 Part 2: Grade Allocation – A, B, C, D
🎯 Goal:
Divide students into 4 performance bands based on marks:
o A – Highest
o D – Lowest
🔹 Step 1: Find min and max Marks
Similar to how max is tracked:
o max = -1, min = 101 (initial values)
For each card:
o Update max if mark > current max
o Update min if mark < current min
Result:
min = 42, max = 97, so range = 56
🔹 Step 2: Divide Range into 4 Equal Bands
Grade Marks Range
D 42–55
C 56–69
B 70–83
A 84–97
Each band is 14 marks wide (56 ÷ 4)
🔹 Step 3: Classify Each Card
Iterate through all cards again
Check mark and assign grade according to band
📊 Grade Distribution Example:
A: 8 students
B: 10 students
C: 9 students
D: 3 students
Shows most students performed well, many in B or higher.
Key Takeaways
Concept Insight
Average filter Requires full pass before filtering
Change during processing, so filtering must
Dynamic thresholds (e.g., average)
happen after
Grades from range Need min and max, then derive grade bands
First pass → compute summary values; second
Two-pass processing
pass → filtering/classify
🔄 Summary: Two Iteration Pattern
Task Step 1 Step 2
Select cards with marks
Below-average filter Compute total & average
< average
Assign cards to grade
Grade classification Compute min & max → bands
bands
Here is the pseudocode for both parts of Lecture 3.2:
1. Finding students below average
2. Allocating grades (A/B/C/D) based on mark ranges
✅ Part 1: Pseudocode to Find Below-Average Students
plaintext
CopyEdit
// Step 1: Calculate average
sum = 0
count = 0
while (Pile 1 has more cards) {
pick up card X
move card X to Pile 2
sum = sum + [Link]
count = count + 1
}
average = sum / count
// Step 2: Filter students below average
belowAverageList = []
while (Pile 2 has more cards) {
pick up card X
move card X to Pile 3
if ([Link] < average) {
append [Link] to belowAverageList
}
}
✅ Part 2: Pseudocode to Allocate Grades A, B, C, D
plaintext
CopyEdit
// Step 1: Find min and max marks
minM = 101 // marks are out of 100
maxM = -1
while (Pile 3 has more cards) {
pick up card X
move card X to Pile 4
if ([Link] < minM) {
minM = [Link]
}
if ([Link] > maxM) {
maxM = [Link]
}
}
// Step 2: Define grade bands
range = maxM - minM + 1
bandWidth = range / 4
// Grade thresholds
D_min = minM
D_max = D_min + bandWidth - 1
C_min = D_max + 1
C_max = C_min + bandWidth - 1
B_min = C_max + 1
B_max = B_min + bandWidth - 1
A_min = B_max + 1
A_max = maxM
// Step 3: Classify into grades
gradeA = []
gradeB = []
gradeC = []
gradeD = []
while (Pile 4 has more cards) {
pick up card X
move card X to Pile 5
if ([Link] >= A_min and [Link] <= A_max) {
append [Link] to gradeA
} else if ([Link] >= B_min and [Link] <= B_max) {
append [Link] to gradeB
} else if ([Link] >= C_min and [Link] <= C_max) {
append [Link] to gradeC
} else {
append [Link] to gradeD
}
}
🧠 Key Variables
Variable Meaning
sum Total marks accumulated
count Number of students/cards
average Mean maths marks
minM Minimum maths mark
Variable Meaning
maxM Maximum maths mark
bandWidth Width of each grade band
gradeX
Lists for student IDs in
each grade
Lecture 3.3 – Hypothesis Verification: Word
Length and Frequency
🎯 Objective
To verify a hypothesis:
“Frequent words tend to be shorter than infrequent words.”
🧪 Approach to Hypothesis Verification
Step 1: Define the two variables
Frequency: Words occurring more than once = frequent.
Length: Based on letter count, split words into short and long.
Step 2: Compute Average Word Length
Add up all letter counts of 65 words → total = 329 letters.
Average word length = 329 ÷ 65 = ~5.06 → rounded to 5.
So:
Short = 5 letters or fewer
Long = 6 letters or more
🔢 Step 3: Classify Words into 4 Quadrants
Category Description
✅ Conform: short & frequent Hypothesis-supporting
❌ Refute: long & frequent Violates hypothesis
✅ Conform: long & infrequent Hypothesis-supporting
❌ Refute: short & infrequent Violates hypothesis
📊 Results
Type of Word Count Interpretation
Frequent & Short 8 ✅ Supports hypothesis
Frequent & Long 1 ❌ (e.g., “Monday”)
Infrequent & Long 23 ✅ Supports hypothesis
Infrequent & Short 15 ❌ Opposes hypothesis
Total Words Considered: 47
Conforming cases: 31 (~66%)
Refuting cases: 16 (~34%)
🧠 Conclusion
Two-thirds of the words confirm the hypothesis.
Especially strong support: 8 of 9 frequent words are short.
Moderate support: ~60% of infrequent words are long.
✅ Observation is statistically suggestive, though not absolute.
✅ Lecture 3.4 – Three Prizes Problem
🎯 Problem
Select 3 students for prizes, such that:
1. All 3 are in top 3 overall (total marks)
2. Each has a top 3 position in at least one subject
3. There is gender diversity (at least one boy and one girl)
🔁 Step 1: Find Top 3 Overall
Use sliding top-3 logic:
o Track top1, top2, top3 based on total marks.
o Compare incoming cards with the current top3
If higher → update positions and shift others.
Handle ties by keeping extra cards (e.g., multiple students with same marks).
📝 Example Finalists:
Rahul (Boy)
Deepika (Girl)
Abirami (Girl)
✅ Mix of genders achieved.
📘 Step 2: Verify Subject Criteria
For each of the 3 finalists:
Check if any one subject (Maths, Physics, Chemistry) has their mark in the top 3.
Use same “top-3” logic subject-wise.
📌 Result:
All 3 finalists had at least one subject where they were in the top 3.
❓ What If...? (Edge Cases)
If top 3 all from same gender → may expand selection to include next highest with
opposite gender.
If none are in top 3 in any subject → might need to adjust criteria.
If tie in marks → use total or other subject as tie-breaker.
✅ Conclusion
Choosing "top performers" with multiple fairness constraints requires:
o Iterative filtering
o Careful tracking of rankings
o Handling special cases (ties, diversity)
✅ Final result must be defensible with clear criteria.
Step 2: Classify Words into 4 Groups
// Assume frequency information is already available
while (more word cards) {
pick up card W
move W to final pile
if ([Link] > 1) {
if (W.letter_count <= average_length) {
add W to Frequent_Short
} else {
add W to Frequent_Long
} else {
if (W.letter_count <= average_length) {
add W to Infrequent_Short
} else {
add W to Infrequent_Long
}
Goal:
Choose 3 students satisfying:
1. Top 3 in total marks
2. Top 3 in at least one subject
3. Include at least one boy and one girl
Step 1: Track Top 3 by Total Marks
Lecture 3.5 – Introduction to Procedures
and Parameters
🎯 Objective
To understand:
What is a procedure?
What is a parameter?
How do we reuse a process for multiple similar tasks?
🧠 Motivation: Repetitive Task – Finding Maximum
Suppose we want to:
Find the maximum marks in Maths, Physics, Chemistry, and Total.
🔄 What changes each time?
Only the field/column being examined (Maths, Physics, etc.)
🔁 What stays the same?
The process:
Iterate over cards, compare with current max, update if higher.
🧩 Analogy: Subcontracting a Task
Imagine:
You delegate the “find max” task to someone else.
You give them:
o A set of cards (the data)
o A field to focus on (Maths, Physics, etc.)
They:
Perform the max-finding logic.
Return:
o The result (e.g., max = 97)
o The original cards (possibly in different order, but not damaged)
🔧 What Is a Procedure?
A procedure is a named block of logic that:
Performs a specific task (like finding max)
Can be reused multiple times
🧠 Think of it like a function or method in programming.
📦 What Is a Parameter?
A parameter is a piece of input information you provide to a procedure to customize its
behavior.
🔹 Example:
plaintext
CopyEdit
max_of("Maths") → returns 97
max_of("Physics") → returns 92
max_of("Chemistry") → returns 97
max_of("Total") → returns 281
max_of is the procedure name
"Maths", "Physics", etc. are parameters
📐 Why Use Procedures?
Avoid repeating the same code/logic.
Easily adjust to new tasks by just changing the parameter.
Promotes modular thinking and code reuse.
🧮 Example Use Case in Class
We calculated:
Maximum marks in each subject:
Maths = 97, Physics = 92, Chemistry = 97
Total of max subject marks: 97 + 92 + 97 = 286
Actual highest total by any student: 281
➡️The best total is less than the sum of bests → no one topped all subjects.
🧠 Key Concepts Recap
Concept Description
Procedure A reusable set of instructions (e.g., max_of)
Parameter Variable input that changes behavior (e.g., subject name)
Return Value The output/result from the procedure (e.g., max mark)
Delegation One module gives data + task to another, gets result back
✅ Summary
A procedure helps modularize and reuse logic.
Parameters help make procedures flexible.
Delegation of tasks using procedures is powerful and efficient in programming and
computational thinking.
procedure max_of(field)
max = 0
while (more cards) {
pick up card X
move X to next pile
value = X[field] // access field dynamically (e.g., [Link], [Link])
if (value > max) {
max = value
return max
end procedure
📌 How to Use the Procedure
You can call it like this:
plaintext
Copy
Edit
math_max = max_of("Maths")
physics_max = max_of("Physics")
chemistry_max = max_of("Chemistry")
total_max = max_of("Total")
Lecture 3.6 – Pseudocode for Procedures and Parameters (Part 1)
🎯 Objective
Understand:
How to write pseudocode using procedures
The role of parameters to generalize and reuse code
How to modularize repetitive logic
🔁 Motivation: Repeated Computations
🔹 Task:
Calculate:
Sum of boys’ Maths marks
Sum of girls’ Maths marks
These require almost the same code, except:
Gender == "M" vs. Gender == "F"
✅ Key Insight:
Only the checked gender changes, so we can turn this into a parameter.
🧾 Generalized Procedure: sum_marks(gen)
Pseudocode:
Expanding Further: Add Field Parameter
Now generalize the field too, not just the gender.
➕ New Goal:
Add any subject's marks for any gender, e.g.:
Girls’ Physics total
Boys’ Chemistry total
📦 New Procedure: sum_marks(gen, field)
Pseudocode:
plaintext
CopyEdit
procedure sum_marks(gen, field)
sum = 0
while (more cards) {
pick up card X
move X to next pile
if ([Link] == gen) {
sum = sum + X[field]
}
}
return sum
end procedure
🔹 Usage Examples:
plaintext
CopyEdit
GirlChemSum = sum_marks("F", "Chemistry")
BoyPhysSum = sum_marks("M", "Physics")
TotalGirls = sum_marks("F", "Total")
🔧 Two Types of Parameters
Type Description Example
Value Used for comparisons "F", "M"
Field name Used to access a data field dynamically "Maths", "Physics"
⚙️Procedure Call Styles
✅ With Return Value (Assigned):
plaintext
CopyEdit
GirlChem = sum_marks("F", "Chemistry")
✅ Without Return Value (Action only):
Used when procedure modifies data, not computes value (e.g., correcting marks).
plaintext
CopyEdit
update_marks(CardID, "Maths", 92)
Lecture 3.7 – Pseudocode for Procedures and Parameters (Part 2)
🎯 Objective
To demonstrate how to:
Write and call procedures in pseudocode
Use parameters to vary behavior
Build modular, reusable code
🧠 Use Case: Finding the Best Overall Student
Question:
Is there a single student who topped in every subject?
Method:
Compare:
1. Max total across students
2. Sum of max marks in individual subjects (Maths + Physics + Chemistry)
If:
Max Total == Max(Maths) + Max(Physics) + Max(Chemistry)
→ One student topped all subjects
Procedure: max_marks(field)
Purpose:
Find the maximum value in a specified field (e.g., Maths, Physics).
🔹 Pseudocode:
plaintext
CopyEdit
procedure max_marks(field)
maxval = 0
while (more cards) {
pick up card X
move X to next pile
if (X[field] > maxval) {
maxval = X[field]
}
}
return maxval
end procedure
field is the parameter indicating which subject to check.
Logic is identical regardless of the field; only the field changes.
🧾 Using the Procedure
plaintext
CopyEdit
mathMax = max_marks("Maths")
physicsMax = max_marks("Physics")
chemistryMax = max_marks("Chemistry")
totalMax = max_marks("Total")
subjectMaxSum = mathMax + physicsMax + chemistryMax
if (totalMax == subjectMaxSum) {
singleTopper = true
} else {
singleTopper = false
}
Store the return values of each procedure call in a variable.
Use them in expressions and conditional checks.
🧩 Modularization Benefits
Advantage Explanation
Avoids repetition One procedure works for all fields (Maths, Physics, etc.)
Easy updates Change logic in one place (procedure), apply everywhere
Improves readability Clear, concise logic through function-style abstraction
Fewer errors No copy-paste bugs across repeated code
🧠 Summary: Key Concepts
Concept Explanation
Procedure A named, reusable block of code
Parameter Customizes the procedure (e.g., field to check like “Maths”)
Return value The result computed by the procedure (e.g., max marks)
Procedure call Triggers computation and optionally stores result
Modular design Makes code maintainable and scalable
Lecture 3.8 – Pseudocode for the Three
Prizes Problem
🎯 Objective
To write complete pseudocode for the "Three Prizes" problem, using:
Procedures
Parameters
Proper handling of:
o Subject toppers
o Gender balance
o Tie situations
📝 Problem Statement
Award 3 prizes to students such that:
1. They are among the top 3 in total marks.
2. They are in the top 3 in at least one subject.
3. The winners include at least one boy and one girl.
🔁 Key Sub-Procedures
🔹 1. top_three_marks(field)
Input: Subject name (e.g., "Maths")
Goal: Find the top 3 marks in that subject.
Only returns the 3rd highest mark because:
We only need to check whether a student scores ≥ third max to qualify.
🔹 2. subject_topper(card, math3, phys3, chem3)
Input: A student card and threshold marks (3rd highest) for each subject
Returns: true if student scores ≥ threshold in any subject
🧾 Main Steps in Pseudocode
🔸 Step 1: Initialize
plaintext
CopyEdit
// Initialize max marks and IDs
max = second = third = 0
maxID = secondID = thirdID = -1
Use -1 as a dummy ID not present in actual data.
🔸 Step 2: Compute Subject Thresholds
plaintext
CopyEdit
math3 = top_three_marks("Maths")
phys3 = top_three_marks("Physics")
chem3 = top_three_marks("Chemistry")
These represent the 3rd highest mark in each subject.
🔸 Step 3: Process Each Card
plaintext
CopyEdit
while (more cards) {
pick card X
if (subject_topper(X, math3, phys3, chem3)) {
T = [Link]
if (T > max) {
third = second; thirdID = secondID
second = max; secondID = maxID
max = T; maxID = [Link]
} else if (T > second) {
third = second; thirdID = secondID
second = T; secondID = [Link]
} else if (T > third) {
third = T; thirdID = [Link]
}
}
}
Only subject toppers are considered for prize eligibility.
Updates both marks and IDs in sync.
🔸 Step 4: Gender Constraint Check
plaintext
CopyEdit
// After finding top 3 IDs, check for gender balance
if (all 3 winners are same gender) {
// Drop third place
// Repeat process until one winner of opposite gender is found
}
Optional loop to re-balance gender mix by discarding and reprocessing if necessary.
Issue: If no student of opposite gender is a subject topper, fairness fails.
⚠️Special Considerations
🔹 Ties
Ties in top marks can distort winner count:
o 3 tied at 1st → all take prizes.
o 2 tied at 2nd → occupy 2 positions.
Could exceed 3 winners unless tie-breaking is handled explicitly.
🔹 Edge Cases
If no girls (or boys) are subject toppers → gender balance can't be achieved.
May require redefining prize criteria if constraints conflict.
🧠 Key Insights
Concept Purpose
top_three_marks Reusable procedure to compute threshold for subjects
subject_topper Efficient check if a student is a subject-wise topper
Modular design Separates concerns: filtering, ranking, fairness
Reuse of logic Shared logic (top 3 updates) adapted with IDs & marks
Boolean functions Can be used directly in if without == true
✅ Summary
A complex selection task is broken into reusable procedures.
Logic includes filters, comparisons, sorting, and validations.
The final pseudocode:
o Is modular
o Can be reused in other ranking-based selection problems
o Accounts for fairness and flexibility
Lecture 3.9 – Side Effects of Procedures
🎯 Objective
To understand:
What side effects are in procedures
When they are acceptable, undesirable, or even intended
How to manage them properly using clear contracts
🧠 What Are Side Effects?
A side effect occurs when a procedure modifies data or state outside of its intended return
value.
⚠️For example:
A procedure that modifies the order of a deck of cards passed to it—even though the main
task was just to compute a sum.
🔧 Example: Summing Marks by Gender
Suppose we use a procedure like:
plaintext
CopyEdit
sum_marks(gender, field)
🔹 Assumptions:
It processes cards from a deck (e.g., Pile 1)
It moves cards to another pile (seen pile)
At the end, original deck is empty
This is a side effect—the input deck was modified.
🔄 Restoration Problem
If we want the cards to remain unchanged after the procedure:
Cards must be moved back
But even then, the order may be reversed
Example:
Initial order: 1 → 2 → 3 → 4
After processing: 4 → 3 → 2 → 1 (reversed)
This change is a side effect in card sequence (not just content).
🔍 When Do Side Effects Matter?
Side Effect
Scenario Why?
OK?
Sum of marks ✅ Yes Order doesn't affect the sum
Sorting cards by marks ✅ Yes Changing order is the goal
Pronoun resolution in Word sequence is critical to correct
❌ No
text interpretation
Contract-Based Thinking (Interface vs Implementation)
When you write a procedure:
You are creating a contract between:
The caller (who gives input)
The procedure (which processes and returns output)
🔹 Contract must specify:
1. Inputs (parameters): What you will give
2. Output: What you expect in return
3. Side effects: What changes (if any) the procedure will make to inputs or environment
📦 Interface vs Implementation
Part Purpose
Interface What the procedure accepts and returns (visible to the caller)
Implementation Internal steps taken (not the caller’s concern if contract is followed)
✅ If the interface remains the same, you can improve or replace the implementation later.
💡 Key Takeaway Examples
Should Preserve Input Side Effect
Task Notes
Order? Acceptable?
Order doesn’t matter for
Summing marks No ✅ Yes
summing
Sorting by total No ✅ Yes Side effect is the intended
Should Preserve Input Side Effect
Task Notes
Order? Acceptable?
marks output
Pronoun Word order is critical for
✅ Yes ❌ No
resolution correctness
✅ Summary
Side effects are changes made by procedures outside their return values
Sometimes they are:
o ✅ Useful (e.g., sorting)
o ⚠️Harmless (e.g., summing marks)
o ❌ Harmful (e.g., when order matters)
Always define a clear contract:
o Inputs
o Expected outputs
o Allowed side effects (if any)
Good procedures respect data integrity unless modification is explicitly intended