0% found this document useful (0 votes)

2 views6 pages

Problem 4 Report - D&C Compression

The document outlines a divide and conquer algorithm for string compression that detects repeated substrings and encodes them using reference encoding. It details the algorithm's steps, pseudocode, time and space complexity, correctness, and parallelization capabilities. Additionally, it compares the algorithm with suffix arrays and provides examples and test cases demonstrating its efficiency and effectiveness.

Uploaded by

Fahad Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views6 pages

Problem 4 Report - D&C Compression

Uploaded by

Fahad Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Problem 4: Divide and Conquer in String Compression

Course: CS-2009 Design and Analysis of Algorithms

Assignment: 3
Student: [Your Name]
Roll Number: [Your Roll Number]

1. Algorithm Explanation
The algorithm uses divide and conquer to detect repeated substrings for compression. It splits text into blocks,
finds repeats within and across blocks, then merges results to create compressed output using reference
encoding (similar to LZ77).

Key Steps:

1. Divide text recursively into left/right halves

2. Find longest repeated substring in each half

3. Check for repeats spanning the boundary

4. Merge by selecting repeat with maximum compression savings

5. Encode output with <offset, length> references

2. Pseudocode

FUNCTION compress(text, blockSize):

repeat = divideAndConquer(0, length(text))
RETURN createCompressedOutput(repeat)

FUNCTION divideAndConquer(start, end):

IF end - start ≤ blockSize:
RETURN findLongestRepeatInBlock(text[start...end])

mid = (start + end) / 2

leftRepeat = divideAndConquer(start, mid)
rightRepeat = divideAndConquer(mid, end)
crossRepeat = findCrossBoundaryRepeats(mid)

RETURN mergeRepeats([leftRepeat, rightRepeat, crossRepeat])

FUNCTION findLongestRepeatInBlock(block):
FOR each substring in block:
IF substring appears > 1 time AND longer than current best:
Update longest repeat
RETURN longest repeat

FUNCTION findCrossBoundaryRepeats(mid):
FOR overlap = 1 TO blockSize:
IF text[mid-overlap...mid] = text[mid...mid+overlap]:
IF this pattern repeats elsewhere:
Update cross repeat
RETURN cross repeat

FUNCTION mergeRepeats(repeats):
SELECT repeat with maximum savings
WHERE savings = length × (occurrences - 1)

FUNCTION createCompressedOutput(repeat):
Keep first occurrence
Replace other occurrences with <offset, length>
RETURN compressed string

3. Time and Space Complexity

Time Complexity

Component Complexity

Divide O(1)

Find repeats in block O(b³) where b = block size

Cross-boundary check O(b² × n)

Merge O(k) where k = candidates

Total per level O(n × b²)

Recursion depth O(log n)

Overall O(n × b² × log n)

 

Optimized: O(n log² n) with hash-based substring matching

Space Complexity
Recursion stack: O(log n)
Substring storage: O(n)

Overall: O(n)

4. Merge Step Correctness

Ensures Correctness by:

1. Completeness: Left + Right + Cross-boundary covers ALL possible repeats

2. No missed patterns: Cross-boundary check handles patterns spanning blocks

3. Optimal selection: Chooses repeat with max savings = length × (occurrences - 1)

Proof: Any repeated substring R is either:

Entirely in left half → Found by left recursion

Entirely in right half → Found by right recursion

Spans boundary → Found by cross-boundary check ∴ R is always detected

5. Parallelization
The algorithm naturally supports parallel execution:

PARALLEL divideAndConquer(start, end):

IF base case: RETURN sequential result

mid = (start + end) / 2

SPAWN leftTask = divideAndConquer(start, mid)

SPAWN rightTask = divideAndConquer(mid, end)

SYNC leftRepeat = [Link]

SYNC rightRepeat = [Link]

crossRepeat = findCrossBoundaryRepeats(mid)
RETURN mergeRepeats([leftRepeat, rightRepeat, crossRepeat])

Performance: With p processors, time reduces to O(n log² n / p + log n)

6. Comparison with Suffix Array
Metric Divide & Conquer Suffix Array

Time O(n log² n) O(n log n)

Space O(n) O(n)

Parallelizable ✓ Excellent ✗ Limited

Cache locality ✓ Sequential ✗ Random

Best for Parallel systems Single-threaded

 

Trade-off: D&C is slightly slower but better for multi-core/distributed systems

7. Diagrams
Divide and Conquer Tree for "abcabcabcdefdef"

[SPACE FOR D&C TREE DIAGRAM]

"abcabcabcdefdef"
|
___________|___________
| |
"abcabcabc" "defdef"
| |
____|____ _____|_____
| | | |
"abca" "bcabc" "def" "def"

Merge at each level to find best repeat

Result: "abc" appears 3 times (positions 0,3,6)

Algorithm Flowchart

[SPACE FOR FLOWCHART]

Start → Input Text

↓
Divide at midpoint
↓
Recursively process left & right
↓
Check cross-boundary repeats
↓
Merge: Select best repeat
↓
Create compressed output
↓
End

8. Example Scenario
Input: "abcabcabcdefdefghi" (block size = 6)

Execution:

Level 0: Split "abcabcabcdefdefghi" at position 9

Left: "abcabcabc" | Right: "defdefghi"

Level 1:
Left finds: "abc" at [0,3,6] → 3 occurrences
Right finds: "def" at [9,12] → 2 occurrences
Cross-boundary: No match

Merge: Select "abc" (savings = 3×(3-1) = 6 > 3×(2-1) = 3)

Output: "abc<3,3><6,3>defdefghi"
(First "abc" kept, others replaced with references)

Compression: Original 18 chars → Compressed ~16 chars

9. Output Screenshots

[SCREENSHOT 1: Basic compression with "abcabcabcdef"]

[SCREENSHOT 2: High repetition text showing good compression ratio]

[SCREENSHOT 3: Text with no repetition showing 0% compression]

[SCREENSHOT 4: Example execution trace for "abcabcabcdefdefghi"]

10. Sample Test Cases

Test 1: "ababababab"
Output: "ab<2,2><4,2><6,2><8,2>"
Ratio: ~40%

Test 2: "abcdefghijk"
Output: "abcdefghijk" (no compression)
Ratio: 0%

Test 3: "abcabcdefdef"
Output: Detects "abc" (3 chars, 2 times)
Best: "abc" or "def" based on savings

Conclusion
✓ Efficient: O(n log² n) time, O(n) space
✓ Correct: Merge ensures all repeats detected
✓ Parallel: Natural decomposition for multi-core systems
✓ Practical: Cache-friendly, good for large-scale compression

End of Report

Probabilistic Robotics: SLAM: Robots Móviles UC3M Dep. de Ing. de Sistemas y Automática
No ratings yet
Probabilistic Robotics: SLAM: Robots Móviles UC3M Dep. de Ing. de Sistemas y Automática
33 pages
Unit 2 - Interpolation
No ratings yet
Unit 2 - Interpolation
31 pages
MVCT101 Advanced Mathematics UNIT 1 Capsule 1 Objective Question Notes RGPV MTECH CTM (Notescivil - Blogspot.com)
No ratings yet
MVCT101 Advanced Mathematics UNIT 1 Capsule 1 Objective Question Notes RGPV MTECH CTM (Notescivil - Blogspot.com)
8 pages
Engineering Management Exam 2019
No ratings yet
Engineering Management Exam 2019
12 pages
Lecture 2 - LP Basics1
No ratings yet
Lecture 2 - LP Basics1
33 pages
CH 13
No ratings yet
CH 13
22 pages
CBSE Sample Paper
No ratings yet
CBSE Sample Paper
6 pages
BFGS
No ratings yet
BFGS
9 pages
Portal Frame Analysis by Moment Distribution Method
No ratings yet
Portal Frame Analysis by Moment Distribution Method
199 pages
ASSIGNMENT 3 - Probabilistic Models, GBDT, SVM
No ratings yet
ASSIGNMENT 3 - Probabilistic Models, GBDT, SVM
3 pages
Message Authentication and Hash Functions
No ratings yet
Message Authentication and Hash Functions
3 pages
ML-Lab Programs - VTU
No ratings yet
ML-Lab Programs - VTU
5 pages
CS22403 - DAA - Assignment 1
No ratings yet
CS22403 - DAA - Assignment 1
1 page
Discrete-Time Linear, Time Invariant Systems and Z-Transforms
No ratings yet
Discrete-Time Linear, Time Invariant Systems and Z-Transforms
16 pages
Muratore
No ratings yet
Muratore
106 pages
Digital Image Processing Guide
No ratings yet
Digital Image Processing Guide
8 pages
BTech Data Mining Exam Prep
No ratings yet
BTech Data Mining Exam Prep
8 pages
Face Detection Using MATLAB
No ratings yet
Face Detection Using MATLAB
3 pages
Object Detection Using Autoencoder
No ratings yet
Object Detection Using Autoencoder
8 pages
Lecture 4 - Constraint Satisfaction Problem II
No ratings yet
Lecture 4 - Constraint Satisfaction Problem II
27 pages
DFS BFS
No ratings yet
DFS BFS
23 pages
Rosenblatt's Perceptron: Neural Networks and Learning Machines
No ratings yet
Rosenblatt's Perceptron: Neural Networks and Learning Machines
12 pages
Mathematics-II Practice Problems
No ratings yet
Mathematics-II Practice Problems
13 pages
Suffix
No ratings yet
Suffix
29 pages
Electronics Engineering Basics
No ratings yet
Electronics Engineering Basics
6 pages
Seed Fill Algorithms
No ratings yet
Seed Fill Algorithms
13 pages
Ada 1
No ratings yet
Ada 1
26 pages
Algorithm U1 Answer Key
No ratings yet
Algorithm U1 Answer Key
23 pages
Deep Learning Models For Cyber Security in IoT Networks
No ratings yet
Deep Learning Models For Cyber Security in IoT Networks
6 pages
A Neural Network Based PDE Solving Algorithm With High Precision
No ratings yet
A Neural Network Based PDE Solving Algorithm With High Precision
12 pages

Problem 4 Report - D&C Compression

Uploaded by

Problem 4 Report - D&C Compression

Uploaded by

Problem 4: Divide and Conquer in String Compression

Course: CS-2009 Design and Analysis of Algorithms

1. Divide text recursively into left/right halves

2. Find longest repeated substring in each half

3. Check for repeats spanning the boundary

4. Merge by selecting repeat with maximum compression savings

5. Encode output with <offset, length> references

FUNCTION compress(text, blockSize):

FUNCTION divideAndConquer(start, end):

mid = (start + end) / 2

RETURN mergeRepeats([leftRepeat, rightRepeat, crossRepeat])

3. Time and Space Complexity

Find repeats in block O(b³) where b = block size

Cross-boundary check O(b² × n)

Merge O(k) where k = candidates

Total per level O(n × b²)

Recursion depth O(log n)

Overall O(n × b² × log n)

Optimized: O(n log² n) with hash-based substring matching

4. Merge Step Correctness

1. Completeness: Left + Right + Cross-boundary covers ALL possible repeats

2. No missed patterns: Cross-boundary check handles patterns spanning blocks

3. Optimal selection: Chooses repeat with max savings = length × (occurrences - 1)

Proof: Any repeated substring R is either:

Entirely in left half → Found by left recursion

Entirely in right half → Found by right recursion

Spans boundary → Found by cross-boundary check ∴ R is always detected

PARALLEL divideAndConquer(start, end):

mid = (start + end) / 2

SPAWN leftTask = divideAndConquer(start, mid)

SYNC leftRepeat = [Link]

Performance: With p processors, time reduces to O(n log² n / p + log n)

Time O(n log² n) O(n log n)

Space O(n) O(n)

Parallelizable ✓ Excellent ✗ Limited

Cache locality ✓ Sequential ✗ Random

Best for Parallel systems Single-threaded

Trade-off: D&C is slightly slower but better for multi-core/distributed systems

[SPACE FOR D&C TREE DIAGRAM]

Merge at each level to find best repeat

[SPACE FOR FLOWCHART]

Start → Input Text

Level 0: Split "abcabcabcdefdefghi" at position 9

Merge: Select "abc" (savings = 3×(3-1) = 6 > 3×(2-1) = 3)

Compression: Original 18 chars → Compressed ~16 chars

[SCREENSHOT 1: Basic compression with "abcabcabcdef"]

[SCREENSHOT 2: High repetition text showing good compression ratio]

[SCREENSHOT 3: Text with no repetition showing 0% compression]

[SCREENSHOT 4: Example execution trace for "abcabcabcdefdefghi"]

You might also like