DSA - Training
DSA - Training
SET
• Definition
A set is a collection of unique elements, with no duplicates allowed.
In C, a set can be implemented using arrays or hash tables, but arrays will be used for simplicity.
o Time Complexity: O(n) in the worst case, as you might need to check all elements.
if (set->data[i] == element)
return 1;
return 0;
}
Union
void unionSets(int a[], int b[], int sizeA, int sizeB) {
int result[100], k = 0;
for (int i = 0; i < sizeA; i++) result[k++] = a[i];
Goal: Create a new set for (int i = 0; i < sizeB; i++) {
containing all unique
int found = 0;
elements from two sets.
for (int j = 0; j < sizeA; j++) if (b[i] == a[j]) found = 1;
Approach: Add all elements
from both sets, ensuring no if (!found) result[k++] = b[i];
duplicates. }
Time Complexity: O(n + m), printf("Union: "); for (int i = 0; i < k; i++) printf("%d ", result[i]);
where n and m are the sizes
of the two sets printf("\n");
}
Intersection
void intersectionSets(int a[], int b[], int sizeA, int sizeB) {
int result[100], k = 0;
o Goal: Create a new set with for (int i = 0; i < sizeA; i++)
elements common to both sets. for (int j = 0; j < sizeB; j++)
o Approach: For each element in if (a[i] == b[j]) result[k++] = a[i];
the first set, check if it exists in
the second set. printf("Intersection: "); for (int i = 0; i < k; i++) printf("%d ", result[i]);
o Goal: Create a new set with for (int i = 0; i < sizeA; i++) {
elements from the first set that int found = 0;
are not in the second set. for (int j = 0; j < sizeB; j++) if (a[i] == b[j]) found = 1;
o Approach: For each element in
if (!found) result[k++] = a[i];
the first set, add it to the new set
if it's not in the second set. }
o Time Complexity: O(n * m) printf("Difference: "); for (int i = 0; i < k; i++) printf("%d ", result[i]);
printf("\n");
}
Applications
• Approach: Compute the hash of the key, then store the pair in the appropriate location in the hash table.
• Counting Frequencies
Maps are commonly used to count the frequency of elements, such as counting word occurrences in a text.
• Configuration Settings
Maps store configuration settings where keys represent setting names and values represent setting values, allowing quick
lookup and modification.
• Caching
Maps can be used to implement caches where the key is a resource identifier, and the value is the cached data. This allows
for efficient retrieval of frequently accessed data.
• Associative Arrays
Maps function as associative arrays, allowing the storage of pairs of data where one element is the identifier, like storing
usernames (keys) with their associated data (values).
• Storing Metadata
Maps are often used to store metadata about files or database records, where the key is the file or record identifier and the
value is the associated metadata.
Frequency Array
• A Frequency Array is a specialized array used to count the occurrences of elements within a fixed range. Each
index in the array corresponds to a specific element, and the value at that index represents the frequency (or
count) of that element.
• Frequency arrays are particularly efficient for counting occurrences when the range of possible elements is
known and limited (e.g., counting digits `0-9` or lowercase letters `a-z`).
Counting Frequency of Digits in a Number
• The prefix sum allows for efficient calculation of the sum of any subarray, by subtracting the prefix sum at
one index from the prefix sum at another index.
• Construction
1. Initialize the prefix sum array's first element as the original array's first element.
2. Iterate through the original array, and for each element at index `i`, calculate the prefix sum as the sum of the
element at `i` and the prefix sum at `i-1`.
void constructPrefixSum(int arr[], int n, int prefixSum[]) {
3. The formula is:
prefixSum[0] = arr[0];
• prefix_sum[i] = prefix_sum[i-1] + arr[i] for (int i = 1; i < n; i++) {
prefixSum[i] = prefixSum[i - 1] + arr[i];
Using Prefix Sum to Calculate Subarray Sums
void constructPrefixSum(int arr[], int n, int prefixSum[]) {
prefixSum[0] = arr[0];
for (int i = 1; i < n; i++) {
prefixSum[i] = prefixSum[i - 1] + arr[i];
The input array is `{1, 2, 3, 4, 5}`.
}
We calculate the sum of the subarray from index `1` to
`3` (subarray `{2, 3, 4}`), which results in `9`.
}
This is computed using the prefix sum array, allowing
us to find the sum in constant time.
// Function to calculate the sum of elements from index l to r
int rangeSum(int prefixSum[], int l, int r) {
if (l == 0)
return prefixSum[r];
else
return prefixSum[r] - prefixSum[l - 1];
}
Applications
• Essentially, it’s the reverse of the prefix sum, where instead of summing from the start of the array, you sum
from the end toward the beginning.
• 1. Initialize the last element of the suffix sum array as the last element of the original array.
• 2. Iterate through the original array from the second-last element to the first element. For each element at
index `i`, calculate the suffix sum as the sum of the element at `i` and the suffix sum at `i+1`.
Example: If you need to find the maximum sum of a subarray that can be split into two parts, prefix sums can help with
the first part, and suffix sums can help with the second part.
3. Finding Equilibrium Index:
Use Case: An equilibrium index is an index where the sum of elements on the left is equal to the sum of elements on
the right. By precomputing prefix and suffix sums, you can efficiently find such indices.
Example: You can check for each index `i` if `prefix_sum[i-1] == suffix_sum[i+1]`.
Combined Examples: Problem-Solving with Sets, Maps, Frequency Arrays, and Prefix/Suffix Sums
Problem Statement:
Given an array of integers and a target sum, find all pairs of numbers in the array that add up to the target sum.
Approach
Use a set to store the elements as you iterate through the array.
- For each element, check if the difference between the target sum and the current element exists in the set.
- If it exists, you have found a pair; otherwise, add the current element to the set.
- The frequency array `set` acts like a set to track the presence of elements.
- As you iterate through the array, for each element `arr[i]`, you check if `target - arr[i]` is already in the set. If it is,
then the pair `(arr[i], target - arr[i])` sums to the target.
- This approach has a time complexity of O(n) and requires O(n) additional space
Problem 2: Counting Frequency of Elements Using Frequency Arrays
Problem Statement:
Given an array of integers where the range of elements is known (e.g., `0-9`), count the frequency of each element.
Approach:
- Use a frequency array to count occurrences of each element.
- Each index in the frequency array corresponds to an element in the original array.
Problem Statement:
Given an array of integers, find the maximum sum of any contiguous subarray.
Approach:
- Use a prefix sum array to calculate the sum of subarrays efficiently.
- The maximum subarray sum ending at index `i` can be found by comparing all possible subarray sums ending at `i`.
- The prefix sum array is used to efficiently calculate the sum of any subarray.
- The double loop checks all subarrays ending at each index `j`, and the maximum sum found is stored.
- This approach has a time complexity of O(n^2), but it’s more efficient than the naive O(n^3) solution without
prefix sums.
Problem 4: Finding an Equilibrium Index Using Prefix and Suffix Sums
Problem Statement:
Find an index in the array such that the sum of elements on the left is equal to the sum of elements on the right.
Approach:
- Use both prefix and suffix sum arrays.
- The equilibrium index is found where `prefix_sum[i-1] == suffix_sum[i+1]`.
- The prefix sum array sums elements from the start to each index, while the suffix sum array sums from the end to
each index.
- The equilibrium index is found where the sum of elements to the left equals the sum of elements to the right.
- This approach checks each index and returns the first equilibrium index found.
Steps:
class Solution: 1.Convert nums1 and nums2 to sets:
def intersection(self, nums1, nums2): # Note the •The conversion of arrays to sets helps remove any
'self' added here duplicate elements automatically, as a set only stores unique
# Convert both lists to sets to remove duplicates values.
set1 = set(nums1) •This is important because the problem asks for unique
set2 = set(nums2) elements in the result.
2.Find the intersection of the two sets:
•Using the intersection operation (&), we get only the common
# Perform set intersection elements between the two sets.
result = set1 & set2 •This is an efficient operation in Python since it's optimized
for set-based operations.
# Convert the result back to a list before 3.Return the result as a list:
returning •The final result is converted back to a list, since the output
return list(result) format expects a list, not a set.
Time Complexity:
# Example usage: •O(n + m): Here, n is the length of nums1, and m is the length of
nums1 = [4, 9, 5] nums2.
nums2 = [9, 4, 9, 8, 4] •Converting both arrays into sets takes O(n) and O(m) time.
print(intersection(nums1, nums2)) # Output: [9, 4] •The intersection operation between the two sets is O(min(n,
m)).
Hence, the total time complexity is O(n + m), which is very
efficient.
Brute-Force Approach:
1. For each character in the first string, check if it appears in all other strings.
2. Count how many times the character appears in every string, and add it to the result as often as it occurs across all strings.
Steps:
1. Start with the first string.
2. For each character in the first string, count its occurrences in the other strings.
3. If it is present in all other strings, add it to the result.
4. Repeat the process for all characters of the first string.
Consider the input words = ["bella", "label", "roller"]:
1.The first word is "bella". The function starts by examining the characters of "bella"
one by one:
•First character: 'b'. It looks for 'b' in "label" and "roller". It finds 'b' in "label" but
not in "roller", so it moves on.
•Second character: 'e'. It finds 'e' in "label" and "roller", so 'e' is added to the result.
•Third character: 'l'. It finds 'l' in both "label" and "roller", so 'l' is added to the
result.
•Fourth character: 'l'. The second 'l' is also found in both words (using the marked
versions of the words), so another 'l' is added.
•Fifth character: 'a'. It doesn't find 'a' in "roller", so it moves on.
2.The final result is ["e", "l", "l"], which are the common characters in all three words.
vector<string> commonChars(vector<string>& words) { •isCommon flag: This boolean variable is
initialized to true for every character in the
vector<string> result; first word.
•The inner for loop (starting from index 1)
string firstWord = words[0]; // Start with the first word
goes through every other word in the
for (char ch : firstWord) { //Iterate Over Each Character in the First Word words vector to check if the current
character ch from the first word is present
bool isCommon = true; in these words.
// Check if 'ch' is present in all other words •find(ch): It looks for the character ch in
the current word (words[i]). If it doesn't
for (int i = 1; i < words.size(); i++) { find the character (find() returns npos), it
size_t pos = words[i].find(ch); // Check if the character is in the current word sets isCommon = false, meaning this
character is not common across all the
if (pos == string::npos) { words, and it breaks the loop.
•Marking the character: If the character
isCommon = false; is found in the current word, the algorithm
break; replaces that character with a placeholder
(#) to ensure that the same character isn't
} else { reused for future checks. This is important
to correctly handle cases where the
words[i][pos] = '#'; // Mark it as used}}
character appears multiple times.
// If 'ch' is common in all strings, add it to the result
If the character ch is found in all the words, it is
if (isCommon) { added to the result vector as a string. The
result.push_back(string(1, ch)); string(1, ch) part converts the character into a
string format since result stores strings, not
}}return result;}}; characters.
Optimal Approach:
1. Use frequency counting to track the minimum occurrence of
each character across all strings.
2. The characters that have non-zero minimum frequencies in all
words are the common characters.
Steps:
1. Create a frequency array of size 26 (for 'a' to 'z') initialized to
a high value (e.g., `INT_MAX`).
2. For each string, create a local frequency count of its
characters.
3. Update the global frequency array by taking the minimum
occurrence of each character.
4. After processing all strings, construct the result by adding
characters based on their minimum frequency.
Let’s go through an example with input words = ["bella", "label", "roller"]:
1.First word "bella":
•The frequency array freq will count the characters: {'b': 1, 'e': 1, 'l': 2, 'a': 1}.
•minFreq will be updated with these frequencies, so minFreq for 'b', 'e', 'l', and 'a' will be [1,
1, 2, 1].
2.Second word "label":
•The frequency array freq for this word is {'l': 2, 'a': 1, 'b': 1, 'e': 1}.
•minFreq will remain unchanged for 'b', 'e', 'l', and 'a' because the frequencies are the same or
greater in "label".
3.Third word "roller":
•The frequency array freq for this word is {'r': 2, 'o': 1, 'l': 2, 'e': 1}.
•minFreq for 'b' becomes 0 (because 'b' is not present in "roller"). For 'l', it stays 2, for 'e' it
stays 1, and 'a' becomes 0 (as it's missing in "roller").
4.Final Result:
•The function will construct the result based on minFreq, which will look like: {'e': 1, 'l': 2}.
•So, the final result will be ["e", "l", "l"].
vector<string> commonChars(vector<string>& words) {
// Initialize minFreq array with large values (INT_MAX)
vector<int> minFreq(26, INT_MAX);
// Loop through each word
Explanation of Optimal Approach:
for (const string& word : words) { 1. `minFreq` Array: We initialize an array
vector<int> freq(26, 0); // Frequency array for current word
`minFreq` of size 26 (for all lowercase letters) to
store the minimum frequency of each character
// Count frequency of each character in the word
across all strings.
for (char ch : word) {
2. Counting Frequencies: For each string, we
freq[ch - 'a']++;}
count the frequency of each character and update
// Update minFreq to keep track of minimum frequency of each character the `minFreq` array by taking the minimum of the
for (int i = 0; i < 26; i++) { current word's frequency and the previous
minFreq[i] = min(minFreq[i], freq[i]);}}// Prepare the result vector minimum.
vector<string> result; 3. Constructing the Result: After processing all
// For each character, add it to the result as many times as its minFreq value strings, for each character, we append it to the
for (int i = 0; i < 26; i++) { result as many times as it appears across all strings
(as determined by the `minFreq` array).
while (minFreq[i] > 0) {
result.push_back(string(1, 'a' + i)); // Convert index back to char
minFreq[i]--;}}return result;}};
An email consists of a local and domain names, separated by the @ sign. There are two key rules:
1.Periods (.) in the local name should be ignored.
•Example: "[email protected]" and "[email protected]" should be treated as the same email address.
2.Plus signs (+) in the local name: Everything after the first + and before the @ should be ignored.
•Example: "[email protected]" should be considered the same as "[email protected]".
The domain name part remains unchanged and should be used as is.
Approach:
1.Iterate over the list of email addresses.
2.Split each email into the local name and the domain name using the @ character.
3.Handle the local name:
•Remove all periods (.).
•Ignore everything after the first plus sign (+) if present.
4.Concatenate the cleaned local name and the unchanged domain name.
5.Store the processed email in a set (to automatically filter duplicates).
6.Finally, return the size of the set (the number of unique email addresses).
int numUniqueEmails(vector<string>& emails) {
unordered_set<string> uniqueEmails;
for (string email : emails) {
string localName = "";
string domainName = "";
bool ignore = false; // flag to track if we should ignore characters
bool atReached = false; // flag to track if '@' is encountered
The brute-force method is already fairly optimal, but we can make some slight adjustments for readability and
efficiency.
1.String Operations: Instead of handling each character in the local name manually, we can use built-in string
operations like split, replace, and find.
2.Optimization: Using simple string functions makes the code cleaner and easier to understand.
int numUniqueEmails(vector<string>& emails) {
unordered_set<string> uniqueEmails;
return uniqueEmails.size();
}};