Problem: Counting the number of occurrences of a word and its synonyms in a corpus of text documents.
1. Decomposition
The problem can be broken into two primary sub-problems:
a. Synonym Expansion:
- Expand the keyword to include all its synonyms based on the thesaurus.
- Parse the thesaurus to retrieve synonyms for the given keyword.
b. Word Count in Corpus:
- Iterate through each document in the corpus.
- For each document, count occurrences of the keyword and its synonyms.
2. Pattern Recognition
Two primary patterns emerge in the solution:
a. Iterating over collections:
- The corpus contains multiple documents, and the same process of searching for words is applied to e
- The thesaurus contains a list of synonyms, and we need to process all synonyms associated with the
b. Searching and counting:
- Within each document, the process of counting occurrences of the keyword and its synonyms is repea
3. Data Abstraction and Representation
The data can be represented as follows:
a. Thesaurus: A dictionary where the key is a word, and the value is a list of synonyms.
Example:
thesaurus = {
"happy": ["joyful", "content", "pleased"],
"sad": ["unhappy", "sorrowful", "downcast"]
b. Corpus: A list of strings, where each string is a document.
Example:
corpus = [
"I am very happy and joyful today.",
"This content is about being happy.",
"Feeling sad and sorrowful now."
c. Keyword: A single string, e.g., "happy".
4. Algorithm
The algorithm for solving the problem is as follows:
a. Input:
- Keyword (string)
- Thesaurus (dictionary of word-synonym pairs)
- Corpus (list of text documents)
b. Synonym Expansion:
- Retrieve the list of synonyms for the keyword from the thesaurus.
- Combine the keyword and its synonyms into a single list of "search terms."
c. Word Count in Corpus:
- Initialize a counter to 0.
- For each document in the corpus:
- Split the document into words.
- For each word in the document:
- Check if the word is in the list of search terms (keyword + synonyms).
- If yes, increment the counter.
d. Output:
- Return the counter as the total number of occurrences of the keyword and its synonyms.
5. Real-World Problem Example
A company analyzing customer feedback to assess sentiment might use this algorithm to count positive wo
This step could form the basis for determining a sentiment score for products.