0% found this document useful (0 votes)

51 views11 pages

Dads402-Unstructured Data Analysis

The document is an assignment for a Master of Business Administration (MBA) course on Unstructured Data Analysis, covering topics such as differences between structured and unstructured data, text vs. big data, word clouds, Naive Bayes classifiers, sentiment analysis, topic modeling, Fast Fourier Transform (FFT), and audio data preprocessing. It includes detailed explanations, applications, and techniques related to each topic. The assignment is structured into multiple questions and answers, providing a comprehensive overview of key concepts in data analysis and machine learning.

Uploaded by

namrita15mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views11 pages

Dads402-Unstructured Data Analysis

Uploaded by

namrita15mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

ASSIGNMENT

SESSION FEBRUARY - MARCH 2024

PROGRAM MASTER OF BUSINESS ADMINISTRATION
(MBA)
SEMESTER IV
COURSE CODE & NAME DADS402-UNSTRUCTURED DATA ANALYSIS
NAME NAMRITA MISHRA
ROLL NUMBER 2214511886

Assignment Set – 1
Question-1 (a) List down a few differences between structured and unstructured data.
(b) What is the difference between Text and Big data?
Answer- 1 (a) Differences between Structured and Unstructured Data

Structured Data:

1. Format: Structured data is organized in a predefined format, typically in rows and

columns. Examples include databases, spreadsheets, and tables.
2. Storage: It is stored in relational databases (SQL databases) where data relationships
are well-defined.
3. Ease of Analysis: Structured data is easier to analyze using traditional data processing
techniques and tools due to its organized nature.
4. Schema: Structured data relies on a fixed schema that defines the data types,
relationships, and constraints.
5. Examples: Customer information in CRM systems, transaction records, inventory
data, and financial data.

Unstructured Data:

1. Format: Unstructured data lacks a predefined format or organizational structure. It

can include text, images, videos, and audio files.
2. Storage: It is stored in non-relational databases (NoSQL databases) or data lakes,
which are designed to handle diverse data types.
3. Ease of Analysis: Analyzing unstructured data is more complex and often requires
advanced techniques such as natural language processing (NLP), machine learning,
and image recognition.
4. Schema: Unstructured data does not have a fixed schema, making it more flexible but
also more challenging to manage.
5. Examples: Emails, social media posts, documents, multimedia content, and sensor
data from IoT devices.

(b) Difference between Text and Big Data

Text Data:
1. Definition: Text data refers to data that is in textual form, including written or printed
words. It is typically unstructured and can be found in documents, emails, social
media posts, and web pages.
2. Volume: The volume of text data can vary from small to large datasets, but it does not
necessarily encompass the vast scale associated with big data.
3. Analysis Techniques: Text data analysis involves techniques like text mining, natural
language processing (NLP), sentiment analysis, and keyword extraction.
4. Sources: Text data is generated from sources such as books, articles, emails, social
media, and chat logs.
5. Tools: Tools for text data analysis include NLP libraries (like NLTK and SpaCy), text
mining software, and sentiment analysis tools.

Big Data:

1. Definition: Big data refers to extremely large and complex datasets that cannot be
easily managed or processed with traditional data processing tools. It encompasses
structured, unstructured, and semi-structured data.
2. Volume: Big data involves vast volumes of data that are continuously generated at
high velocity. It is characterized by the three Vs: Volume, Velocity, and Variety.
3. Analysis Techniques: Analyzing big data requires advanced analytics, including
machine learning, artificial intelligence, distributed computing, and big data
frameworks like Hadoop and Spark.
4. Sources: Big data sources are diverse and include transactional systems, social media,
IoT devices, sensors, logs, and multimedia content.
5. Tools: Tools for big data processing and analysis include Hadoop, Spark, NoSQL
databases (like MongoDB and Cassandra), data warehouses, and big data analytics
platforms (like Apache Flink and Google BigQuery).

Question-2 (a) What is a word cloud? What are some libraries that you need to import to
create a word cloud in python?

(b) What is a naive Bayes classifier and how does it work in text classification?

Answer-2 Word Cloud: A word cloud is a visual representation of text data where the size
of each word indicates its frequency or importance within the text. It helps in identifying the
most prominent terms in a body of text and can be a powerful tool for data visualization and
text analysis. Words that appear more frequently in the source text are displayed in larger
fonts, while less frequent words are shown in smaller fonts.

Libraries for Creating Word Clouds in Python: To create a word cloud in Python, you
typically need to import the following libraries:

1. wordcloud: This is the primary library for generating word clouds. It provides
functions to create and customize word clouds from text data.
2. matplotlib: This library is used for plotting and visualizing the word cloud.
3. numpy (optional): Useful for handling arrays and numerical data, often used for
image masking when creating custom-shaped word clouds.
4. PIL (Python Imaging Library) or its fork Pillow: Used for image manipulation,
such as creating masks or adding color to the word cloud.
(b) Naive Bayes Classifier and Its Working in Text Classification

Naive Bayes Classifier: The Naive Bayes classifier is a probabilistic machine learning
algorithm based on Bayes' Theorem, used primarily for classification tasks. It is called
"naive" because it assumes that the features (in this case, words) are independent of each
other, which is rarely true in real-world data but simplifies the computation significantly.

How Naive Bayes Works in Text Classification: In text classification, the Naive Bayes
classifier is commonly used due to its simplicity and effectiveness. It works by calculating
the probability of each category (or class) given the words in a document. The category with
the highest probability is then assigned to the document.

Steps Involved:

1. Training Phase:
o Calculate Prior Probabilities: Determine the prior probability of each class
based on the training data. This is the probability of any document belonging
to a specific class.
o Calculate Likelihoods: For each word in the vocabulary, calculate the
likelihood of that word given each class. This involves counting the frequency
of each word in documents of a particular class and normalizing it by the total
number of words in that class.

2. Prediction Phase:
o Calculate Posterior Probabilities: For a new document, calculate the
posterior probability for each class by combining the prior probabilities and
the likelihoods of the words in the document using Bayes' Theorem.
o Class Assignment: Assign the class with the highest posterior probability to
the document.

Mathematical Representation: P(c∣d)=P(d∣c)⋅P(c)P(d)P(c|d) = \frac{P(d|c) \cdot P(c)}

{P(d)}P(c∣d)=P(d)P(d∣c)⋅P(c) Where:

 P(c∣d)P(c|d)P(c∣d) is the posterior probability of class ccc given document ddd.

 P(d∣c)P(d|c)P(d∣c) is the likelihood of document ddd given class ccc.
 P(c)P(c)P(c) is the prior probability of class ccc.
 P(d)P(d)P(d) is the probability of document ddd (a normalizing constant).

Conclusion

Word clouds are a simple yet effective way to visualize the frequency of words in a text,
using libraries like wordcloud and matplotlib in Python. The Naive Bayes classifier, on the
other hand, is a powerful algorithm for text classification that leverages the principles of
Bayes' Theorem to predict the likelihood of different classes based on the occurrence of
words in documents. Despite its simplicity and the assumption of feature independence, it
performs remarkably well in many text classification tasks.

Question- 3 (a) What is the Machine Learning approach in sentiment analysis?

(b) What are some applications of topic modeling?

Answer- 3 (a) Machine Learning Approach in Sentiment Analysis

Sentiment Analysis: Sentiment analysis, also known as opinion mining, is the process of
using natural language processing (NLP), text analysis, and computational linguistics to
identify and extract subjective information from text. The goal is to determine the sentiment
expressed in a text, whether it is positive, negative, or neutral.

Machine Learning Approach: The machine learning approach to sentiment analysis

involves training algorithms on labeled datasets where the sentiment is predefined. These
algorithms learn patterns and features associated with different sentiments and apply this
knowledge to new, unseen data.

Steps Involved:

1. Data Collection:
o Gather a large corpus of text data from sources such as social media, reviews,
blogs, and forums. This data should be labeled with sentiments (e.g., positive,
negative, neutral).

2. Data Preprocessing:
o Text Cleaning: Remove noise such as punctuation, special characters, and
stop words.
o Tokenization: Split text into individual words or tokens.
o Normalization: Convert text to lowercase, stem or lemmatize words to their
base forms.

3. Feature Extraction:
o Bag of Words (BoW): Represent text by the frequency of words appearing in
the document.
o TF-IDF (Term Frequency-Inverse Document Frequency): Weigh the
importance of words based on their frequency in a document relative to the
entire corpus.
o Word Embeddings: Use pre-trained embeddings like Word2Vec or GloVe to
capture semantic meaning.

4. Model Training:
o Algorithms: Train machine learning models such as Logistic Regression,
Naive Bayes, Support Vector Machines (SVM), or advanced deep learning
models like Recurrent Neural Networks (RNN) and Transformers.
o Training: Split the data into training and testing sets and train the model on
the training data while validating on the test data.

5. Model Evaluation:
o Evaluate the model using metrics like accuracy, precision, recall, F1-score,
and AUC-ROC to assess its performance.

6. Prediction:
o Use the trained model to predict the sentiment of new, unseen text data.
(b) Applications of Topic Modelling

Topic Modelling: Topic modelling is a type of statistical model used to discover the abstract
topics that occur in a collection of documents. It helps in identifying hidden patterns in the
text and clustering documents based on topics. Latent Dirichlet Allocation (LDA) is one of
the most commonly used algorithms for topic modelling.

Applications:

1. Content Recommendation:
o Application: Streaming services like Netflix and Spotify use topic modelling
to recommend content to users. By analyzing the topics of movies, TV shows,
or music, they can suggest similar content that matches the user's preferences.
o Example: A user who watches a lot of science fiction movies might get
recommendations for new sci-fi releases based on topic analysis of their
viewing history.

2. Document Classification:
o Application: Topic modelling helps in classifying documents into predefined
categories. This is useful in organizing large repositories of documents, such
as news articles, research papers, and legal documents.
o Example: Classifying news articles into categories like sports, politics,
technology, and entertainment based on the topics they discuss.

3. Trend Analysis:
o Application: Analyzing social media posts, blogs, and news articles over time
to identify emerging trends and public opinions.
o Example: Businesses can use topic modelling to detect changes in consumer
sentiment and preferences, allowing them to adapt their marketing strategies
accordingly.

4. Customer Feedback Analysis:

o Application: Understanding customer feedback from reviews, surveys, and
support tickets to identify common issues and areas for improvement.
o Example: An e-commerce company can use topic modelling to analyze
product reviews and determine the most frequently mentioned problems, such
as delivery delays or product defects.

5. Academic Research:
o Application: Helping researchers organize and analyze large volumes of
academic papers by identifying the main topics of research and clustering
related papers together.
o Example: A researcher studying climate change can use topic modelling to
find and group papers discussing similar subtopics, such as carbon emissions,
renewable energy, and climate policy.

Conclusion:

The machine learning approach to sentiment analysis involves data preprocessing, feature
extraction, model training, and evaluation, allowing for accurate prediction of sentiments in
text data. Topic modelling, on the other hand, has diverse applications across content
recommendation, document classification, trend analysis, customer feedback analysis, and
academic research, enabling organizations and individuals to extract meaningful insights
from large text datasets.

Assignment Set – 2
Question- 4 (a) What is Fast Fourier Transform (FFT)?

(b) What is audio data preprocessing in machine learning?

Answer-4 (a) What is Fast Fourier Transform (FFT)?

The Fast Fourier Transform (FFT) is an efficient algorithm used to compute the Discrete
Fourier Transform (DFT) and its inverse. Fourier Transform is a mathematical technique that
transforms a function of time (or space) into a function of frequency. In the context of digital
signal processing, the DFT converts a sequence of complex numbers into another sequence of
complex numbers, representing the signal in the frequency domain.

Key Concepts:

⋅e − i⋅2πkn/N, where NNN is the number of points, xnx_nxn is the time-domain signal, and
Discrete Fourier Transform (DFT): The DFT is defined by the formula: Xk = ∑n = 0N− 1 xn

XkX_kXk is the frequency-domain representation.

 Efficiency: Direct computation of the DFT requires O(N2) O(N2) O(N2) operations,
where NNN is the number of data points. FFT reduces this complexity to
O(Nlog⁡N)O(N \log N)O(NlogN), making it much faster and practical for large
datasets.

Applications:

 Signal Processing: FFT is widely used in audio, image, and speech processing to
analyze the frequency components of signals.
 Communication Systems: Used in modulation, demodulation, and signal
compression techniques.
 Biomedical Engineering: Analyzing the frequency content of EEG, ECG, and other
biomedical signals.
 Astronomy: Processing radio signals from space.

(b) What is Audio Data Preprocessing in Machine Learning?

Audio data preprocessing is a critical step in preparing raw audio signals for machine
learning tasks. It involves transforming the audio data into a format that can be effectively
used by machine learning algorithms. The primary goals of audio preprocessing are to reduce
noise, extract relevant features, and normalize the data.

Steps in Audio Data Preprocessing:

1. Loading and Resampling:

o Loading: Audio data is typically loaded using libraries like librosa or pydub.
o Resampling: Standardizing the sample rate ensures consistency across
different audio files. Common sample rates are 16 kHz or 44.1 kHz.

2. Noise Reduction:
o Filtering: Applying filters to remove background noise and unwanted
frequencies.
o Spectral Gating: Reducing noise based on the spectral properties of the audio
signal.

3. Segmentation:
o Silence Removal: Cutting out silent sections of the audio to focus on the
meaningful parts.
o Framing: Dividing the audio signal into short frames (e.g., 20-40
milliseconds) for analysis.

4. Feature Extraction:
o Time-Domain Features: Extracting features like zero-crossing rate
(frequency of sign changes) and energy (signal strength).
o Frequency-Domain Features: Using FFT to transform the signal and extract
features like spectral centroid (brightness), spectral bandwidth (spread), and
Mel-Frequency Cepstral Coefficients (MFCCs), which are particularly useful
for speech and audio recognition.
o Temporal Features: Extracting features that capture the changes over time,
such as delta and delta-delta MFCCs.

5. Normalization:
o Scaling: Normalizing the amplitude of the audio signal to a standard range
(e.g., between -1 and 1) to ensure consistency.
o Standardization: Adjusting the mean and variance of features to improve the
performance of machine learning models.

6. Data Augmentation:
o Techniques: Applying transformations like pitch shifting, time stretching, and
adding background noise to increase the diversity of the training data.

Applications:

 Speech Recognition: Converting spoken language into text.

 Music Classification: Categorizing music by genre, mood, or artist.
 Speaker Identification: Recognizing individuals based on their voice.
 Sound Event Detection: Identifying specific sounds within an audio clip, such as
sirens or dog barks.

Question- 5 (a) What are the benefits of using histogram equalization?

(b) What is the advantage of using a CNN for image classification?

Answer- 5 (a) Benefits of Using Histogram Equalization

Histogram Equalization: Histogram equalization is a technique in image processing used to

improve the contrast of an image. This method adjusts the intensity distribution of an image
to span a broader range, enhancing its visual quality. The process involves redistributing the
image's histogram so that the output image has a more uniform histogram, which generally
improves the visibility of features in the image.

Benefits:

1. Enhanced Contrast:
o Histogram equalization increases the global contrast of images, especially
when the usable data of the image is represented by close contrast values. By
stretching out the intensity range, it makes the dark regions darker and bright
regions brighter, thus improving the overall visibility.

2. Better Feature Representation:

o By improving the contrast, histogram equalization can make important
features more discernible. This is particularly useful in medical imaging (e.g.,
X-rays, MRI scans) where subtle details need to be highlighted for accurate
diagnosis.

3. Improved Detail Visibility:

o Details in shadowed or highlighted regions become more visible. This is
beneficial in applications like satellite imagery, where enhancing details can
lead to better interpretation of geographical data.

4. Uniform Histogram:
o The process aims to produce a uniform histogram, which means the pixel
intensity values are evenly distributed. This can lead to better performance in
various computer vision tasks since the dynamic range of the pixel values is
maximized.

5. Preprocessing for Further Analysis:

o Histogram equalization is often used as a preprocessing step in image
processing and computer vision tasks. By normalizing the intensity
distribution, it prepares the image for subsequent analysis, such as edge
detection, object recognition, and image segmentation, improving the accuracy
and robustness of these tasks.

(b) Advantages of Using a Convolutional Neural Network (CNN) for Image Classification

Convolutional Neural Networks (CNNs): CNNs are a class of deep neural networks
specifically designed for processing structured grid data, such as images. They have proven
highly effective for various image-related tasks, including image classification, object
detection, and segmentation.

Advantages:

1. Automatic Feature Extraction:

o CNNs automatically learn and extract features from raw image data. Unlike
traditional methods that require manual feature extraction, CNNs use
convolutional layers to learn spatial hierarchies of features, such as edges,
textures, and shapes, through backpropagation.

2. Spatial Hierarchies:
o The convolutional layers in CNNs detect low-level features (e.g., edges and
textures) in the initial layers and higher-level features (e.g., objects and
shapes) in deeper layers. This hierarchical feature extraction is highly effective
for recognizing complex patterns in images.

3. Parameter Sharing:
o Convolutional layers use the same weights (filters) across different regions of
the image, significantly reducing the number of parameters compared to fully
connected layers. This parameter sharing makes CNNs more efficient and less
prone to overfitting, especially when dealing with large images.

4. Translation Invariance:
o CNNs inherently possess translation invariance due to their convolutional and
pooling operations. This means they can recognize objects regardless of their
position in the image. Pooling layers further enhance this property by
downsampling the feature maps, making the network robust to spatial
variations.

5. Reduction in Computational Complexity:

o The local connectivity of convolutional layers reduces the computational
complexity by focusing on small regions of the input image at a time. This
makes CNNs more efficient and scalable to larger and deeper networks,
enabling the processing of high-resolution images.

Applications:

 Image Classification: Categorizing images into predefined classes, such as

identifying animals, vehicles, or landmarks.
 Object Detection: Identifying and localizing objects within an image, useful in
applications like autonomous driving and security surveillance.
 Image Segmentation: Partitioning an image into meaningful segments, used in
medical imaging and scene understanding.

Question- 6 (a) What are some common techniques used for video classification?

(b) What is the difference between feature extraction and feature selection?

Answer- 6 (a) Common Techniques Used for Video Classification

Video Classification: Video classification involves categorizing video clips into predefined
categories based on their content. This task is more complex than image classification due to
the temporal dimension of videos, requiring methods that can capture both spatial and
temporal information.
Common Techniques:

1. Convolutional Neural Networks (CNNs):

o 2D CNNs: Used to extract spatial features from individual frames of the
video. These networks treat each frame as a separate image, and features are
extracted independently from each frame.
o 3D CNNs: Extend 2D CNNs to the temporal dimension, allowing
simultaneous extraction of spatial and temporal features. 3D convolutions are
applied to a sequence of frames, capturing motion information effectively.
o Example: A 3D CNN could analyze a sequence of frames from a sports video
to classify the type of sport.

2. Recurrent Neural Networks (RNNs):

o LSTM (Long Short-Term Memory): A type of RNN designed to capture
long-term dependencies in sequential data. LSTMs can process frame-level
features extracted by CNNs, learning the temporal dynamics of the video.
o GRU (Gated Recurrent Unit): A simplified version of LSTM, also used to
handle temporal dependencies in video sequences.
o Example: LSTMs can be used to analyze a sequence of actions in a cooking
video to classify the recipe.

3. Two-Stream Networks:
o Spatial Stream: Processes spatial information from video frames using a 2D
CNN.
o Temporal Stream: Captures motion information using optical flow or
temporal differences between frames, often processed by another 2D CNN.
o Fusion: The outputs from both streams are fused to make the final
classification decision.
o Example: Two-stream networks can classify human activities in surveillance
videos by combining appearance and motion information.

4. Transformers:
o Self-Attention Mechanism: Transformers, originally designed for natural
language processing, have been adapted for video classification. They use self-
attention mechanisms to capture relationships between different parts of the
video sequence.
o Example: Vision transformers can process long video sequences by attending
to important frames and actions, classifying videos based on learned
representations.

5. Hybrid Models:
o Combination: These models combine CNNs for spatial feature extraction and
RNNs or transformers for temporal modeling. This approach leverages the
strengths of both types of networks.
o Example: A hybrid model might use a CNN to extract features from frames
and an LSTM to capture temporal dependencies, effectively classifying
complex video content like movie genres.
(b) Difference Between Feature Extraction and Feature Selection

Feature Extraction: Feature extraction involves transforming raw data into a set of
meaningful features that can be used for machine learning tasks. The goal is to create new
features that represent the data's important characteristics, often reducing dimensionality
while retaining critical information.

Key Points:

 Transformation: Converts raw data into informative and non-redundant features.

 Dimensionality Reduction: Reduces the number of features by combining or
transforming original features into a smaller set of new features.
 Methods: Includes techniques like Principal Component Analysis (PCA),
Independent Component Analysis (ICA), and autoencoders.
 Example: In image processing, feature extraction might involve extracting edges,
textures, or shapes from images to create a set of descriptive features.

Feature Selection: Feature selection involves selecting a subset of relevant features from the
original dataset. The goal is to improve model performance by removing irrelevant or
redundant features, thus simplifying the model and reducing overfitting.

Key Points:

 Subset Selection: Chooses a subset of original features without transforming them.

 Relevance: Focuses on selecting features that are most relevant to the target variable.
 Methods: Includes techniques like filter methods (e.g., correlation coefficient scores),
wrapper methods (e.g., recursive feature elimination), and embedded methods (e.g.,
regularization techniques like LASSO).
 Example: In a dataset with numerous attributes, feature selection might involve
choosing only the most important attributes, such as age, income, and education level,
for predicting credit risk.

MCQ-402 - Unstructured Data Analysis
No ratings yet
MCQ-402 - Unstructured Data Analysis
20 pages
Restaurant Review Production Analysis Using Python
No ratings yet
Restaurant Review Production Analysis Using Python
33 pages
Text Data Analysis and Visualization Techniques
No ratings yet
Text Data Analysis and Visualization Techniques
22 pages
Sentiment Analysis for Tech Students
No ratings yet
Sentiment Analysis for Tech Students
5 pages
Analyzing Sentiment Using IMDb Dataset
No ratings yet
Analyzing Sentiment Using IMDb Dataset
4 pages
Ajay PD Yadav
No ratings yet
Ajay PD Yadav
7 pages
Sentiment Analysis To Measure The Users Opinion by Using Machine Learning Techniques
No ratings yet
Sentiment Analysis To Measure The Users Opinion by Using Machine Learning Techniques
15 pages
02.MOUDLE 5 - Text Mining
No ratings yet
02.MOUDLE 5 - Text Mining
27 pages
BDA Module-5b Text Mining
No ratings yet
BDA Module-5b Text Mining
23 pages
YOUTUBE SENTEMENT ANALYSIS (Major Project mp11)
No ratings yet
YOUTUBE SENTEMENT ANALYSIS (Major Project mp11)
40 pages
System For Sentiment Analysis of Big Text Data
No ratings yet
System For Sentiment Analysis of Big Text Data
4 pages
Minor Project Presentation
No ratings yet
Minor Project Presentation
16 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
05b.BDA (18CS72) Module-5 Text Mining
No ratings yet
05b.BDA (18CS72) Module-5 Text Mining
23 pages
My Review On Sentiment Analysis
No ratings yet
My Review On Sentiment Analysis
10 pages
Lec # 8
No ratings yet
Lec # 8
23 pages
08 05 Lessonarticle
No ratings yet
08 05 Lessonarticle
4 pages
Research Ashish
No ratings yet
Research Ashish
7 pages
Wa0007.
No ratings yet
Wa0007.
8 pages
Bcse206l FDS Module-4 Smsatapathy
No ratings yet
Bcse206l FDS Module-4 Smsatapathy
50 pages
Hotel Sentiment Analysis via LSTM
No ratings yet
Hotel Sentiment Analysis via LSTM
13 pages
Effective Sentiment Analysis of Twitter With Apache Spark
No ratings yet
Effective Sentiment Analysis of Twitter With Apache Spark
8 pages
Sentiment Analysis PDF
No ratings yet
Sentiment Analysis PDF
4 pages
Data Mining Techniques Guide
No ratings yet
Data Mining Techniques Guide
61 pages
RES Presentation
No ratings yet
RES Presentation
21 pages
NLP Unit 6
No ratings yet
NLP Unit 6
16 pages
BERT for Social Media Sentiment Analysis
No ratings yet
BERT for Social Media Sentiment Analysis
34 pages
Twitter Sentiment Analysis with Python
No ratings yet
Twitter Sentiment Analysis with Python
14 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
Text Analysis with MonkeyLearn
No ratings yet
Text Analysis with MonkeyLearn
46 pages
Types of Data Represented As Strings
No ratings yet
Types of Data Represented As Strings
2 pages
Unit 5
No ratings yet
Unit 5
9 pages
Module4 TextAnalytics
No ratings yet
Module4 TextAnalytics
9 pages
Text Classification
No ratings yet
Text Classification
7 pages
Lecture 2 Guide To Text Analytics Techniques
No ratings yet
Lecture 2 Guide To Text Analytics Techniques
14 pages
Text Analytics and Sentiment Analysis Guide
No ratings yet
Text Analytics and Sentiment Analysis Guide
10 pages
Twitter Sentiment Analysis Using Naive Bayes
No ratings yet
Twitter Sentiment Analysis Using Naive Bayes
3 pages
Top 10 NLP Question - Answer
No ratings yet
Top 10 NLP Question - Answer
16 pages
Seminar Report (SA)
No ratings yet
Seminar Report (SA)
24 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Formation of Smart Sentiment Analysis Technique For Big Data
No ratings yet
Formation of Smart Sentiment Analysis Technique For Big Data
8 pages
Data Mining and Sentiment Analysis: A Seminar Report On
No ratings yet
Data Mining and Sentiment Analysis: A Seminar Report On
39 pages
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
Sentimental Analysis of Twitter Using Emoji: A Creative and Innovative Project Report
No ratings yet
Sentimental Analysis of Twitter Using Emoji: A Creative and Innovative Project Report
19 pages
Mini Project
No ratings yet
Mini Project
16 pages
Semantic Processing for Data Scientists
No ratings yet
Semantic Processing for Data Scientists
10 pages
Business Intelligence and Anlytics UNIT 2
No ratings yet
Business Intelligence and Anlytics UNIT 2
35 pages
MP 1
No ratings yet
MP 1
14 pages
### Seminar Report
No ratings yet
### Seminar Report
12 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
6 pages
Sentiment Analysis for Students
No ratings yet
Sentiment Analysis for Students
26 pages
Sentiment Analysis Using Twitter Data
No ratings yet
Sentiment Analysis Using Twitter Data
7 pages
Engineering Reports - 2022 - Omuya - Sentiment Analysis On Social Media Tweets Using Dimensionality Reduction and Natural
No ratings yet
Engineering Reports - 2022 - Omuya - Sentiment Analysis On Social Media Tweets Using Dimensionality Reduction and Natural
14 pages
Addressing Sentiment Analysis Challenges
No ratings yet
Addressing Sentiment Analysis Challenges
8 pages
Twitter Sentiment Analysis
100% (2)
Twitter Sentiment Analysis
10 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
42 pages
Module Code & Module Title CU6051NA - Artificial Intelligence
No ratings yet
Module Code & Module Title CU6051NA - Artificial Intelligence
23 pages
Authoring Guidelines Version 36 April 2025 1
No ratings yet
Authoring Guidelines Version 36 April 2025 1
29 pages
Full Document Translation v2.7 3b85de
No ratings yet
Full Document Translation v2.7 3b85de
15 pages
POC Annotation Checklist
No ratings yet
POC Annotation Checklist
2 pages
G1 Project Details Compressed
No ratings yet
G1 Project Details Compressed
40 pages
Namrita Mishra Resume
No ratings yet
Namrita Mishra Resume
2 pages
POC Annotation Checklist Full v4
No ratings yet
POC Annotation Checklist Full v4
2 pages
Marksheet Semester 1 SD 383207
No ratings yet
Marksheet Semester 1 SD 383207
2 pages
Dads401-Advanced Machine Learning
No ratings yet
Dads401-Advanced Machine Learning
13 pages
Multi-Domain SER for Researchers
No ratings yet
Multi-Domain SER for Researchers
6 pages
Detection of Schizophrenia From EEG Signals Using Selected Statistical Moments of MFC Coefficients and Ensemble Learning
No ratings yet
Detection of Schizophrenia From EEG Signals Using Selected Statistical Moments of MFC Coefficients and Ensemble Learning
22 pages
Anomaly Detection of Deepfake Audio Based On Real Audio Using Generative Adversarial Network Model
No ratings yet
Anomaly Detection of Deepfake Audio Based On Real Audio Using Generative Adversarial Network Model
16 pages
Measuring Neuropsychiatric Symptoms in Patients With Early Cognitive Decline Using Speech Analysis
No ratings yet
Measuring Neuropsychiatric Symptoms in Patients With Early Cognitive Decline Using Speech Analysis
7 pages
?voice Spoofing Detection For Multiclass Attack Classification Using Deep-1
No ratings yet
?voice Spoofing Detection For Multiclass Attack Classification Using Deep-1
16 pages
Course File Format
No ratings yet
Course File Format
32 pages
Audio Analysis in Healthcare Using ML
No ratings yet
Audio Analysis in Healthcare Using ML
74 pages
Voice-Controlled Robot Vehicle
No ratings yet
Voice-Controlled Robot Vehicle
8 pages
SNS - Final Project Report
No ratings yet
SNS - Final Project Report
19 pages
ASVspoof The Automatic Speaker Verificat
No ratings yet
ASVspoof The Automatic Speaker Verificat
19 pages
Multi-modal Fusion for Emotion Recognition
No ratings yet
Multi-modal Fusion for Emotion Recognition
10 pages
Voice Emotion Recognition
No ratings yet
Voice Emotion Recognition
11 pages
Pad Assignment 2
No ratings yet
Pad Assignment 2
12 pages
Emotion Recognition From Speech Using Telugu Language and Deep Learning Models
No ratings yet
Emotion Recognition From Speech Using Telugu Language and Deep Learning Models
6 pages
A Novel Speech-Driven Lip-Sync Model With CNN and LSTM
No ratings yet
A Novel Speech-Driven Lip-Sync Model With CNN and LSTM
6 pages
An Overview of Noise-Robust Automatic Speech Recognition
No ratings yet
An Overview of Noise-Robust Automatic Speech Recognition
33 pages
Review 3 PPT Final1)
No ratings yet
Review 3 PPT Final1)
51 pages
Tao Zhang
No ratings yet
Tao Zhang
40 pages
189 Paper
No ratings yet
189 Paper
6 pages
Project - I Review-2 Report SAMPLE
No ratings yet
Project - I Review-2 Report SAMPLE
16 pages
Classifying Maqams of Quranic Recitations Using D
No ratings yet
Classifying Maqams of Quranic Recitations Using D
11 pages
AI-based Arabic Language and Speech Tutor
No ratings yet
AI-based Arabic Language and Speech Tutor
8 pages
Speech Processing Papers
No ratings yet
Speech Processing Papers
4 pages
Music Data
No ratings yet
Music Data
30 pages
IT Report-1
No ratings yet
IT Report-1
14 pages
HC Zuwoli
No ratings yet
HC Zuwoli
23 pages
CSEIT172535
No ratings yet
CSEIT172535
9 pages
A Review On Automatic Speech Recognition Architect
No ratings yet
A Review On Automatic Speech Recognition Architect
13 pages
A Practical Deep Learning-Based Acoustic Side
No ratings yet
A Practical Deep Learning-Based Acoustic Side
21 pages
Dialect Recognition System For Bagri Rajasthani Language Using Optimized Featured Swarm Convolutional Neural Network (Ofscnn) Model
No ratings yet
Dialect Recognition System For Bagri Rajasthani Language Using Optimized Featured Swarm Convolutional Neural Network (Ofscnn) Model
20 pages

Dads402-Unstructured Data Analysis

Uploaded by

Dads402-Unstructured Data Analysis

Uploaded by

ASSIGNMENT

SESSION FEBRUARY - MARCH 2024

1. Format: Structured data is organized in a predefined format, typically in rows and

1. Format: Unstructured data lacks a predefined format or organizational structure. It

(b) Difference between Text and Big Data

Mathematical Representation: P(c∣d)=P(d∣c)⋅P(c)P(d)P(c|d) = \frac{P(d|c) \cdot P(c)}

 P(c∣d)P(c|d)P(c∣d) is the posterior probability of class ccc given document ddd.

Question- 3 (a) What is the Machine Learning approach in sentiment analysis?

Answer- 3 (a) Machine Learning Approach in Sentiment Analysis

Machine Learning Approach: The machine learning approach to sentiment analysis

4. Customer Feedback Analysis:

(b) What is audio data preprocessing in machine learning?

Answer-4 (a) What is Fast Fourier Transform (FFT)?

XkX_kXk is the frequency-domain representation.

(b) What is Audio Data Preprocessing in Machine Learning?

Steps in Audio Data Preprocessing:

1. Loading and Resampling:

 Speech Recognition: Converting spoken language into text.

Question- 5 (a) What are the benefits of using histogram equalization?

(b) What is the advantage of using a CNN for image classification?

Histogram Equalization: Histogram equalization is a technique in image processing used to

2. Better Feature Representation:

3. Improved Detail Visibility:

5. Preprocessing for Further Analysis:

1. Automatic Feature Extraction:

5. Reduction in Computational Complexity:

 Image Classification: Categorizing images into predefined classes, such as

Answer- 6 (a) Common Techniques Used for Video Classification

1. Convolutional Neural Networks (CNNs):

2. Recurrent Neural Networks (RNNs):

 Transformation: Converts raw data into informative and non-redundant features.

 Subset Selection: Chooses a subset of original features without transforming them.

You might also like