1.
a) Download the dataset from
https://inclass.kaggle.com/c/si650winter11/download/training.txt and load it to the
variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
To view the first 3 rows of the dataset, which of the following commands is used?
sentiment_analysis_data.head(3)
2.In Supervised learning, class labels of the training samples are ____________
known
3.Inverse Document frequency is used in the term-document matrix.
True
4.Can we consider sentiment classification as a text classification problem?
yes
5.In document classification, each document has to be converted from full text to a
document vector.
true
6.A technique used to depict the performance in a tabular form that has 2
dimensions namely actual and predicted sets of data is ___________
Confusion Matrix
7.Which NLP technique uses a lexical knowledge base to obtain the correct base form
of the words?
lemmatization
8. a) Download the dataset from
https://inclass.kaggle.com/c/si650winter11/download/training.txt and load it to the
variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
What does the command sentiment_analysis_data['label'].value_counts() return?
The number of columns in the dataset
9. a) Download the dataset from
https://inclass.kaggle.com/c/si650winter11/download/training.txt and load it to the
variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
What command should be given to tokenize a sentence into words?
from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)
10.Which numerical statistics is used to identify the importance of a rare word in
a document?
TF-IDF
11.Which type of cross-validation is used for an imbalanced dataset?
K-Fold
12.Cross-validation causes over-fitting.
False
13.Select the pre-processing technique(s) from the following.
All the options
14.Clustering is supervised classification.
false
15. a) Download the dataset from
https://inclass.kaggle.com/c/si650winter11/download/training.txt and load it to the
variable 'sentiment_analysis_data'.
b) Give the column names as 'label' and 'message'.
c) Try out the code snippets and answer the questions.
Is there a class imbalance problem in the given data set?
Yes
16.SVM is a _____________
Supervised learning algorithm
17.In a Term Document Matrix (TDM), each row represents ____________
TF-IDF value
18.Imagine you have just finished training a decision tree for spam classification,
and it is showing abnormal bad performance on both your training and test sets.
Assume that your implementation has no bugs. What could be the reason for this
problem?
All the options
19.Which of the given hyperparameters, when increased, may cause the random forest
to overfit the data?
Depth of Tree
20.In a Document Term Matrix (DTM), each row represents
TF-IDF value
21.Email spam data is an example of __________
Unstructured data
22.