Identify the unstructured data from the following Image
What kind of classification is our case study 'Spam Detection'?Binary
Which preprocessing technique is used to remove the most commonly used words?Stopword removal
Cross-validation technique is used to evaluate a classifier by dividing the data set into training set to train
the classifier and testing set to test the same T
True Negative is when the predicted instance and the actual is positive.F
True Positive is when the predicted instance and the actual instance is not negative.T
ITPE
Data Analysis -> PreProcessing -> Model Building--> Predict
A classifer that can compute using numeric as well as categorical values is Decision Tree Classifier
print(sentiment_analysis_data['label'].unique()) 10
Which of the given hyper parameter(s), when increased may cause random forest to over fit the data?
Depth of Tree
Choose the correct sequence for classifier building from the following:Initialize -> Train - -> Predict--
>Evaluate
Clustering is a supervised classification False
Classification where each data is mapped to more than one class is called Multi Class Classification
To view the first 3 rows of the dataset, which of the following commands are used?
sentiment_analysis_data.head(3)
Imagine you have just finished training a decision tree for spam classication and it is showing abnormal
bad performance on both your training and test sets. Assume that your implementation has no bugs.
What could be reason for this problem You need to increase the learning rate.
Which NLP technique uses lexical knowledge base to obtain the correct base form of the words?
lemmatization
Which one of the following is not a classification technique?StratifiedShuffleSplit
Supervised learning differs from unsupervised learning in that supervised learning requires Labeled data
Model Tuning helps to increase the accuracy True
Identify the stop words from the following Both "the" and "it"
In a Term Document Matrix (TDM) each row represents document
TF-IDF is a freature extraction technique T
Which of the following is not a performance evaluation measure?DecisionTree
Which of the following command is used to view the dataset SIZE and what is the value returned?
sentiment_analysis_data.size,(7086, 3)
What is the purpose of lemmatization?To convert words to a proper base form
Lemmatization offers better precision than stemming T
The fit(X, y) is used to Train the Classifier
What does the command sentiment_analysis_data['label'].value_counts() return?The total count of
elements in 'label' column
Can we consider sentiment classification as a text classification problem?T
Inverse Document frequency is used in term document matrix.F
Pruning is a technique associated with SVM
email spam data is an example of Unstructured Data
Select pre-processing techniques from the options All
High classification accuracy always indicates a good classifier.F
Which type of cross validation is used for imbalanced dataset? Stratified Shuffle Split
Stemming and lemmatization gives the same result.F
Which numerical statistics is used to identify the importance of a rare word in a document? tf-idf