BCA Data visualization and analysis by Vamshi NM
✅ Chapters to Cover:
1. Fundamentals of Data
2. Relational Data in Azure
3. Non-Relational Data in Azure
4. Introduction to Data Analytics
5. Data Wrangling and Cleaning
6. Exploratory Data Analysis
7. Statistical Methods for Data Analysis
8. Introduction to Data Visualization
9. Machine Learning
10. Tools: Python, R, Excel, Tableau, Power BI
⚙️ PART 1: (Questions 1–30)
Covering: Fundamentals of Data & Relational Data in Azure
Chapter 1: Fundamentals of Data
1. What is data?
A) A type of software
B) Raw facts and figures
C) Processed information
D) None of the above
✅ Answer: B
2. What does ‘Big Data’ refer to?
A) Data stored in hard drives
B) Very small datasets
C) Extremely large and complex datasets
D) Pictures and videos only
✅ Answer: C
3. Which of these is structured data?
A) Image
B) Video
C) Excel sheet
D) MP3 file
✅ Answer: C
4. Which term refers to meaningful information derived from data?
A) Raw data
B) Insight
C) Noise
D) Format
✅ Answer: B
5. What are the main types of data?
A) Structured, Semi-structured, Unstructured
B) Binary and Text
C) Fixed and Variable
D) Numeric and Character
✅ Answer: A
6. What is metadata?
A) Data about data
B) Redundant data
C) Formatted data
D) Numeric data
✅ Answer: A
7. Which of the following is NOT a characteristic of data?
A) Volume
B) Velocity
C) Validation
D) Variety
✅ Answer: C
8. The process of discovering patterns in large data sets is called:
A) Data Mining
B) Data Logging
C) Data Logging
D) Data Transfer
✅ Answer: A
9. Which is an example of a semi-structured data format?
A) CSV
B) XML
C) Excel
D) MySQL
✅ Answer: B
10. The term “data lifecycle” refers to:
A) Life of a computer
B) Sequence of stages in data management
C) Hardware operation
D) Software updates
✅ Answer: B
Chapter 2: Relational Data in Azure
11. What is a relational database?
A) Database that stores data in a tree format
B) Stores data in tables with rows and columns
C) A group of unstructured files
D) None of the above
✅ Answer: B
12. Which query language is used for relational databases?
A) NoSQL
B) SQL
C) XML
D) JSON
✅ Answer: B
13. What is Azure SQL Database?
A) A local storage system
B) A Microsoft cloud-based relational database service
C) A type of RAM
D) An operating system
✅ Answer: B
14. What is a primary key in a relational database?
A) A duplicate column
B) Unique identifier for a row
C) A password for database
D) Temporary data
✅ Answer: B
15. Which Azure service is best for large-scale relational data storage?
A) Azure Blob
B) Azure Virtual Machine
C) Azure SQL Database
D) Azure App Service
✅ Answer: C
16. What is a foreign key?
A) Key used outside the database
B) Used to link two tables
C) A primary key
D) A column used in Excel
✅ Answer: B
17. Which feature is provided by Azure SQL Database?
A) Automatic backups
B) Version control
C) Manual query optimization only
D) No security
✅ Answer: A
18. What does ACID stand for in relational databases?
A) Atomicity, Consistency, Isolation, Durability
B) Action, Control, Input, Delivery
C) Accuracy, Cleanliness, Integrity, Data
D) None of the above
✅ Answer: A
19. Which of the following is NOT a relational database in Azure?
A) Azure SQL
B) Azure Cosmos DB
C) SQL Server on VM
D) Azure Database for MySQL
✅ Answer: B
20. What is normalization?
A) Adding redundancy to data
B) Removing duplicate data and organizing efficiently
C) Encrypting data
D) Deleting data
✅ Answer: B
✅ Chapters:
• 3. Non-Relational Data in Azure
• 4. Introduction to Data Analytics
Chapter 3: Fundamentals of Non-Relational Data in Azure
31. What is non-relational data?
A) Data in tables with rows and columns
B) Data that doesn't follow tabular schema
C) Only audio files
D) Video-only data
✅ Answer: B
32. Which of the following is a non-relational database?
A) MySQL
B) SQL Server
C) Azure Cosmos DB
D) Oracle
✅ Answer: C
33. What type of database is Azure Cosmos DB?
A) Key-value store only
B) Only supports SQL
C) Globally distributed multi-model NoSQL database
D) Flat-file system
✅ Answer: C
34. Which is a type of non-relational database?
A) Document store
B) Column store
C) Key-value store
D) All of the above
✅ Answer: D
35. NoSQL stands for:
A) Not Only SQL
B) New Operating SQL
C) Non-operational SQL
D) Normal SQL
✅ Answer: A
36. A document database stores data in:
A) Tables
B) JSON/XML-like structures
C) Flat files
D) Strings
✅ Answer: B
37. Which is not a benefit of NoSQL databases?
A) High scalability
B) Schema flexibility
C) Joins between tables
D) Fast read/write
✅ Answer: C
38. What does "eventual consistency" mean in NoSQL?
A) Data always consistent
B) Data is never consistent
C) Data becomes consistent over time
D) Data deleted regularly
✅ Answer: C
39. Azure Cosmos DB supports which APIs?
A) SQL
B) MongoDB
C) Cassandra
D) All of the above
✅ Answer: D
40. Which NoSQL database type is ideal for hierarchical data?
A) Graph database
B) Key-value store
C) Column family
D) None
✅ Answer: A
Chapter 4: Introduction to Data Analytics
41. What is data analytics?
A) Deleting data
B) Using formulas to clean data
C) Analyzing raw data to draw conclusions
D) Encrypting data
✅ Answer: C
42. What is descriptive analytics?
A) Tells what happened
B) Tells what will happen
C) Prescribes actions
D) Deletes data
✅ Answer: A
43. Predictive analytics is used to:
A) Visualize past events
B) Predict future outcomes
C) Perform backups
D) Increase storage
✅ Answer: B
44. Which type of analytics suggests actions to take?
A) Descriptive
B) Diagnostic
C) Predictive
D) Prescriptive
✅ Answer: D
45. Which step comes first in data analytics lifecycle?
A) Data collection
B) Model building
C) Data cleaning
D) Data visualization
✅ Answer: A
46. Which of the following is a benefit of data analytics?
A) Wasting storage
B) Decision-making support
C) Slower processes
D) Data loss
✅ Answer: B
47. Diagnostic analytics focuses on:
A) What is happening
B) Why something happened
C) What to do next
D) Predicting future
✅ Answer: B
48. Data Analytics process usually starts with:
A) Reporting
B) Hypothesis testing
C) Problem definition
D) Visualization
✅ Answer: C
49. Business Intelligence (BI) is closely related to:
A) Data analytics and visualization
B) Web design
C) Hardware maintenance
D) Software testing
✅ Answer: A
50. The output of data analytics should be:
A) Raw numbers
B) Actionable insights
C) Random facts
D) Encrypted code
✅ Answer: B
51. Which is NOT a key role in analytics teams?
A) Data Scientist
B) Data Engineer
C) Web Developer
D) Business Analyst
✅ Answer: C
52. Which tool is widely used in data analytics?
A) Microsoft Word
B) Tableau
C) PowerPoint
D) Adobe Photoshop
✅ Answer: B
53. Which one helps in summarizing large data in analytics?
A) Pivot tables
B) Text box
C) Timer
D) Loops
✅ Answer: A
54. Dashboards are commonly used for:
A) Data visualization
B) Data encryption
C) Storing code
D) Cloud backup
✅ Answer: A
55. Which of the following is used for statistical analysis in analytics?
A) Python
B) Excel
C) R
D) All of the above
✅ Answer: D
56. ETL stands for:
A) Encrypt-Transfer-Learn
B) Extract-Transform-Load
C) Estimate-Test-Learn
D) Enter-Try-Launch
✅ Answer: B
57. In analytics, what is "data quality"?
A) Size of data
B) Accuracy, completeness, reliability
C) Noise
D) Duplicates
✅ Answer: B
58. What is "correlation" in analytics?
A) Linking two unrelated tables
B) Relationship between variables
C) Data export
D) Filtering only text
✅ Answer: B
59. What is the role of a data analyst?
A) Build mobile apps
B) Make decisions without data
C) Collect, clean, and analyze data
D) Only visualize data
✅ Answer: C
60. Which of the following is an example of an analytics question?
A) What time is it?
B) How do sales vary across months?
C) What is the company logo?
D) What is 2 + 2?
✅ Answer: B
✅ Chapters:
• 5. Data Wrangling and Cleaning
• 6. Exploratory Data Analysis (EDA)
Chapter 5: Data Wrangling and Cleaning
61. What is data wrangling?
A) Storing data in files
B) Converting and cleaning raw data into a usable format
C) Encrypting data
D) Copying data
✅ Answer: B
62. What is the first step in data cleaning?
A) Deleting rows
B) Data inspection
C) Data visualization
D) Data encryption
✅ Answer: B
63. Missing values in a dataset can be handled by:
A) Ignoring them
B) Replacing with mean/median/mode
C) Deleting them
D) All of the above
✅ Answer: D
64. Which method is used to remove duplicate records?
A) Normalization
B) Drop_duplicates()
C) Encrypting
D) Merge
✅ Answer: B
65. What is data normalization?
A) Making all values zero
B) Bringing data to a common scale
C) Adding bias to data
D) Encrypting the file
✅ Answer: B
66. Which tool in Python is commonly used for cleaning data?
A) Pandas
B) Matplotlib
C) Django
D) Flask
✅ Answer: A
67. What function is used to fill missing values in Pandas?
A) drop()
B) fillna()
C) missing()
D) complete()
✅ Answer: B
68. In Excel, which function is used for removing spaces?
A) REMOVE()
B) SPACE()
C) TRIM()
D) CLEAR()
✅ Answer: C
69. What is outlier detection?
A) Finding repeated values
B) Finding errors in file names
C) Identifying values that deviate significantly from others
D) Encrypting values
✅ Answer: C
70. What is the importance of data cleaning?
A) To reduce file size
B) To ensure accuracy and consistency
C) To delete data
D) To backup files
✅ Answer: B
71. What is data type conversion?
A) Changing variable names
B) Changing data from one format to another
C) Adding values
D) Merging datasets
✅ Answer: B
72. Which function is used in Python to remove nulls?
A) delete_null()
B) dropna()
C) null_drop()
D) remove_missing()
✅ Answer: B
73. What is a common cause of dirty data?
A) Automated backups
B) Manual entry errors
C) Perfect automation
D) Proper scaling
✅ Answer: B
74. What is deduplication?
A) Encrypting files
B) Removing duplicate values
C) Normalizing data
D) Making columns bold
✅ Answer: B
75. Which visualization helps detect outliers?
A) Pie chart
B) Bar chart
C) Box plot
D) Line chart
✅ Answer: C
Chapter 6: Exploratory Data Analysis (EDA)
76. What is EDA?
A) Encrypt Data Algorithm
B) Exploratory Data Analysis
C) External Data Architecture
D) Extra Data Adjustment
✅ Answer: B
77. Which of the following is a goal of EDA?
A) Generate random values
B) Summarize data characteristics
C) Hide missing data
D) Build models
✅ Answer: B
78. What is a histogram used for?
A) Comparing categories
B) Showing data distribution
C) Showing time changes
D) Identifying outliers
✅ Answer: B
79. Which of the following is a graphical EDA technique?
A) Mean
B) Median
C) Box Plot
D) Mode
✅ Answer: C
80. What does a box plot display?
A) Only the average
B) Only the minimum value
C) Median, quartiles, and outliers
D) Text data only
✅ Answer: C
81. Which summary statistic measures central tendency?
A) Mean
B) Range
C) Variance
D) Standard deviation
✅ Answer: A
82. A scatter plot is useful to visualize:
A) One variable only
B) Frequency
C) Relationships between two variables
D) File sizes
✅ Answer: C
83. What does a high correlation coefficient (close to 1 or -1) mean?
A) Weak relationship
B) Strong relationship
C) No relationship
D) Repeated data
✅ Answer: B
84. In EDA, missing data should be:
A) Ignored
B) Always removed
C) Investigated and handled
D) Deleted automatically
✅ Answer: C
85. What is the role of visualizations in EDA?
A) Backup data
B) Make data unreadable
C) Help understand patterns and anomalies
D) Decrease data size
✅ Answer: C
86. What is variance?
A) Square of standard deviation
B) Minimum value
C) Center value
D) Data type
✅ Answer: A
87. Which tool is NOT used in EDA?
A) Matplotlib
B) Pandas
C) Tableau
D) WordPad
✅ Answer: D
88. Which Python library is best for plotting graphs in EDA?
A) NumPy
B) Seaborn
C) Django
D) Flask
✅ Answer: B
89. What is the first step in EDA?
A) Model training
B) Data cleaning
C) Data visualization
D) Data collection
✅ Answer: D
90. What does correlation coefficient range from?
A) -10 to +10
B) -1 to +1
C) 0 to 100
D) 0 to 1
✅ Answer: B
✅ Chapters:
• 7. Statistical Methods for Data Analysis
• 8. Introduction to Data Visualization
Chapter 7: Statistical Methods for Data Analysis
91. What is the mean of 4, 5, 6, 7, 8?
A) 5
B) 6
C) 7
D) 8
✅ Answer: B
92. What is the median of 2, 3, 4, 8, 9?
A) 4
B) 5
C) 6
D) 7
✅ Answer: A
93. Mode is defined as:
A) Average of all numbers
B) Most frequently occurring value
C) The middle value
D) Maximum value
✅ Answer: B
94. What does standard deviation measure?
A) Central value
B) Range
C) Spread of data from mean
D) Minimum value
✅ Answer: C
95. Variance is:
A) Square of mean
B) Square of standard deviation
C) Minimum of all values
D) Mode + median
✅ Answer: B
96. Which distribution is symmetric and bell-shaped?
A) Binomial
B) Uniform
C) Normal
D) Skewed
✅ Answer: C
97. What is the probability of getting heads in a fair coin toss?
A) 1
B) 0
C) 0.5
D) 2
✅ Answer: C
98. In statistics, hypothesis testing is used to:
A) Prove a theory
B) Guess answers
C) Make data random
D) Test assumptions using data
✅ Answer: D
99. What is a p-value?
A) Mean of population
B) Probability value to reject null hypothesis
C) A measure of mode
D) Standard deviation
✅ Answer: B
100. If p-value < 0.05, the result is usually considered:
A) Insignificant
B) Significant
C) Not useful
D) Random
✅ Answer: B
101. Which test is used to compare means of two groups?
A) T-test
B) Z-test
C) Chi-square
D) ANOVA
✅ Answer: A
102. Chi-square test is used for:
A) Numeric mean comparison
B) Variance testing
C) Categorical data relationships
D) Standard deviation
✅ Answer: C
103. Correlation coefficient between unrelated variables is:
A) 1
B) -1
C) 0
D) 100
✅ Answer: C
104. Regression analysis is used for:
A) Forecasting
B) Describing central tendency
C) Cleaning data
D) Encrypting data
✅ Answer: A
105. A scatter plot shows:
A) Trend between two variables
B) Pie data
C) Word frequency
D) File sizes
✅ Answer: A
106. Which test is used for more than two means?
A) Chi-square
B) ANOVA
C) T-test
D) Mode test
✅ Answer: B
107. What does R² value in regression indicate?
A) Strength of prediction
B) Type of data
C) Sample size
D) Mean error
✅ Answer: A
108. A high R² means:
A) Poor fit
B) Strong model performance
C) Random values
D) Wrong data
✅ Answer: B
109. Outliers can affect:
A) Mean
B) Median
C) Mode
D) Frequency
✅ Answer: A
110. What is a population in statistics?
A) A subset of data
B) Entire dataset or group
C) Random variable
D) Histogram
✅ Answer: B
Chapter 8: Introduction to Data Visualization
111. What is data visualization?
A) Encrypting data
B) Graphically representing data
C) Deleting data
D) Backup system
✅ Answer: B
112. Which tool is used for data visualization?
A) MS Word
B) Tableau
C) VLC Player
D) Paint
✅ Answer: B
113. What does a bar chart represent?
A) Trends over time
B) Categorical comparisons
C) Distribution
D) File counts
✅ Answer: B
114. Which chart is best to show parts of a whole?
A) Line chart
B) Pie chart
C) Histogram
D) Scatter plot
✅ Answer: B
115. What is a dashboard?
A) Car control system
B) Visual display of data and KPIs
C) Coding platform
D) Data backup file
✅ Answer: B
116. Which chart type is used for trends over time?
A) Pie chart
B) Line chart
C) Box plot
D) Scatter chart
✅ Answer: B
117. Heatmaps are used for:
A) Temperature only
B) Highlighting patterns with color
C) Encrypting passwords
D) Pie chart transformation
✅ Answer: B
118. Which of the following is NOT a visualization tool?
A) Excel
B) Power BI
C) Tableau
D) Notepad
✅ Answer: D
119. Which library is used in Python for visualization?
A) Flask
B) Pandas
C) Matplotlib
D) Django
✅ Answer: C
120. The purpose of visualization is to:
A) Complicate reports
B) Hide data
C) Communicate insights clearly
D) Encrypt files
✅ Answer: C
✅ Chapters:
• 9. Machine Learning
• 10. Software/Tools: Python, R, Excel, Tableau, Power BI
Chapter 9: Machine Learning
121. What is Machine Learning (ML)?
A) Writing code manually
B) Making machines learn from data automatically
C) Encrypting data
D) Formatting spreadsheets
✅ Answer: B
122. Supervised learning means:
A) Training with labeled data
B) Training without any data
C) Random guessing
D) Backup of data
✅ Answer: A
123. Unsupervised learning deals with:
A) Labeled data
B) Structured outputs
C) Unlabeled data
D) No data
✅ Answer: C
124. Classification is a type of:
A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) Image compression
✅ Answer: A
125. Clustering is used in:
A) Regression
B) Supervised learning
C) Unsupervised learning
D) SQL
✅ Answer: C
126. Which algorithm is used for classification?
A) Linear Regression
B) Decision Tree
C) K-means
D) Histogram
✅ Answer: B
127. Linear Regression is used for:
A) Predicting categories
B) Forecasting continuous values
C) Clustering
D) Classification
✅ Answer: B
128. What is overfitting in ML?
A) Model too simple
B) Model fits training data too well and fails on new data
C) Model doesn’t train
D) Model that encrypts data
✅ Answer: B
129. Which is NOT a machine learning application?
A) Fraud detection
B) Email spam filtering
C) Image recognition
D) Typing in Word
✅ Answer: D
130. What is training data?
A) Final report
B) Data used to teach the model
C) Data for charts
D) Random words
✅ Answer: B
131. K-means is used for:
A) Classification
B) Regression
C) Clustering
D) File compression
✅ Answer: C
132. Which is NOT a type of machine learning?
A) Supervised
B) Unsupervised
C) Pre-trained
D) Reinforcement
✅ Answer: C
133. Reinforcement learning involves:
A) Labeled data
B) Unsupervised data
C) Rewards and punishments for decisions
D) Static answers
✅ Answer: C
134. Which language is most commonly used in ML?
A) HTML
B) Python
C) JavaScript
D) PHP
✅ Answer: B
135. In ML, what is a model?
A) Picture file
B) Mathematical representation learned from data
C) Excel sheet
D) Error log
✅ Answer: B
Chapter 10: Software/Tools: Python, R, Excel, Tableau, Power BI
136. Python is best known for:
A) Video editing
B) Machine learning and data analysis
C) Photo editing
D) App design
✅ Answer: B
137. Which library in Python is used for numerical operations?
A) Pandas
B) NumPy
C) Seaborn
D) Flask
✅ Answer: B
138. Pandas is useful for:
A) Frontend development
B) Data manipulation and analysis
C) Video rendering
D) Data encryption
✅ Answer: B
139. R is a language mainly used for:
A) Web development
B) Statistical computing and visualization
C) Machine hardware
D) Animation
✅ Answer: B
140. What does .head() do in Pandas?
A) Shows full dataset
B) Shows first 5 rows
C) Deletes top rows
D) Renames columns
✅ Answer: B
141. Excel is used for:
A) Image creation
B) Presentations
C) Data storage, analysis, and visualization
D) Installing games
✅ Answer: C
142. In Excel, what is a pivot table used for?
A) Drawing
B) Summarizing and analyzing data
C) Formatting text
D) Writing paragraphs
✅ Answer: B
143. Tableau is mainly used for:
A) Coding
B) Data visualization
C) File conversion
D) Antivirus
✅ Answer: B
144. Tableau works on what principle?
A) Code-based
B) Drag and drop
C) Manual typing
D) Remote desktop
✅ Answer: B
145. Power BI is a tool from which company?
A) Google
B) Apple
C) Microsoft
D) IBM
✅ Answer: C
146. Power BI is used for:
A) Hacking
B) Data analysis and interactive dashboards
C) Making games
D) Coding in Java
✅ Answer: B
147. Which of the following tools supports DAX language?
A) Excel
B) Tableau
C) Power BI
D) MS Paint
✅ Answer: C
148. Which is NOT a feature of Power BI?
A) Real-time dashboards
B) Interactive visuals
C) Image editing
D) Data connectivity
✅ Answer: C
149. Which tool uses worksheets and dashboards?
A) R
B) Tableau
C) NumPy
D) Visual Studio
✅ Answer: B
150. Which tool allows you to build reports using drag & drop interface?
A) Power BI
B) CMD
C) Notepad++
D) Python
✅ Answer: A
All the best for ur Examination