Q: What is data?
A: Data refers to raw facts and figures that are collected for analysis and reference. It can be in
various forms such as numbers, text, images, or audio.
Q: Which are types of Data?
A: Structured Data, Unstructured Data, Semi-structured Data.
Q: What is use of Data?
A: Data is used to make informed decisions, identify trends, improve processes, and drive strategic
planning.
Q: What is difference between structured & unstructured data?
A: Structured data is organized and easily searchable (e.g., spreadsheets), while unstructured data
lacks a predefined format (e.g., social media posts).
Q: Define Data Analytics?
A: Data Analytics is the process of examining datasets to draw conclusions about the information
they contain.
Q: What is importance of Data Analytics?
A: It helps organizations optimize performance, predict trends, make data-driven decisions, and gain
competitive advantage.
Q: Which are different types of Data Analytics?
A: Descriptive Analytics, Diagnostic Analytics, Predictive Analytics, Prescriptive Analytics.
Q: How Data Analytics used for improvement in Business?
A: Businesses use data analytics to understand customer behavior, improve operations, and
develop targeted strategies.
Q: What is Business Intelligence?
A: Technologies and strategies used for data analysis and management of business information to
support decision-making.
Q: Which are various elements of Data Analytics?
A: Data collection, Data processing, Statistical analysis, Data visualization, and Interpretation of
results.
Q: What is Big Data?
A: Extremely large datasets that may be analyzed computationally to reveal patterns, trends, and
associations.
Q: Which are Characteristics of Big Data?
A: Volume, Velocity, Variety, Veracity, Value.
Q: Which are different sources of Big Data?
A: Social media platforms, Sensors, Transactional records, Mobile devices, Web logs.
Q: What is Data Repository?
A: A central place where data is stored and maintained, like a database or data warehouse.
Q: What is difference between Data science & Business Intelligence?
A: Data Science focuses on predictive analysis while Business Intelligence focuses on descriptive
analysis.
Q: Which are minimum skills required for Data Scientist?
A: Statistical analysis, Programming, Data visualization, Machine learning, Domain knowledge.
Q: Which are various applications of Big Data Analytics?
A: Healthcare, Finance, Retail, Transportation sectors.
Q: Which are features of Python?
A: Simplicity, Readability, Extensive libraries, Cross-platform compatibility, Supports multiple
paradigms.
Q: Which are different packages of python used in data science?
A: NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn, TensorFlow.
Q: What is use of NumPy? Explain with example?
A: Used for numerical computations and handling arrays.
Example: arr = np.array([1,2,3])
Q: How to create 1D & 2D array using NumPy? Explain with example?
A: 1D: np.array([1,2,3])
2D: np.array([[1,2],[3,4]])
Q: What is use of Pandas? Explain with example?
A: Used for data manipulation and analysis.
Example: df = pd.DataFrame({'Name':['A','B'],'Age':[20,30]})
Q: How to create series using Pandas? Explain with example?
A: s = pd.Series([10,20,30], index=['a','b','c'])
Q: How to create data frame using Pandas? Explain with example?
A: df = pd.DataFrame({'Name':['Tom','Jerry'], 'Score':[90,85]})
Q: How to import & export csv file using Pandas? Explain with example?
A: df = pd.read_csv('file.csv')
df.to_csv('output.csv')
Q: How to import & export excel file using Pandas? Explain with example?
A: df = pd.read_excel('file.xlsx')
df.to_excel('output.xlsx')
Q: How to import & export SQL file using Pandas? Explain with example?
A: Use sqlite3 connection and pd.read_sql_query(), pd.to_sql().
Q: What is use of Matplotlib? Explain with example?
A: Used for data visualization.
Example: plt.plot(x,y)
Q: How to create multiple plots in same canvas using Matplotlib? Explain with example?
A: Plot multiple lines and use plt.legend().
Q: How to create Line chart using Matplotlib? Explain with example?
A: Use plt.plot(x,y) and plt.show().
Q: How to create Bar Plot using Matplotlib? Explain with example?
A: Use plt.bar(x,y) and plt.show().
Q: How to create Scatter Plot using Matplotlib? Explain with example?
A: Use plt.scatter(x,y) and plt.show().
Q: How to create Histogram using Matplotlib? Explain with example?
A: Use plt.hist(data, bins=5).
Q: How to create Boxplot using Matplotlib? Explain with example?
A: Use plt.boxplot(data).
Q: How to create Facet Grid using Matplotlib? Explain with example?
A: Use seaborn's FacetGrid().
Q: How to create Pair plot Matplotlib? Explain with example?
A: Use seaborn's pairplot().
Q: How to create Heat map using Matplotlib? Explain with example?
A: Use seaborn's heatmap().
Q: What is Data Wrangling?
A: The process of cleaning, transforming, and preparing raw data for analysis.
Q: How to find & fill missing values present in dataset?
A: Use df.isnull() and df.fillna().
Q: How to generate statistical data related to dataset?
A: Use df.describe().
Q: What is data formation?
A: Structuring raw data into a suitable format for analysis.
Q: What is data normalization?
A: Scaling data to a specific range to improve model performance.
Q: Which methods are used in python for data normalization?
A: Min-Max Scaling, Z-Score Standardization, Robust Scaling.
Q: How is binning in python? Why it is used?
A: Grouping continuous values into bins using pd.cut(). Used to reduce errors.
Q: How to convert categorical values in to quantitative values in python?
A: Use mapping or LabelEncoder.
Q: How to calculate mean, mode & median of given data in python?
A: Use np.mean(), np.median(), and stats.mode().
Q: How to find standard deviation & variance of given data in python?
A: Use np.std() and np.var().
Q: What is Hypothesis?
A: An assumption that can be tested through study and experimentation.
Q: What is Null Hypothesis?
A: The statement that there is no significant difference or relationship.
Q: What is use of scipy module in python?
A: Used for scientific computing and technical computing.
Q: What is EDA?
A: Exploratory Data Analysis is used to summarize datasets and visualize them.
Q: Which feature of dataset are explore in EDA?
A: Missing values, data distribution, outliers, relationships, summary statistics.
Q: What is use of Regression analysis?
A: Predict the value of a dependent variable based on independent variables.
Q: Which are types of Regression?
A: Linear Regression, Multiple Linear Regression, Logistic Regression, Polynomial Regression.
Q: Which are different steps to build regression in Python?
A: Import, Load, Preprocess, Split, Train, Evaluate, Predict.
Q: Which are different parameters used for evaluation of regression analysis?
A: R² Score, MSE, MAE, RMSE.
Q: Explain difference between linear & multiple linear regression?
A: Linear: One independent variable.
Multiple Linear: Multiple independent variables.
Q: Under which situation logistics regression is used?
A: Used when the dependent variable is categorical.