Module-2: Data Science:Exploratory Data Analysis and Data Visualization
Plotting for exploratory data analysis (EDA)
• Introduction to IRIS dataset and 2D scatter plot 26 mins
• 3D scatter plot 6 mins
• Pair plots 14 mins
• Limitations of Pair Plots 2 mins
• Histogram and Introduction to PDF(Probability Density Function) 17 mins
• Univariate Analysis using PDF 6 mins
• CDF(Cumulative Distribution Function) 15 mins
• Mean, Variance and Standard Deviation 17 mins
• Median 10 mins
• Percentiles and Quantiles 9 mins
• IQR(Inter Quartile Range) and MAD(Median Absolute Deviation) 6 mins
• Box-plot with Whiskers 9 mins
• Violin Plots 4 mins
• Summarizing Plots, Univariate, Bivariate and Multivariate analysis 6 mins
• Multivariate Probability Density, Contour Plot 9 mins
• Assignment-1: Data Visualization with Haberman Dataset 4 mins
Linear Algebra
• Why learn it ? 4 mins
• Introduction to Vectors(2-D, 3-D, n-D) , Row Vector and Column Vector 14 mins
• Dot Product and Angle between 2 Vectors 14 mins
• Projection and Unit Vector 5 mins
• Equation of a line, Plane and Hyperplane 23 mins
• Distance of a point from a Plane/Hyperplane, Half-Spaces 10 mins
• Equation of a Circle (2-D), Sphere (3-D) and Hypersphere (n-D) 7 mins
• Equation of an Ellipse (2-D), Ellipsoid (3-D) and Hyperellipsoid (n-D) 6 mins
• Square ,Rectangle 6 mins
• Hyper Cube,Hyper Cuboid 3 mins
• Revision Questions 30 mins
Probability and Statistics
• Introduction to Probability and Statistics 17 mins
• Population and Sample 7 mins
• Gaussian/Normal Distribution and its PDF(Probability Density Function) 27 mins
• CDF(Cumulative Distribution function) of Gaussian/Normal distribution 11 mins
• Symmetric distribution, Skewness and Kurtosis 25 mins
• Standard normal variate (Z) and standardization 6 mins
• Kernel density estimation 7 mins
• Sampling distribution & Central Limit theorem 19 mins
• Q-Q plot:How to test if a random variable is normally distributed or not? 23 mins
• How distributions are used? 17 mins
• Chebyshev’s inequality 20 mins
• Discrete and Continuous Uniform distributions 13 mins
• How to randomly sample data points (Uniform Distribution) 10 mins
• Bernoulli and Binomial Distribution 11 mins
• Log Normal Distribution 12 mins
• Power law distribution 12 mins
• Box cox transform 12 mins
• Applications of non-gaussian distributions? 26 mins
• Co-variance 14 mins
• Pearson Correlation Coefficient 13 mins
• Spearman Rank Correlation Coefficient 7 mins
• Correlation vs Causation 5 mins
• How to use correlations? 13 mins
• Confidence interval (C.I) Introduction 8 mins
• Computing confidence interval given the underlying distribution 11 mins
• C.I for mean of a random variable 14 mins
• Confidence interval using bootstrapping 18 mins
• Hypothesis testing methodology, Null-hypothesis, p-value 16 mins
• Hypothesis Testing Intution with coin toss example 27 mins
• Resampling and permutation test 15 mins
• K-S Test for similarity of two distributions 15 mins
• Code Snippet K-S Test 6 mins
• Hypothesis testing: another example 18 mins
• Resampling and Permutation test: another example 19 mins
• How to use hypothesis testing? 23 mins
• Proportional Sampling 18 mins
• Revision Questions 30 mins
Interview Questions on Probability and statistics
• Questions & Answer 30 mins
Dimensionality reduction and Visualization
• What is Dimensionality reduction? 3 mins
• Row Vector and Column Vector 5 mins
• How to represent a data set? 4 mins
• How to represent a dataset as a Matrix. 7 mins
• Data Preprocessing: Feature Normalisation 20 mins
• Mean of a data matrix 6 mins
• Data Preprocessing: Column Standardization 16 mins
• Co-variance of a Data Matrix 24 mins
• MNIST dataset (784 dimensional) 20 mins
• Code to Load MNIST Data Set 12 mins
PCA(principal component analysis)
• Why learn PCA? 4 mins
• Geometric intuition of PCA 14 mins
• Mathematical objective function of PCA 13 mins
• Alternative formulation of PCA: Distance minimization 10 mins
• Eigen values and Eigen vectors (PCA): Dimensionality reduction 23 mins
• PCA for Dimensionality Reduction and Visualization 10 mins
• Visualize MNIST dataset 5 mins
• Limitations of PCA 5 mins
• PCA Code example 19 mins
• PCA for dimensionality reduction (not-visualization) 15 mins
(t-SNE)T-distributed Stochastic Neighbourhood Embedding
• What is t-SNE? 7 mins
• Neighborhood of a point, Embedding 7 mins
• Geometric intuition of t-SNE 9 mins
• Crowding Problem 8 mins
• How to apply t-SNE and interpret its output 38 mins
• t-SNE on MNIST 7 mins
• Code example of t-SNE 9 mins
• Revision Questions 30 mins
Interview Questions on Dimensionality Reduction
• Questions & Answers 30 mins