UNIT-I: Data Basics and The Five Steps of Data Science
Structured versus Unstructured Data:
Structured data is organized in a tabular format with rows and columns, easily searchable in
relational databases.
Unstructured data, on the other hand, lacks a fixed schema and includes formats such as text,
images, and videos.
Quantitative and Qualitative Data:
- Quantitative data: Numerical data representing quantities, e.g., height, weight.
- Qualitative data: Non-numerical data representing categories or qualities, e.g., color, brand.
The Four Levels of Data:
1. Nominal Level: Categorization without any order (e.g., gender, colors).
2. Ordinal Level: Data with a meaningful order but no consistent difference (e.g., rankings).
3. Interval Level: Numeric scales with equal intervals but no true zero (e.g., temperature in Celsius).
4. Ratio Level: Numeric scales with a meaningful zero point (e.g., weight, height).
The Five Steps of Data Science:
1. Ask an interesting question.
2. Obtain the data.
3. Explore the data.
4. Model the data.
5. Communicate and visualize the results.
Explore the Data:
Exploring data involves summarizing, visualizing, and identifying patterns or outliers to understand
the dataset's characteristics.