0% found this document useful (0 votes)
54 views4 pages

Assignment 2 ML

The document outlines various data types, including structured, semi-structured, unstructured, quantitative, qualitative, primary, and secondary data, along with their definitions, examples, and analysis methods. It also discusses data collection methods such as surveys, experiments, and observational studies, detailing their purposes, suitable data types, challenges, and impacts on data quality. The information serves as a comprehensive guide for understanding data types and collection methods in data analysis.

Uploaded by

vu.241fa04f26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views4 pages

Assignment 2 ML

The document outlines various data types, including structured, semi-structured, unstructured, quantitative, qualitative, primary, and secondary data, along with their definitions, examples, and analysis methods. It also discusses data collection methods such as surveys, experiments, and observational studies, detailing their purposes, suitable data types, challenges, and impacts on data quality. The information serves as a comprehensive guide for understanding data types and collection methods in data analysis.

Uploaded by

vu.241fa04f26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ML Assignment 2: Exploring Data Types and Data Collection Methods

1. Understanding Data Types


Task 1: Define and Describe Data Types

1. Structured Data
o Definition: Data organized into rows and columns, typically stored in relational
databases.
o Example: An Excel spreadsheet containing employee records.
o Characteristics: Highly organized, easily searchable using SQL, suitable for
traditional data analysis tools.
2. Semi-structured Data
o Definition: Data that does not reside in a traditional database but has some
organizational properties (tags, markers).
o Example: JSON or XML files.
o Characteristics: Flexible structure, allows for hierarchical relationships, needs
special parsing tools.
3. Unstructured Data
o Definition: Data that lacks a predefined format or structure.
o Example: Videos, images, emails, social media posts.
o Characteristics: Requires preprocessing or AI techniques to analyze; storage and
management are more complex.
4. Quantitative Data
o Definition: Numeric data that represents measurable quantities.
o Example: Height, temperature, income.
o Characteristics: Supports statistical and mathematical analysis.
5. Qualitative Data
o Definition: Descriptive data that represents categories or qualities.
o Example: Customer feedback, product reviews.
o Characteristics: Analyzed using thematic or content analysis, not easily
quantifiable.
6. Primary Data
o Definition: Data collected directly by the researcher for a specific purpose.
o Example: Responses from a custom survey.
o Characteristics: Original, tailored to specific research needs, usually more
accurate.
7. Secondary Data
o Definition: Data collected by others, used for a purpose different from its original
intent.
o Example: Government census data.
o Characteristics: Readily available, less costly, but may not fit research needs
exactly.

Task 2: Implications for Data Analysis

• Structured Data: Easily analyzed using SQL and statistical software. Visualization tools
like bar charts, line graphs, and dashboards work well.
• Semi-structured Data: Requires parsing and transformation before analysis. Techniques
include JSON/XML parsers, followed by statistical or machine learning tools.
• Unstructured Data: Needs preprocessing (e.g., NLP for text, computer vision for
images). Advanced techniques are essential for extracting useful insights.
• Quantitative Data: Ideal for statistical tests (e.g., regression, correlation). Easily
visualized with histograms, scatter plots, and line charts.
• Qualitative Data: Analyzed using coding and thematic analysis. Visualizations include
word clouds, concept maps.
• Primary vs. Secondary Data: Primary data is more relevant but expensive. Secondary
data is faster to obtain but may lack specificity.

Task 3: Data Type Table


Data Type Example Analysis Method

Structured SQL Database Descriptive statistics, SQL queries

Semi-structured JSON/XML Parsing, keyword extraction

Unstructured Video/Text/Image NLP, image recognition, deep learning

Quantitative Test Scores, Age Statistical modeling, regression

Qualitative Interview Transcripts Thematic/content analysis

Primary User-conducted survey Tailored analysis, high relevance

Secondary Public health reports Comparative/trend analysis

2. Data Collection Methods


Task 1: Describe Data Collection Methods

1. Surveys
Description: Structured questionnaires used to collect responses from a
o
population.
o Purpose: Collect standardized data on opinions, behaviors, demographics.
o Use Cases: Market research, academic studies.
2. Experiments
o Description: Controlled tests where variables are manipulated to observe
outcomes.
o Purpose: Establish cause-effect relationships.
o Use Cases: Clinical trials, A/B testing in product development.
3. Observational Studies
o Description: Researchers observe subjects in natural settings without
interference.
o Purpose: Study behaviors and interactions in real-world environments.
o Use Cases: Ethnographic research, user experience studies.

Task 2: Data Type Suitability

• Surveys
o Suitable Data: Structured (Likert scale), Quantitative (age), Qualitative (open-
ended).
o Challenges: Risk of response bias, low response rates.
• Experiments
o Suitable Data: Primarily quantitative.
o Challenges: Costly, may raise ethical concerns, limited to specific settings.
• Observational Studies
o Suitable Data: Qualitative, unstructured (video, audio).
o Challenges: Observer bias, limited control over variables, potential privacy
issues.

Task 3: Impact on Data Quality

• Surveys: Can provide large-scale data quickly, but quality depends on question clarity
and respondent honesty.
• Experiments: High reliability due to controlled variables, but may not reflect real-world
behavior.
• Observational Studies: High ecological validity, but subject to interpretation and harder
to replicate.

Examples:

• Poorly designed surveys can yield unreliable results (e.g., ambiguous questions).
• Experiments with small samples may lack statistical power.
• Observer presence in observational studies may alter subject behavior (Hawthorne
effect).

You might also like