0% found this document useful (0 votes)

6 views19 pages

Lecture Notes 2

Uploaded by

nouhaguettari993

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views19 pages

Lecture Notes 2

Uploaded by

nouhaguettari993

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data

Collection
Learn to effectively collect reliable and high-quality data for
informed analysis and decision-making

Oct 27, 2024

Agenda
1. What Types Of Data Can We Gather For Data Analysis?
2. An Overview Of Data Collection Methods
3. Definition Of Sampling And Sampling Techniques
4. Real-world Challenges In The Data Collection Pipeline

Part 1: What Types Of Data Can

We Gather For Data Analysis?
Source

Data analysis cannot be performed without data. Data is the fundamental input for analysis,
as it provides the raw information that analysts use to extract insights, identify trends, and
make informed decisions.

Why is Data Analysis Important?

1. Informed Decision-Making: Data analysis helps organizations and individuals make

decisions based on facts and evidence rather than intuition or guesswork. By analyzing
relevant data, decision-makers can evaluate options, predict outcomes, and choose the
most effective course of action.
2. Identifying Trends and Patterns: Through data analysis, trends and patterns within
datasets can be identified. This allows businesses to anticipate market shifts,
understand customer behavior, and detect emerging opportunities or risks.
3. Resource Optimization: Analyzing data enables businesses and organizations to
optimize resources by identifying inefficiencies and areas for improvement. This can lead
to cost reductions, better allocation of resources, and improved operational efficiency.

Types of Data Sources

1. Primary Data: Primary data refers to original data collected firsthand by the researcher
or organization for a specific purpose. It is often gathered through direct methods like
surveys, interviews, experiments, and observations.
Examples:
A company conducting customer surveys to gather feedback on a new product.
Researchers conducting experiments to collect data on the effects of a drug.
2. Secondary Data: Secondary data refers to data that has already been collected and
published by someone else, often for a different purpose than the current research. It
includes datasets from government reports, academic papers, market research reports,
and historical records.
Examples:
Publicly available government statistics like census data.
Academic studies using existing datasets.
3. Third-Party Data: Third-party data refers to data that is collected by an external
organization and then made available for use by others. This data is often sold or
provided through platforms, APIs, or subscription-based services.
Examples:
Social media analytics data from platforms like Twitter or Facebook.
Market research reports provided by consulting firms.
Financial data purchased from data vendors.

In the table below, the raw data (temperature readings, gender information, exam scores)
represents unprocessed facts. However, once analyzed, these numbers provide meaningful
information:
Criteria Primary Data Secondary Data Third-Party Data

Definition Data collected Data collected by Data collected by external

directly by the someone else, typically organizations and sold or
researcher for a for a different purpose made available for use by
specific purpose others

Source Direct source (e.g., Existing datasets from External organizations, data
surveys, public or private sources providers, and platforms
experiments, (e.g., government (e.g., social media
observations) reports, academic analytics, market reports)
papers)

Control High: You control Low: Collected by Low: Data is gathered by

over Data how and what data others with their third parties, and you have
is collected methods and objectives limited control

Time and Time-consuming Less time-consuming Can be expensive,

Cost and costly to and less expensive depending on the provider
gather

Relevance Highly relevant to May not be perfectly May provide valuable

the research aligned with current insights but may not align
question needs perfectly with specific
analysis needs

Data Typically high, as it Quality varies depending Varies widely depending on

Quality is collected with a on the original data the third-party provider and
specific goal in collector data collection methods
mind

Examples Survey data, Government statistics, Social media data, market

experimental data, academic research research reports, financial
interviews datasets, historical data data from vendors

Pros Tailored to specific Cost-effective, time- Offers access to diverse,

needs, high control saving, readily available specialized data from
and accuracy external sources

Cons Expensive, time- May not be entirely Potentially expensive,

consuming, relevant or up-to-date quality and relevance can
resource-intensiv vary
Info Third-party data providers are companies that collect, aggregate, and sell
or license data to other businesses, organizations, or individuals. Some notable
companies in the third-party data space include: Acxiom, Gartner and Experian

Part 2: An Overview Of Data

Collection Methods
Regardless of whether the data comes from primary, secondary, or third-party sources, the
process of data collection involves using different methods to gather accurate and reliable
information. The method chosen often depends on the nature of the data, the goals of the
analysis, and the resources available. Understanding these methods is crucial for ensuring
the integrity and relevance of the data collected.

1. Surveys: Surveys are one of the most common methods of primary data collection,
often used to gather quantitative data from a large number of respondents. They can be
conducted through various means, such as online questionnaires, face-to-face
interviews, or phone surveys.
2. Experiments: Experiments involve manipulating one or more variables and observing the
effects. This method is typically used in scientific research or controlled studies to
establish cause-and-effect relationships between variables.
3. Observations: This method involves directly observing people, processes, or events in
real-time without intervening. It is commonly used in behavioral studies, ethnography,
and other qualitative research.
4. Databases: Using existing databases is a secondary data collection method. This
involves accessing datasets that have already been collected, often by government
agencies, research institutions, or private companies. Examples include census data,
financial reports, or academic research datasets.

These four methods cover a wide range of data collection techniques and can be combined
depending on the research goals and resources available.

Surveys: Tools and Techniques for Implementation

Surveys are a widely used method for collecting data, particularly when you need to gather
information from a large number of respondents. They are flexible and can be adapted to a
variety of contexts, from market research to social science studies. Below, we explore
different tools and techniques to implement surveys effectively.

Designing a Survey: Before using any tools, it is crucial to design a well-structured

survey that ensures the collection of reliable and valid data. Key considerations include:
Clear Objectives: Define the purpose of the survey. What information are you
trying to gather?
Question Types: Decide between open-ended, closed-ended, or Likert scale
questions.
Length: Keep the survey concise to avoid respondent fatigue.
Language: Use clear and simple language to avoid confusion.
Tools for Implementing Surveys: Several tools can help you create, distribute, and
analyze surveys effectively. Here are some of the most popular survey tools:

Tool Features Best For

Google Forms Easy to use, integrates with Google Quick surveys, educational
Sheets for data analysis, purposes, free to use
customizable

SurveyMonkey Advanced survey creation tools, Professional surveys, customer

data analysis features, reporting feedback, detailed reports
options

Typeform Highly customizable, interactive Engaging surveys with better

surveys, user-friendly design user experience

Qualtrics Robust survey platform, data Academic research, large-scale

analytics, and reporting tools studies, enterprise-level surveys

Microsoft Similar to Google Forms, integrates Simple surveys for educational

Forms with Microsoft Office tools, easy to or organizational purposes
use

SurveyGizmo Offers advanced features for Detailed and complex surveys,

(Alchemer) branching, reporting, and integration especially for business analysis
with other tools

Best Practices for Implementing Surveys

To ensure that surveys yield valuable insights, consider these best practices:

Pilot Testing: Before launching a survey, test it on a small sample to identify potential
issues in question clarity or functionality.
Clear Instructions: Provide clear guidelines on how to complete the survey and an
estimated time to finish.
Incentives: Offer incentives like discounts or prize draws to encourage higher response
rates.
Follow-Up: Send reminders to non-respondents to increase completion rates.
Data Privacy: Ensure respondents are informed about how their data will be used and
stored, adhering to privacy laws and regulations.

Experiments: Tools and Techniques for Implementation

Experiments are a method of data collection where researchers manipulate one or more
variables and observe the effect on another. Experiments are essential for establishing
cause-and-effect relationships and are widely used in scientific research, social sciences,
psychology, and more. Below, we explore the key tools and techniques for implementing
experiments effectively.

Experiments can generally be categorized into controlled and field experiments. Each type
has its strengths, limitations, and appropriate use cases. Here’s a detailed breakdown of
both:

1. Controlled Experiments: Controlled experiments, also known as laboratory experiments,

are conducted in highly controlled environments where researchers can manipulate the
independent variables and control other factors that might influence the dependent
variables.
2. Field Experiments: Field experiments, on the other hand, are conducted in natural
settings where the researcher still manipulates the independent variable but allows other
external factors to remain in play. These experiments are often used in real-world
environments to see how variables behave outside of the lab.

Aspect Controlled Experiment Field Experiment

Setting Conducted in a controlled, artificial Conducted in real-world settings

environment (e.g., lab) (e.g., schools, workplaces)

Control Over High control over independent and Low control over extraneous
Variables extraneous variables variables

Validity High internal validity (strong cause- High external validity (results
and-effect) apply to real-world)

Cost Can be expensive (lab equipment, Generally less expensive (real-

staff, resources) world setting)

Complexity Less complex in terms of external More complex due to

variables unaccounted external factors

Generalizability Limited (due to artificial High (results reflect real-world

environment) behavior)

Ethical Easier to manage informed consent Harder to control, especially with

Considerations and safety protocols deception

Ethical Considerations in Experiments

Conducting experiments involves ethical responsibilities, especially when human participants

are involved. Consider the following principles:

Informed Consent: Participants should be fully informed about the purpose of the
experiment, any potential risks, and their right to withdraw at any time.
Confidentiality: Data collected from participants should be kept confidential and stored
securely.
Debriefing: After the experiment, participants should be debriefed about the true
purpose of the study, especially in cases where deception is used.
Avoiding Harm: Experiments should avoid causing any physical or psychological harm to
participants.

Tools for Conducting Experiments

Labster: Labster offers virtual lab simulations that allow users to conduct interactive
experiments online. It is designed to make science experiments accessible remotely,
providing a simulated environment for students and researchers.

Best For: This tool is ideal for educational experiments, particularly in the
sciences, and is widely used in remote learning setups. It's perfect for teaching
concepts that require practical experience but cannot be conducted in a physical
lab.

Qualtrics: Qualtrics is a powerful survey and experiment platform that supports advanced
features such as A/B testing and data analysis. It allows users to design experiments, collect
data, and analyze results in one integrated system.

Best For: It's best suited for online experiments, especially those focused on
marketing, consumer behavior, and user experience studies. Researchers can
easily set up controlled experiments and analyze the results using built-in
analytics.

SPSS (IBM): SPSS is a comprehensive data analysis software that supports experimental
designs, statistical analyses, and hypothesis testing. It offers tools for designing
experiments, managing data, and conducting a range of statistical tests.

Best For: SPSS is highly effective for analyzing data from controlled experiments,
particularly in social sciences, psychology, and other fields where hypothesis
testing and statistical analysis are required.

Google Optimize: Google Optimize is a tool for running A/B tests and optimizing user
experience (UX) on websites. It enables users to test different website variations and analyze
the results based on user interactions.

Best For: This tool is specifically tailored for marketing experiments, website
optimization, and UX testing. It is commonly used for experiments aimed at
improving the performance of websites and digital products.

Observations: Tools and Techniques for Implementation

Observation is a direct method of collecting data by watching, recording, and analyzing

behaviors, events, or phenomena as they naturally occur. Unlike surveys or experiments, this
method focuses on real-world contexts, making it particularly useful for understanding
patterns, interactions, and contexts that might not be captured through other means.

Types of Observation

Structured Observation: Uses predefined frameworks or protocols to collect specific

data points.
Example: Observing customer behavior in a retail store based on a checklist of
actions (e.g., product browsing time).
Unstructured Observation: Involves open-ended, exploratory observation without
predefined categories or frameworks.
Example: Observing interactions at a public event to identify emergent patterns.
Participant Observation: The researcher becomes a part of the group or environment
being studied to gain deeper insights.
Example: Joining a workplace team to observe communication dynamics.
Non-Participant Observation: The researcher observes without participating in the
activity or environment.
Example: Monitoring wildlife behavior from a distance without interference.

Tools for Observational Data Collection

To facilitate efficient and accurate data collection, various tools and technologies have been
developed. These tools range from traditional methods like manual recording to advanced
digital solutions that automate and enhance the observation process.

Field Notes: Researchers manually record observations, focusing on behaviors,

contexts, and interactions.
Video Recording: Captures detailed data for later analysis, especially in complex or
dynamic environments.
Software Tools: Platforms like Nvivo and Atlas.ti can be used for organizing and
analyzing qualitative observational data.

Experiment vs. Observation In observations, the researcher has no control

over the variables. Experiments involve manipulating one or more variables to
observe their effects on other variables.

Databases: Tools and Techniques for Implementation

Databases are organized systems used to store, manage, and retrieve data efficiently. They
serve as essential tools for collecting and maintaining large volumes of structured data,
ensuring accessibility and reliability for analysis.
Source

Broadly, they are classified into two main categories based on their data structure and
usage: Relational Databases and Non-Relational (NoSQL) Databases. Each type has unique
characteristics and is suited to specific use cases.

1. Relational Databases: Relational databases use tables (rows and columns) to organize
data and define relationships between data points. They rely on structured schemas that
ensure consistency and accuracy.
Examples:
MySQL: Open-source database commonly used for web applications.
PostgreSQL: Advanced open-source database with support for complex queries
and custom functions.
Oracle Database: Enterprise-grade database known for its robustness and
scalability.
Use Case: Relational databases are ideal for structured data with consistent
relationships. For example, in an e-commerce platform, relational databases can
store: Customer information: Name, email, and contact details. Purchase records:
Transaction history tied to customer IDs. These relationships allow seamless tracking
of customer behaviors and transaction patterns.
2. Non-Relational (NoSQL) Databases: NoSQL databases handle data in a flexible and
scalable way, without requiring predefined schemas. They are designed to store
unstructured or semi-structured data, making them highly adaptable for modern data
needs.
Examples:
MongoDB: Document-oriented database storing data in JSON-like formats.
Cassandra: Distributed database designed for scalability and high availability.
Firebase: Real-time database for mobile and web app development.
Use Case: NoSQL databases excel in managing massive and dynamic datasets. For
instance, in a website with user-generated content, they can handle: User posts:
Blogs, comments, or images stored as documents. Clickstream data: Logs capturing
user interactions, such as page views and clicks, in real-time. This flexibility ensures
smooth scaling as the volume and diversity of data grow.

Flaws in Data Collection Techniques

Each data collection method has its strengths, but they also come with specific flaws that
can impact data quality, reliability, and usability. Here's an overview of potential drawbacks
for each technique:

Limitations of Surveys

Response Bias:Participants may provide inaccurate or socially desirable answers

rather than truthful responses, skewing the data.
Low Response Rates:It can be challenging to achieve a high number of completed
surveys, especially in online or unsolicited formats.

Limitations of Experiments

Controlled Environment Limitations:Experiments conducted in artificial settings may

not accurately reflect real-world behaviors or conditions.
Ethical and Practical Constraints:Some experiments are difficult or unethical to
conduct, limiting the scope of research.

Limitations of Observations

Observer Bias:The researcher’s expectations or personal perceptions can influence

how data is recorded or interpreted.
Time and Resource Intensive:Observing behaviors or events in natural settings often
requires significant time, effort, and cost.

Limitations of Databases

Data Quality Issues:Pre-existing datasets might contain inaccuracies,

inconsistencies, or outdated information, reducing their usefulness.
Limited Customization:Databases collected by third parties may not fully align with
the specific requirements of a new analysis or project.

How Much Data is Enough? Striking the Right Balance

In data analysis, the quantity of data plays a critical role in shaping insights and decisions.
Collecting either too little or too much data can have adverse consequences, making it
essential to strike the right balance.

The Problem with Too Little Data

Lack of Representation:A small dataset may fail to capture the diversity or complexity of
the phenomenon being studied.
Example: Analyzing customer preferences with a sample of only 10 customers may
overlook trends and variations in the broader audience.
Statistical Limitations:Insufficient data leads to unreliable statistical conclusions,
increasing the likelihood of errors such as overfitting or underfitting.
Impact: Predictions or insights may lack robustness and generalizability.

The Problem with Too Much Data

Data Overload:Excessive data can be difficult to process, analyze, or store, leading to

inefficiencies and higher costs.
Example: Collecting every possible website click for a small marketing campaign
might create an unmanageable dataset without actionable insights.
Risk of Redundancy:Large datasets may contain duplicate or irrelevant information,
complicating analysis without adding value.
Impact: Analysts spend significant time cleaning data rather than extracting insights.

Striking the Right Balance

Understand the Objective:Focus on collecting only the data needed to answer the
research question or solve the problem.
Tip: Start by defining the goals and identifying the variables critical to achieving them.
Conduct a Pilot Study:A small initial dataset can help identify whether the quantity and
type of data are sufficient for the intended analysis.
Benefit: This approach minimizes unnecessary effort while refining data collection
methods.
Leverage Sampling Techniques:Use appropriate sampling methods to work with
representative subsets instead of entire datasets.
Example: Stratified sampling ensures all key subgroups are included without
excessive data collection.
Regularly Evaluate Data Needs: Periodically reassess the volume of data required as
the project progresses to ensure alignment with goals.

The Goldilocks principle Just as Goldilocks sought porridge that was "just
right," data analysts should aim for datasets that are neither too small to be
meaningful nor too large to be practical

Part 3: Definition Of Sampling And

Sampling Techniques
What is a Sample?

A sample is a subset of data selected from a larger group, known as the population, for the
purpose of analysis. Sampling is used to make inferences about the population without
examining every individual or data point. For example: Surveying 1,000 citizens to understand
the opinions of a country's entire population.

Key Insight A well-chosen sample allows for accurate, reliable conclusions
while saving time and resources

Difference Between Population and Sample

Aspect Population Sample

Definition The entire group of interest in a A subset of the population selected for
study or analysis analysis

Size Typically large, sometimes infinit Smaller and more manageable

Purpose Represents the complete scope Used to infer characteristics of the

of study population

Example All employees in a company 100 randomly chosen employees from

the company

What is Sampling?

Sampling is the process of selecting a sample from the population. It is a foundational

technique in data collection and analysis, ensuring that studies are efficient and conclusions
are statistically valid.

Why Sampling Matters?

Feasibility: Collecting data from an entire population can be difficult due to the size, cost,
and logistical challenges. For instance, reaching a large or global population may be time-
consuming and expensive, requiring significant resources. Sampling allows researchers to
gather insights from a smaller, more manageable group, making data collection more
feasible.

Efficiency: Sampling offers greater efficiency by reducing the time, effort, and costs needed
for data collection. A smaller sample means quicker data collection and analysis, as well as
fewer resources needed for outreach, processing, and storage. This makes sampling an
attractive option when resources are limited.
Types of Sampling

There are several techniques for sampling, each suited to different research needs. The
most common sampling methods include simple random sampling, stratified sampling, and
cluster sampling. These techniques help ensure that the sample is representative, which is
essential for accurate analysis and decision-making.

Simple Random Sampling

In this method, each member of the population has an equal chance of being selected. The
selection is random, meaning there is no bias in choosing the sample. This technique is
straightforward and easy to implement.

Source

Scenario: A university wants to survey 100 students about their satisfaction with
campus facilities. The university has 5,000 students enrolled. To ensure that each
student has an equal chance of being selected, the survey team uses simple random
sampling. They randomly select 100 students from the entire student body using a
random number generator.
Outcome: Each student in the university, regardless of their program or year, has an
equal chance of being included in the survey. This method is ideal when there is no
need to categorize students based on specific characteristics, and the goal is simply
to get an unbiased representation of the population.

Stratified Sampling
The population is divided into distinct subgroups, or strata, based on specific characteristics
(e.g., age, gender, income level). Then, a random sample is taken from each subgroup. This
ensures that every subgroup is properly represented in the final sample.
Source

Scenario: A company wants to understand employee satisfaction across different

departments. The company has 1,000 employees in total, divided into five
departments (HR, IT, Sales, Marketing, and Finance). To ensure that each department
is properly represented, the company uses stratified sampling. First, they divide the
employees into strata based on their department, then randomly select an equal
number of employees from each department to survey.
Outcome: This method ensures that each department is represented in the sample,
so the results accurately reflect satisfaction levels across the entire company. The
company can now draw conclusions that take department-specific nuances into
account, providing more actionable insights.

Cluster Sampling
In cluster sampling, the population is divided into clusters (often geographically), and a
random sample of clusters is selected. Then, all individuals within the selected clusters are
surveyed. This method is useful when the population is spread out geographically.

Source
Scenario: A national educational organization wants to assess the effectiveness of a
new online learning platform. Since the platform is used by schools across the
country, it would be costly and time-consuming to survey every school. Instead, they
use cluster sampling. They divide the country into regions, then randomly select 10
regions. Afterward, they survey all the schools in these selected regions that use the
platform.
Outcome: By selecting clusters (regions) instead of individual schools, the
organization significantly reduces costs and logistical challenges. While this method
may lead to some bias if the chosen regions are not representative of the entire
country, it is still a cost-effective way to obtain a large sample when population
members are geographically dispersed.

Note Other sampling techniques include systematic sampling, where every k-th
individual is selected from a population after a random starting point, and
convenience sampling, where individuals are chosen based on ease of access

Sampling Bias: What It Is, Its Sources, and Consequences

What is Sampling Bias? Sampling bias occurs when the sample collected for analysis does
not accurately represent the population from which it was drawn. This leads to skewed or
inaccurate results, which can significantly impact the validity of any analysis or conclusions
drawn from the data.

Source
Sources of Sampling Bias

Selection Bias: This occurs when certain individuals or groups are more likely to be
selected than others, often due to non-random selection methods. For example, only
selecting participants from a particular region or group could exclude others, making the
sample unrepresentative.
Nonresponse Bias: This happens when a significant portion of the selected sample does
not respond or participate. For instance, if only a small subset of survey respondents
answer a poll, those responses may not reflect the views of the larger population.
Response Bias: Response bias happens when participants provide inaccurate or biased
answers, either intentionally or unintentionally. This could result from the way questions
are worded, the survey environment, or social pressures. For instance, people may
exaggerate or provide socially desirable responses, leading to a biased dataset.
Measurement Bias: Measurement bias occurs when the tools or methods used to
collect data consistently produce inaccurate results. This can happen due to faulty
instruments, poor survey design, or misinterpretation of data. For example, using a faulty
scale in a study could result in incorrect weight measurements that misrepresent the
population.
Reporting Bias: Reporting bias is when only certain data or results are reported, usually
due to selective memory or a desire to highlight specific outcomes. This can occur if
researchers only report successful outcomes or ignore data that doesn't fit the
hypothesis, leading to a skewed interpretation of the results.

Consequences of Sampling Bias

Inaccurate Conclusions: The most significant consequence of sampling bias is that it

can lead to inaccurate or misleading conclusions. If a sample is not representative, the
results of the analysis cannot be generalized to the broader population.
Invalid Predictions: In predictive modeling, sampling bias can result in models that are
not accurate when applied to the entire population. For example, if a predictive model is
trained on biased data, it may fail to predict accurately for underrepresented groups.
Poor Decision-Making: In fields like public policy, healthcare, and business, sampling
bias can lead to decisions that are not reflective of the population's actual needs or
characteristics. This can result in ineffective interventions, misallocated resources, or
flawed strategies.
Loss of Credibility: If a study or survey is found to suffer from sampling bias, its findings
may be dismissed or discredited, undermining the credibility of the researcher or
organization responsible for the data collection.

Part 4: Real-world Challenges In

The Data Collection Pipeline
Data collection is a critical process in any analysis or research, but it often comes with a
variety of challenges that can affect the quality, accuracy, and efficiency of the data
gathered. From ensuring that the data is reliable and representative, to dealing with the
logistical constraints of resources and time, data collection presents many hurdles that must
be managed.

Data Quality Issues: One of the primary challenges in data collection is ensuring the quality
of the data. Data quality can be compromised in several ways, such as errors during data
entry, inconsistencies in how data is recorded, or gaps in data. Poor quality data can lead to
inaccurate conclusions and flawed analysis, which undermines the entire research or
decision-making process.

Common Causes: Human errors during data entry, lack of standardization in the
collection process, and faulty data collection tools.
Impact: Inaccurate or inconsistent data can lead to misleading insights and decisions,
wasting time and resources.
Solution: Implementing robust data validation techniques, standardized protocols,
and regular audits can help ensure that data collected is accurate and of high quality.

Data Privacy Concerns: In many industries, data privacy is a growing concern, especially
with the increasing amount of personal and sensitive information being collected. Data
privacy violations can have serious legal, ethical, and financial consequences. Data
collection methods must ensure that personal information is protected and that data
collection complies with regulations like GDPR, HIPAA, or other relevant privacy laws.

Common Causes: Collection of personally identifiable information (PII) without

consent, lack of secure storage for sensitive data, and improper sharing of data.
Impact: Violating privacy regulations can lead to legal penalties, loss of consumer
trust, and reputational damage for organizations.
Solution: Data collection systems must adhere to privacy regulations and standards.
This can include anonymization of sensitive data, acquiring informed consent from
individuals, and ensuring secure data storage and access control.

Resource Constraints: Data collection can be a time-consuming and resource-intensive

process, particularly when large datasets or detailed information is required. Limited
budgets, manpower, and technical resources can constrain how data is gathered and
analyzed, leading to compromises on the thoroughness or scope of data collection.

Common Causes: Limited access to necessary tools or technologies, shortage of

personnel or expertise, and insufficient funding for large-scale data collection efforts.
Impact: Resource constraints can lead to incomplete data collection, delays, or
reliance on less reliable or biased data sources.
Solution: To mitigate this, organizations can prioritize key data collection needs,
utilize cost-effective tools, and explore collaboration opportunities with other
organizations to share resources or access necessary technologies.
© Zhor Diffallah, 2024. All rights reserved.

Comprehensive Guide To Data Collection
No ratings yet
Comprehensive Guide To Data Collection
16 pages
BBA Unit 5
No ratings yet
BBA Unit 5
131 pages
Data Collection
No ratings yet
Data Collection
6 pages
Research Method Unit 3
No ratings yet
Research Method Unit 3
10 pages
Research Mythology
No ratings yet
Research Mythology
10 pages
Data Science Basics for Beginners
100% (2)
Data Science Basics for Beginners
68 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Introduction To Data Science Module 2
No ratings yet
Introduction To Data Science Module 2
35 pages
Module 5 Lecture Note
No ratings yet
Module 5 Lecture Note
8 pages
Data Collection for Business Insights
No ratings yet
Data Collection for Business Insights
8 pages
Data Sources & Collection Methods
No ratings yet
Data Sources & Collection Methods
8 pages
Lect1 3
No ratings yet
Lect1 3
19 pages
Data Collection Methods Ramirez
No ratings yet
Data Collection Methods Ramirez
10 pages
3 2-Output
No ratings yet
3 2-Output
5 pages
Data Collection Lecture
No ratings yet
Data Collection Lecture
10 pages
Effective Data Collection Methods
No ratings yet
Effective Data Collection Methods
6 pages
Data Is A Collection
No ratings yet
Data Is A Collection
9 pages
BigDataAnalytics - Unit1
No ratings yet
BigDataAnalytics - Unit1
21 pages
BRM Unit 3
No ratings yet
BRM Unit 3
51 pages
Unit 2: Primary and Secondary Data Collection
No ratings yet
Unit 2: Primary and Secondary Data Collection
59 pages
Field Survey Data Collection Techniques
No ratings yet
Field Survey Data Collection Techniques
118 pages
Data Collection
No ratings yet
Data Collection
64 pages
Sta 108 (2) Stat
No ratings yet
Sta 108 (2) Stat
6 pages
Xi Ai Unit - 5 Notes
No ratings yet
Xi Ai Unit - 5 Notes
28 pages
Data Collection
No ratings yet
Data Collection
13 pages
Chapter II Data Collection and Management
No ratings yet
Chapter II Data Collection and Management
19 pages
Research Data Types & Collection Tools
No ratings yet
Research Data Types & Collection Tools
5 pages
Primary vs Secondary Data Explained
No ratings yet
Primary vs Secondary Data Explained
11 pages
Module 2: Data Collection and Sampling Design
100% (1)
Module 2: Data Collection and Sampling Design
8 pages
IM M2-Week 3-Organization & Presentation of Data-1
No ratings yet
IM M2-Week 3-Organization & Presentation of Data-1
16 pages
Updated-Module-3-Data Collection and Measurement
No ratings yet
Updated-Module-3-Data Collection and Measurement
47 pages
RM 4
No ratings yet
RM 4
17 pages
UNIT 2 Notes - Data Science
No ratings yet
UNIT 2 Notes - Data Science
18 pages
Data Collection Strategies Overview
No ratings yet
Data Collection Strategies Overview
60 pages
Marketing Data Sources
No ratings yet
Marketing Data Sources
38 pages
Module 3
No ratings yet
Module 3
5 pages
Lec02 Business Analytics - 20231224 - 102047 - 0000 1
No ratings yet
Lec02 Business Analytics - 20231224 - 102047 - 0000 1
23 pages
What Is Data Collection
No ratings yet
What Is Data Collection
13 pages
Statistics Method of Data Collection
No ratings yet
Statistics Method of Data Collection
6 pages
What Is Data Collection
No ratings yet
What Is Data Collection
8 pages
ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD (Department of Special Education)
No ratings yet
ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD (Department of Special Education)
29 pages
BUC - UOS MBA PGBM161 Workshop 5 JLSS
No ratings yet
BUC - UOS MBA PGBM161 Workshop 5 JLSS
65 pages
Module 3
No ratings yet
Module 3
17 pages
DS Unit-2
No ratings yet
DS Unit-2
9 pages
Session 3 Data Collection Analysis and Interpretation
No ratings yet
Session 3 Data Collection Analysis and Interpretation
31 pages
Data and Measurements BRM
No ratings yet
Data and Measurements BRM
16 pages
Nursing Research
No ratings yet
Nursing Research
7 pages
Data Collection & Presentation
No ratings yet
Data Collection & Presentation
12 pages
Assignment - DMBA301 - MBA 3 - Set-1 and 2 - Sep - 2023
No ratings yet
Assignment - DMBA301 - MBA 3 - Set-1 and 2 - Sep - 2023
8 pages
Unit 3
No ratings yet
Unit 3
41 pages
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
No ratings yet
Notes of Unit-I Data Analyticsdocx - 250319 - 093958
18 pages
Data Collection Methods
No ratings yet
Data Collection Methods
15 pages
TECH8000 Week 05
No ratings yet
TECH8000 Week 05
30 pages
Da Notes
No ratings yet
Da Notes
61 pages
Sources of Data Collection
No ratings yet
Sources of Data Collection
4 pages
CS 325 Data Collection Analyzing Data Conclusions and Results
No ratings yet
CS 325 Data Collection Analyzing Data Conclusions and Results
8 pages
Data Collection
No ratings yet
Data Collection
57 pages
Understanding Secondary Data
No ratings yet
Understanding Secondary Data
29 pages
Data Collection Methods Guide
No ratings yet
Data Collection Methods Guide
7 pages
Data Gathering
No ratings yet
Data Gathering
12 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
26 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
15 pages
Fiche TP3
No ratings yet
Fiche TP3
2 pages
Cours 1
No ratings yet
Cours 1
29 pages
Cours 3 2024
No ratings yet
Cours 3 2024
34 pages
Cills Ancillaries Cornerposts Coupling Astragal
No ratings yet
Cills Ancillaries Cornerposts Coupling Astragal
3 pages
Careers - Sterling Parfums
No ratings yet
Careers - Sterling Parfums
1 page
Digital Fundamentals Pp1a
No ratings yet
Digital Fundamentals Pp1a
7 pages
Ethical Hacking Assignment
No ratings yet
Ethical Hacking Assignment
9 pages
Urban Trenchless Tech Insights
No ratings yet
Urban Trenchless Tech Insights
10 pages
Edexcel GCSE Paper 1 - Principles of Computer Science
No ratings yet
Edexcel GCSE Paper 1 - Principles of Computer Science
15 pages
Jones 2017 Life Cycle Assessment of High-Speed Rail A Case Study in Portugal
No ratings yet
Jones 2017 Life Cycle Assessment of High-Speed Rail A Case Study in Portugal
13 pages
Heat Exchanger Design Principles
No ratings yet
Heat Exchanger Design Principles
38 pages
Modicon M218 Logic Controller: Programming Guide
No ratings yet
Modicon M218 Logic Controller: Programming Guide
184 pages
DT466 Wiring Diagram
91% (11)
DT466 Wiring Diagram
6 pages
Airbus Collaborative Engineering Study
No ratings yet
Airbus Collaborative Engineering Study
11 pages
Husky 2.5-Ton Pro Low Profile Car Jack HPL4121-DIP - The Home Depot
No ratings yet
Husky 2.5-Ton Pro Low Profile Car Jack HPL4121-DIP - The Home Depot
1 page
A16z Crypto Market Report 2025
No ratings yet
A16z Crypto Market Report 2025
8 pages
Community Medicine OSPE for MBBS Students
No ratings yet
Community Medicine OSPE for MBBS Students
26 pages
B.Tech Theory of Computation Exam
No ratings yet
B.Tech Theory of Computation Exam
3 pages
Mixed-Signal IP Design Challenges in 28 NM and Beyond
No ratings yet
Mixed-Signal IP Design Challenges in 28 NM and Beyond
14 pages
10 CFM Vacuum Pump Specifications
No ratings yet
10 CFM Vacuum Pump Specifications
1 page
CO2 Fire Extinguisher
No ratings yet
CO2 Fire Extinguisher
2 pages
Norris, M., & Oppenheim, C. (2007) - Comparing Alternatives To The Web of Science For Coverage of The Social Sciences Literature
No ratings yet
Norris, M., & Oppenheim, C. (2007) - Comparing Alternatives To The Web of Science For Coverage of The Social Sciences Literature
9 pages
Quiz Web App Project Report
100% (1)
Quiz Web App Project Report
32 pages
Growth Hacking
No ratings yet
Growth Hacking
22 pages
Kenwood KR-6600 & KR-7600 Service Manual
No ratings yet
Kenwood KR-6600 & KR-7600 Service Manual
26 pages
Esewa Final Project 1
100% (1)
Esewa Final Project 1
31 pages
2023 FM SAC Financial - Part 1
No ratings yet
2023 FM SAC Financial - Part 1
7 pages
MK 90adptr010 23 PDF
No ratings yet
MK 90adptr010 23 PDF
154 pages
Assignment 2.1
No ratings yet
Assignment 2.1
4 pages
FortiGate HA & SD-WAN Setup Guide
No ratings yet
FortiGate HA & SD-WAN Setup Guide
7 pages
Peachtree Course Outline
No ratings yet
Peachtree Course Outline
3 pages
MIT16 36s09 Quiz03
No ratings yet
MIT16 36s09 Quiz03
8 pages
JEE Main 2024 Maths Syllabus
No ratings yet
JEE Main 2024 Maths Syllabus
10 pages

Lecture Notes 2

Uploaded by

Lecture Notes 2

Uploaded by

Data

Oct 27, 2024 ﻿

Part 1: What Types Of Data Can

Why is Data Analysis Important?

1. Informed Decision-Making: Data analysis helps organizations and individuals make

Types of Data Sources

Definition Data collected Data collected by Data collected by external

Control High: You control Low: Collected by Low: Data is gathered by

Time and Time-consuming Less time-consuming Can be expensive,

Relevance Highly relevant to May not be perfectly May provide valuable

Data Typically high, as it Quality varies depending Varies widely depending on

Examples Survey data, Government statistics, Social media data, market

Pros Tailored to specific Cost-effective, time- Offers access to diverse,

Cons Expensive, time- May not be entirely Potentially expensive,

Part 2: An Overview Of Data

Surveys: Tools and Techniques for Implementation

Designing a Survey: Before using any tools, it is crucial to design a well-structured

Tool Features Best For

SurveyMonkey Advanced survey creation tools, Professional surveys, customer

Typeform Highly customizable, interactive Engaging surveys with better

Qualtrics Robust survey platform, data Academic research, large-scale

Microsoft Similar to Google Forms, integrates Simple surveys for educational

SurveyGizmo Offers advanced features for Detailed and complex surveys,

Best Practices for Implementing Surveys

Experiments: Tools and Techniques for Implementation

1. Controlled Experiments: Controlled experiments, also known as laboratory experiments,

Aspect Controlled Experiment Field Experiment

Setting Conducted in a controlled, artificial Conducted in real-world settings

Cost Can be expensive (lab equipment, Generally less expensive (real-

Complexity Less complex in terms of external More complex due to

Generalizability Limited (due to artificial High (results reflect real-world

Ethical Easier to manage informed consent Harder to control, especially with

Ethical Considerations in Experiments

Conducting experiments involves ethical responsibilities, especially when human participants

Tools for Conducting Experiments

Observations: Tools and Techniques for Implementation

Observation is a direct method of collecting data by watching, recording, and analyzing

Structured Observation: Uses predefined frameworks or protocols to collect specific

Tools for Observational Data Collection

Field Notes: Researchers manually record observations, focusing on behaviors,

﻿ Experiment vs. Observation In observations, the researcher has no control

Databases: Tools and Techniques for Implementation

Flaws in Data Collection Techniques

Response Bias:Participants may provide inaccurate or socially desirable answers

Controlled Environment Limitations:Experiments conducted in artificial settings may

Observer Bias:The researcher’s expectations or personal perceptions can influence

Data Quality Issues:Pre-existing datasets might contain inaccuracies,

How Much Data is Enough? Striking the Right Balance

The Problem with Too Little Data

The Problem with Too Much Data

Data Overload:Excessive data can be difficult to process, analyze, or store, leading to

Striking the Right Balance

Part 3: Definition Of Sampling And

Difference Between Population and Sample

Aspect Population Sample

Size Typically large, sometimes infinit Smaller and more manageable

Purpose Represents the complete scope Used to infer characteristics of the

Example All employees in a company 100 randomly chosen employees from

Sampling is the process of selecting a sample from the population. It is a foundational

Why Sampling Matters?

Simple Random Sampling

Scenario: A company wants to understand employee satisfaction across different

Sampling Bias: What It Is, Its Sources, and Consequences

Consequences of Sampling Bias

Inaccurate Conclusions: The most significant consequence of sampling bias is that it

Part 4: Real-world Challenges In

Common Causes: Collection of personally identifiable information (PII) without

Resource Constraints: Data collection can be a time-consuming and resource-intensive

Common Causes: Limited access to necessary tools or technologies, shortage of

You might also like

Oct 27, 2024

Experiment vs. Observation In observations, the researcher has no control