Data Analysis Project Directions
Objective: The objective of this project is to provide students with an opportunity to apply precalculus concepts
to real-world data analysis. Students will collect or find a dataset of their choice, analyze it using various
mathematical techniques, and draw conclusions about the relationships between variables.
Project Overview:
1. Data Collection/Selection:
• Students will either collect data themselves or find a dataset from a reputable source on the
internet.
• Data should involve at least two variables with measurable quantities (e.g., time, distance,
temperature, population, etc.).
• Data Examples
a. Data related to your Chemistry Research project.
b. Collecting data that interests you from the world around you.
c. See second page for reliable data sources if not collecting your own data.
2. Data Analysis:
• Students will organize and examine the data to identify any trends or patterns.
• They will create scatterplots of the data and visually inspect for potential relationships between
variables.
• Using the collected data, students will create semi-log and log-log graphs to linearize the data.
• They will determine which graph type (𝑦 ̂ 𝑣𝑠. 𝑥, log 𝑦̂ 𝑣𝑠. 𝑥, 𝑜𝑟 log 𝑦̂ 𝑣𝑠. log 𝑥) best linearizes the
relationship between the variables.
3. Model Selection:
• Based on the linearization, students will determine the type of function that best fits the data
(linear, exponential, or power function).
• They will justify their choice using mathematical reasoning and observations from the linearized
graphs.
• Students will calculate the line of best fit for the chosen model using appropriate regression
techniques.
• They will determine the equation of the line based on the regression results.
• They will determine the function of best fit based on the line of best fit.
4. Prediction:
• Using the function of best fit, students will make predictions for values beyond the range of the
collected data.
• They should explain the significance of their predictions in the context of the dataset.
5. Summary Paragraph:
• Finally, students will write a paragraph summarizing their findings and conclusions.
• They should discuss the relationship between the variables, the suitability of the chosen model,
the accuracy of their predictions, and any limitations or uncertainties encountered during the
analysis.
What to Turn In:
1. Your report (compiled in a word processor like Microsoft Word or Google Docs) containing:
• Description of the dataset and variables.
• Visualizations of the data (scatterplots, semi-log, and log-log graphs).
• Justification for the chosen model.
• Equation of the line of best fit.
• Equation of the function of best fit.
• Predictions based on the model.
• Summary paragraph discussing the findings and conclusions.
Scoring Rubric:
Criteria Points
Data Collection/Selection:
- Appropriateness of dataset selection
- Clarity in describing variables and dataset 10
Data Analysis:
- Clear and accurate visualization of data
- Identification of trends or patterns
- Creation of semi-log and log-log graphs
- Justification for chosen linearization method 10
Model Selection:
- Sound reasoning for selecting the model
- Explanation of why the chosen model is suitable
- Accuracy in determining the equation of the line
- Appropriateness of regression techniques 10
Prediction:
- Correct application of the model for predictions and justification for the chosen predictions 5
Summary Paragraph:
- Clarity and coherence of the summary
- Reflection on findings and insights 10
General Presentation:
- Organization and formatting of the report
- Grammar, spelling, and overall presentation 5
Total Points: 50
Some Reliable Data Sources
1. Data.gov: Data.gov is a U.S. government website that provides access to datasets from federal agencies.
Students can find datasets on topics such as climate, agriculture, transportation, and demographics.
Website: Data.gov
2. Kaggle: Kaggle is a platform that hosts datasets across various domains. Students can explore datasets
related to demographics, economy, health, education, and more. Website: Kaggle
3. Google Dataset Search: Google Dataset Search allows users to search for datasets across the web.
Students can use keywords related to their interests to find relevant datasets. Website: Google Dataset
Search
4. World Bank Data: The World Bank provides access to a wide range of datasets related to global
development indicators, including economic, social, and environmental data. Website: World Bank Data
5. National Center for Education Statistics (NCES): NCES provides datasets related to education in the
United States, including student demographics, academic performance, and educational expenditures.
Website: NCES