0% found this document useful (0 votes)
69 views4 pages

Air Quality Project

The document outlines a project focused on predicting Air Quality Index (AQI) using machine learning, highlighting the problem of air pollution and the limitations of traditional monitoring systems. The project includes objectives such as developing a robust ML model, comparing algorithms, and visualizing data trends, with a structured workflow from data collection to model evaluation. Key findings indicate strong correlations between AQI and pollutants like PM2.5, with Random Forest being the most effective algorithm based on evaluation metrics.

Uploaded by

shinfana89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views4 pages

Air Quality Project

The document outlines a project focused on predicting Air Quality Index (AQI) using machine learning, highlighting the problem of air pollution and the limitations of traditional monitoring systems. The project includes objectives such as developing a robust ML model, comparing algorithms, and visualizing data trends, with a structured workflow from data collection to model evaluation. Key findings indicate strong correlations between AQI and pollutants like PM2.5, with Random Forest being the most effective algorithm based on evaluation metrics.

Uploaded by

shinfana89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Phase-2 Submission: Air Quality

Prediction Project
**Student Name:**
K. SRITHIKA
A. SHIFANA
P. SWETHA
B. SUBASHINI
J. SHOBANA

**Register Number:**
621123205053
621123205051
621123205055
621123205054
621123205052

**Institution:** Idhaya Engineering College for Women

**Department:** B.Tech Information Technology

**Date of Submission:** [Insert Date]

**GitHub Repository Link:** [Insert Link]

1. Problem Statement
Air pollution severely affects environmental and human health. Traditional air quality
monitoring systems lack predictive capabilities and often don't offer actionable early
warnings. This project aims to build a regression-based machine learning model to predict
Air Quality Index (AQI) using real-time environmental and pollutant data. Predictive
insights will empower citizens and governments to make proactive decisions to mitigate
health risks and environmental impact.

2. Project Objectives
- Develop a robust ML model to predict AQI levels based on environmental features.
- Compare performance of multiple algorithms (Linear Regression, Random Forest,
XGBoost).
- Identify key pollutants influencing AQI.
- Visualize trends and patterns in air quality data.
- Create a user-friendly dashboard or tool (optional deployment).
- Adjust project goals post-EDA for improved performance and interpretability.
3. Flowchart of the Project Workflow
1. Data Collection
2. Data Cleaning & Preprocessing
3. Exploratory Data Analysis
4. Feature Engineering
5. Model Building
6. Model Evaluation
7. Visualization & Insights
8. (Optional) Deployment

4. Data Description
- Dataset: Delhi Air Quality Dataset
- Source: Kaggle (https://www.kaggle.com/datasets)
- Type: Structured, time-series
- Features: PM2.5, PM10, NO2, CO, SO2, O3, temperature, humidity, wind speed
- Target Variable: AQI
- Records: ~30,000 rows, 15+ features
- Nature: Static dataset with potential for real-time API extension

5. Data Preprocessing
- Handled missing values using forward-fill and interpolation.
- Removed duplicate entries.
- Converted date columns to datetime format.
- Standardized pollutant values to common units.
- One-hot encoded categorical weather descriptions.
- Normalized numerical columns using Min-Max Scaling.
- Final cleaned dataset saved for modeling.

6. Exploratory Data Analysis (EDA)


Univariate Analysis:
- PM2.5 and PM10 show right-skewed distributions.
- AQI ranges mostly from 100 to 350 (Moderate to Hazardous).

Bivariate Analysis:
- Strong correlation between AQI and PM2.5 (r = 0.87).
- Seasonal variation: AQI increases during winter.

Insights:
- PM2.5, PM10, and NO2 are the most influential pollutants.
- Weekends show slightly lower pollution levels.
- AQI is affected by temperature and humidity to some extent.
7. Feature Engineering
- Created new feature: Pollution Category (Good, Moderate, Poor, etc.).
- Extracted datetime components: hour, weekday, month.
- Combined PM2.5 and PM10 as a composite feature.
- Removed redundant columns (e.g., city names if constant).
- Considered polynomial features (PM2.5^2) for non-linear models.

8. Model Building
Algorithms Tried:
- Linear Regression
- Random Forest Regressor
- XGBoost Regressor

Why These?
- Linear Regression for baseline
- Random Forest for robustness and interpretability
- XGBoost for performance in structured data

Data Split: 80% training, 20% test using stratified sampling where applicable

Evaluation Metrics:
- MAE: ~28
- RMSE: ~35
- R² Score: ~0.85 (Random Forest best)

9. Visualization of Results & Model Insights


- Feature Importance Plot: PM2.5 and NO2 most significant
- Residual Plots: Random Forest shows least error residuals
- AQI Prediction vs Actual: Close alignment in most data segments
- Confusion in Categories: Misclassification mainly in borderline cases (e.g., Moderate vs
Poor)

10. Tools and Technologies Used


- Language: Python
- IDE: Google Colab
- Libraries: pandas, numpy, scikit-learn, seaborn, matplotlib, xgboost
- Visualization: Plotly, seaborn
- Version Control: GitHub
- (Optional): Streamlit for interface

11. Team Members and Contributions


| Name | Contribution |
|--------------|----------------------------------|
| K. Srithika | Data Collection & Integration |
| A. Shifana | Data Cleaning & Preprocessing |
| P. Swetha | EDA & Feature Engineering |
| B. Subashini | Model Training & Evaluation |
| J. Shobana | Documentation & Visualization |

You might also like