Assignment- 1 (Unit 2)
Date of issue: Last date of submission:
Question 1: Explain the differences between traditional programming and machine learning
in the context of energy analytics. Provide an example of how each approach can be used to
solve a specific problem related to energy consumption prediction.
Certainly! Traditional programming and machine learning (ML) approach problems in
fundamentally different ways, especially in the context of energy analytics.
Traditional Programming
Approach:
In traditional programming, a developer explicitly writes rules and algorithms to process data
and produce outputs. This method relies heavily on predefined logic and heuristics, meaning
the programmer must have a deep understanding of the problem domain.
Example in Energy Analytics:
For predicting energy consumption, a traditional programming approach might involve
creating a detailed algorithm that considers various factors such as time of day, temperature,
and historical consumption patterns. For instance, you might create a rule-based model that
states:
- If it's daytime, increase the prediction by a fixed percentage based on the historical average
for that time.
- If the temperature is above a certain threshold, adjust the prediction upwards because
cooling systems are likely to be in use.
This model would require continual tweaking and updating based on new data or changing
conditions, as it is rigid and relies on fixed rules.
Machine Learning
Approach:
Machine learning, on the other hand, uses data-driven techniques to learn patterns from
historical data without explicit programming for every rule. Instead, the model is trained on a
dataset, which allows it to identify relationships and make predictions based on new, unseen
data.
Example in Energy Analytics:
For energy consumption prediction using ML, you could employ a regression model or a
time-series forecasting technique. Here’s how it might work:
1. Data Collection:Gather historical energy consumption data along with various influencing
factors (e.g., temperature, occupancy, time of year).
2. Model Training:Use this historical data to train a machine learning model (e.g., a neural
network, random forest, or gradient boosting).
3. Prediction: Once trained, the model can predict future energy consumption based on new
inputs (e.g., upcoming weather forecasts, day of the week).
For example, you might find that the ML model can identify complex interactions between
multiple factors that are not easily captured by a traditional rule-based system, leading to
more accurate predictions.
Summary
Traditional Programming: Explicitly defined rules and heuristics; suitable for well-
understood problems but can be inflexible and require constant updates.
Example: Rule-based algorithm adjusting energy predictions based on time and temperature.
Machine Learning: Data-driven, learns from patterns; more adaptive and can handle complex
relationships in data.
Example: Regression model predicting energy consumption based on historical data and
various features like weather and occupancy.
Both approaches have their place, but ML often offers more flexibility and accuracy in
dynamic fields like energy analytics.
Question 2: Describe how each element (representation, data collection, data preparation,
model selection, model training, model evaluation, and prediction) would be implemented in
a project aimed at forecasting energy demand for a city.
1. Representation
Implementation:
Feature Selection: Identify relevant features that may influence energy demand. This could
include:
o Historical energy consumption data (hourly/daily)
o Weather data (temperature, humidity, precipitation)
o Time features (hour of the day, day of the week, holidays)
o Demographic data (population density, economic indicators)
o Events (local festivals, sports events)
Target Variable: Define the target variable as the total energy consumption for the city,
aggregated by the desired time interval (e.g., hourly, daily).
2. Data Collection
Implementation:
Sources: Gather data from various sources:
o Energy utility companies for historical consumption data.
o Meteorological departments for weather data.
o Local government databases for demographic and event data.
APIs and Databases: Utilize APIs to automate data retrieval (e.g., weather APIs) and
maintain a database for storage.
3. Data Preparation
Implementation:
Cleaning: Remove missing values, outliers, and duplicate records. Ensure data is
consistent and formatted correctly.
Transformation: Convert categorical variables (e.g., day of the week) into numerical
format using one-hot encoding.
Feature Engineering: Create additional features that may enhance model
performance, such as lagged consumption values (previous day's demand) or rolling
averages.
Normalization/Scaling: Normalize or scale numerical features to improve model
performance, particularly for algorithms sensitive to feature scales.
4. Model Selection
Implementation:
Algorithm Choice: Evaluate various algorithms suitable for time series forecasting and
regression tasks, such as:
o Linear Regression
o Decision Trees or Random Forests
o Gradient Boosting Machines (GBM)
o Recurrent Neural Networks (RNN) for time series data
Framework: Choose appropriate machine learning frameworks (e.g., scikit-learn,
TensorFlow, or PyTorch) based on the selected algorithms.
5. Model Training
Implementation:
Training and Validation Split: Divide the dataset into training and validation sets
(e.g., 80% training, 20% validation) to assess model performance.
Hyperparameter Tuning: Use techniques like Grid Search or Random Search to
optimize hyperparameters for the selected model.
Training Process: Fit the model to the training data, allowing it to learn the
relationships between features and energy demand.
6. Model Evaluation
Implementation:
Metrics Selection: Choose appropriate evaluation metrics based on the project goals,
such as:
o Mean Absolute Error (MAE)
o Root Mean Squared Error (RMSE)
o Mean Absolute Percentage Error (MAPE)
Validation: Evaluate model performance using the validation set and assess
overfitting by checking performance on unseen data.
Cross-Validation: Optionally, employ k-fold cross-validation for a more robust
assessment of model performance.
7. Prediction
Implementation:
Future Input Data: Collect and prepare future input data (e.g., weather forecasts,
upcoming events) for prediction.
Model Deployment: Deploy the trained model in a production environment (e.g.,
using cloud platforms) to allow for real-time predictions.
Real-time Prediction: Implement a system that regularly fetches new data, updates
the input features, and generates energy demand forecasts at specified intervals (e.g.,
hourly, daily).
Reporting and Visualization: Create dashboards or reports to visualize the predicted
energy demand, enabling stakeholders to make informed decisions based on the
forecasts.
Question 3: For hourly energy consumption data for a year, describe the steps for preparing
this data for a machine learning model using Pandas. Include how you would handle missing
values, normalize the data, and create new features such as day of the week or hour of the
day.
1. Load the Data
import pandas as pd
Load the data
data = pd.read_csv('hourly_energy_consumption.csv')
2. Inspect the Data
Inspect the first few rowsprint([Link]())
Check for missing values and data
typesprint([Link]())print([Link]().sum())
3. Handle Missing Values
Approach:
You can fill missing values using various strategies, depending on the nature of the data.
Common approaches include forward filling, backward filling, or using interpolation.
Forward fill to handle missing values
data['consumption'] = data['consumption'].fillna(method='ffill')
Alternatively, you could use interpolation# data['consumption'] =
data['consumption'].interpolate()
4. Convert Date/Time Column
Ensure your date/time column is in the correct date time format. If your data set includes a
time stamp:
Convert the 'time stamp' column to date time
data['time stamp'] = pd.to_date time(data['time stamp'])
5. Set the Index (Optional)
Setting the time stamp as the index can be useful for time series analysis.
python
Copy code
# Set time stamp as the index
data.set_index('time stamp', inplace =True)
6. Create New Features
You can extract useful features from the date time index:
python
Copy code
# Create new features
data['hour'] = [Link]
data['day_of_week'] = [Link] of week # Monday=0, Sunday=6
data['month'] = [Link]
data['year'] = [Link]
data['is_weekend'] = (data['day_of_week'] >= 5).as-type(int) # 1 if
weekend, 0 if weekday
7. Normalize the Data
Normalization helps in scaling the data to a standard range, which is particularly useful for
algorithms sensitive to feature scales.
python
Copy code
from [Link] import MinMaxScaler
# Initialize the scaler
scaler = MinMaxScaler()
Normalize the consumption data
data['consumption_normalized'] =
scaler.fit_transform(data[['consumption']])
8. Drop Unnecessary Columns
If there are any columns you won't use in your model (like the original consumption), you
can drop them:
[Link](columns=['consumption'], inplace=True)
9. Final Data Preparation
Ensure the data is ready for modeling by checking its shape and content:
Check the final shape and head of the prepared
dataprint([Link])print([Link]())
Question 4: Take a dataset of monthly energy consumption over the past 10 years, use
NumPy to calculate the following:
a. Mean and median monthly energy consumption.
b. Standard deviation and variance of monthly energy consumption.
c. The 25th and 75th percentiles of the monthly energy consumption.
Provide the Python code you would use to perform these calculations.
import numpy as np
import pandas as pd
Load the dataset (assuming the data is in a CSV file with a column named
'monthly_consumption')
data = pd.read_csv('monthly_energy_consumption.csv')
Extract the monthly consumption values into a NumPy array
monthly_consumption = data['monthly_consumption'].values
a. Mean and median monthly energy consumption
mean_consumption = [Link](monthly_consumption)
median_consumption = [Link](monthly_consumption)
print(f'Mean Monthly Energy Consumption: {mean_consumption}')
print(f'Median Monthly Energy Consumption: {median_consumption}')
b. Standard deviation and variance of monthly energy consumption
std_deviation = [Link](monthly_consumption)
variance = [Link](monthly_consumption)
print(f'Standard Deviation of Monthly Energy Consumption: {std_deviation}')
print(f'Variance of Monthly Energy Consumption: {variance}')
c. The 25th and 75th percentiles of the monthly energy consumption
percentile_25 = [Link](monthly_consumption, 25)
percentile_75 = [Link](monthly_consumption, 75)
print(f'25th Percentile of Monthly Energy Consumption: {percentile_25}')
print(f'75th Percentile of Monthly Energy Consumption: {percentile_75}')