0% found this document useful (0 votes)
53 views10 pages

LSTM - Ipynb - Colab

The document outlines a Jupyter notebook for analyzing electricity consumption data using LSTM (Long Short-Term Memory) neural networks. It includes steps for data preprocessing, outlier detection, feature engineering, and model training, along with visualization of results. Key techniques employed include seasonal decomposition, IQR for outlier detection, and scaling of features for the LSTM model.

Uploaded by

ashupandul754
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views10 pages

LSTM - Ipynb - Colab

The document outlines a Jupyter notebook for analyzing electricity consumption data using LSTM (Long Short-Term Memory) neural networks. It includes steps for data preprocessing, outlier detection, feature engineering, and model training, along with visualization of results. Key techniques employed include seasonal decomposition, IQR for outlier detection, and scaling of features for the LSTM model.

Uploaded by

ashupandul754
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

3/27/25, 11:02 PM LSTM.

ipynb - Colab

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from google.colab import drive


drive.mount("/content/drive")

Mounted at /content/drive

df = pd.read_excel("/content/drive/MyDrive/data_2017_4_2023.xlsx",index_col = 'Parameter'

df.head()

Electricity(in MU)

Parameter

2017-04-01 162.1

2017-04-02 161.3

2017-04-03 162.2

2017-04-04 164.0

2017-04-05 165.2

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

df.plot(figsize = (12,6))

https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 1/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

<Axes: xlabel='Parameter'>

from statsmodels.tsa.seasonal import seasonal_decompose

result = seasonal_decompose(df['Electricity(in MU)'], model='additive', period=365)

# Plot to visualize
result.plot()

https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 2/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

import numpy as np

residual_std = np.std(result.resid.dropna()) # drop NaN because trend/seasonal edges wil

https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 3/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

threshold = 2 * residual_std

# Flag points where residual exceeds threshold


df['outlier'] = np.abs(result.resid) > threshold

# Replace outliers with trend + seasonal (without noisy residual)


df['cleaned_electricity'] = df['Electricity(in MU)'] # copy original

df.loc[df['outlier'], 'cleaned_electricity'] = (result.trend + result.seasonal)[df['outli

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(df['Electricity(in MU)'], label='Original Data', alpha=0.6)
plt.plot(df['cleaned_electricity'], label='Outlier Removed (Trend + Seasonal)', alpha=0.9
plt.legend()
plt.show()

Seasonal decomposition (seasonal_decompose()) is sensitive to incomplete seasonal periods


at the edges. At the very end of the time series, the decomposition model cannot properly
calculate the seasonal component (because it doesn't have future data to detect full seasonal
cycles). This is why the residuals at the start and end of the data are often inaccurate or
missing. As a result, outliers near the end (or start) may not be detected correctly.

https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 4/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

apply IQR (Interquartile Range) outlier detection specifically for the last month of 2023. as there
is as visible outlier

# Filter December 2023


last_month = df.loc['2023-12'].copy()

# IQR Calculation
Q1 = last_month['Electricity(in MU)'].quantile(0.25)
Q3 = last_month['Electricity(in MU)'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Detect outliers
last_month['is_outlier'] = (last_month['Electricity(in MU)'] < lower_bound) | (last_month

# Replace outliers with median


median_value = last_month['Electricity(in MU)'].median()
last_month.loc[last_month['is_outlier'], 'cleaned_electricity'] = median_value

df.update(last_month)

df.tail()

Electricity(in MU) outlier cleaned_electricity

Parameter

2024-12-27 182.07 False 182.07

2024-12-28 187.75 False 187.75

2024-12-29 188.07 False 188.07

2024-12-30 195.84 False 195.84

2024-12-31 201.81 False 201.81

# Assuming 'df' has DateTime index and 'Electricity' column


df['month'] = df.index.month
df['dayofyear'] = df.index.dayofyear
df['dayofweek'] = df.index.dayofweek
df['is_weekend'] = (df['dayofweek'] >= 5).astype(int)

# Fourier Features for Yearly Seasonality (365 days)


df['sin_day'] = np.sin(2 * np.pi * df['dayofyear'] / 365)
df['cos_day'] = np.cos(2 * np.pi * df['dayofyear'] / 365)

df

https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 5/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

Electricity(in
outlier cleaned_electricity month dayofyear dayofweek
MU)

Parameter

2017-04-
162.10 False 162.10 4 91 5
01

2017-04-
161.30 False 161.30 4 92 6
02

2017-04-
162.20 False 162.20 4 93 0
03

2017-04-
164.00 False 164.00 4 94 1
04

2017-04-
165.20 False 165.20 4 95 2
05

... ... ... ... ... ... ...

2024-12-
182.07 False 182.07 12 362 4
27

2024-12-
187 75 False 187 75 12 363 5
 

Next steps: Generate code with df toggle_off View recommended plots New interactive sheet

from sklearn.preprocessing import RobustScaler


scaler = RobustScaler()
df['scaled_electricity'] = scaler.fit_transform(df['cleaned_electricity'].values.reshape(

features = ['scaled_electricity', 'month', 'dayofyear', 'dayofweek', 'is_weekend', 'sin_d

# Convert DataFrame into supervised learning format (sliding window creation)


def create_lstm_data(df, features, time_steps=30):
X, y = [], []
for i in range(len(df) - time_steps):
X.append(df[features].iloc[i:i + time_steps].values)
y.append(df['scaled_electricity'].iloc[i + time_steps])
return np.array(X), np.array(y)

#30-day sliding window (keep your time_steps = 30)


time_steps = 30
X, y = create_lstm_data(df, features, time_steps)

print(f"X shape: {X.shape}, y shape: {y.shape}")

X shape: (2802, 30, 7), y shape: (2802,)

# Train-Test Split (80% train, 20% test)


split_index = int(0.8 * len(X))
X_train, X_test = X[:split_index], X[split_index:]
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 6/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

y_train, y_test = y[:split_index], y[split_index:]

print(f"Train shape: X={X_train.shape}, y={y_train.shape}")


print(f"Test shape: X={X_test.shape}, y={y_test.shape}")

Train shape: X=(2241, 30, 7), y=(2241,)


Test shape: X=(561, 30, 7), y=(561,)

from sklearn.model_selection import train_test_split

# Split train into train and validation (e.g., 80% train, 20% validation)
X_train_final, X_val, y_train_final, y_val = train_test_split(
X_train, y_train, test_size=0.2, shuffle=False # No shuffle for time series!
)

print(f"Train shape: X={X_train_final.shape}, y={y_train_final.shape}")


print(f"Validation shape: X={X_val.shape}, y={y_val.shape}")
print(f"Test shape: X={X_test.shape}, y={y_test.shape}")

Train shape: X=(1792, 30, 7), y=(1792,)


Validation shape: X=(449, 30, 7), y=(449,)
Test shape: X=(561, 30, 7), y=(561,)

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

model = Sequential()

# 1st LSTM layer (stacked)


model.add(LSTM(64, activation='tanh', return_sequences=True, input_shape=(30, 7)))
model.add(Dropout(0.3))

# 2nd LSTM layer


model.add(LSTM(32, activation='tanh'))
model.add(Dropout(0.3))

# Final dense output


model.add(Dense(1))

# Compile with Huber Loss (more robust to outliers)


model.compile(optimizer=Adam(learning_rate=0.001), loss='huber', metrics=['mae'])

# Callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True
reduce_lr = ReduceLROnPlateau(monitor='val_loss', patience=7, factor=0.5, verbose=1)

# Train
history = model.fit(X_train_final, y_train_final,
validation_data=(X_val, y_val),
epochs=50,
batch_size=32,

https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 7/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

callbacks=[early_stopping, reduce_lr])

56/56 ━━━━━━━━━━━━━━━━━━━━ 2s 33ms/step - loss: 0.0269 - mae: 0.1786 - val_loss: 0 


Epoch 23/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 2s 24ms/step - loss: 0.0271 - mae: 0.1819 - val_loss: 0
Epoch 24/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 4s 44ms/step - loss: 0.0252 - mae: 0.1776 - val_loss: 0
Epoch 25/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - loss: 0.0308 - mae: 0.1947 - val_loss: 0
Epoch 26/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 1s 23ms/step - loss: 0.0275 - mae: 0.1803 - val_loss: 0
Epoch 27/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 3s 40ms/step - loss: 0.0254 - mae: 0.1744 - val_loss: 0
Epoch 28/50


https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 8/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

import matplotlib.pyplot as plt

plt.plot(history.history['loss'], label='Train Loss')


plt.plot(history.history['val_loss'], label='Val Loss')
plt.legend()
plt.show()

y_pred_scaled = model.predict(X_test)
y_pred_original = scaler.inverse_transform(y_pred_scaled.reshape(-1, 1)).flatten()

y_test_original = scaler.inverse_transform(y_test.reshape(-1, 1)).flatten()

18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step

import matplotlib.pyplot as plt

plt.figure(figsize=(12,6))
plt.plot(y_test_original, label='Actual', marker='o', linestyle='-')
plt.plot(y_pred_original, label='Predicted', marker='x', linestyle='--')
plt.xlabel('Time Step')
plt.ylabel('Electricity Consumption (Original Scale)')
plt.title('Actual vs Predicted on Test Data (Original Scale)')
plt.legend()
plt.show()


 

https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 9/10
3/27/25, 11:02 PM LSTM.ipynb - Colab

https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 10/10

You might also like