3/27/25, 11:02 PM LSTM.
ipynb - Colab
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive
drive.mount("/content/drive")
Mounted at /content/drive
df = pd.read_excel("/content/drive/MyDrive/data_2017_4_2023.xlsx",index_col = 'Parameter'
df.head()
Electricity(in MU)
Parameter
2017-04-01 162.1
2017-04-02 161.3
2017-04-03 162.2
2017-04-04 164.0
2017-04-05 165.2
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
df.plot(figsize = (12,6))
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 1/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
<Axes: xlabel='Parameter'>
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(df['Electricity(in MU)'], model='additive', period=365)
# Plot to visualize
result.plot()
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 2/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
import numpy as np
residual_std = np.std(result.resid.dropna()) # drop NaN because trend/seasonal edges wil
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 3/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
threshold = 2 * residual_std
# Flag points where residual exceeds threshold
df['outlier'] = np.abs(result.resid) > threshold
# Replace outliers with trend + seasonal (without noisy residual)
df['cleaned_electricity'] = df['Electricity(in MU)'] # copy original
df.loc[df['outlier'], 'cleaned_electricity'] = (result.trend + result.seasonal)[df['outli
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 6))
plt.plot(df['Electricity(in MU)'], label='Original Data', alpha=0.6)
plt.plot(df['cleaned_electricity'], label='Outlier Removed (Trend + Seasonal)', alpha=0.9
plt.legend()
plt.show()
Seasonal decomposition (seasonal_decompose()) is sensitive to incomplete seasonal periods
at the edges. At the very end of the time series, the decomposition model cannot properly
calculate the seasonal component (because it doesn't have future data to detect full seasonal
cycles). This is why the residuals at the start and end of the data are often inaccurate or
missing. As a result, outliers near the end (or start) may not be detected correctly.
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 4/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
apply IQR (Interquartile Range) outlier detection specifically for the last month of 2023. as there
is as visible outlier
# Filter December 2023
last_month = df.loc['2023-12'].copy()
# IQR Calculation
Q1 = last_month['Electricity(in MU)'].quantile(0.25)
Q3 = last_month['Electricity(in MU)'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# Detect outliers
last_month['is_outlier'] = (last_month['Electricity(in MU)'] < lower_bound) | (last_month
# Replace outliers with median
median_value = last_month['Electricity(in MU)'].median()
last_month.loc[last_month['is_outlier'], 'cleaned_electricity'] = median_value
df.update(last_month)
df.tail()
Electricity(in MU) outlier cleaned_electricity
Parameter
2024-12-27 182.07 False 182.07
2024-12-28 187.75 False 187.75
2024-12-29 188.07 False 188.07
2024-12-30 195.84 False 195.84
2024-12-31 201.81 False 201.81
# Assuming 'df' has DateTime index and 'Electricity' column
df['month'] = df.index.month
df['dayofyear'] = df.index.dayofyear
df['dayofweek'] = df.index.dayofweek
df['is_weekend'] = (df['dayofweek'] >= 5).astype(int)
# Fourier Features for Yearly Seasonality (365 days)
df['sin_day'] = np.sin(2 * np.pi * df['dayofyear'] / 365)
df['cos_day'] = np.cos(2 * np.pi * df['dayofyear'] / 365)
df
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 5/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
Electricity(in
outlier cleaned_electricity month dayofyear dayofweek
MU)
Parameter
2017-04-
162.10 False 162.10 4 91 5
01
2017-04-
161.30 False 161.30 4 92 6
02
2017-04-
162.20 False 162.20 4 93 0
03
2017-04-
164.00 False 164.00 4 94 1
04
2017-04-
165.20 False 165.20 4 95 2
05
... ... ... ... ... ... ...
2024-12-
182.07 False 182.07 12 362 4
27
2024-12-
187 75 False 187 75 12 363 5
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
df['scaled_electricity'] = scaler.fit_transform(df['cleaned_electricity'].values.reshape(
features = ['scaled_electricity', 'month', 'dayofyear', 'dayofweek', 'is_weekend', 'sin_d
# Convert DataFrame into supervised learning format (sliding window creation)
def create_lstm_data(df, features, time_steps=30):
X, y = [], []
for i in range(len(df) - time_steps):
X.append(df[features].iloc[i:i + time_steps].values)
y.append(df['scaled_electricity'].iloc[i + time_steps])
return np.array(X), np.array(y)
#30-day sliding window (keep your time_steps = 30)
time_steps = 30
X, y = create_lstm_data(df, features, time_steps)
print(f"X shape: {X.shape}, y shape: {y.shape}")
X shape: (2802, 30, 7), y shape: (2802,)
# Train-Test Split (80% train, 20% test)
split_index = int(0.8 * len(X))
X_train, X_test = X[:split_index], X[split_index:]
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 6/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
y_train, y_test = y[:split_index], y[split_index:]
print(f"Train shape: X={X_train.shape}, y={y_train.shape}")
print(f"Test shape: X={X_test.shape}, y={y_test.shape}")
Train shape: X=(2241, 30, 7), y=(2241,)
Test shape: X=(561, 30, 7), y=(561,)
from sklearn.model_selection import train_test_split
# Split train into train and validation (e.g., 80% train, 20% validation)
X_train_final, X_val, y_train_final, y_val = train_test_split(
X_train, y_train, test_size=0.2, shuffle=False # No shuffle for time series!
)
print(f"Train shape: X={X_train_final.shape}, y={y_train_final.shape}")
print(f"Validation shape: X={X_val.shape}, y={y_val.shape}")
print(f"Test shape: X={X_test.shape}, y={y_test.shape}")
Train shape: X=(1792, 30, 7), y=(1792,)
Validation shape: X=(449, 30, 7), y=(449,)
Test shape: X=(561, 30, 7), y=(561,)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
model = Sequential()
# 1st LSTM layer (stacked)
model.add(LSTM(64, activation='tanh', return_sequences=True, input_shape=(30, 7)))
model.add(Dropout(0.3))
# 2nd LSTM layer
model.add(LSTM(32, activation='tanh'))
model.add(Dropout(0.3))
# Final dense output
model.add(Dense(1))
# Compile with Huber Loss (more robust to outliers)
model.compile(optimizer=Adam(learning_rate=0.001), loss='huber', metrics=['mae'])
# Callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True
reduce_lr = ReduceLROnPlateau(monitor='val_loss', patience=7, factor=0.5, verbose=1)
# Train
history = model.fit(X_train_final, y_train_final,
validation_data=(X_val, y_val),
epochs=50,
batch_size=32,
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 7/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
callbacks=[early_stopping, reduce_lr])
56/56 ━━━━━━━━━━━━━━━━━━━━ 2s 33ms/step - loss: 0.0269 - mae: 0.1786 - val_loss: 0
Epoch 23/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 2s 24ms/step - loss: 0.0271 - mae: 0.1819 - val_loss: 0
Epoch 24/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 4s 44ms/step - loss: 0.0252 - mae: 0.1776 - val_loss: 0
Epoch 25/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - loss: 0.0308 - mae: 0.1947 - val_loss: 0
Epoch 26/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 1s 23ms/step - loss: 0.0275 - mae: 0.1803 - val_loss: 0
Epoch 27/50
56/56 ━━━━━━━━━━━━━━━━━━━━ 3s 40ms/step - loss: 0.0254 - mae: 0.1744 - val_loss: 0
Epoch 28/50
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 8/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
import matplotlib.pyplot as plt
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.legend()
plt.show()
y_pred_scaled = model.predict(X_test)
y_pred_original = scaler.inverse_transform(y_pred_scaled.reshape(-1, 1)).flatten()
y_test_original = scaler.inverse_transform(y_test.reshape(-1, 1)).flatten()
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step
import matplotlib.pyplot as plt
plt.figure(figsize=(12,6))
plt.plot(y_test_original, label='Actual', marker='o', linestyle='-')
plt.plot(y_pred_original, label='Predicted', marker='x', linestyle='--')
plt.xlabel('Time Step')
plt.ylabel('Electricity Consumption (Original Scale)')
plt.title('Actual vs Predicted on Test Data (Original Scale)')
plt.legend()
plt.show()
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 9/10
3/27/25, 11:02 PM LSTM.ipynb - Colab
https://colab.research.google.com/drive/1XEGrJRqqFy7kG6MkzFpxVW99XGreospe#scrollTo=3sVIjWPBo4W7&printMode=true 10/10