Team Name : Entangled Minds
Team Leader Name : Kunal Patil
Problem Statement : 5
Team Members
Team Member-1:
Team Leader:
Name: Vinit Solanki
Name: Kunal Patil
College: Vivekanand Education Society's Institute Of
College: Vivekanand Education Society's Institute
Technology (VESIT)
Of Technology (VESIT)
Team Member-2: Team Member-3:
Name: Aum Patel Name: Bhushan Nargolkar
College: Vivekanand Education Society's Institute College: Vivekanand Education Society's Institute Of
Of Technology (VESIT) Technology (VESIT)
Brief about the Idea:
Problem Statement
Traditional weather models often miss fine-scale cloud dynamics and sub-
grid processes, leading to inaccurate short-term forecasts. Accurate
nowcasting is vital for precipitation prediction and solar energy
management, but existing AI methods like optical flow or basic CNN/RNNs
yield blurry and inconsistent results. This creates an opportunity to use
advanced AI with multi-spectral satellite data for precise and temporally
coherent cloud forecasting.
Idea
Our innovative solution leverages advanced deep learning and
temporal modeling to predict future cloud states using satellite
imagery. By fusing spectral, spatial, and temporal information, our
architecture can anticipate cloud movement, intensity, and type,
enabling more accurate weather forecasting and disaster
preparedness.
Opportunity should be able to explain the following:
How is it Different? How Does It Solve the Problem? UPS
Context-Aware Future
Multi-modal Fusion: Integrates VIS,
SWIR, and IR channels for a holistic Accurate Cloud Prediction: By Prediction
understanding of cloud properties, understanding both spatial and The system doesn’t just
discarding less relevant WV data for temporal evolution, the model extrapolate linearly it
efficiency.
predicts not only where clouds learns motion trends,
Feature-wise Linear Modulation will be, but also their type and rotations, and spectral
(FiLM): Injects context at every UNet intensity. signatures, providing
layer, ensuring the model is always robust, real-world cloud
“aware” of past dynamics. evolution forecasts.
Efficient Computation: Smart
Temporal Consistency Module (TCM):
resizing and channel selection
Uses 3D convolutions and attention to reduce computational overhead Plug-and-Play for Satellite Data End-to-End Learnable
model complex motion patterns (e.g.,
without sacrificing accuracy.
cyclone rotation), far beyond simple frame- All components (TSM, FiLM,
to-frame prediction. Designed to work with
multi-channel satellite UNet, TCM) are trained
Temporal Spectral Module
Robust to Missing Data: jointly, optimizing for both
data, making it adaptable
(TSM): Encodes cloud motion and Interpolates missing timestamps, pixel-level accuracy and
for various remote sensing
spectral relationships into rich ensuring continuity and reliability
embeddings, capturing nuanced missions. structural similarity.
trends across time and bands.
in operational settings.
List of features offered by the solution
Multi-channel input support (VIS, SWIR, IR) Temporal context encoding (TSM)
Spectral and spatial fusion for robust embeddings Gaussian noise modeling for realistic data augmentation
Feature-wise context injection (FiLM) at every UNet layer Predicts multiple future frames simultaneously
Autoregressive model sampling Loss function combining MSE and SSIM for perceptual and pixel accuracy
Handles missing timestamps via interpolation Computationally efficient via smart resizing and normalization
Process flow diagram or Use-case diagram
Actors
Meteorological Data Provider: Supplies satellite imagery.
System User (Meteorologist, Researcher): Consumes cloud
movement predictions.
AI Prediction System: Processes data and generates
predictions.
Use Cases
Ingest Satellite Data: System receives and stores multi-channel satellite
frames.
Preprocess Data: System resizes, normalizes, and interpolates data for
consistency.
Generate Context Embedding: System fuses temporal, spectral, and
spatial information.
Predict Future Cloud States: System uses DDPM and UNet with FiLM/TCM
to forecast next frames.
Evaluate Predictions: System computes loss and updates model weights.
Visualize and Export Results: System provides predicted cloud states to
users for analysis.
The following diagram illustrates the flow of data and knowledge
Relationships
Meteorological Data Provider →
Ingest Satellite Data
within the architecture, highlighting the interplay between temporal,
System User →Visualize and Export Results
spectral, and spatial components, and the innovative use of FiLM
and TCM modules to guide the UNet in learning complex cloud
AI Prediction System: Executes all internal use cases (preprocessing,
embedding, prediction, evaluation). behaviors.
Wireframes/Mock diagrams of the proposed solution
Data Input Interface
Upload Section: Allows users to upload multi-channel satellite imagery (VIS, SWIR, TIR1/IR). Option to select
time frames or batch upload sequences.
Channel Selection Panel: Visual toggles for selecting/deselecting channels (default: WV off).
Preprocessing Summary: Displays resizing, normalization status, and interpolation actions taken.
Data Visualization Panel Prediction Output
Frame Viewer: Interactive slider to browse through input frames (Xt-3 to
Xt). Hover/zoom to inspect pixel values across channels.
Result Display: Side-by-
Channel Overlay: Toggle overlays to compare VIS, SWIR, and IR side comparison of input
channels. sequence and predicted
Missing Data Alert: Visual indicator if any timestamps are interpolated. future frames (Xt+1, Xt+2).
Download Options: Export
Model Configuration predictions as images,
Module Overview:Visual blocks for TSM, Conditional UNet, FiLM, TCM. videos, or data arrays.
Tooltips explain each module’s role. Performance Metrics:
Parameter Settings: Adjustable fields for batch size, learning rate, λ Real-time display of MSE,
(loss balance), etc. SSIM, and combined loss for
Progress Tracker: Step-by-step progress bar: Data Ingest → current batch.
Preprocessing → Embedding → Prediction→Evaluation.
Architecture diagram of the proposed solution
Module Function Innovation
Cleans, resizes,
Efficient, robust
Data Preprocessing normalizes,
input
interpolates data
Fuses temporal,
Rich, context-driven
TSM spectral, spatial info
encoding
(Z_enc)
Denoises and
Conditional UNet + Dynamic, adaptive
predicts with context
FiLM learning
at all layers
Enforces temporal
Realistic motion
TCM consistency with 3D
prediction
conv + attention
Generative Sharp, high-fidelity
DDPM
forecasting via output
diff i
Our solution leverages cutting-edge deep learning techniques to predict cloud movement and evolution from satellite imagery. By fusing temporal, spectral, and
spatial information, the system generates sharp, context-aware forecasts of future cloud states, supporting meteorological analysis and disaster preparedness.
Architecture
Multi-Channel Data Ingestion:
Utilizes VIS, SWIR, and TIR1/IR channels for a comprehensive input
representation.
Excludes WV for computational efficiency.
Advanced Preprocessing:
Resizes all frames to 124×124 pixels.
Normalizes each channel using dataset-wide mean and standard deviation.
Interpolates missing timestamps to maintain temporal continuity.
Temporal Spectral Module (TSM):
Encodes past cloud dynamics and spectral features into a rich context
embedding (Z_enc).
Captures motion trends, intensity changes, and inter-channel relationships.
Conditional UNet with FiLM:
Denoising UNet architecture, conditioned at every layer with Z_enc via
Feature-wise Linear Modulation (FiLM).
Ensures the model is always aware of past motion and spectral context.
Temporal Consistency Module (TCM):
Applies 3D convolutions and attention to bottleneck features.
Enforces smooth, physically realistic cloud motion and structure in predictions.
Generative Forecasting (DDPM):
Adds Gaussian noise to input, training the model to denoise and predict future
frames.
Enables the generation of sharp, high-fidelity cloud forecasts.
Optimization Techniques
1.Dynamic Batch Processing: Adjusts batch sizes during training and inference to maximize hardware utilization and
speed.
2.Automated Hyperparameter Tuning: Uses tools and algorithms to automatically find the best learning rates,
batch sizes, and other model parameters for optimal performance.
3.Efficient Model Architectures: Implements lightweight and scalable network designs (e.g., optimized UNet, FiLM,
TCM) to reduce computation without sacrificing accuracy.
4.Model Compression: Applies pruning, quantization, or knowledge distillation to shrink model size, enabling faster
inference and lower memory usage.
5.Adaptive Resource Allocation: Dynamically allocates computational resources based on workload, ensuring
efficient processing even with fluctuating data volumes.
6.Early Stopping and Regularization: Uses early stopping, dropout, and other regularization techniques to prevent
overfitting and unnecessary computation..
Technologies to be used in the solution:
Deep Learning: PyTorch, TensorFlow, PyTorch Lightning
Data handling: Numpy, Pandas, h5py Dask For large array computation in parallel
Geospatial: GDAL, Xarray, NumPy, OpenCV
Architectures: Conditional Diffusion Models (DDPM, LDM), UNet backbones
Visualization: Matplotlib, Plotly, Seaborn
Deployment and automation: Apache Airflow / Prefect, Docker
Cloud/hardware : AWS S3 / GCS – For storing large datasets, NVIDIA CUDA/cuDNN – GPU
acceleration
Estimated implementation cost (optional):
Resize and Normalize Inputs:
Efficient GPU Batching Multi-Path Learning
All input frames are resized to (4, 124, 124) Error Propagation through Specialized
The system leverages optimized GPU batch
pixels and each channel is normalized Modules Prediction errors are
processing for both training and inference,
using its dataset-wide mean and standard backpropagated through multiple
maximizing hardware utilization and
deviation. This reduces memory usage, specialized modules:
significantly speeding up computation when
accelerates processing, and ensures The Temporal Spectral Module (TSM)
handling large satellite datasets.
stable model training. processes temporal cloud dynamics,
FiLM layers inject spectral context at
Gradient-Based Learning every UNet layer,
End-to-End Trainable Architecture The Temporal Consistency Module (TCM)
All core components—including UNet convolutional layers, FiLM modulation parameters (γ, β), enforces motion-aware processing.
TCM’s 3D convolution and attention, and TSM’s temporal-spectral fusion—are updated via This multi-path strategy ensures the
gradient descent. This enables the system to jointly optimize spatial, spectral, and temporal model learns from both direct errors
representations for robust prediction. and contextual trends.
Combined Loss Functions - Balancing Pixel Accuracy and Structural Realism
The model uses a composite loss function combining Mean Squared Error (MSE) for pixel-level accuracy and Structural Similarity Index (SSIM)
for perceptual and structural fidelity.
The total loss is calculated as: Loss=MSE+λ(1−SSIM), λ=0.1
This ensures predictions are both numerically precise and visually realistic.