Authors: Maurits Bleeker, Markus Hennerbichler, and Pavel Kuksa
For questions, please create an issue in our GitHub repo.
- Introduction
- Tutorial
Welcome to the Noether Framework tutorial!
This tutorial demonstrates how to use the Noether Framework through a practical project based on the experiments from Section 4.4 of the AB-UPT paper.
While this tutorial covers the core functionality of the framework, it does not cover every possible feature or use case.
Note
This tutorial presents recommended practices and patterns, but it is not a full blueprint on how to use the framework.
The Noether Framework is flexible and supports multiple approaches to implementing the same functionality.
The tutorial project follows the following directory structure:
└── callbacks/ # Callbacks for evaluation, logging, and monitoring during training
└── configs/ # YAML files for configuring experiments using Hydra
└── callbacks/
└── data_specs/
└── dataset_normalizers/
└── dataset_statistics/
└── datasets/
└── experiments/
└── model/
└── optimizer/
└── pipeline/
└── tracker/
└── trainer/
train_ahmedml.yaml
train_caeml.yaml
train_drivaerml.yaml
train_drivaernet.yaml # Additional dataset (not covered in tutorial)
train_shapenet.yaml
train_wing.yaml # Additional dataset (not covered in tutorial)
└── jobs/ # SLURM job scripts for running experiments on clusters
└── model/ # Model architecture definitions
└── pipeline/ # Data processing and collation pipeline
└── schemas/ # Pydantic schemas for configuration validation
└── callbacks/
└── datasets/
└── models/
└── pipelines/
└── trainers/
config_schema.py
└── trainers/ # Trainer classes that manage the training loop
Minimal required structure for any Noether project:
└── callbacks/ # Can be empty if only using default callbacks
└── configs/ # Required: defines all configurations
└── datasets/ # Required only for custom datasets
└── pipeline/ # Required: defines data processing
└── model/ # Required: defines model architectures
└── trainers/ # Required: defines training logic
The configs/ directory roughly mirrors the root folder structure; for each module or class defined in the project, there is a corresponding configuration file.
This organizational pattern makes it easy to locate and manage configurations.
Every Noether project consists of the following core modules (in alphabetical order):
- Callbacks: Classes that compute metrics and statistics at specific points during training. Can be empty when using only the framework's default callbacks.
- Configs: YAML configuration files that define all hyperparameters, paths, and settings for the training pipeline.
- Dataset: Provides the interface between raw data on disk and the multi-stage pipeline. Defines how to load individual tensors for each data sample. This tutorial uses pre-implemented datasets, but you can create custom ones.
- Model: Defines the model architecture and its forward pass.
- Pipeline: Defines the multi-stage data pipeline that loads, processes, and collates individual samples into batches for training.
- Schemas: Pydantic schemas that define the input data of each class we define in our project.
- Trainer: The trainer loop takes batches from the pipeline, runs the model's forward pass and computes the loss.
Prerequisites: Python 3.12
Clone the repository and set up the environment:
git clone https://github.com/Emmi-AI/noether.git
cd noether/
uv venv --python 3.12
source .venv/bin/activate
uv pip install emmiai-noetherIf the built package is not available, you can build it from source:
uv pip install ..The boilerplate_project/ directory contains a minimal working example that demonstrates the essential components needed for training in the Noether Framework. This stripped-down project is useful for:
- Understanding the minimum required code for each module
- Quick prototyping of new projects
- Reference when building your own
Noetherapplications
We recommend reviewing the boilerplate project alongside this tutorial to see what minimal implementations look like.
To run the boilerplate project, use the following command:
uv run noether-train --hp ./boilerplate_project/configs/base_experiment.yaml +seed=1 +devices=\"0\" tracker=disabledImportant
Run all the training commands from the root of the repository.
Important
The Noether Framework by default runs on GPU. If no GPU is available, please add either +accelerator=cpu or +accelerator=mps to the command.
This tutorial covers each module of the Noether Framework in a logical progression.
While we have organized the content to flow naturally, you may need to reference earlier or later sections every now and then, as some components interact with multiple parts of the pipeline.
Configuration (configs/)
Note
This section assumes familiarity with Hydra configuration management and Pydantic schemas. If you're new to these tools, we recommend reviewing their official documentation before proceeding.
The configuration is the backbone of the Noether Framework, enabling reproducible, modular, and type-safe experiment definitions.
All experiments are defined through YAML configuration files that use:
- Hydra for hierarchical composition and command-line overrides
- Pydantic for runtime data validation and type safety
The Noether Framework uses a hierarchical configuration pattern where:
- Base configurations define default settings for each component (datasets, models, trainers, etc.)
- Experiment configurations compose and override base configs for specific experiments
- Command-line overrides allow quick parameter sweeps without file changes
The main entry point for any experiment is a top-level configuration file like train_shapenet.yaml, which serves as the composition root that brings together all required components.
train_shapenet.yaml demonstrates the structure of a complete experiment configuration.
Let's break down its key components:
# @package _global_
# Define key values here that are used multiple times in the config files.
dataset_root: <path to your ShapeNet dataset root>
dataset_kind: noether.data.datasets.cfd.ShapeNetCarDataset # class path to the dataset
config_schema_kind: tutorial.schemas.config_schema.TutorialConfigSchema # class path to the config schema in your downstream project
excluded_properties:
- surface_friction
- volume_pressure
- volume_vorticity
defaults:
- data_specs: shapenet_car
- dataset_normalizers: shapenet_dataset_normalizers
- dataset_statistics: shapenet_car_stats
- model: ??? # Intentionally undefined - specified per experiment
- trainer: shapenet_trainer
- datasets: shapenet_dataset
- tracker: ??? # Intentionally undefined - specified per experiment
- callbacks: training_callbacks_shapenet
- pipeline: shapenet_pipeline
- optimizer: adamw
- _self_Each entry like dataset_normalizers: shapenet_dataset_normalizers tells Hydra to load configs/dataset_normalizers/shapenet_dataset_normalizers.yaml and merge it into the final configuration.
The ??? marker indicates required fields that must be specified in experiment configs.
The _self_ marker controls when the current file's values override inherited ones (placing it last gives the current file the highest priority).
Complete configuration structure:
To run an experiment, you need configurations for:
- Model: Architecture and hyperparameters
- Trainer: Trainer config
- Callbacks: Evaluation, logging, and monitoring
- Tracker: Experiment tracking (W&B or disabled)
- Dataset(s): Dataset config
- Pipeline: Data preprocessing and collation
- Optimizer: Optimization algorithm
Most components remain constant across experiments on the same dataset.
For example, when training different models on ShapeNet-Car, only the model and tracker configurations typically change, while dataset, pipeline, trainer, and callbacks remain fixed.
Example: Dataset configuration
The base dataset configuration configs/datasets/shapenet_dataset.yaml demonstrates config composition:
train:
root: ${dataset_root}
kind: ${dataset_kind}
split: train
pipeline: ${pipeline}
dataset_normalizers: ${dataset_normalizers}
excluded_properties: ${excluded_properties}
test:
root: ${dataset_root}
kind: ${dataset_kind}
split: test
pipeline: ${pipeline}
dataset_normalizers: ${dataset_normalizers}
excluded_properties: ${excluded_properties}Notice the ${variable_name} references? These resolve to values defined in the top-level train_shapenet.yaml. This pattern avoids duplication: dataset_root is defined once, used everywhere.
To make the training work, add the dataset_root folder where the preprocessed data is stored to train_shapenet.yaml.
To preprocess the data, have a look at preprocessing.py of the ShapeNet-Car dataset.
Config groups and directory structure:
The configs/ directory roughly mirrors the component structure:
configs/
├── train_shapenet.yaml # Top-level composition
├── datasets/ # Dataset config group
│ ├── shapenet_dataset.yaml
│ ├── ahmedml_dataset.yaml
│ └── ...
├── model/ # Model config group
│ ├── transformer.yaml
│ ├── upt.yaml
│ └── ...
├── trainer/ # Trainer config group
│ └── shapenet_trainer.yaml
└── experiments/ # Experiment-specific overrides
└── shapenet/
├── transformer.yaml
├── upt.yaml
└── ...
When you specify datasets: shapenet_dataset in the defaults list, Hydra automatically loads configs/datasets/shapenet_dataset.yaml.
Experiment-specific configurations compose base configs and apply targeted overrides. An experiment file should:
- Select a specific model variant
- Choose a tracker (W&B or disabled)
- Override any experiment-specific hyperparameters
Example: Transformer experiment
The Transformer experiment configuration configs/experiments/shapenet/transformer.yaml:
# @package _global_
defaults:
- override /model: transformer
- override /tracker: development_tracker
- override /optimizer: lion
name: shapenet-car-transformer-float16
trainer:
precision: float16Breaking down the experiment config:
override /model: transformer: Useconfigs/model/transformer.yamlinstead of the placeholder???in the base configoverride /tracker: development_tracker: Select the W&B tracker configurationoverride /optimizer: lion: Override the default AdamW optimizer with Liontrainer.precision: float16: Override the trainer's defaultfloat32precision
The override keyword ensures the experiment's choice takes precedence over any defaults, preventing accidental config merging issues.
Creating new experiments:
To run a different model on the same dataset:
- Create a new experiment file (e.g.,
configs/experiments/shapenet/my_model.yaml) - Specify the model config to use
- Add any model-specific overrides
- Keep tracker and other settings as needed
Basic execution:
To train a model with a specific configuration (from the root of the repository):
uv run noether-train --hp tutorial/configs/train_shapenet.yaml +experiment/shapenet=transformer tracker=disabled trainer.max_epochs=10uv run noether-train --hp tutorial/configs/train_shapenet.yaml +experiment/shapenet=ab_upt tracker=disabled trainer.max_epochs=10To enable experiment tracking, simply remove the tracker=disabled override:
uv run noether-train --hp tutorial/configs/train_shapenet.yaml +experiment/shapenet=transformerImportant
Run all the training commands from the root of the repository.
Warning
Make sure to either set dataset_root in train_shapenet.yaml or add it to the command line via `dataset_root=""
You'll need to configure your W&B API key on first run and update configs/tracker/development_tracker.yaml with your project details.
Single parameter overrides:
uv run noether-train --hp tutorial/configs/train_shapenet.yaml \
+experiment/shapenet=transformer \
trainer.max_epochs=100Multiple parameter overrides:
To modify multiple related parameters (e.g., changing Transformer dimensions):
uv run noether-train --hp tutorial/configs/train_shapenet.yaml \
+experiment/shapenet=transformer \
model.hidden_dim=256 \
model.transformer_block_config.num_heads=4 \Note: When changing hidden_dim, ensure num_heads divides it evenly (i.e., hidden_dim % num_heads == 0).
While Hydra handles configuration composition, Pydantic schemas provide runtime validation and type safety.
Every class in the Noether Framework has a corresponding Pydantic schema that validates configuration: Checks types, ranges, and constraints before training begins.
Schema hierarchy:
All schemas in the Noether Framework follow an inheritance pattern. For example, model schemas inherit from ModelBaseConfig:
class ModelBaseConfig(BaseModel):
kind: str
"""Kind of model to use, i.e. class path"""
name: str
"""Name of the model. Needs to be unique"""
optimizer_config: OptimizerConfig | None = None
"""The optimizer configuration to use for training."""
initializers: list[AnyInitializer] | None = Field(None)
"""List of initializers configs to use for the model."""
is_frozen: bool | None = False
"""Whether to freeze the model parameters."""
forward_properties: list[str] | None = []
"""List of properties to be used as inputs for the forward pass."""
model_config = {"extra": "forbid"}The extra: "forbid" setting ensures that typos in YAML files are caught immediately, preventing silent configuration errors.
All models in Noether use schema composition and validation. The schema hierarchy for the Transformer models looks like:
ModelBaseConfig (base for all models)
└── TransformerConfig (Transformer-specific config)
└── TransformerBlockConfig (component config)
TransformerBlockConfig defines individual block parameters:
class TransformerBlockConfig(BaseModel):
"""Configuration for a transformer block."""
hidden_dim: int = Field(..., ge=1)
"""Hidden dimension of the transformer block."""
num_heads: int = Field(..., ge=1)
"""Number of attention heads."""
mlp_hidden_dim: int | None = Field(None)
"""Hidden dimension of the MLP layer."""
mlp_expansion_factor: int | None = Field(None, ge=1)
"""Expansion factor for MLP hidden dimension."""
drop_path: float = Field(0.0, ge=0.0, le=1.0)
"""Stochastic depth probability."""
attention_constructor: Literal[
"dot_product",
"perceiver",
"transolver",
"transolver_plusplus",
] = "dot_product"
"""Type of attention mechanism to use."""
use_rope: bool = Field(False)
"""Whether to use Rotary Positional Embeddings."""
# ... additional fields omitted for brevity
@model_validator(mode="after")
def set_mlp_hidden_dim(self):
if self.mlp_hidden_dim is None:
if self.mlp_expansion_factor is None:
raise ValueError(
"Either 'mlp_hidden_dim' or 'mlp_expansion_factor' must be provided."
)
self.mlp_hidden_dim = self.hidden_dim * self.mlp_expansion_factor
return selfTransformerConfig extends the block config:
class TransformerConfig(TransformerBlockConfig, ModelBaseConfig):
"""Configuration for a Transformer model."""
model_config = ConfigDict(extra="forbid")
depth: int
"""Number of transformer blocks in the model."""
mlp_expansion_factor: int = 4
"""Override default: expansion factor for MLP hidden dimension."""
@model_validator(mode="after")
def set_mlp_hidden_dim(self):
if self.mlp_hidden_dim is None:
if self.mlp_expansion_factor is None:
raise ValueError(
"Either 'mlp_hidden_dim' or 'mlp_expansion_factor' must be provided."
)
self.mlp_hidden_dim = self.hidden_dim * self.mlp_expansion_factor
return selfMultiple inheritance means TransformerConfig inherits:
- Model management from
ModelBaseConfig(optimizer, freezing, etc.) - Block parameters from
TransformerBlockConfig(attention, MLP, etc.) - Adds Transformer model parameters (
depth) - Overrides defaults (sets
mlp_expansion_factor = 4)
Understanding the schema tells you which YAML fields are required and optional. For a minimal Transformer config:
kind: tutorial.model.Transformer
name: transformer
hidden_dim: 192
depth: 12
num_heads: 3
optimizer_config: ${optimizer}UPT and AB-UPT models support automatic configuration injection from parent to submodules for shared parameters between parent and submodules.
When you set hidden_dim, num_heads, or mlp_expansion_factor at the top level of a UPT config (or just hidden_dim for AB-UPT), these values automatically propagate to submodules unless explicitly overridden. This reduces redundancy and keeps consistency across your model architecture.
Example - UPT configuration:
kind: tutorial.model.UPT
name: upt
hidden_dim: 192
num_heads: 3
mlp_expansion_factor: 4
approximator_depth: 12
use_rope: true
supernode_pooling_config:
input_dim: 3
radius: 9
# hidden_dim is automatically 192 (inherited from parent)
approximator_config:
use_rope: true
# hidden_dim, num_heads, mlp_expansion_factor all inherited from parent
decoder_config:
depth: 12
input_dim: 3
perceiver_block_config:
use_rope: true
# hidden_dim, num_heads, mlp_expansion_factor all inherited from parentYour downstream project must define a top-level configuration schema that specifies the complete experiment structure. For this tutorial, the schema is:
class TutorialConfigSchema(ConfigSchema):
data_specs: AeroDataSpecs
model: AnyModelConfig = Field(..., discriminator="name")
trainer: AutomotiveAerodynamicsCfdTrainerConfig
datasets: dict[str, AeroDatasetConfig]
dataset_statistics: AeroStatsSchema | None = None- Inherits from
ConfigSchema: The base configuration schema fromNoether Framework data_specs: Defines the data structure (field names, dimensions, types) for aerodynamics tasksmodel: Union type using discriminator pattern - accepts any model config that will be definedtrainer: Specifies the trainer configuration (specific to automotive aerodynamics CFD)datasets: Dictionary of dataset configurationsdataset_statistics: Optional normalization statistics for the dataset
AnyModelConfig is a union of all model configs we define in this project.
Pydantic uses the name of the configured model as a discriminator to select the correct schema.
It tells Pydantic to use the name field to determine which specific model schema to validate against:
AnyModelConfig = Union[
TransformerConfig,
TransolverConfig,
UPTConfig,
ABUPTConfig,
TransolverPlusPlusConfig,
CompositeTransformerConfig,
]Where schemas are defined:
All tutorial schemas live in the schemas/ directory:
schemas/
├── callbacks/ # Callback configuration schemas
├── datasets/ # Dataset configuration schemas
├── models/ # Model configuration schemas
├── pipelines/ # Pipeline configuration schemas
└── trainers/ # Trainer configuration schemas
Each module in your project should have a corresponding schema that defines its configuration interface.
The kind field in most configs specifies the class path for instantiation. The Factory pattern uses this to dynamically import and instantiate the correct class with the validated configuration.
The objects we instantiate in the Noether Framework via configs are instantiated via a factory.
To do this, the config of the object needs to contain a kind, i.e., the class path of the class.
The remaining variables are passed to the constructor of the class via the config object we created after the Pydantic schema evaluation.
An example is given above in the Transformer config, where kind: tutorial.model.Transformer indicates which model class to instantiate.
The Dataset (datasets/)
The Dataset class serves as the bridge between raw (or preprocessed) data stored on disk and the multi-stage pipeline that transforms individual samples into batches for model training (which we discuss in the next section).
It defines how to load and access individual data tensors for each sample.
The Dataset enables:
- Load individual data samples from disk
- Provide tensor-level data access through modular methods
- Apply per-tensor normalization and transformations
- Support flexible data loading for different model requirements
This tutorial uses the pre-implemented ShapeNetCarDataset from the Noether package.
Dataset class hierarchy:
torch.utils.data.Dataset (PyTorch base)
└── noether.data.Dataset (Noether base with getitem_* pattern)
└── noether.data.datasets.cfd.AeroDataset (CFD aerodynamics API)
└── ShapeNetCarDataset (ShapeNet-Car implementation)
The AeroDataset provides a general API for CFD aerodynamics datasets (AhmedML, DrivAerML, DrivAerNet++, ShapeNet-Car, etc.), ensuring consistent interfaces across different automotive aerodynamics datasets.
Traditional PyTorch datasets use a single __getitem__ method to load all data for a sample. This approach has several limitations:
- Becomes complex when different models need different inputs from the same dataset
- Difficult to selectively load subsets of data
- Hard to maintain when adding new data fields
- Forces loading unused data for some experiments
The Noether Framework uses a modular getitem_* pattern where each data tensor has its own dedicated loading method. This enables:
- Modularity: Each method loads one specific tensor
- Flexibility: Selectively load only required tensors via configuration
- Maintainability: Easy to add new data fields without modifying existing code
- Clarity: Self-documenting through method names (e.g.,
getitem_surface_pressure)
Example implementation:
def _load(self, idx: int, filename: str) -> torch.Tensor:
"""
Loads a tensor from a file within a specific sample directory.
Args:
idx: Index of the sample to load.
filename: Name of the file to load from the sample directory.
Returns:
The loaded tensor.
"""
# Use modulo to handle dataset repetitions
idx = idx % len(self.uris)
sample_uri = self.uris[idx] / filename
return torch.load(sample_uri, weights_only=True)
def getitem_surface_position(self, idx: int) -> torch.Tensor:
"""Retrieves surface position coordinates (num_surface_points, 3)."""
return self._load(idx=idx, filename="surface_points.pt")
def getitem_surface_pressure(self, idx: int) -> torch.Tensor:
"""Retrieves surface pressure values (num_surface_points, 1)."""
return self._load(idx=idx, filename="surface_pressure.pt").unsqueeze(1)Design pattern:
- Helper methods (e.g.,
_load) keep code DRY and handle common operations - Descriptive names make it clear what each method loads
- Consistent signature: All
getitem_*methods takeidxand return a tensor - Tensor-level operations: Shape transformations (e.g.,
unsqueeze) applied immediately
The ShapeNet-Car dataset contains CFD simulation data for 889 car geometries, with each data point consisting of preprocessed PyTorch tensors stored on disk.
Note
To download and preprocess the data, see the ShapeNet-Car dataset README.
Available data tensors:
Each simulation provides the following fields through corresponding getitem_* methods:
| Tensor | Method | Shape | Description |
|---|---|---|---|
| Surface Position | getitem_surface_position |
(N_surf, 3) |
3D coordinates of surface mesh points |
| Surface Pressure | getitem_surface_pressure |
(N_surf, 1) |
Pressure values at surface points |
| Surface Normals | getitem_surface_normals |
(N_surf, 3) |
Normal vectors at surface points |
| Volume Position | getitem_volume_position |
(N_vol, 3) |
3D coordinates of volume mesh points |
| Volume Velocity | getitem_volume_velocity |
(N_vol, 3) |
Velocity vectors at volume points |
| Volume Normals | getitem_volume_normals |
(N_vol, 3) |
Normal vectors (pointing to nearest surface) |
| Volume SDF | getitem_volume_sdf |
(N_vol, 1) |
Signed Distance Function to nearest surface |
Note on surface SDF:
There is no getitem_surface_sdf method because surface SDF values are always zero (points on the surface have zero distance to the surface). This constant tensor is created automatically in the multi-stage pipeline when needed, avoiding redundant disk storage.
Datasets in Noether are instantiated by the DatasetFactory, which uses configuration files to create dataset instances with appropriate settings.
Basic dataset configuration structure:
The configs/datasets/shapenet_dataset.yaml file defines dataset configurations for different splits:
train:
root: ${dataset_root}
kind: ${dataset_kind}
split: train
pipeline: ${pipeline}
dataset_normalizers: ${dataset_normalizers}
excluded_properties: ${excluded_properties}
test:
root: ${dataset_root}
kind: ${dataset_kind}
split: test
pipeline: ${pipeline}
dataset_normalizers: ${dataset_normalizers}
excluded_properties: ${excluded_properties}Configuration parameters:
root: Path to the dataset directory on diskkind: Full class path to the dataset class (e.g.,noether.data.datasets.cfd.ShapeNetCarDataset)split: Data split identifier (train,test,val, etc.) used by the dataset to select appropriate samplespipeline: Reference to the multi-stage pipeline configurationdataset_normalizers: Reference to tensor normalization configurationsexcluded_properties: List ofgetitem_*methods to skip during data loading
Advanced: Multiple dataset configurations
You can define multiple dataset configurations for different evaluation scenarios:
test_repeat:
root: ${dataset_root}
kind: ${dataset_kind}
split: test
pipeline: ${pipeline}
dataset_normalizers: ${dataset_normalizers}
excluded_properties: ${excluded_properties}
dataset_wrappers:
- kind: noether.data.base.wrappers.RepeatWrapper
repetitions: 10Dataset wrappers:
The RepeatWrapper loops over the dataset multiple times (10× in this example) to reduce variance during evaluation. Other useful wrappers include:
SubsetWrapper: Select specific indices from the datasetShuffleWrapper: Randomize sample order
This flexibility allows you to:
- Use different pipelines for train vs. test datasets
- Create multiple evaluation sets with different sampling strategies
- Apply different normalizations to different splits
By default, all getitem_* methods are called when loading a sample.
However, different models often require different input tensors.
The excluded_properties configuration allows selective loading:
# Example: Exclude normal vectors for a model that doesn't use them
excluded_properties:
- surface_normals
- volume_normalsA point-based Transformer might only need positions and surface pressure and volume velocity:
# Load only essential tensors
excluded_properties:
- surface_normals
- volume_normals
- volume_sdfNow those additional features are excluded from data loading, while a more complex model uses all available features:
# Load everything
excluded_properties: []This pattern enables using the same dataset class for different models without modifying code.
Beyond the getitem_* methods, dataset classes implement standard PyTorch dataset methods:
__len__ method:
Defines the total number of samples for one epoch:
def __len__(self) -> int:
"""Returns the total size of the dataset."""
return len(self.uris) * self.num_repeatsThis calculation accounts for dataset repetitions, useful for oversampling small datasets during training.
Additional methods:
Most other methods follow standard PyTorch Dataset patterns. If you're unfamiliar with PyTorch datasets, review the official PyTorch dataset tutorial.
In the Noethern Framework, most of the normalization happens at the tensor level immediately after loading, using a decorator pattern for clean, declarative code.
The @with_normalizers decorator:
Apply normalization to any getitem_* method by adding a decorator:
@with_normalizers("surface_position")
def getitem_surface_position(self, idx: int) -> torch.Tensor:
"""Retrieves surface positions (num_surface_points, 3)"""
return self._load(idx=idx, filename=self.filemap.surface_position)How it works:
- The decorator identifies which normalizer(s) to apply using the key (
"surface_position") - Looks up the normalizer configuration in the dataset's
dataset_normalizersconfig - Applies the normalization transformation to the loaded tensor
- Returns the normalized tensor
Configuring normalizers:
All normalizers are defined in noether.data.preprocessors.normalizers.
surface_pressure:
- kind: noether.data.preprocessors.normalizers.MeanStdNormalization
mean: ${dataset_statistics.surface_pressure_mean}
std: ${dataset_statistics.surface_pressure_std}
surface_position:
- kind: noether.data.preprocessors.normalizers.MeanStdNormalization
mean: ${dataset_statistics.surface_position_mean}
std: ${dataset_statistics.surface_position_std}We now define the surface_pressure, which maps to a MeanStdNormalization, using a configurable mean and std.
Composing multiple normalizers:
Note that each key configures a list of normalizers, allowing you to compose a chain of normalization methods.
All normalization preprocessors must be invertible so that we can denormalize the data for evaluation.
Each normalizer is wrapped by the noether.data.preprocessor.ComposePreProcess, which can contain multiple preprocessors applied sequentially.
Each noether.data.preprocessor.PreProcessor must implement the denormalize method.
The ComposePreProcess.inverse method calls the denormalize method of all normalizers in the ComposePreProcess in reverse order, ensuring that data normalization can be inverted correctly.
To use normalizers like MeanStdNormalization, you need to compute statistics from your training data.
Step 1: Compute statistics
Run the statistics calculation tool:
noether-dataset-stats \
--dataset_kind=noether.data.datasets.cfd.ShapeNetCarDataset \
--root=/path/to/shapenet_car/ \
--split=train \
--exclude_attributes=volume_velocity,volume_pressure,volume_vorticity,surface_normals,surface_frictionParameters explained:
--dataset_kind: Full class path to your dataset--root: Path to dataset directory--split: Which split to compute statistics from (typicallytrain)--exclude_attributes: Properties to skip (either unavailable or not used)
Note
We exclude certain properties because they're not available in ShapeNet-Car, even though the general AeroDataset interface defines getitem_* methods for them.
The statistics need to be manually added to a YAML file in configs/dataset_statistics/:
The Noether Framework includes pre-implemented datasets for CFD aerodynamics in noether.data.datasets.cfd:
| Dataset | Class Path | Data processing README |
|---|---|---|
| ShapeNet-Car | noether.data.datasets.cfd.ShapeNetCarDataset | README.ME |
| AhmedML | noether.data.datasets.cfd.AhmedMLDataset | README.ME |
| DrivAerML | noether.data.datasets.cfd.DrivAerMLDataset | README.ME |
| DrivAerNet++ | noether.data.datasets.cfd.DrivAerNetDataset | README.ME |
| Wing Dataset | noether.data.datasets.cfd.EmmiWingDataset | README.ME |
All datasets share the AeroDataset interface, ensuring consistent access patterns and easy switching between datasets.
Creating custom datasets:
To implement a custom dataset:
- Inherit from
noether.data.Dataset(ornoether.data.datasets.cfd.AeroDataset) - Implement required
getitem_*methods for your data fields - Override
__init__to discover and filter your data samples - Add
@with_normalizersdecorators where normalization is needed - Create a corresponding Pydantic schema in your
schemas/datasets/directory - Configure the normalizers
See the boilerplate project for a minimal dataset implementation example.
The Multi-Stage Pipeline (pipeline/)
The multi-stage pipeline serves as the interface between the dataset class and the model/trainer (which we discuss later). It defines how to combine individual samples from the dataset into batches that are fed to the model. Each batch contains the model inputs for the forward pass and the corresponding targets needed to compute the loss.
The multi-stage pipeline has three sequential stages:
- Sample processor pipeline: Sample processors act on individual data samples (i.e., data points).
- Collation: The collator pipeline collates individual samples into a batch.
- Batch processor pipeline: Batch processors act on the entire batch.
This sequential processing gives the multi-stage pipeline its name. In this project, most of the computation occurs during the sample processing stage.
A basic implementation of a custom MultiStagePipeline looks like this:
from noether.data.pipeline import MultiStagePipeline
class CustomMultiStagePipeline(MultiStagePipeline):
def __init__(self, **kwargs):
super().__init__(
preprocessors=[],
collators=[],
postprocessors=[],
**kwargs,
)You need to provide three lists to the multi-stage pipeline (which are all empty in the example above): one for sample processors, one for collators, and one for batch processors.
The MultiStagePipeline iterates through each list sequentially.
The output from one processor becomes the input for the next, making the order of operations crucial for all three stages.
To understand the AeroMultistagePipeline, it's essential to understand the data processing flow for this project.
We're dealing with CFD aerodynamic simulations that have both a surface and a volume mesh/field.
Each point in these fields has three coordinates (x, y, z), one or more target values (e.g., pressure, velocity, vorticity, wallshear stress, etc.), and potentially additional features (e.g., SDF, surface/volume normals).
The target values and features can vary depending on whether the point belongs to the surface or volume and which dataset is used.
From now on, we'll refer to these additional features as physics features.
We do not consider global features for this project.
The data structure for our tasks is defined in, for example, configs/data_specs/shapenet_car.yaml, which corresponds to the AeroDataSpecs schema.
The models we use can be roughly divided into two classes:
- Point-based models, where the input points to the model's encoder are also the points used for predicting the output values (e.g.,
Transformer,Transolver). - Query-based models, which use additional query points (distinct from the input points) for predicting output values (e.g.,
UPT,AB-UPT).
This means we have to build a multi-stage pipeline that works for both point-based and query-based models.
We will now outline the sample processor pipeline required for these models:
- Some input tensors have constant values. For example, the
SDFfor the surface mesh is always zero (as discussed earlier). Therefore, we first create default tensors if needed. Because this step occurs before batch collation, it's considered a sample processing step. - Next, we subsample the entire simulation mesh to a specified number of surface and volume points and, if used, query points. For both input and query points, we define how many to sample from the surface and how many from the volume. If we train AB-UPT, we sample anchor points instead of input/query points.
- If we use query points, their corresponding physical quantities become the model's prediction targets.
If we only use input points, their values are the output targets (labels).
Hence, we need to rename the relevant values to
targetsbased on whether the model uses input points or query points for its predictions.
The high-level pipeline is visualized in the image below:
This entire pipeline is implemented in the _build_sample_processor_pipeline method in the AeroMultistagePipeline class, which composes the list of sample processor classes based on the three steps listed above.
Please have a look at the code to understand what it is doing.
This method returns a list of individual SampleProcessor instances.
Each sample processor takes a sample as input (which is a dictionary with the result of all the getitem_* methods called by the dataset for one data point) and does some form of processing on one or more tensors of the sample.
Note that the order is important, as the sample processors are called sequentially.
When the multi-stage pipeline runs, the sample processors are called as follows:
# pre-process on a sample level
samples = [deepcopy(sample) for sample in samples] # copy to avoid changing method input
for sample_processor in self.sample_processors:
for idx, sample in enumerate(samples):
# sample = {'surface_pressure: torch. Tensor[...], 'surface_position': torch.Tensor[...], .....}
# each key in the sample is the output of a getitem_* method of the dataset
samples[idx] = sample_processor(sample)Each sample processor takes a sample as input and returns the (pre)processed sample.
As mentioned, the order is crucial.
Each SampleProcessor must implement the __call__(self, sample: dict[str, Any]) -> dict[str, Any] method.
This method receives a dictionary containing the sample's tensors as input.
The SampleProcessor's goal is to apply a specific processing step to the corresponding values for one or more keys in the sample dictionary.
See individual sample processor implementations (e.g., PointSampling) for detailed examples.
The code for calling the collators in the multi-stage pipeline looks as follows:
batch = {}
for batch_collator in self.collators:
sub_batch = batch_collator(samples)
# make sure that there is no overlap between collators
for key, value in sub_batch.items():
if key in batch:
raise ValueError(f"Key '{key}' already exists in batch. Collators must not overlap in keys.")
batch[key] = valueEach collator defines how to merge certain keys from each sample into a batch.
In most cases, the DefaultCollator, where tensors are simply concatenated along the batch dimension, will suffice.
However, when creating sparse tensors, for example, a more sophisticated collation approach is required.
We define the collator pipeline in the _build_collator_pipeline method.
Only when dealing with supernodes do we require additional collator classes such as the SparseTensorOffsetCollator (e.g., for AB-UPT and UPT).
In this project, we do not use any batch processors. Nevertheless, they work in the same way as sample processors. However, instead of processing individual samples, they process the collated batch. Below is the code showing how batch processors are called:
# process the batch
for batch_processor in self.batch_processors:
batch = batch_processor(batch)The Trainer (trainers/)
The AutomotiveAerodynamicsCFDTrainer is a specialized trainer designed for automotive Computational Fluid Dynamics (CFD) tasks, specifically for the AhmedML, DrivAerML, DrivAerNet++, ShapeNet-Car, and Emmi-Wing datasets.
Its primary role is to manage the training step by processing model outputs, computing a flexible weighted loss, and returning the results.
To implement a custom Trainer for a downstream project, you must extend the noether.training.trainers.BaseTrainer class.
The BaseTrainer handles the full training loop and provides the following two key methods:
def loss_compute(
self, forward_output: dict[str, torch.Tensor], targets: dict[str, torch.Tensor]
) -> LossResult | tuple[LossResult, dict[str, torch.Tensor]]:
"""
Each trainer that extends this class needs to implement a custom loss computation using the targets and the model output.
Args:
forward_output: Output of the model after the forward pass.
targets: Dict with target tensors needed to compute the loss for this trainer.
Returns:
A dict with the (weighted) sub-losses to log.
"""
raise NotImplementedError("Subclasses must implement loss_compute.")
def train_step(self, batch: dict[str, Tensor], model: torch.nn.Module) -> TrainerResult:
"""Overriding this function is optional. By default, the `train_step` of the model will be called and is
expected to return a TrainerResult. Trainers can override this method to implement custom training logic.
Args:
batch: Batch of data from which the loss is calculated.
model: Model to use for processing the data.
Returns:
TrainerResult dataclass with the loss for backpropagation, (optionally) individual losses if multiple
losses are used, and (optionally) additional information about the model forward pass that is passed
to the callbacks (e.g., the logits and targets to calculate a training accuracy in a callback).
"""
forward_batch, targets_batch = self._split_batch(batch)
forward_output = model(**forward_batch)
additional_outputs = None
losses = self.loss_compute(forward_output=forward_output, targets=targets_batch)
if isinstance(losses, tuple) and len(losses) == 2:
losses, additional_outputs = losses
if isinstance(losses, torch.Tensor):
return TrainerResult(total_loss=losses, additional_outputs=additional_outputs, losses_to_log={'loss': losses})
elif isinstance(losses, list):
losses = {f"loss_{i}": loss for i, loss in enumerate(losses)}
if len(losses) == 0:
raise ValueError("No losses computed, check your output keys and loss function.")
return TrainerResult(
total_loss=sum(losses.values(), start=torch.zeros_like(next(iter(losses.values())))),
losses_to_log=losses,
additional_outputs=additional_outputs,
)Understanding the two key methods:
As an end-user, you need to implement the methods: loss_compute and sometimes train_step.
The train_step method receives the batch from the multi-stage pipeline and the model being trained (which can be a DistributedDataParallel model when training on multiple GPUs).
In the base implementation, the batch is split into two sub-batches:
- Forward batch: Contains all tensors needed for the forward pass. The model receives the
forward_batchas named keyword arguments, and the forward pass is computed. - Targets batch: Contains tensors needed for loss computation. The
loss_computemethod computes the custom loss for your task.
For task-specific implementations, see the AutomotiveAerodynamicsCFDTrainer example.
Important
We give a warning if there are keys in the batch that do not end in either the forward batch or the target batch. This means that the collator returns tensors that are not used during the forward pass.
Return value requirements:
The train_step method must always return the TrainerResult dataclass, which should contain:
- A scalar value of the total loss used to compute gradients (can be a weighted sum of multiple losses)
- A dictionary with the losses you want to log
- Optionally, a dictionary with additional output for logging
When to override train_step:
The train_step method defined in the BaseTrainer class fits most general deep learning forward passes.
However, you can decide whether this implementation is sufficient for your downstream training task.
If not, you can always implement a custom train_step method in the child trainer class (as has been done in the boiler plate project trainer).
When using the default train_step method, you must define both the forward_properties and the target_properties to define which tensors are part of the forward_batch and which tensors are part of the target_batch.
In this tutorial, the target properties are fixed per dataset, while the forward_properties depend on the model.
Therefore, we define them as follows:
target_properties:
- surface_pressure_target
- volume_velocity_target
forward_properties: ${model.forward_properties}Required BaseTrainer parameters:
The following parameters must be defined for the BaseTrainer:
kind: tutorial.trainers.AutomotiveAerodynamicsCFDTrainer # which trainer to load
max_epochs: 500
effective_batch_size: 1
log_every_n_epochs: 1 # optional but best practice to define
callbacks: ${callbacks} # which callbacks to runThe most important variables in the __init__ method are the loss weights, which give you fine-grained control over the training objective.
Loss weight hierarchy:
The loss has two levels of weights:
- Individual weights: Parameters like
surface_pressure_weightandvolume_velocity_weightcontrol the importance of a specific physical quantity in the total loss. - Group weights: The
surface_weightandvolume_weightparameters apply an additional weight to all surface-related or volume-related losses, respectively.
During initialization, the trainer uses these weights to build an internal loss_items list.
The output_modes parameter (e.g., ['surface_pressure', 'volume_velocity']) specifies which of these potential losses should be computed during training.
Custom loss calculation (loss_compute):
This method contains the core logic of the trainer for computing the loss.
It calculates the final loss by iterating through the loss_items configured during initialization.
For each item (like surface_pressure), it first checks that its weight is non-zero and that the model produced a corresponding output key.
This flexible system allows you to easily experiment with different combinations of output objectives without changing the underlying code.
When using only a single loss value, the loss_compute method is not needed and can be implemented directly inside the forward function (by overriding the base train_step method, as done in the boilerplate project).
Building models in the Noether Framework is straightforward and follows the same patterns as standard PyTorch models that inherit from torch.nn.Module.
To be compatible with the Noether Trainer, all models must inherit from noether.core.models.Model (or CompositeModel for multi-component architectures, discussed later).
Beyond this, a model is implemented just like any PyTorch model: define layers in the constructor (__init__) and implement the forward method.
Each model in the Noether Framework must inherit from the noether.core.models.Model class.
The config schema for models is defined by ModelBaseConfig:
class ModelBaseConfig(BaseModel):
kind: str
"""Kind of model to use, i.e. class path"""
name: str
"""Name of the model. Needs to be unique"""
optimizer_config: OptimizerConfig | None = None
"""The optimizer configuration to use for training the model. When a model is used for inference only, this can be left as None."""
initializers: list[AnyInitializer] | None = Field(None)
"""List of initializers configs to use for the model."""
is_frozen: bool | None = False
"""Whether to freeze the model parameters (i.e., not trainable)."""
forward_properties: list[str] | None = []
"""List of properties to be used as inputs for the forward pass of the model. Only relevant when the train_step of the BaseTrainer is used. When overridden in a class method, this property is ignored."""Key configuration parameters:
kind: The full class path to the model class (e.g.,tutorial.model.Transformer).name: A unique identifier for the model, typically overridden in child config classes to match the correct model configuration.optimizer_config: The optimizer configuration for training. Can beNonewhen loading a model for inference only.initializers: Optional list of initializer configs for loading pre-trained weights or custom weight initialization.is_frozen: Boolean flag to freeze all model parameters (useful for transfer learning or ensemble models).forward_properties: List of properties to be used as inputs for the model's forward pass. Only relevant when using theBaseTrainer's defaulttrain_stepmethod.
Note
In the Noether Framework, optimizers are attached to models rather than being global. This design allows different components of composite models to use different optimizers and learning rates.
A minimal custom model implementation looks as follows:
Python implementation dummy code:
from noether.core.models import Model
class CustomModel(Model):
def __init__(self, model_config: CustomModelConfig, **kwargs):
# the model config needs to be passed to the parent Model class
super().__init__(model_config=model_config, **kwargs)
self.config = model_config
# Define your model layers here
self.encoder = torch.nn.Linear(model_config.input_dim, model_config.hidden_dim)
self.decoder = torch.nn.Linear(model_config.hidden_dim, model_config.output_dim)
def forward(self, input_tensor: torch.Tensor) -> dict[str, torch.Tensor]:
"""
Forward pass of the model.
Args:
input_tensor: torch tensor with data
Returns:
Dictionary containing model outputs.
"""
# Example: extract inputs from batch
x = input_tensor
# Forward pass
hidden = self.encoder(x)
output = self.decoder(hidden)
return {'output':output}kind: path.to.CustomModel
name: custom_model
input_dim: 3
hidden_dim: 128
output_dim: 1
optimizer_config: ${optimizer} # Reference to optimizer defined elsewhere
forward_properties:
- input_tensorTo unify input representation, output structure, and input conditioning across all baseline models in this tutorial, we provide a BaseModel class.
This BaseModel inherits from noether.core.models.Model and contains common utilities that can be reused across different model architectures:
- Surface and volume bias projection: An MLP projection layer to handle domain-specific biases.
- Physics feature projection: A linear layer to map physics features (e.g., SDF, normals) to the model's hidden dimension.
- Positional embeddings: Sine-cosine or linear positional embedding layers for input coordinates.
- Output projection: A final linear layer to project from the hidden dimension to the number of predicted physical quantities.
Benefits of using BaseModel in a downstream project:
- Reduces code duplication across different model implementations
- Ensures consistent input/output interfaces
- Simplifies distinguishing between surface and volume mesh coordinates
- Provides standardized feature processing
Physical quantities predicted for surface points often differ from those for volume points. For example:
- Surface predictions: pressure, wall shear stress
- Volume predictions: velocity, pressure, vorticity
The gather_outputs method in the BaseModel class handles this heterogeneity:
- Takes the entire output tensor and a surface mask
- Splits the output tensor to isolate surface predictions from volume predictions
- Returns a structured dictionary that maps to physical quantities
Example output structure:
{
'surface_pressure': tensor[...], # dimension 0 of surface outputs
'volume_velocity': tensor[...], # dimensions 1:4 of volume outputs
'surface_friction': tensor[...], # dimension 4:7 of surace outputs
....
}By using gather_outputs consistently across all models, the output dictionary is structured in a way that the trainer's loss_compute method can process uniformly. This design allows the same trainer to work with all model architectures without modification.
A composite model consists of multiple noether.core.models.Model sub-modules, each potentially with its own:
- Optimizer and learning rate
- Learning rate schedule
- Weight initialization strategy
- Frozen/trainable status
Example: CompositeTransformer demonstrates a Transformer model with two sub-modules, each with independent configurations.
Configuration files:
- Schema: Composite Transformer config defines the overall structure
- Sub-module schemas: Sub-modules each have their own config schemas
- YAML example: composite_transformer.yaml shows how to configure different optimizers and learning rates per sub-module
Example configuration snippet (note that classes do not exist):
kind: tutorial.model.composite_transformer.CompositeTransformer
name: composite_transformer
encoder:
kind: tutorial.model.transformer.TransformerEncoder
hidden_dim: 192
optimizer_config:
kind: torch.optim.AdamW
lr: 1e-3
decoder:
kind: tutorial.model.transformer.TransformerDecoder
hidden_dim: 192
optimizer_config:
kind: torch.optim.Lion
lr: 5e-4The Noether Framework includes base implementations for several state-of-the-art models in noether.modeling.models:
| Model | Paper | Tutorial Implementation | Notes |
|---|---|---|---|
| AB-UPT | arXiv:2502.09692 | tutorial/model/ab_upt.py | Wrapper around base implementation |
| Transformer | - | tutorial/model/transformer.py | Wrapper around base implementation and adding RoPE |
| Transolver | arXiv:2402.02366 | tutorial/model/transolver.py | Wrapper around base implementation |
| Transolver++ | arXiv:2502.02414 | Schema only: transolver.py | Extension of Transolver with different attention class |
| UPT | arXiv:2402.12365 | tutorial/model/upt.py | Extended forward method for tutorial compatibility |
Implementation approaches:
- Simple wrappers: AB-UPT, Transformer, and Transolver use the base implementations directly
- Custom extensions: UPT uses individual sub-modules from the base implementation with a modified
forwardmethod to adapt to tutorial-specific requirements
Callbacks (callbacks/)
A callback is an object that can perform actions at various stages of the training loop, such as at the beginning or end of training, an epoch, or an update step.
Callbacks are the most complex objects in the Noether Framework.
For a full understanding of callback implementation and utilities, refer to the documentation and [how-to][https://noether-docs.emmi.ai/html/guides/training/use_callbacks.html].
The SurfaceVolumeEvaluationMetricsCallback is a specific callback that runs the current model on a separate validation or test set, computes error metrics, and logs them.
This class inherits from PeriodicDataIteratorCallback, meaning its main logic is executed at regular intervals and iterates over a dataset.
In this tutorial, we focus only on PeriodicDataIteratorCallback.
However, you can also implement a PeriodicCallback, which does not iterate over a dataset but can be used, for example, to store an exponential moving average (EMA) of the model weights.
Callback access to training components:
Callbacks have access to the following (among others):
- The Trainer (
self.trainer): Provides access to trainer properties - The Model (
self.model): The currently trained model - The Data Container (
self.data_container): Object containing all datasets, allowing normalizers to be accessed for denormalization
Callbacks that inherit from PeriodicDataIteratorCallback must implement two methods:
process_data(self, batch: dict[str, torch.Tensor], **_) -> dict[str, torch.Tensor]: Receives a batch from the dataset as input and computes metrics (or tensors) that are returned.process_results(self, results: dict[str, torch.Tensor], **_) -> None: All computed metrics/tensors from theprocess_datamethod are aggregated into a dictionary and processed by this method.
For example, the process_results method can use self.writer to log metrics to Weights & Biases.
In tutorial/configs/trainer/shapenet_trainer.yaml, we define the list of callbacks to use for the trainer class (for ShapeNet-Car).
Below are three callback configurations:
- kind: noether.core.callbacks.BestCheckpointCallback
every_n_epochs: 1
metric_key: loss/test/total
name: BestCheckpointCallback
# test loss
- kind: tutorial.callbacks.SurfaceVolumeEvaluationMetricsCallback
batch_size: 1
every_n_epochs: 1
dataset_key: test
name: SurfaceVolumeEvaluationMetricsCallback
forward_properties: ${model.forward_properties}
- kind: tutorial.callbacks.SurfaceVolumeEvaluationMetricsCallback
batch_size: 1
every_n_epochs: ${trainer.max_epochs}
dataset_key: test_repeat
name: SurfaceVolumeEvaluationMetricsCallback
forward_properties: ${model.forward_properties}Periodic callback triggers:
To define how often a periodic callback (i.e., periodic_callback) should be triggered, set one of the following arguments in your configuration:
every_n_epochs: Triggers the callback every N epochsevery_n_updates: Triggers the callback every N model update stepsevery_n_samples: Triggers the callback after every N samples have been processed
You cannot define multiple of these arguments. In addition to the interval, you can also define the batch_size, which is usually set to 1 to compute metrics per sample.
Required callback parameters:
For all periodic callbacks, you must define:
dataset_key: Indicates which dataset (configured earlier) should be used to run the callbackname: Must match a name in the callback schemas so that the correct schema can be used for data validation
In tutorial.schemas.trainers.AutomotiveAerodynamicsCfdTrainerConfig, we define the following for callback validation:
from noether.core.schemas.callbacks import CallbacksConfig
from tutorial.schemas.callbacks import TutorialCallbacksConfig
AllCallbacks = Union[
TutorialCallbacksConfig, CallbacksConfig
] # custom callbacks need to be added here to one union type with the base Noether CallbacksConfig
class AutomotiveAerodynamicsCfdTrainerConfig(BaseTrainerConfig):
...
callbacks: list[AllCallbacks] | None = Field(
...,
) You need to define a Union of your custom-implemented callbacks and the default callbacks implemented in the Noether Framework to ensure all callbacks have proper schema validation.
The process_data method of the SurfaceVolumeEvaluationMetricsCallback method simply looks like:
def process_data(self, batch: dict[str, torch.Tensor], **_) -> dict[str, torch.Tensor]:
"""
Execute forward pass and compute metrics.
Args:
batch: Input batch dictionary
**_: Additional unused arguments
Returns:
Dictionary mapping metric names to computed values
"""
model_outputs = self._run_model_inference(batch)
metrics = {}
for mode in self.evaluation_modes:
metrics.update(self._compute_mode_metrics(batch, model_outputs, mode))
return metricsFirst, it computes the module outputs, and next, it adds the desired metrics to an output dictionary. All the substeps are implemented by individual methods in the callback itself. Please have a look at the implementation
Metrics are usually computed on unnormalized data.
To denormalize the normalization steps executed by the dataset, we retrieve the data normalizers via the DataContainer.
In the __init__ method of the callback we implement, we use the available self.data_container to get the correct dataset used for this callback and retrieve the normalizers to denormalize the data for metric computation:
self.dataset_key = callback_config.dataset_key
self.dataset_normalizers = self.data_container.get_dataset(self.dataset_key).normalizersTo denormalize surface_pressure, for example, you can use:
normalizer = self.dataset_normalizers['surface_pressure']
denormalized_predictions = normalizer.inverse(predictions.cpu())
denormalized_targets = normalizer.inverse(targets.cpu())For each output in the SurfaceVolumeEvaluationMetricsCallback, we calculate the following metrics:
- Mean Squared Error (MSE): The average of the squared differences between the prediction and the target
- Mean Absolute Error (MAE): The average of the absolute differences between the prediction and the target
- Relative L2 Error: The Euclidean norm of the error vector divided by the norm of the target vector, measuring the error relative to the magnitude of the ground truth
At the end of training, we want to run the model one more time on the test set, looping 10 times over that set to reduce variance due to the point sampling.
Earlier, we configured the test_repeat dataset in shapenet_dataset.yaml, which uses the RepeatWrapper to loop over the dataset with 10 repetitions.
We can now use test_repeat with this custom dataset implementation for the final callback.
Moreover, we set every_n_epochs: ${trainer.max_epochs} to ensure that this callback is only executed at the very end.
Each metric is logged with the corresponding dataset_key to Weights & Biases.
For the CAEML dataset, we also implemented chunked inference, where we loop over the entire surface and volume mesh in chunks to do inference on the full mesh.
To enable this, we set chunked_inference: true, and we configured a dataset chunked_test which has a multi-stage pipeline that returns all points in the surface/volume mesh.
To run all the models for ShapeNet-Car, simply execute:
sbatch train_shapenet.jobThe same applies to train_ahmedml.job and train_drivaerml.job, which can be found in the jobs/ directory.
We also provide the config files to run the experiments for DrivAerNet++ (train_drivaernet.yaml) and the Emmi-Wing (train_wing.yaml), however, those experiments are not part of this tutorial.
Warning
This assumes you have access to a SLURM-based system. If not, please review the job files to see the commands used to run the experiments.
Job arrays:
In the jobs/experiments/ folder, we define job arrays (i.e., arrays with different experiments/jobs) for all the experiments we want to run.
You can add extra rows with different seeds or experiment variants to these *.txt files as needed.
The flag #SBATCH --array=... defines how to run the job array:
#SBATCH --array=1-10: Runs rows 1 to 10 from./jobs/experiments/shapenet_experiments.txt#SBATCH --array=1,5,9: Runs rows 1, 5, and 9#SBATCH --array=1-10%5: Runs rows 1 to 10, but with a maximum of 5 jobs running simultaneously. When one of the 5 jobs finishes, the next job in the array (e.g., row 6) will start. This is especially useful for large job arrays when you don't want to occupy the entire cluster.
To run a single experiment, execute the following command:
uv run noether-train --hp {user path to tutorial}/configs/train_shapenet.yaml +experiment/shapenet=transformer tracker=disabled +seed=1Important
Please set the dataset_root in either the config files or via the command line override
When running outside of SLURM, use uv run noether-train as shown above. This will spawn one process for every GPU that is available on the system and visible via CUDA_VISIBLE_DEVICES.
You can also configure the devices by adding devices="0,1,2,4", for example, to the root config.
Important
If you train on more than 1 GPU, ensure that effective_batch_size is at least equal to the number of GPUs used.
Multi-node training is currently not supported.
Example of a multi-GPU SLURM job:
srun --nodes=1 --partition=compute --gpus-per-node=2 --mem=64GB --ntasks-per-node=2 --kill-on-bad-exit=1 --cpus-per-task=28 uv run noether-train --hp tutorial/configs/train_shapenet.yaml +experiment/shapenet=transformer tracker=disabled trainer.effective_batch_size=2To run evaluation callbacks on trained models, use the noether-eval CLI tool.
For detailed instructions on running inference with trained models, refer to the documentation: https://noether-docs.emmi.ai/guides/inference/how_to_run_evaluation_on_trained_models.html
To resume training after an error or interruption, simply add resume_run_id: <RUN_ID> (resume_stage_name and if a stag_name was used in the previous run) and to the training configuration (either in the YAML file or via the CLI). Training will continue from the last saved epoch checkpoint.
Example:
uv run noether-train --hp tutorial/configs/train_shapenet.yaml +experiment/shapenet=transformer resume_run_id=<run_id> resume_stage_name=<stage_name>Optinally, you can change the stage_name to make it clear that checkpoints stored for this run are from a continued training run.
To initialize a model with weights from a previous training run, add an initializer configuration to the model config:
model:
# ... model configuration
initializers:
- kind: noether.core.initializers.PreviousRunInitializer
run_id: <run_id>
model_name: ab_upt
checkpoint_tag: latest # Options: 'latest', 'best', or specific checkpoint such as E10_U100_S200
# model_info: ema=0.9999 # Optional: for EMA weights or specific checkpoint variantsRequired parameters:
run_id: The run identifier from the previous training runmodel_name: The name of the model to load weights fromcheckpoint_tag: Which checkpoint to use (latest,best, or a specific epoch number)
Optional parameters:
model_info: Additional checkpoint metadata (e.g.,ema=0.9999for exponential moving average weights, or specific loss metric identifiers for best checkpoints). Leave empty for standard checkpoints.
We implemented a Weights and Biases (WandB) tracker to log during training and evaluation (also have a look at ./configs/tracker).
kind: noether.core.trackers.WandBTracker
entity: <WandB entity>
project: <WandB project> Simply add your own WandB entity and project to start logging.
- Output path: The output path is undefined by default and must be configured. In this tutorial, we set it to
./outputs. TheNoether Frameworkwill use the generatedrun_idto store the checkpoints for each training run in subfolders. - Physics features: You can set
physics_featurestotruefor the multi-stageAeroMultistagePipeline. This only works for ShapeNet-Car and will add the SDF and normal vectors to the coordinate inputs. However, we never properly utilized these features in our experiments, and they are not implemented for other datasets. Therefore, this code is not fully polished or optimized. - Code snapshots: By default, a snapshot of the codebase is stored as part of the checkpoints for reproducibility.
- Batch size considerations: Almost all experiments we ran for the AB-UPT paper use a batch size of 1. However, the data loading pipeline is implemented to work with batches larger than 1 (including with physics features). Note that we never thoroughly validated these results or checked for potential training/data loading instabilities with larger batch sizes.
- Effective batch size and gradient accumulation: The
effective_batch_sizeparameter defines the total number of samples processed before performing an optimizer step (also known as the "global batch size"). In multi-GPU setups, the local batch size per device is calculated aseffective_batch_size / number of GPUs. When gradient accumulation is enabled, the batch size is further divided by the number of accumulation steps. To enable gradient accumulation, set themax_batch_sizeparameter. For example, withmax_batch_size=2andeffective_batch_size=8, the framework will perform 4 gradient accumulation steps before updating the model weights.
