feat: NoAuth for AML mlflow Logging by timothyas · Pull Request #200 · ecmwf/anemoi-utils

timothyas · 2025-08-07T17:49:00Z

Description

This is a support PR for ecmwf/anemoi-core#469. My basic understanding is that the authentication processes are applicable to AML mlflow logging, since lots of credentials are required to setup the workspace in the first place. So, this creates a PassiveAuth authenticator to allow the AnemoiAzureMlflowLogger to be a child class of the AnemoiMlflowLogger parent class, just silently ignoring all the TokenAuth steps.

What problem does this change solve?

New feature: supports AML mlflow logging.

What issue or task does this change relate to?

None open but happy to open one if necessary.

Additional notes

See ecmwf/anemoi-core#469 discussion

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

floriankrb · 2025-08-11T13:44:04Z

This seems legitimate to allow a different type of authentication.
I would not call your new class "PassiveAuth" though but "DummyAuth" or "NoAuth". (I also like Authentication better than Auth)

Also, could you please refactor the code to avoid inheriting from concrete class, maybe creating an additional generic class? Or maybe calling your class "GenericAuth" and making "TokenAuth" inherit from it?

timothyas · 2025-08-12T03:59:41Z

Thanks and that all sounds good to me @floriankrb. I'm currently taking some leave this week, but I'll make those changes next week when I'm back. It shouldn't take much but I just want to test that something doesn't go wrong.

gmertes

Thanks for this, agreed with Florian's suggestions. I propose something like:

AuthBase : ABC with init, save, login and authenticate.
NoAuth(AuthBase) : Your class that does nothing.
TokenAuth(AuthBase)

I originally went with Auth instead of Authentication because that's consistent with naming of similar components in other web packages like requests. But if we like Authentication more feel free to change.

I'd also add a test - it's a bit silly for a class that does nothing but just in case.

for more information, see https://pre-commit.ci

timothyas · 2025-08-18T17:27:39Z

Hi @floriankrb & @gmertes, thanks for the comments on this one. I updated the NoAuth class as suggested:

PassiveAuth -> NoAuth
NoAuth and TokenAuth inherit from AuthBase
Added 2 unit tests for NoAuth
Rebased to updated ecmwf/anemoi-utils main branch

This works for my purposes, please let me know if you'd like anything else to go in the PR. Thanks!

gmertes

Thanks!

🤖 Automated Release PR This PR was created by `release-please` to prepare the next release. Once merged: 1. A new version tag will be created 2. A GitHub release will be published 3. The changelog will be updated Changes to be included in the next release: --- ## [0.4.36](0.4.35...0.4.36) (2025-09-22) ### Features * Add aliases to registry ([#219](#219)) ([37267b5](37267b5)) * Debug imports ([#182](#182)) ([1eaa615](1eaa615)) * NoAuth for AML mlflow Logging ([#200](#200)) ([732182e](732182e)) * Rich logging ([#209](#209)) ([3c762a5](3c762a5)) * Speedup checkpoint editing - remove compression ([#218](#218)) ([b49120f](b49120f)) * Use obstore to access s3 buckets ([#210](#210)) ([da380be](da380be)) ### Bug Fixes * Add missing s3 function used by datasets ([#212](#212)) ([30589e8](30589e8)) --- > [!IMPORTANT] > Please do not change the PR title, manifest file, or any other automatically generated content in this PR unless you understand the implications. Changes here can break the release process. > ⚠️ Merging this PR will: > - Create a new release > - Trigger deployment pipelines > - Update package versions **Before merging:** - Ensure all tests pass - Review the changelog carefully - Get required approvals [Release-please documentation](https://github.com/googleapis/release-please)

@mchantry

## Description I have been logging metrics to Azure ML, which requires Azure's special blend of mlflow. This PR adds those features, which I think might be useful to a broader audience of people who either run on Azure, Azure ML, or simply log and view metrics on AML. I'm currently running on an on-prem HPC machine but logging to AML with this code, however I think it will be necessary for those actually running on AML as well. cc @mchantry ## What problem does this change solve? Using mlflow to log to Azure ML requires some azure-specific code and packages. ## What issue or task does this change relate to? No open issue, please let me know if you'd like me to open one up to describe the situation, I'd be happy to do so. ## Additional notes ## I got this working *just enough* to my liking, but there are still some issues unfortunately. * tests: I'm not sure how to make tests with this, since it would require logging to AML... If anyone has any ideas I'm all ears * TokenAuth: is not necessary with AML, see necessary helper PR at ecmwf/anemoi-utils#200 * terminal logs: with regular mlflow, terminal logs get updated by simply overwriting the terminal_log.txt. Overwriting files is not allowed with azure-mlflow. Moreover this appears to happen on each process (?) that's running, so many times over and over. I put in some try/except logic to try writing these files, but just pass whenever a failure happens. This means the terminal_log.txt stops updating pretty early on in the training process. I'm not really sure what to recommend here. * diagnostic plots: these also seem to get written by each process (worker?). However, since each unique plot has a unique name, putting the try/except logic here doesn't really matter - each process tries to log the plot and if it succeeds, that means it was the first one and the plot gets stored. If the process fails, then it just moves on passively since a different one beat it to the job. no problem but maybe not the cleanest solution. * model checkpoints: I simply could not get model checkpoints to log to AML, and I don't know what is going on there. For my purposes, I don't really care though, since they get stored locally on the machine where I will use the checkpoints anyway. Also, @mariahpope at EPIC/NOAA has been kicking off some anemoi training runs on AML, and noticed the checkpoints are stored. So, this may not be a problem. * hyperparameter logging: in the second half of the method `log_hyperparams_in_mlflow`, the logger tries to log "expanded_params" after truncating them. I have no idea why but I could not get this to work for AML, despite lots of attempts with ChatGPT and Microsoft Copilot. This is the reason for overriding this method in the Azure subclass. * I think there are a lot of different ways to establish a connection to an AML workspace, I did it here with the following code block: ```python sp_auth = ServicePrincipalAuthentication( tenant_id=os.environ["AZURE_TENANT_ID"], service_principal_id=os.environ["AZURE_CLIENT_ID"], service_principal_password=os.environ["AZURE_CLIENT_SECRET"], ) ws = Workspace( subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"], resource_group=aml_resource_group, workspace_name=aml_workspace_name, auth=sp_auth, ) ``` Note there's a check before those lines to make sure the necessary environment variables are defined for the user. I would be very open to making this more flexible based on what true AML users think, or we could let this evolve as more users start to flex this feature. Let me know if these limitations are acceptable, and I'd be happy to put some warning flags for the user so it's clear what will and will not be logged to AML. If anyone wants these to be worked out in the PR, I'd love to hear some suggestions - I'm pretty new to AML and leaned heavily on some LLMs to make this work, but "we" all got stuck on these issues :) --- ***As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/*** By opening this pull request, I affirm that all authors agree to the [Contributor License Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nathan Simpson <[email protected]> Co-authored-by: Nathan Simpson <[email protected]> Co-authored-by: Nathan Simpson <[email protected]> Co-authored-by: Ana Prieto Nemesio <[email protected]>

github-project-automation bot added this to Anemoi-dev Aug 7, 2025

github-project-automation bot moved this to To be triaged in Anemoi-dev Aug 7, 2025

github-actions bot added the contributor label Aug 7, 2025

timothyas mentioned this pull request Aug 7, 2025

feat: Logging to AzureML with ML Flow ecmwf/anemoi-core#469

Merged

gmertes reviewed Aug 15, 2025

View reviewed changes

passive auth for aml logging

dc92e82

timothyas force-pushed the feature/azure-mlflow branch from b33268b to dc92e82 Compare August 18, 2025 17:00

timothyas and others added 3 commits August 18, 2025 11:07

incorporate comments from @gmertes and @floriankrb

c1a4190

[pre-commit.ci] auto fixes from pre-commit.com hooks

2e00fdd

for more information, see https://pre-commit.ci

unit tests for NoAuth

a613175

github-actions bot added the tests label Aug 18, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

019636c

for more information, see https://pre-commit.ci

timothyas changed the title ~~feat: PassiveAuth for AML mlflow Logging~~ feat: NoAuth for AML mlflow Logging Aug 18, 2025

gmertes approved these changes Aug 19, 2025

View reviewed changes

gmertes added the approved-for-ci label Aug 19, 2025

gmertes merged commit 732182e into ecmwf:main Aug 19, 2025
71 of 72 checks passed

github-project-automation bot moved this from To be triaged to Done in Anemoi-dev Aug 19, 2025

DeployDuck mentioned this pull request Aug 19, 2025

chore(main): Release 0.4.36 #207

Merged

timothyas added a commit to timothyas/anemoi-core that referenced this pull request Aug 19, 2025

PassiveAuth -> NoAuth a la ecmwf/anemoi-utils#200

485d312

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: NoAuth for AML mlflow Logging#200

feat: NoAuth for AML mlflow Logging#200
gmertes merged 5 commits intoecmwf:mainfrom
timothyas:feature/azure-mlflow

timothyas commented Aug 7, 2025

Uh oh!

floriankrb commented Aug 11, 2025 •

edited

Loading

Uh oh!

timothyas commented Aug 12, 2025

Uh oh!

gmertes left a comment •

edited

Loading

Uh oh!

timothyas commented Aug 18, 2025

Uh oh!

gmertes left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

timothyas commented Aug 7, 2025

Description

What problem does this change solve?

What issue or task does this change relate to?

Additional notes

Uh oh!

floriankrb commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timothyas commented Aug 12, 2025

Uh oh!

gmertes left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timothyas commented Aug 18, 2025

Uh oh!

gmertes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

floriankrb commented Aug 11, 2025 •

edited

Loading

gmertes left a comment •

edited

Loading