Skip to content

Job CLI: create Job, submit Job, list_templates, show_variables#1888

Merged
chesterxgchen merged 60 commits intoNVIDIA:mainfrom
chesterxgchen:nvflare_job
Aug 18, 2023
Merged

Job CLI: create Job, submit Job, list_templates, show_variables#1888
chesterxgchen merged 60 commits intoNVIDIA:mainfrom
chesterxgchen:nvflare_job

Conversation

@chesterxgchen
Copy link
Copy Markdown
Collaborator

@chesterxgchen chesterxgchen commented Aug 1, 2023

Description

This PR depends on #1923 (duplicated changes)

Goals:

Allow users to quickly learn job configuration and adapt user's own code to the new job configuration by providing a set of templates.

nvflare job list_templates

nvflare  job list_templates  -d integration/job_templates/

or
simply

nvflare  job list_templates  

if the job_templates is specified in hidden config or env. variable

------------------------------------------------------------------------------------------------------------------------
   name            description                                                  controller type      client category     
------------------------------------------------------------------------------------------------------------------------
   cyclic_tf       NA                                                           NA                   NA                  
   sag_cross_pt    scatter & gather and cross-site validation workflow using py server               client_api          
   sag_pt          scatter & gather workflow using pytorch                      server               client_api          
   stats_df        FedStats: tabular data with pandas                           server               stats executor      
------------------------------------------------------------------------------------------------------------------------

nvflare job create

nvflare job create -j /tmp/nvflare/stats_job -w stats_df  -force -f config_fed_server.conf bins=20 -f config_fed_client.conf min_noise_level=0.3   -f meta.conf min_clients=2
---------------------------------------------------------------------------------------------------------------------------------------

  job folder: /tmp/nvflare/stats_job

---------------------------------------------------------------------------------------------------------------------------------------
  file_name                      var_name                       value                               component
---------------------------------------------------------------------------------------------------------------------------------------
  config_exchange.conf           exchange_format                pytorch
  config_exchange.conf           transfer_type                  DIFF

  config_fed_client.conf         data_path                      data.csv                            DFStatistics
  config_fed_client.conf         max_bins_percent               10                                  HistogramBinsCleanser
  config_fed_client.conf         max_noise_level                0.3                                 AddNoiseToMinMax
  config_fed_client.conf         min_count                      10                                  MinCountCleanser
  config_fed_client.conf         min_noise_level                0.3                                 AddNoiseToMinMax
  config_fed_client.conf         precision                      4
  config_fed_client.conf         result_cleanser_ids            ['min_count_cleanser', 'min_max_noi

  config_fed_server.conf         bins                           20
  config_fed_server.conf         enable_pre_run_task            False                               StatisticsController
  config_fed_server.conf         json_encoder_path              nvflare.app_common.utils.json_utils JsonStatsFileWriter
  config_fed_server.conf         output_path                    statistics/adults_stats.json        JsonStatsFileWriter
  config_fed_server.conf         precision                      4                                   StatisticsController
  config_fed_server.conf         range                          [0, 120]
  config_fed_server.conf         result_wait_timeout            10                                  StatisticsController
  config_fed_server.conf         wait_time_after_min_received   1                                   StatisticsController

  meta.conf                      app                            ['@ALL']
  meta.conf                      mandatory_clients              []
  meta.conf                      min_clients                    2
  meta.conf                      scope                          []

---------------------------------------------------------------------------------------------------------------------------------------

nvflare job show_variables

nvflare job show_variables -j /tmp/nvflare/stats_job 
The following are the variables you can change in the template

----------------------------------------------------------------------------------------------------
                                                                                                    
    job folder: /tmp/nvflare/stats_job                                                                  
                                                                                                    
----------------------------------------------------------------------------------------------------
    file_name                           var_name                  value                    
----------------------------------------------------------------------------------------------------
    config_fed_client.conf              data_path                 data.csv
    config_fed_client.conf              max_bins_percent          10
    config_fed_client.conf              max_noise_level           0.3
    config_fed_client.conf              min_count                 10
    config_fed_client.conf              min_noise_level           0.3

    config_fed_server.conf              bins                      20
    config_fed_server.conf              enable_pre_run_task       False
    config_fed_server.conf              json_encoder_path         nvflare.app_common.utils.json_utils.ObjectEncoder
    config_fed_server.conf              output_path               statistics/adults_stats.json
    config_fed_server.conf              range                     [0, 120]

    meta.conf                           app                       ['@ALL']
    meta.conf                           mandatory_clients         []
    meta.conf                           min_clients               2
    meta.conf                           scope                     []

----------------------------------------------------------------------------------------------------

nvflare job submit

  • require POC or production system
nvflare job submit -j /tmp/nvflare/stats_job -f config_fed_server.conf bins=20 -f config_fed_client.conf min_noise_level=0.3   -f meta.conf min_clients=2

or simply

nvflare job submit -j /tmp/nvflare/stats_job 

Limitation Already know at this point

  • site-specific configurations are not handled (such as site-1, 2, each have separate config_fed_client.json etc.)

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@chesterxgchen chesterxgchen marked this pull request as draft August 1, 2023 05:30
@chesterxgchen chesterxgchen force-pushed the nvflare_job branch 5 times, most recently from 4fd5bd6 to c89ad08 Compare August 3, 2023 16:27
@YuanTingHsieh YuanTingHsieh changed the base branch from dev to main August 4, 2023 21:05
@chesterxgchen chesterxgchen force-pushed the nvflare_job branch 4 times, most recently from 1673b45 to 6f65c78 Compare August 10, 2023 15:46
@chesterxgchen chesterxgchen changed the title Job CLI: Create Job and submit Job Job CLI: create Job, submit Job, list_templates, show_variables Aug 12, 2023
@chesterxgchen chesterxgchen requested review from YuanTingHsieh, holgerroth and yanchengnv and removed request for YuanTingHsieh August 12, 2023 00:42
@chesterxgchen chesterxgchen marked this pull request as ready for review August 12, 2023 00:50
@chesterxgchen chesterxgchen force-pushed the nvflare_job branch 2 times, most recently from ed17c28 to a1610c7 Compare August 12, 2023 05:35
Copy link
Copy Markdown
Collaborator

@YuanTingHsieh YuanTingHsieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
Some comments and questions.

@chesterxgchen chesterxgchen marked this pull request as draft August 16, 2023 05:20
@chesterxgchen chesterxgchen marked this pull request as ready for review August 17, 2023 21:23
@chesterxgchen chesterxgchen marked this pull request as draft August 17, 2023 21:38
@chesterxgchen chesterxgchen marked this pull request as ready for review August 17, 2023 22:58
@chesterxgchen
Copy link
Copy Markdown
Collaborator Author

/build

IsaacYangSLA
IsaacYangSLA previously approved these changes Aug 18, 2023
Copy link
Copy Markdown
Collaborator

@IsaacYangSLA IsaacYangSLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From user's point of view, this new command is useful and works well.

holgerroth
holgerroth previously approved these changes Aug 18, 2023
Copy link
Copy Markdown
Collaborator

@holgerroth holgerroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful.

@chesterxgchen chesterxgchen dismissed stale reviews from holgerroth and IsaacYangSLA via 708d6dd August 18, 2023 18:28
@chesterxgchen
Copy link
Copy Markdown
Collaborator Author

/build

@chesterxgchen chesterxgchen merged commit 4ba505a into NVIDIA:main Aug 18, 2023
@chesterxgchen chesterxgchen deleted the nvflare_job branch August 18, 2023 18:47
holgerroth pushed a commit to nanaHa1003/NVFlare that referenced this pull request Sep 6, 2023
Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (NVIDIA#1939)

Fixed the recursive FLComponents creation. (NVIDIA#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (NVIDIA#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <[email protected]>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <[email protected]>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <[email protected]>
Co-authored-by: KumoLiu <[email protected]>

Re-add cli persistent history (NVIDIA#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Job CLI:  create Job,  submit Job, list_templates, show_variables (NVIDIA#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (NVIDIA#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (NVIDIA#1937)

make sure the Job CLI support multi-config formats (NVIDIA#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (NVIDIA#1948)

update POC tutorials (NVIDIA#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (NVIDIA#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (NVIDIA#1950)

Vertical XGBoost with PSI integration (NVIDIA#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>
Co-authored-by: Chester Chen <[email protected]>

Allow metric negation in model selection (NVIDIA#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (NVIDIA#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <[email protected]>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (NVIDIA#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (NVIDIA#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (NVIDIA#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (NVIDIA#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (NVIDIA#1961)

Improved error handling and fixed memory leak (NVIDIA#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <[email protected]>

Fix unit test and integration tests (NVIDIA#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (NVIDIA#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <[email protected]>
Co-authored-by: Yan Cheng <[email protected]>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Update POC tutorials and fix POC bugs (NVIDIA#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add Job CLI Tutorials and step-by-step initial examples  (NVIDIA#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (NVIDIA#1965)

Remove some POC stop message (NVIDIA#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (NVIDIA#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (NVIDIA#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (NVIDIA#1975)

update readme
holgerroth added a commit that referenced this pull request Sep 6, 2023
…stillation for Federated Learning from Partially Annotated Data" [skip ci] (#1940)

* Add implementation to ConDistFL research folder.

Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (#1939)

Fixed the recursive FLComponents creation. (#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <[email protected]>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <[email protected]>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <[email protected]>
Co-authored-by: KumoLiu <[email protected]>

Re-add cli persistent history (#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Job CLI:  create Job,  submit Job, list_templates, show_variables (#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (#1937)

make sure the Job CLI support multi-config formats (#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (#1948)

update POC tutorials (#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (#1950)

Vertical XGBoost with PSI integration (#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>
Co-authored-by: Chester Chen <[email protected]>

Allow metric negation in model selection (#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <[email protected]>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (#1961)

Improved error handling and fixed memory leak (#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <[email protected]>

Fix unit test and integration tests (#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <[email protected]>
Co-authored-by: Yan Cheng <[email protected]>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Update POC tutorials and fix POC bugs (#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add Job CLI Tutorials and step-by-step initial examples  (#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (#1965)

Remove some POC stop message (#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (#1975)

update readme

* remove old file

* formatting

---------

Co-authored-by: Holger Roth <[email protected]>
holgerroth pushed a commit to holgerroth/NVFlare that referenced this pull request Dec 4, 2023
…IA#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np
holgerroth added a commit to holgerroth/NVFlare that referenced this pull request Dec 4, 2023
…stillation for Federated Learning from Partially Annotated Data" [skip ci] (NVIDIA#1940)

* Add implementation to ConDistFL research folder.

Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (NVIDIA#1939)

Fixed the recursive FLComponents creation. (NVIDIA#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (NVIDIA#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <[email protected]>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <[email protected]>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <[email protected]>
Co-authored-by: KumoLiu <[email protected]>

Re-add cli persistent history (NVIDIA#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Job CLI:  create Job,  submit Job, list_templates, show_variables (NVIDIA#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (NVIDIA#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (NVIDIA#1937)

make sure the Job CLI support multi-config formats (NVIDIA#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (NVIDIA#1948)

update POC tutorials (NVIDIA#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (NVIDIA#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (NVIDIA#1950)

Vertical XGBoost with PSI integration (NVIDIA#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>
Co-authored-by: Chester Chen <[email protected]>

Allow metric negation in model selection (NVIDIA#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (NVIDIA#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <[email protected]>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (NVIDIA#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (NVIDIA#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (NVIDIA#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (NVIDIA#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (NVIDIA#1961)

Improved error handling and fixed memory leak (NVIDIA#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <[email protected]>

Fix unit test and integration tests (NVIDIA#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (NVIDIA#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <[email protected]>
Co-authored-by: Yan Cheng <[email protected]>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Update POC tutorials and fix POC bugs (NVIDIA#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add Job CLI Tutorials and step-by-step initial examples  (NVIDIA#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (NVIDIA#1965)

Remove some POC stop message (NVIDIA#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (NVIDIA#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (NVIDIA#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (NVIDIA#1975)

update readme

* remove old file

* formatting

---------

Co-authored-by: Holger Roth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants