Skip to content

POC Upgrade 2#1944

Merged
chesterxgchen merged 2 commits intoNVIDIA:mainfrom
chesterxgchen:POC_update2
Aug 18, 2023
Merged

POC Upgrade 2#1944
chesterxgchen merged 2 commits intoNVIDIA:mainfrom
chesterxgchen:POC_update2

Conversation

@chesterxgchen
Copy link
Copy Markdown
Collaborator

@chesterxgchen chesterxgchen commented Aug 18, 2023

Description

  1. POC code base move from nvflare.lighter to nvflare.tools.poc directory
  2. Re-design the parser to change the syntax to remove "--" for action commands, for example:
    nvflare poc --prepare ===> nvflare poc prepare
    nvflare poc --start ===> nvflare poc start
    etc.
  3. Change the default workspace (/tmp/nvflare/poc) to a different location. Previously, we used the NVFLARE_POC_WORKSPACE env. variable. Now we use the contents of a hidden nvflare config file (at ~/.nvflare/config.conf), and if nothing defined, then using the NVFALRE_POC_WORKSPACE env. variable. The hidden config file is created/updated during nvflare poc prepare if it does not exist. You can manually edit the config file to set the desired paths.
cat ~/.nvflare/config.conf 

    startup_kit {
        path = /tmp/nvflare/poc1/example_project/prod_00
    }

    poc_workspace {
        path = /tmp/nvflare/poc
    }

  1. Each sub-command now has its own parser.
nvflare poc -h
usage: nvflare poc [-h] {prepare,prepare-examples,start,stop,clean} ...

optional arguments:
  -h, --help            show this help message and exit

poc:
  {prepare,prepare-examples,start,stop,clean}
                        poc subcommand
    prepare             prepare poc
    prepare-examples    prepare examples
    start               start services in poc mode
    stop                stop services in poc mode
    clean               clean up poc workspace

  • nvflare poc prepare -h
 nvflare poc prepare -h
usage: nvflare poc prepare [-h] [-n [NUMBER_OF_CLIENTS]] [-c [CLIENTS [CLIENTS ...]]] [-e [EXAMPLES]] [-he] [-i [PROJECT_INPUT]] [-d [DOCKER_IMAGE]] [-debug]

optional arguments:
  -h, --help            show this help message and exit
  -n [NUMBER_OF_CLIENTS], --number_of_clients [NUMBER_OF_CLIENTS]
                        number of sites or clients, default to 2
  -c [CLIENTS [CLIENTS ...]], --clients [CLIENTS [CLIENTS ...]]
                        Space separated client names. If specified, number_of_clients argument will be ignored.
  -e [EXAMPLES], --examples [EXAMPLES]
                        examples directory
  -he, --he             enable homomorphic encryption.
  -i [PROJECT_INPUT], --project_input [PROJECT_INPUT]
                        project.yaml file path, If specified, 'number_of_clients','clients' and 'docker' specific options will be ignored.
  -d [DOCKER_IMAGE], --docker_image [DOCKER_IMAGE]
                        generate docker.sh based on the docker_image, used in '--prepare' command. and generate docker.sh 'start/stop' commands will start with docker.sh
  -debug, --debug       debug is on


A few sentences describing the changes proposed in this pull request.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

refactoring POC
format and dependency
change the logic of get poc workspace
@chesterxgchen
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@chesterxgchen
Copy link
Copy Markdown
Collaborator Author

/build

Copy link
Copy Markdown
Collaborator

@nvkevlu nvkevlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me. Just need to make sure we update everywhere in the documentation and also make users aware.

@chesterxgchen chesterxgchen merged commit 94f1026 into NVIDIA:main Aug 18, 2023
@chesterxgchen chesterxgchen deleted the POC_update2 branch August 18, 2023 19:45
holgerroth pushed a commit to nanaHa1003/NVFlare that referenced this pull request Sep 6, 2023
Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (NVIDIA#1939)

Fixed the recursive FLComponents creation. (NVIDIA#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (NVIDIA#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <[email protected]>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <[email protected]>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <[email protected]>
Co-authored-by: KumoLiu <[email protected]>

Re-add cli persistent history (NVIDIA#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Job CLI:  create Job,  submit Job, list_templates, show_variables (NVIDIA#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (NVIDIA#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (NVIDIA#1937)

make sure the Job CLI support multi-config formats (NVIDIA#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (NVIDIA#1948)

update POC tutorials (NVIDIA#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (NVIDIA#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (NVIDIA#1950)

Vertical XGBoost with PSI integration (NVIDIA#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>
Co-authored-by: Chester Chen <[email protected]>

Allow metric negation in model selection (NVIDIA#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (NVIDIA#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <[email protected]>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (NVIDIA#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (NVIDIA#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (NVIDIA#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (NVIDIA#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (NVIDIA#1961)

Improved error handling and fixed memory leak (NVIDIA#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <[email protected]>

Fix unit test and integration tests (NVIDIA#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (NVIDIA#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <[email protected]>
Co-authored-by: Yan Cheng <[email protected]>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Update POC tutorials and fix POC bugs (NVIDIA#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add Job CLI Tutorials and step-by-step initial examples  (NVIDIA#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (NVIDIA#1965)

Remove some POC stop message (NVIDIA#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (NVIDIA#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (NVIDIA#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (NVIDIA#1975)

update readme
holgerroth added a commit that referenced this pull request Sep 6, 2023
…stillation for Federated Learning from Partially Annotated Data" [skip ci] (#1940)

* Add implementation to ConDistFL research folder.

Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (#1939)

Fixed the recursive FLComponents creation. (#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <[email protected]>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <[email protected]>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <[email protected]>
Co-authored-by: KumoLiu <[email protected]>

Re-add cli persistent history (#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Job CLI:  create Job,  submit Job, list_templates, show_variables (#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (#1937)

make sure the Job CLI support multi-config formats (#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (#1948)

update POC tutorials (#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (#1950)

Vertical XGBoost with PSI integration (#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>
Co-authored-by: Chester Chen <[email protected]>

Allow metric negation in model selection (#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <[email protected]>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (#1961)

Improved error handling and fixed memory leak (#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <[email protected]>

Fix unit test and integration tests (#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <[email protected]>
Co-authored-by: Yan Cheng <[email protected]>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Update POC tutorials and fix POC bugs (#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add Job CLI Tutorials and step-by-step initial examples  (#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (#1965)

Remove some POC stop message (#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (#1975)

update readme

* remove old file

* formatting

---------

Co-authored-by: Holger Roth <[email protected]>
holgerroth pushed a commit to holgerroth/NVFlare that referenced this pull request Dec 4, 2023
* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main
holgerroth added a commit to holgerroth/NVFlare that referenced this pull request Dec 4, 2023
…stillation for Federated Learning from Partially Annotated Data" [skip ci] (NVIDIA#1940)

* Add implementation to ConDistFL research folder.

Remove redundant files and update running script.

Add license header for research/condist_fl.

Wrap training scripts inside main method (NVIDIA#1939)

Fixed the recursive FLComponents creation. (NVIDIA#1934)

* Fixed the resursive FLComponents creation.

* Remove the temp_fl_ctx change.

* Removed no used import.

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Rename Cell to CoreCell (cell.py -> core_cell.py)
Rename NewCell to Cell (new_cell.py -> cell.py)

Remove comment and unused codes

Add experiment tracking with MONAI for MetricExchanger (NVIDIA#1925)

* add `stats_sender_id` in `ClientAlgoExecutor`

Signed-off-by: KumoLiu <[email protected]>

* add `NVFlareStatsHandler`

Signed-off-by: KumoLiu <[email protected]>

* add experiment tracking with MONAI for MetricExchanger

* fix ci

* remove log_writer_metrics_exchanger.py which was not supposed to be there

* make changes after discussion about PR

* fix ci

* make fixes from PR comments

* make fixes from PR comments

* make fixes from PR comments

---------

Signed-off-by: KumoLiu <[email protected]>
Co-authored-by: KumoLiu <[email protected]>

Re-add cli persistent history (NVIDIA#1938)

* re-add cli persistent history

* change history file

* change to pathlib

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Job CLI:  create Job,  submit Job, list_templates, show_variables (NVIDIA#1888)

* add nvflare job command
update the job config setup

* fix meta.json error

* fix few bugs

* change with the new format

* Working in Progress

* move import to the top

* ALl things worked, Stil need cleanup and unit tests

* add missing transfer type

* restore

* restore

* restore

* restore

* 1. move some code to cli_utils.py
2. add CROSS validation workflow
3. avoid duplicated components, and empty components and empty executors
4. add nvflare config

* restore the change

* restore the change

* restore

* add pyhocon as required dependency

* restore setup dev version ( separate PR will do this part)

* reduce number of files

* restore

* add unit test

* 1. ConfigTreeEx
2. add unit tests

* Now design WORKING in PROGRESS

* add nvflare job show_workflows

* update create job

* WIP

* working in progress

* WIP

* working in progress

* working in progress

* working in progress

* add variable values

* working in progress

* working in progress

* rebase

* remove download

* fix 1 unit test

* CLI complete ( todo need to remove simulator related changes after another PR is merged)

* add debug on ci/unit test failure ( only on jenkins)

* temp remove a unit test

* restore

* rebase

* make pyhocon required dependency

* remove un-used files

* remove un-used files

* 1. remove un-used files
2. show_variables support all alternative formats
3. replace hard-coded names with constant variables

* Fix a refactoring introduced bug

* Fix a refactoring introduced bug

* style formats

* update client scripts

* check python versions update cross-validation workflows to use numpy

* add class arguments to the list, still have a bug

* 1. restructure the indexer, introduce keyIndex data structure.
2. merge is not refactored
2. unit-tests are not working yet.

* redesign the indexer. the code worked. still need to fix unit tests

* fix the unit-tests

* update

* clean up

* style format

* tweak

* rename the job from sag_cross_pt to sag_cross_np

POC Upgrade 2 (NVIDIA#1944)

* save startup kit location
refactoring POC
format and dependency
change the logic of get poc workspace

* rebased main

Helper and manager

Working but with gpu resource exception

Remove cc.token from resource spec to avoid confusing resource manager

Improve private function names

Use command check with fl context

Add document and remove unused codes

Reword and improve control flow

Rewrite double quote to heredocs to avoid bash/zsh issues

Update template

Fix controller unit test timing (NVIDIA#1937)

make sure the Job CLI support multi-config formats (NVIDIA#1946)

* make sure the code support multi-config formats

* make sure the code support multi-config formats

* remove debug

* style format

add workspace to config command (NVIDIA#1948)

update POC tutorials (NVIDIA#1949)

* update POC tutorials

* remove "--" in few more places

update docs after change in nvflare poc command [skip ci] (NVIDIA#1945)

* update docs after change in nvflare poc command

* remove unintended files

* add note in docs

* add to POC config info and some small fixes

* fix ci

* add note

Add Sean to build command (NVIDIA#1950)

Vertical XGBoost with PSI integration (NVIDIA#1922)

* vertical xgboost with psi integration

* formatting

* simplifying user exp

* improvements, changes to use hist executor

* minor improvements

* remove unused func

* separate psi into another job

* remove job scripts, improve data scripts

* generalize app

* add explanation for site-1 label owner

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>
Co-authored-by: Chester Chen <[email protected]>

Allow metric negation in model selection (NVIDIA#1951)

* tensorboard logging and metric negation in model selection

* update license

* update license

* update license to 2023

* revert license header

* remove tb logging

add deprecation commands (NVIDIA#1952)

* add deprecation commands

* updaste style

---------

Co-authored-by: nvkevlu <[email protected]>

Add CellCipher for secure message encryption/decryption
Add SessionKeyManager to handle key exchange and management

Fix fl model utils (NVIDIA#1902)

* Use explicit argument name instead of kwargs

* Address comments

SFM Heartbeat Support (NVIDIA#1942)

* Removed WAIT_UNTIL from Cellnet

* Added heartbeat support to all drivers

* Revert grpc keepalive to 2 Min

* Renamed capability HEARTBEAT to SEND_HEART

Enhance ML2FL API (NVIDIA#1953)

Add example figures to README.md and fix issues regarding to the PR comments.

Fix research/condist-fl license headers and update README.

Update README.md

Fix markdown syntax error in README.md

Update README.md

Update README.md

Add captions to figures.

Update README.md

Remove fobs calls (NVIDIA#1960)

* Removed the extra fobs.dumps() calls.

* removed more fobs.dump().

* Removed more fobs.dumps().

* Removed additional Fobs.dumps().

* Removed more Fobs.dumps() calls.

* Removed no use import.

Removed the not used import in cell.py (temporary) (NVIDIA#1961)

Improved error handling and fixed memory leak (NVIDIA#1921)

* Added more error handling and fixed the memory leak

* Ignore late ACKs

* Check for no payload scenario

* Addressed the PR comments, added lock, moved pop to top

---------

Co-authored-by: Chester Chen <[email protected]>

Fix unit test and integration tests (NVIDIA#1962)

* Fix f3 communicator unit test

* Update dxo meta with FLModel meta

* Fix fl model util

Client controller (NVIDIA#1913)

* initial cut.

* WIP:

* WIP:

* added filters and task for client controller.

* Working version.

* Fixed the client_sag broadcast_tasks.

* Refacftored.

* Added error handling.

* WIP: client_controller change.

* Fixed the client controller _call_task_cb().

* Extracted the apply_data_filters() and apply_result_filters().

* refactored.

* Adjust the task result cb logic.

* Added server as the client_controller target.

* Added the client controller based cyclic example.

* codestyle changes.:

* codestyle changes for example.

* Removed no use import.

* Addressed the PR review feedbacks.

* Removed the cyclic example.

* Added direction support for the filters.

* minor fix.

* added target validation.

* optimized the task_utils.

* added direction control for Scope filters.

* Moved the constants to FilterKey.

* codestyle fix.

* license header year change.

* refactoried.

* further extract the common functions for task_utils.

* passed in the proper Scope field name.

* renamed a variable.

* Changed to use hard coded field name in the Scope.

---------

Co-authored-by: Chester Chen <[email protected]>
Co-authored-by: Yan Cheng <[email protected]>
Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Update POC tutorials and fix POC bugs (NVIDIA#1958)

* Update POC tutorials

* format style

* format style

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* typos

* fixing typos

* rename method

* update wordings

* update wordings

* update wordings

* update wordings

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add Job CLI Tutorials and step-by-step initial examples  (NVIDIA#1957)

* update job template and tutorials (WIP)

update POC tutorials: WIP

update POC tutorials: WIP

add tutorial for Job CLI

style formats

style formats

style formats

wording

wording

wording

update the tutorials

format style

* wording

* fix unit tests

* fix unit tests

* format

* fix timeout issue

* fix timeout issue

* fix timeout issue

* fix style and import related changes

* typos

* fixing typos

* fixing typos

* refactory main methods

* bug fixes

* update readme.md

Add more results in the README and fix some minor issues.

Refactor format_log_message with more readability (NVIDIA#1965)

Remove some POC stop message (NVIDIA#1966)

* 1. remove some message on nvflare poc stop
2. clean up the job CLI tutorial wordings

* remove output

* format

Add experiment tracking docs (NVIDIA#1963)

* add experiment tracking docs

* add missed docs

* remove paragraph

* make edits based on PR comments

* make consistent names of functions and variables with plural of metric

---------

Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <[email protected]>

Add SimpleCellCipher to remove session key manager
Refactor common functions to serve both designs

Fix KiTS19 URL in README.md

Change dict key in the checkpoints.

Rename 'extract_tensor' function to 'array_to_list'.

Improve CLI command error handling (NVIDIA#1971)

* improve CLI command error handling

* improve CLI command error handling

* formats

polish notebook for Job CLI (NVIDIA#1975)

update readme

* remove old file

* formatting

---------

Co-authored-by: Holger Roth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants