Skip to content

test error: ./modules/TorchIO_MONAI_PyTorch_Lightning.ipynb #987

@wyli

Description

@wyli
[2022-10-13T00:27:52.977Z] Running ./modules/TorchIO_MONAI_PyTorch_Lightning.ipynb
[2022-10-13T00:27:52.977Z] Checking PEP8 compliance...
[2022-10-13T00:27:52.977Z] Running notebook...
[2022-10-13T00:27:55.501Z] MONAI version: 1.0.0+38.g32a237a7
[2022-10-13T00:27:55.501Z] Numpy version: 1.22.2
[2022-10-13T00:27:55.501Z] Pytorch version: 1.13.0a0+d0d6b1f
[2022-10-13T00:27:55.501Z] MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
[2022-10-13T00:27:55.501Z] MONAI rev id: 32a237a7959e4890a963481c23baff84b95a253b
[2022-10-13T00:27:55.501Z] MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
[2022-10-13T00:27:55.501Z] 
[2022-10-13T00:27:55.501Z] Optional dependencies:
[2022-10-13T00:27:55.501Z] Pytorch Ignite version: 0.4.10
[2022-10-13T00:27:55.501Z] Nibabel version: 4.0.2
[2022-10-13T00:27:55.501Z] scikit-image version: 0.19.3
[2022-10-13T00:27:55.501Z] Pillow version: 9.0.1
[2022-10-13T00:27:55.501Z] Tensorboard version: 2.10.0
[2022-10-13T00:27:55.501Z] gdown version: 4.5.1
[2022-10-13T00:27:55.501Z] TorchVision version: 0.14.0a0
[2022-10-13T00:27:55.501Z] tqdm version: 4.64.1
[2022-10-13T00:27:55.501Z] lmdb version: 1.3.0
[2022-10-13T00:27:55.501Z] psutil version: 5.9.2
[2022-10-13T00:27:55.501Z] pandas version: 1.4.4
[2022-10-13T00:27:55.501Z] einops version: 0.5.0
[2022-10-13T00:27:55.501Z] transformers version: 4.21.3
[2022-10-13T00:27:55.501Z] mlflow version: 1.29.0
[2022-10-13T00:27:55.501Z] pynrrd version: 1.0.0
[2022-10-13T00:27:55.501Z] 
[2022-10-13T00:27:55.501Z] For details about installing the optional dependencies, please visit:
[2022-10-13T00:27:55.501Z]     https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
[2022-10-13T00:27:55.501Z] 
[2022-10-13T00:27:56.870Z] /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
[2022-10-13T00:27:56.870Z]   warnings.warn(
[2022-10-13T00:29:09.859Z] 
Executing:   0%|          | 0/39 [00:00<?, ?cell/s]
Executing:   3%|▎         | 1/39 [00:01<00:49,  1.30s/cell]
Executing:  10%|█         | 4/39 [00:06<00:59,  1.69s/cell]
Executing:  15%|█▌        | 6/39 [00:38<04:17,  7.81s/cell]
Executing:  18%|█▊        | 7/39 [00:55<05:22, 10.07s/cell]
Executing:  21%|██        | 8/39 [00:59<04:23,  8.51s/cell]
Executing:  44%|████▎     | 17/39 [01:04<00:54,  2.47s/cell]
Executing:  62%|██████▏   | 24/39 [01:11<00:26,  1.76s/cell]
Executing:  62%|██████▏   | 24/39 [01:13<00:45,  3.05s/cell]
[2022-10-13T00:29:09.859Z] /opt/conda/lib/python3.8/site-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
[2022-10-13T00:29:09.859Z]   warnings.warn(
[2022-10-13T00:29:09.859Z] Traceback (most recent call last):
[2022-10-13T00:29:09.859Z]   File "/opt/conda/bin/papermill", line 8, in <module>
[2022-10-13T00:29:09.859Z]     sys.exit(papermill())
[2022-10-13T00:29:09.859Z]   File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
[2022-10-13T00:29:09.859Z]     return self.main(*args, **kwargs)
[2022-10-13T00:29:09.859Z]   File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1055, in main
[2022-10-13T00:29:09.859Z]     rv = self.invoke(ctx)
[2022-10-13T00:29:09.859Z]   File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
[2022-10-13T00:29:09.859Z]     return ctx.invoke(self.callback, **ctx.params)
[2022-10-13T00:29:09.859Z]   File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 760, in invoke
[2022-10-13T00:29:09.859Z]     return __callback(*args, **kwargs)
[2022-10-13T00:29:09.859Z]   File "/opt/conda/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
[2022-10-13T00:29:09.859Z]     return f(get_current_context(), *args, **kwargs)
[2022-10-13T00:29:09.859Z]   File "/opt/conda/lib/python3.8/site-packages/papermill/cli.py", line 250, in papermill
[2022-10-13T00:29:09.859Z]     execute_notebook(
[2022-10-13T00:29:09.859Z]   File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 128, in execute_notebook
[2022-10-13T00:29:09.859Z]     raise_for_execution_errors(nb, output_path)
[2022-10-13T00:29:09.859Z]   File "/opt/conda/lib/python3.8/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
[2022-10-13T00:29:09.859Z]     raise error
[2022-10-13T00:29:09.859Z] papermill.exceptions.PapermillExecutionError: 
[2022-10-13T00:29:09.859Z] ---------------------------------------------------------------------------
[2022-10-13T00:29:09.859Z] Exception encountered at "In [11]":
[2022-10-13T00:29:09.859Z] ---------------------------------------------------------------------------
[2022-10-13T00:29:09.859Z] RuntimeError                              Traceback (most recent call last)
[2022-10-13T00:29:09.859Z] Cell In [11], line 3
[2022-10-13T00:29:09.859Z]       1 start = datetime.now()
[2022-10-13T00:29:09.859Z]       2 print('Training started at', start)
[2022-10-13T00:29:09.859Z] ----> 3 trainer.fit(model=model, datamodule=data)
[2022-10-13T00:29:09.859Z]       4 print('Training duration:', datetime.now() - start)
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:740, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, train_dataloader, ckpt_path)
[2022-10-13T00:29:09.859Z]     735     rank_zero_deprecation(
[2022-10-13T00:29:09.859Z]     736         "`trainer.fit(train_dataloader)` is deprecated in v1.4 and will be removed in v1.6."
[2022-10-13T00:29:09.859Z]     737         " Use `trainer.fit(train_dataloaders)` instead. HINT: added 's'"
[2022-10-13T00:29:09.859Z]     738     )
[2022-10-13T00:29:09.859Z]     739     train_dataloaders = train_dataloader
[2022-10-13T00:29:09.859Z] --> 740 self._call_and_handle_interrupt(
[2022-10-13T00:29:09.859Z]     741     self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
[2022-10-13T00:29:09.859Z]     742 )
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:685, in Trainer._call_and_handle_interrupt(self, trainer_fn, *args, **kwargs)
[2022-10-13T00:29:09.859Z]     675 r"""
[2022-10-13T00:29:09.859Z]     676 Error handling, intended to be used only for main trainer function entry points (fit, validate, test, predict)
[2022-10-13T00:29:09.859Z]     677 as all errors should funnel through them
[2022-10-13T00:29:09.859Z]    (...)
[2022-10-13T00:29:09.859Z]     682     **kwargs: keyword arguments to be passed to `trainer_fn`
[2022-10-13T00:29:09.859Z]     683 """
[2022-10-13T00:29:09.859Z]     684 try:
[2022-10-13T00:29:09.859Z] --> 685     return trainer_fn(*args, **kwargs)
[2022-10-13T00:29:09.859Z]     686 # TODO: treat KeyboardInterrupt as BaseException (delete the code below) in v1.7
[2022-10-13T00:29:09.859Z]     687 except KeyboardInterrupt as exception:
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:777, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
[2022-10-13T00:29:09.859Z]     775 # TODO: ckpt_path only in v1.7
[2022-10-13T00:29:09.859Z]     776 ckpt_path = ckpt_path or self.resume_from_checkpoint
[2022-10-13T00:29:09.859Z] --> 777 self._run(model, ckpt_path=ckpt_path)
[2022-10-13T00:29:09.859Z]     779 assert self.state.stopped
[2022-10-13T00:29:09.859Z]     780 self.training = False
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1199, in Trainer._run(self, model, ckpt_path)
[2022-10-13T00:29:09.859Z]    1196 self.checkpoint_connector.resume_end()
[2022-10-13T00:29:09.859Z]    1198 # dispatch `start_training` or `start_evaluating` or `start_predicting`
[2022-10-13T00:29:09.859Z] -> 1199 self._dispatch()
[2022-10-13T00:29:09.859Z]    1201 # plugin will finalized fitting (e.g. ddp_spawn will load trained model)
[2022-10-13T00:29:09.859Z]    1202 self._post_dispatch()
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1279, in Trainer._dispatch(self)
[2022-10-13T00:29:09.859Z]    1277     self.training_type_plugin.start_predicting(self)
[2022-10-13T00:29:09.859Z]    1278 else:
[2022-10-13T00:29:09.859Z] -> 1279     self.training_type_plugin.start_training(self)
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py:202, in TrainingTypePlugin.start_training(self, trainer)
[2022-10-13T00:29:09.859Z]     200 def start_training(self, trainer: "pl.Trainer") -> None:
[2022-10-13T00:29:09.859Z]     201     # double dispatch to initiate the training loop
[2022-10-13T00:29:09.859Z] --> 202     self._results = trainer.run_stage()
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1289, in Trainer.run_stage(self)
[2022-10-13T00:29:09.859Z]    1287 if self.predicting:
[2022-10-13T00:29:09.859Z]    1288     return self._run_predict()
[2022-10-13T00:29:09.859Z] -> 1289 return self._run_train()
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1311, in Trainer._run_train(self)
[2022-10-13T00:29:09.859Z]    1308 if not self.is_global_zero and self.progress_bar_callback is not None:
[2022-10-13T00:29:09.859Z]    1309     self.progress_bar_callback.disable()
[2022-10-13T00:29:09.859Z] -> 1311 self._run_sanity_check(self.lightning_module)
[2022-10-13T00:29:09.859Z]    1313 # enable train mode
[2022-10-13T00:29:09.859Z]    1314 self.model.train()
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1375, in Trainer._run_sanity_check(self, ref_model)
[2022-10-13T00:29:09.859Z]    1373 # run eval step
[2022-10-13T00:29:09.859Z]    1374 with torch.no_grad():
[2022-10-13T00:29:09.859Z] -> 1375     self._evaluation_loop.run()
[2022-10-13T00:29:09.859Z]    1377 self.call_hook("on_sanity_check_end")
[2022-10-13T00:29:09.859Z]    1379 # reset logger connector
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py:145, in Loop.run(self, *args, **kwargs)
[2022-10-13T00:29:09.859Z]     143 try:
[2022-10-13T00:29:09.859Z]     144     self.on_advance_start(*args, **kwargs)
[2022-10-13T00:29:09.859Z] --> 145     self.advance(*args, **kwargs)
[2022-10-13T00:29:09.859Z]     146     self.on_advance_end()
[2022-10-13T00:29:09.859Z]     147     self.restarting = False
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py:110, in EvaluationLoop.advance(self, *args, **kwargs)
[2022-10-13T00:29:09.859Z]     105 self.data_fetcher = dataloader = self.trainer._data_connector.get_profiled_dataloader(
[2022-10-13T00:29:09.859Z]     106     dataloader, dataloader_idx=dataloader_idx
[2022-10-13T00:29:09.859Z]     107 )
[2022-10-13T00:29:09.859Z]     108 dl_max_batches = self._max_batches[dataloader_idx]
[2022-10-13T00:29:09.859Z] --> 110 dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
[2022-10-13T00:29:09.859Z]     112 # store batch level output per dataloader
[2022-10-13T00:29:09.859Z]     113 self.outputs.append(dl_outputs)
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/base.py:145, in Loop.run(self, *args, **kwargs)
[2022-10-13T00:29:09.859Z]     143 try:
[2022-10-13T00:29:09.859Z]     144     self.on_advance_start(*args, **kwargs)
[2022-10-13T00:29:09.859Z] --> 145     self.advance(*args, **kwargs)
[2022-10-13T00:29:09.859Z]     146     self.on_advance_end()
[2022-10-13T00:29:09.859Z]     147     self.restarting = False
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py:122, in EvaluationEpochLoop.advance(self, data_fetcher, dataloader_idx, dl_max_batches, num_dataloaders)
[2022-10-13T00:29:09.859Z]     120 # lightning module methods
[2022-10-13T00:29:09.859Z]     121 with self.trainer.profiler.profile("evaluation_step_and_end"):
[2022-10-13T00:29:09.859Z] --> 122     output = self._evaluation_step(batch, batch_idx, dataloader_idx)
[2022-10-13T00:29:09.859Z]     123     output = self._evaluation_step_end(output)
[2022-10-13T00:29:09.859Z]     125 self.batch_progress.increment_processed()
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py:217, in EvaluationEpochLoop._evaluation_step(self, batch, batch_idx, dataloader_idx)
[2022-10-13T00:29:09.859Z]     215     self.trainer.lightning_module._current_fx_name = "validation_step"
[2022-10-13T00:29:09.859Z]     216     with self.trainer.profiler.profile("validation_step"):
[2022-10-13T00:29:09.859Z] --> 217         output = self.trainer.accelerator.validation_step(step_kwargs)
[2022-10-13T00:29:09.859Z]     219 return output
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py:239, in Accelerator.validation_step(self, step_kwargs)
[2022-10-13T00:29:09.859Z]     234 """The actual validation step.
[2022-10-13T00:29:09.859Z]     235 
[2022-10-13T00:29:09.859Z]     236 See :meth:`~pytorch_lightning.core.lightning.LightningModule.validation_step` for more details
[2022-10-13T00:29:09.859Z]     237 """
[2022-10-13T00:29:09.859Z]     238 with self.precision_plugin.val_step_context():
[2022-10-13T00:29:09.859Z] --> 239     return self.training_type_plugin.validation_step(*step_kwargs.values())
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py:219, in TrainingTypePlugin.validation_step(self, *args, **kwargs)
[2022-10-13T00:29:09.859Z]     218 def validation_step(self, *args, **kwargs):
[2022-10-13T00:29:09.859Z] --> 219     return self.model.validation_step(*args, **kwargs)
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] Cell In [9], line 29, in Model.validation_step(self, batch, batch_idx)
[2022-10-13T00:29:09.859Z]      27 def validation_step(self, batch, batch_idx):
[2022-10-13T00:29:09.859Z]      28     y_hat, y = self.infer_batch(batch)
[2022-10-13T00:29:09.859Z] ---> 29     loss = self.criterion(y_hat, y)
[2022-10-13T00:29:09.859Z]      30     self.log('val_loss', loss)
[2022-10-13T00:29:09.859Z]      31     return loss
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
[2022-10-13T00:29:09.859Z]    1098 # If we don't have any hooks, we want to skip the rest of the logic in
[2022-10-13T00:29:09.859Z]    1099 # this function, and just call forward.
[2022-10-13T00:29:09.859Z]    1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
[2022-10-13T00:29:09.859Z]    1101         or _global_forward_hooks or _global_forward_pre_hooks):
[2022-10-13T00:29:09.859Z] -> 1102     return forward_call(*input, **kwargs)
[2022-10-13T00:29:09.859Z]    1103 # Do not call functions when jit is used
[2022-10-13T00:29:09.859Z]    1104 full_backward_hooks, non_full_backward_hooks = [], []
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/losses/dice.py:732, in DiceCELoss.forward(self, input, target)
[2022-10-13T00:29:09.859Z]     729     raise ValueError("the number of dimensions for input and target should be the same.")
[2022-10-13T00:29:09.859Z]     731 dice_loss = self.dice(input, target)
[2022-10-13T00:29:09.859Z] --> 732 ce_loss = self.ce(input, target)
[2022-10-13T00:29:09.859Z]     733 total_loss: torch.Tensor = self.lambda_dice * dice_loss + self.lambda_ce * ce_loss
[2022-10-13T00:29:09.859Z]     735 return total_loss
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/losses/dice.py:715, in DiceCELoss.ce(self, input, target)
[2022-10-13T00:29:09.859Z]     709     warnings.warn(
[2022-10-13T00:29:09.859Z]     710         f"Multichannel targets are not supported in this older Pytorch version {torch.__version__}. "
[2022-10-13T00:29:09.859Z]     711         "Using argmax (as a workaround) to convert target to a single channel."
[2022-10-13T00:29:09.859Z]     712     )
[2022-10-13T00:29:09.859Z]     713     target = torch.argmax(target, dim=1)
[2022-10-13T00:29:09.859Z] --> 715 return self.cross_entropy(input, target)
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
[2022-10-13T00:29:09.859Z]    1098 # If we don't have any hooks, we want to skip the rest of the logic in
[2022-10-13T00:29:09.859Z]    1099 # this function, and just call forward.
[2022-10-13T00:29:09.859Z]    1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
[2022-10-13T00:29:09.859Z]    1101         or _global_forward_hooks or _global_forward_pre_hooks):
[2022-10-13T00:29:09.859Z] -> 1102     return forward_call(*input, **kwargs)
[2022-10-13T00:29:09.859Z]    1103 # Do not call functions when jit is used
[2022-10-13T00:29:09.859Z]    1104 full_backward_hooks, non_full_backward_hooks = [], []
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/loss.py:1150, in CrossEntropyLoss.forward(self, input, target)
[2022-10-13T00:29:09.859Z]    1149 def forward(self, input: Tensor, target: Tensor) -> Tensor:
[2022-10-13T00:29:09.859Z] -> 1150     return F.cross_entropy(input, target, weight=self.weight,
[2022-10-13T00:29:09.859Z]    1151                            ignore_index=self.ignore_index, reduction=self.reduction,
[2022-10-13T00:29:09.859Z]    1152                            label_smoothing=self.label_smoothing)
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] File /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2846, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
[2022-10-13T00:29:09.859Z]    2844 if size_average is not None or reduce is not None:
[2022-10-13T00:29:09.859Z]    2845     reduction = _Reduction.legacy_get_string(size_average, reduce)
[2022-10-13T00:29:09.859Z] -> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
[2022-10-13T00:29:09.859Z] 
[2022-10-13T00:29:09.859Z] RuntimeError: Expected floating point type for target with class probabilities, got Byte

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions