How to do Shuffling in HDF5 input data to prevent training failure

I am currently training my multi-label regression model with HDF5 data layer. However, I notice that my training loss going up and down periodically (green curve). Since I merge two dataset together to train the model, and my train_list.txt are something like following,

```
datasetA_1.h5
datasetA_2.h5
datasetA_3.h5
...
datasetB_1.h5
datasetB_2.h5
datasetB_3.h5
```

![image](https://cloud.githubusercontent.com/assets/289501/4577557/86a308e4-4fbd-11e4-9a9b-df57ba75ac6b.png)

I find that the train loss raises at about 0.5 when training on dataset A h5 files, and jumps to about 0.2 on dataset B. After all h5 files have been trained, it goes back to the first h5. This is the reason why the loss acts like a "square wave", although I shuffled the data in each h5 file.

I think maybe **shuffle** would solve this problem. Unfortunately, there's no SHUFFLE support in HDF5 data layer (unlike leveldb layer). 

Can we solve this problem in an alternative way?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do Shuffling in HDF5 input data to prevent training failure #1249

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to do Shuffling in HDF5 input data to prevent training failure #1249

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions