Add initial support for generic inference by lukeyeager · Pull Request #189 · NVIDIA/DIGITS

lukeyeager · 2015-07-31T20:53:41Z

Adds a new type of task to DIGITS - Generic Inference.

DIGITS previously only supported "Image Classification," and made assumptions about the types of networks being used, the format of input data, and the way the output of the model should be interpreted.

The new task is more generalized, so you can do other things like object detection or per-pixel segmentation. The network can have one or more n-dimensional blobs, which DIGITS does not try to interpret in any way. I wrestled with a bunch of different names for this - Regression, General-Purpose Networks, Multi-blob Output Models, Dense Prediction, Other Networks, etc. None was quite accurate and simple enough, so I'm going with Generic Inference.

Remaining limitations

This is much more generic, but DIGITS still puts some restrictions on what you can do with your network:

Only image data is supported
The only supported database format is still LMDB
- So the only supported data layer type is still Data
Datasets consist of 1 or 2 LMDBs per phase (no more)

TODO before merging

Add tests

TODO after merging

Create a standard input data format for creating datasets
- Currently, you have to create the LMDBs yourself [!]

This gets rid of the "image_type" notion introduced in 4e48d71 that required test_all functions with a bunch of yields. Instead, I'm taking advantage of the fact that nose will run tests only in classes that start with "Test". This lets me add a whole new set of tests simply by adding a new class that defines a different value for IMAGE_CHANNELS or CROP_SIZE, etc. This will be helpful when creating tests for NVIDIA#189.

lukeyeager · 2015-08-06T00:56:19Z

Added tests in 4cbc17a. Can somebody test this and give me some feedback before merging?

Since DIGITS can't create the LMDBs for you yet, you can create a test dataset with the script at digits/dataset/images/generic/test_lmdb_creator.py and you can use the network here in digits/model/images/generic/test_views.py.

gheinrich · 2015-08-06T12:23:27Z

@lukeyeager this looks very promising!

I have had to make minor tweaks into digits/dataset/images/generic/test_lmdb_creator.py to get it to run in standalone mode (see).

In order to create the dataset, is it possible to have the user specify [train|val|test].txt files in the form of:
/path/to/file [y1,...,yn]
It would be nice if DIGITS could create the image and label databases from these files (in theory that would allow the user to use non-image files too).

In the model creation page, can we have the user choose which loss function they want to use?

The 'infer many images' button didn't work for me (it just prints the file names in the .txt file).

It would be nice to have path completion working for all the fields where we expect the user to provide a server path. Perhaps we could create a custom wtforms.PathField that would automatically set everything up to enable autocomplete.

lukeyeager · 2015-08-06T19:24:32Z

The 'infer many images' button didn't work for me (it just prints the file names in the .txt file).

Oh right, thanks. Fixed that.

I have had to make minor tweaks into digits/dataset/images/generic/test_lmdb_creator.py to get it to run in standalone mode (see).

Looks good, thanks.

In order to create the dataset, is it possible to have the user specify [train|val|test].txt files in the form of:
/path/to/file [y1,...,yn]
It would be nice if DIGITS could create the image and label databases from these files (in theory that would allow the user to use non-image files too).

I think this can be a discussion that comes after merging this PR (see "TODO after merging" in my OP). I've branched this out into a new issue for discussion - #197.

In the model creation page, can we have the user choose which loss function they want to use?

What do you mean? They have to choose their loss function manually in their custom prototxt at the moment, since there aren't any standard networks.

It would be nice to have path completion working for all the fields where we expect the user to provide a server path. Perhaps we could create a custom wtforms.PathField that would automatically set everything up to enable autocomplete.

Great idea! I was hoping you would come along and add your autocomplete stuff to these fields as well. Doing it with a custom field sounds like a good idea to me.

y22ma · 2015-08-06T19:49:54Z

Not sure if I'm doing this properly at all, but I tried to put in an AlexNet as the deploy.prototxt, and train it against the test data generated via test_lmdb_creator.py via:

./test_lmdb_creator.py -x 256 -y 256 -c 5000 ~/dataset/test

I'm getting the following error:

2015-08-06 15:33:16 [20150806-153315-0597] [ERROR] TypeError: Parameter to MergeFrom() must be instance of same class: expected LayerParameter got NoneType.

Just wondering if I'm doing something obviously wrong?

y22ma · 2015-08-06T20:11:57Z

I just realized that AlexNet expects 3 channels input, so the test images I generate is probably not going to work?

lukeyeager · 2015-08-06T20:28:08Z

@y22ma, thanks for the help reviewing this!

I just realized that AlexNet expects 3 channels input, so the test images I generate is probably not going to work?

No, you should be fine. The network is pretty flexible.

TypeError: Parameter to MergeFrom() must be instance of same class: expected LayerParameter got NoneType.

I ran into that error once, but I fixed it here. Apparently the issue has resurfaced somewhere else. I'm looking into this now ...

You can try running digits in debug mode to see if you get any information:

./digits-devserver --debug

gheinrich · 2015-08-06T20:41:13Z

In the model creation page, can we have the user choose which loss function they want to use?

What do you mean? They have to choose their loss function manually in their custom prototxt at the moment, since there aren't any standard networks.

I overlooked the part where the loss function is specified. This looks OK, sorry.

The "infer many" menu is working on your latest commit, thanks. A possible enhancement (I suppose in the context of #197) would be to show the ground truth when specified in the text file.

y22ma · 2015-08-06T20:45:06Z

@lukeyeager no problem, really appreciate this functionality coming together.

I think a made a mistake with my previous test by creating only 50 images instead of the intended 5000 images. I suspect that with 50 images, the default batch size are too large. Now I'm running into another issue:

2015-08-06 16:38:15 [20150806-163811-619a] [ERROR] Train Caffe Model: Check failed: outer_num_ * inner_num_ == bottom[1]->count() (16 vs. 32) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.

Likely due to test_lmdb_creator.py creating only less labels than the output of the softmax layer...

lukeyeager · 2015-08-06T20:50:16Z

I got AlexNet to work on that exact same set of images. But I had to adjust it to fit the specific problem at hand. Here's what I had to change:

Remove the "label" top from the data layers
- Because datum.label is not set
Remove the batch size from the data layers
- Because DIGITS sets a default batch of size 16 for the other data layers
Set the inner_product_param.num_output to 2 for the "fc8" layer
- Because we want the network to output 2 numbers
Change the loss layer to EuclideanLoss
- Because SoftmaxWithLoss doesn't make sense for a regression problem
Set the name of the loss layer to "train_loss"
- So it wouldn't be included in the deploy prototxt
Remove the Accuracy layer
- Because it doesn't make sense for a regression problem

Don't require people to explicitly type train_ in the layer name for these layers

More reliable than reading information from the Job or Task

lukeyeager · 2015-08-07T01:50:11Z

e3354b4 makes steps (5) and (6) above unnecessary.

f3cee35 fixes an issue where DIGITS would crash during inference if the crop size was set.

y22ma · 2015-08-07T02:21:58Z

@lukeyeager unfortunately I can't reproduce your results after following through your instructions. Here's my train_val.prototxt

layer {
  name: "data"
  type: "Data"
  top: "data"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "/home/yanma/dataset/test/train_mean.binaryproto"
  }
  data_param {
    source: "/home/yanma/dataset/test/train_db"
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
  data_param {
    source: "examples/imagenet/ilsvrc12_val_lmdb"
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "fc8"
  top: "loss"
}

And here's the error:

Setting up relu7
Top shape: 16 4096 (65536)
Creating layer drop7
Creating Layer drop7
drop7 <- fc7
drop7 -> fc7 (in-place)
Setting up drop7
Top shape: 16 4096 (65536)
Creating layer fc8
Creating Layer fc8
fc8 <- fc7
fc8 -> fc8
Setting up fc8
Top shape: 16 2 (32)
Creating layer loss
Creating Layer loss
loss <- fc8
loss -> loss
Setting up loss
Check failed: ExactNumBottomBlobs() == bottom.size() (2 vs. 1) EuclideanLoss Layer takes 2 bottom blob(s) as input.

It seems that the output configuration on fc8 is not taking effect? I'm on commit f3cee35. Note that I'm using the NVIDIA fork of Caffe.

lukeyeager · 2015-08-07T16:45:05Z

Check failed: ExactNumBottomBlobs() == bottom.size() (2 vs. 1) EuclideanLoss Layer takes 2 bottom blob(s) as input.

Oh, well you need to create two bottom's for the EuclideanLoss layer like this:

layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

lukeyeager · 2015-08-07T16:58:38Z

And you should remove the transform_param.mean_file and data_param.source values from your prototxt so that DIGITS can overwrite them for you with the correct values from your DIGITS dataset.

lukeyeager · 2015-08-07T17:15:12Z

I'm going to go ahead and merge this PR. For any bugs or requests related to this new set of features, please create new issues or ask for help on the mailing list.

Thanks for the review help, @gheinrich and @y22ma!

Add initial support for generic inference

y22ma · 2015-08-07T18:27:02Z

All good, and thanks for all the tips to get it working. It would be awesome to see example use cases of this features documented on the wiki page, it would definitely help alot of people out.

tmquan · 2015-08-17T02:31:14Z

Hi, @lukeyeager , I looked through this issue #177 and have concerns about the pixel classifier or segmentation task.

We can construct the LMDB with a given folder structure like this:
images/
├── image1.png
├── image1.npy
├── image2.jpg
└── image2.npy

But do we really need to write them as files?
For example I have a big image which is needs to be segmented by the membrane feature.
.
As in Dan Ciresan's method, he extracted a patch or window to train the probability of the center pixel across the image.
In DIGITS, writing them to file seems not to be a good idea as the problem of storage and of reading training data.

Is there any way to construct the LMDB which holds a portion of memory as an training instance? I mean the training stack and training label are already in the memory. While training the model, a patch and corresponding labels will be extracted on the fly.

lukeyeager · 2015-08-17T17:08:55Z

Until we decide on a solution to #197, you'll have to create your LMDB for this task manually anyway. The .png, .npy format you described is just a proposal - it doesn't work yet.

Is there any way to construct the LMDB which holds a portion of memory as an training instance?

Not with LMDB, no. You might be able to do something like that with a MemoryData layer:
http://caffe.berkeleyvision.org/tutorial/layers.html#in-memory

If you get that working and you'd like to use it in DIGITS, please open a separate issue with your request.

y22ma · 2015-08-18T14:56:26Z

@lukeyeager Do you happen to have any reference to create a LMDB for object detection (e.g. R-CNN)? Appreciate any tip you have.

lukeyeager · 2015-08-18T15:56:42Z

I do not, sorry. Your best bet would probably be the Caffe mailing list.

This was referenced Jul 31, 2015

Best way to approach building a pixel classifier #177

Closed

How to do the image matching? #170

Closed

lukeyeager force-pushed the generic-inference branch from c7dd198 to d648434 Compare July 31, 2015 22:52

lukeyeager mentioned this pull request Aug 4, 2015

Refactor test_webapp into several files #192

Merged

lukeyeager force-pushed the generic-inference branch 2 times, most recently from 0a5e505 to 4cbc17a Compare August 6, 2015 00:38

lukeyeager and others added 3 commits August 6, 2015 11:47

Add initial support for generic inference

93ad467

Add tests for generic image datasets and models

92e4183

Allow use of test_lmdb_creator.py in standalone mode

6888dbe

lukeyeager force-pushed the generic-inference branch from 4cbc17a to 6888dbe Compare August 6, 2015 18:49

lukeyeager mentioned this pull request Aug 6, 2015

Choosing data format[s] for creating generic inference datasets #197

Open

lukeyeager added 2 commits August 6, 2015 18:16

Exclude loss/accuracy layers from deploy network

e3354b4

Don't require people to explicitly type train_ in the layer name for these layers

Bugfix - get image dimensions from deploy file

f3cee35

More reliable than reading information from the Job or Task

lukeyeager added a commit that referenced this pull request Aug 7, 2015

Merge pull request #189 from NVIDIA/generic-inference

f42f473

Add initial support for generic inference

lukeyeager merged commit f42f473 into master Aug 7, 2015

lukeyeager deleted the generic-inference branch August 7, 2015 17:15

This was referenced Aug 7, 2015

Add support for multiple and/or floating point labels #97

Closed

Training an autoencoder #117

Closed

lukeyeager added the enhancement label Sep 17, 2015

lukeyeager mentioned this pull request Sep 30, 2015

Automatically prune GoogleNet useless branches in the deploy #335

Closed

Conversation

lukeyeager commented Jul 31, 2015

Remaining limitations

TODO before merging

TODO after merging

Uh oh!

lukeyeager commented Aug 6, 2015

Uh oh!

gheinrich commented Aug 6, 2015

Uh oh!

lukeyeager commented Aug 6, 2015

Uh oh!

y22ma commented Aug 6, 2015

Uh oh!

y22ma commented Aug 6, 2015

Uh oh!

lukeyeager commented Aug 6, 2015

Uh oh!

gheinrich commented Aug 6, 2015

Uh oh!

y22ma commented Aug 6, 2015

Uh oh!

lukeyeager commented Aug 6, 2015

Uh oh!

lukeyeager commented Aug 7, 2015

Uh oh!

y22ma commented Aug 7, 2015

Uh oh!

lukeyeager commented Aug 7, 2015

Uh oh!

lukeyeager commented Aug 7, 2015

Uh oh!

lukeyeager commented Aug 7, 2015

Uh oh!

y22ma commented Aug 7, 2015

Uh oh!

tmquan commented Aug 17, 2015

Uh oh!

lukeyeager commented Aug 17, 2015

Uh oh!

y22ma commented Aug 18, 2015

Uh oh!

lukeyeager commented Aug 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants