Skip to content
This repository was archived by the owner on Jan 7, 2025. It is now read-only.

Add initial support for generic inference#189

Merged
lukeyeager merged 5 commits intomasterfrom
generic-inference
Aug 7, 2015
Merged

Add initial support for generic inference#189
lukeyeager merged 5 commits intomasterfrom
generic-inference

Conversation

@lukeyeager
Copy link
Copy Markdown
Member

Adds a new type of task to DIGITS - Generic Inference.

Solves #97, #117, #177

DIGITS previously only supported "Image Classification," and made assumptions about the types of networks being used, the format of input data, and the way the output of the model should be interpreted.

The new task is more generalized, so you can do other things like object detection or per-pixel segmentation. The network can have one or more n-dimensional blobs, which DIGITS does not try to interpret in any way. I wrestled with a bunch of different names for this - Regression, General-Purpose Networks, Multi-blob Output Models, Dense Prediction, Other Networks, etc. None was quite accurate and simple enough, so I'm going with Generic Inference.

Remaining limitations

This is much more generic, but DIGITS still puts some restrictions on what you can do with your network:

  1. Only image data is supported
  2. The only supported database format is still LMDB
    • So the only supported data layer type is still Data
  3. Datasets consist of 1 or 2 LMDBs per phase (no more)

TODO before merging

  • Add tests

TODO after merging

  • Create a standard input data format for creating datasets
    • Currently, you have to create the LMDBs yourself [!]

lukeyeager added a commit to lukeyeager/DIGITS that referenced this pull request Aug 4, 2015
This gets rid of the "image_type" notion introduced in 4e48d71 that
required test_all functions with a bunch of yields.

Instead, I'm taking advantage of the fact that nose will run tests only
in classes that start with "Test". This lets me add a whole new set of
tests simply by adding a new class that defines a different value for
IMAGE_CHANNELS or CROP_SIZE, etc.

This will be helpful when creating tests for NVIDIA#189.
@lukeyeager lukeyeager force-pushed the generic-inference branch 2 times, most recently from 0a5e505 to 4cbc17a Compare August 6, 2015 00:38
@lukeyeager
Copy link
Copy Markdown
Member Author

Added tests in 4cbc17a. Can somebody test this and give me some feedback before merging?

Since DIGITS can't create the LMDBs for you yet, you can create a test dataset with the script at digits/dataset/images/generic/test_lmdb_creator.py and you can use the network here in digits/model/images/generic/test_views.py.

@gheinrich
Copy link
Copy Markdown
Contributor

@lukeyeager this looks very promising!

I have had to make minor tweaks into digits/dataset/images/generic/test_lmdb_creator.py to get it to run in standalone mode (see).

In order to create the dataset, is it possible to have the user specify [train|val|test].txt files in the form of:
/path/to/file [y1,...,yn]
It would be nice if DIGITS could create the image and label databases from these files (in theory that would allow the user to use non-image files too).

In the model creation page, can we have the user choose which loss function they want to use?

The 'infer many images' button didn't work for me (it just prints the file names in the .txt file).

It would be nice to have path completion working for all the fields where we expect the user to provide a server path. Perhaps we could create a custom wtforms.PathField that would automatically set everything up to enable autocomplete.

@lukeyeager
Copy link
Copy Markdown
Member Author

The 'infer many images' button didn't work for me (it just prints the file names in the .txt file).

Oh right, thanks. Fixed that.

I have had to make minor tweaks into digits/dataset/images/generic/test_lmdb_creator.py to get it to run in standalone mode (see).

Looks good, thanks.

In order to create the dataset, is it possible to have the user specify [train|val|test].txt files in the form of:
/path/to/file [y1,...,yn]
It would be nice if DIGITS could create the image and label databases from these files (in theory that would allow the user to use non-image files too).

I think this can be a discussion that comes after merging this PR (see "TODO after merging" in my OP). I've branched this out into a new issue for discussion - #197.

In the model creation page, can we have the user choose which loss function they want to use?

What do you mean? They have to choose their loss function manually in their custom prototxt at the moment, since there aren't any standard networks.

It would be nice to have path completion working for all the fields where we expect the user to provide a server path. Perhaps we could create a custom wtforms.PathField that would automatically set everything up to enable autocomplete.

Great idea! I was hoping you would come along and add your autocomplete stuff to these fields as well. Doing it with a custom field sounds like a good idea to me.

@y22ma
Copy link
Copy Markdown

y22ma commented Aug 6, 2015

Not sure if I'm doing this properly at all, but I tried to put in an AlexNet as the deploy.prototxt, and train it against the test data generated via test_lmdb_creator.py via:

./test_lmdb_creator.py -x 256 -y 256 -c 5000 ~/dataset/test

I'm getting the following error:

2015-08-06 15:33:16 [20150806-153315-0597] [ERROR] TypeError: Parameter to MergeFrom() must be instance of same class: expected LayerParameter got NoneType.

Just wondering if I'm doing something obviously wrong?

@y22ma
Copy link
Copy Markdown

y22ma commented Aug 6, 2015

I just realized that AlexNet expects 3 channels input, so the test images I generate is probably not going to work?

@lukeyeager
Copy link
Copy Markdown
Member Author

@y22ma, thanks for the help reviewing this!

I just realized that AlexNet expects 3 channels input, so the test images I generate is probably not going to work?

No, you should be fine. The network is pretty flexible.

TypeError: Parameter to MergeFrom() must be instance of same class: expected LayerParameter got NoneType.

I ran into that error once, but I fixed it here. Apparently the issue has resurfaced somewhere else. I'm looking into this now ...

You can try running digits in debug mode to see if you get any information:

./digits-devserver --debug

@gheinrich
Copy link
Copy Markdown
Contributor

In the model creation page, can we have the user choose which loss function they want to use?

What do you mean? They have to choose their loss function manually in their custom prototxt at the moment, since there aren't any standard networks.

I overlooked the part where the loss function is specified. This looks OK, sorry.

The "infer many" menu is working on your latest commit, thanks. A possible enhancement (I suppose in the context of #197) would be to show the ground truth when specified in the text file.

@y22ma
Copy link
Copy Markdown

y22ma commented Aug 6, 2015

@lukeyeager no problem, really appreciate this functionality coming together.

I think a made a mistake with my previous test by creating only 50 images instead of the intended 5000 images. I suspect that with 50 images, the default batch size are too large. Now I'm running into another issue:

2015-08-06 16:38:15 [20150806-163811-619a] [ERROR] Train Caffe Model: Check failed: outer_num_ * inner_num_ == bottom[1]->count() (16 vs. 32) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.

Likely due to test_lmdb_creator.py creating only less labels than the output of the softmax layer...

@lukeyeager
Copy link
Copy Markdown
Member Author

I got AlexNet to work on that exact same set of images. But I had to adjust it to fit the specific problem at hand. Here's what I had to change:

  1. Remove the "label" top from the data layers
    • Because datum.label is not set
  2. Remove the batch size from the data layers
    • Because DIGITS sets a default batch of size 16 for the other data layers
  3. Set the inner_product_param.num_output to 2 for the "fc8" layer
    • Because we want the network to output 2 numbers
  4. Change the loss layer to EuclideanLoss
    • Because SoftmaxWithLoss doesn't make sense for a regression problem
  5. Set the name of the loss layer to "train_loss"
    • So it wouldn't be included in the deploy prototxt
  6. Remove the Accuracy layer
    • Because it doesn't make sense for a regression problem

Don't require people to explicitly type train_ in the layer name for
these layers
More reliable than reading information from the Job or Task
@lukeyeager
Copy link
Copy Markdown
Member Author

e3354b4 makes steps (5) and (6) above unnecessary.

f3cee35 fixes an issue where DIGITS would crash during inference if the crop size was set.

@y22ma
Copy link
Copy Markdown

y22ma commented Aug 7, 2015

@lukeyeager unfortunately I can't reproduce your results after following through your instructions. Here's my train_val.prototxt

layer {
  name: "data"
  type: "Data"
  top: "data"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "/home/yanma/dataset/test/train_mean.binaryproto"
  }
  data_param {
    source: "/home/yanma/dataset/test/train_db"
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
  data_param {
    source: "examples/imagenet/ilsvrc12_val_lmdb"
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "conv1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "norm1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "fc8"
  top: "loss"
}

And here's the error:

Setting up relu7
Top shape: 16 4096 (65536)
Creating layer drop7
Creating Layer drop7
drop7 <- fc7
drop7 -> fc7 (in-place)
Setting up drop7
Top shape: 16 4096 (65536)
Creating layer fc8
Creating Layer fc8
fc8 <- fc7
fc8 -> fc8
Setting up fc8
Top shape: 16 2 (32)
Creating layer loss
Creating Layer loss
loss <- fc8
loss -> loss
Setting up loss
Check failed: ExactNumBottomBlobs() == bottom.size() (2 vs. 1) EuclideanLoss Layer takes 2 bottom blob(s) as input.

It seems that the output configuration on fc8 is not taking effect? I'm on commit f3cee35. Note that I'm using the NVIDIA fork of Caffe.

@lukeyeager
Copy link
Copy Markdown
Member Author

Check failed: ExactNumBottomBlobs() == bottom.size() (2 vs. 1) EuclideanLoss Layer takes 2 bottom blob(s) as input.

Oh, well you need to create two bottom's for the EuclideanLoss layer like this:

layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

@lukeyeager
Copy link
Copy Markdown
Member Author

And you should remove the transform_param.mean_file and data_param.source values from your prototxt so that DIGITS can overwrite them for you with the correct values from your DIGITS dataset.

@lukeyeager
Copy link
Copy Markdown
Member Author

I'm going to go ahead and merge this PR. For any bugs or requests related to this new set of features, please create new issues or ask for help on the mailing list.

Thanks for the review help, @gheinrich and @y22ma!

lukeyeager added a commit that referenced this pull request Aug 7, 2015
Add initial support for generic inference
@lukeyeager lukeyeager merged commit f42f473 into master Aug 7, 2015
@lukeyeager lukeyeager deleted the generic-inference branch August 7, 2015 17:15
@y22ma
Copy link
Copy Markdown

y22ma commented Aug 7, 2015

All good, and thanks for all the tips to get it working. It would be awesome to see example use cases of this features documented on the wiki page, it would definitely help alot of people out.

@tmquan
Copy link
Copy Markdown

tmquan commented Aug 17, 2015

Hi, @lukeyeager , I looked through this issue #177 and have concerns about the pixel classifier or segmentation task.

We can construct the LMDB with a given folder structure like this:
images/
├── image1.png
├── image1.npy
├── image2.jpg
└── image2.npy

But do we really need to write them as files?
For example I have a big image which is needs to be segmented by the membrane feature.
ISBI 2012.
As in Dan Ciresan's method, he extracted a patch or window to train the probability of the center pixel across the image.
In DIGITS, writing them to file seems not to be a good idea as the problem of storage and of reading training data.

Is there any way to construct the LMDB which holds a portion of memory as an training instance? I mean the training stack and training label are already in the memory. While training the model, a patch and corresponding labels will be extracted on the fly.

@lukeyeager
Copy link
Copy Markdown
Member Author

Until we decide on a solution to #197, you'll have to create your LMDB for this task manually anyway. The .png, .npy format you described is just a proposal - it doesn't work yet.

Is there any way to construct the LMDB which holds a portion of memory as an training instance?

Not with LMDB, no. You might be able to do something like that with a MemoryData layer:
http://caffe.berkeleyvision.org/tutorial/layers.html#in-memory

If you get that working and you'd like to use it in DIGITS, please open a separate issue with your request.

@y22ma
Copy link
Copy Markdown

y22ma commented Aug 18, 2015

@lukeyeager Do you happen to have any reference to create a LMDB for object detection (e.g. R-CNN)? Appreciate any tip you have.

@lukeyeager
Copy link
Copy Markdown
Member Author

I do not, sorry. Your best bet would probably be the Caffe mailing list.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants