Optim API: per-layer learning rates etc.

Right now, apart from figuring out API changes around freezing parts of the graph, another problem is:
- specifying per-layer learning rates optionally.

This is a huge pain point in Torch, and is actually a common use case for many.
Cover this use-case properly and provide an example.