Skip to content

Conversation

@rlorigro
Copy link
Contributor

This is meant to resolve #18249, where I pointed out a few things that could improve the CTCLoss docs.

My main goal was to clarify:

  • Target sequences are sequences of class indices, excluding the blank index
  • Lengths of target and input are needed for masking unequal length sequences, and do not necessarily = S, which is the length of the longest sequence in the batch.

I thought about Thomas's suggestion to link the distill.pub article, but I'm not sure about it. I think that should be up to y'all to decide.

I have no experience with .rst, so it might not render as expected :)

@t-vi @ezyang

targets: Tensor of size :math:`(N, S)` or `(sum(target_lengths))`.
Targets (cannot be blank). In the second form, the targets are assumed to be concatenated.
targets: Tensor of size :math:`(N, S)` or `(sum(target_lengths))`, where `N = batch size`, and
`S = maximum target length`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the effect of changing the indent here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean on line 1238?

The indent means that the remaining words in that sentence will belong to the header block when rendered. I noticed that some of the other docs don't do this, so I have actually changed it again to something else.

such that :math: `target_n = targets[n,:s_n]` for each target in a batch.
target_lengths: Tuple or tensor of size :math:`(N)`.
Lengths of the targets
Lengths of the targets (may be :math: < S)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should be <= S if targets is a two-dimensional tensor (but with :math:) would be clearer.

>>>
>>> min_target_length = 10
>>> max_target_length = S
>>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that the blank lines are just too much. Also, I wonder, if we need to introduce a second name for S.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redid the naming it in the latest update. Also removed some of the self explanatory comments, and removed some spaces.

@t-vi
Copy link
Collaborator

t-vi commented Mar 25, 2019

Thank you for your patch! It will be nice to have the improved documentation.
I have a few minor comments. - Also did you build the documentation to see how it is rendered after your changes?

targets: Tensor of size :math:`(N, S)` or `(sum(target_lengths))`, where `N = batch size`, and
`S = maximum target length`.
Each element in the target sequence is a class index. Target index cannot be blank (default=0).
In the second form, the targets are assumed to be concatenated.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the targets concatenated with or without the padding to make it a fixed (N, S)?
Maybe it would be useful to explicitly comment on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer is "with padding". I actually still don't understand how the (sum(target_lengths)) form works... it doesn't seem like that actually describes a shape. Can @t-vi comment on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a 1-d shape. (sum(target_lengths),) would probably be more pythonesque.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no I see now. Not sure why I was confused before. The comma is probably not necessary.

@rlorigro
Copy link
Contributor Author

rlorigro commented Mar 26, 2019

Ok I made some changes, hopefully for the better.

Since you asked about rendering, I didn't want to try to build this because of the dependencies and errors I got when I tried, but i did find a cheap renderer online (missing the math plugin) to give a sense of the appearance

image

and the example now looks like this:
image

Copy link
Collaborator

@t-vi t-vi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking the rendering! (I had half-hoped that you'd go through the trouble of installing the doc dependencies and make a habit of improving those things you find imperfect. 😉)

From my point of view, this can go in, unless you want to do something about the less than perfect explanation of concatenated targets.

@rlorigro
Copy link
Contributor Author

No problem. I will make one more edit tomorrow to address the lengths/concatenation, since there is no rush. Also, does the alternative concatenated form not also apply to inputs?

@t-vi
Copy link
Collaborator

t-vi commented Mar 26, 2019

I will make one more edit tomorrow to address the lengths/concatenation, since there is no rush.
Great, thanks!

Also, does the alternative concatenated form not also apply to inputs?
No.

@ezyang
Copy link
Contributor

ezyang commented Mar 26, 2019

OK, I'll wait for the last change before landing this

@rlorigro
Copy link
Contributor Author

rlorigro commented Mar 28, 2019

@ezyang @t-vi

Ok here is the final edit. It is still less than perfect, but a lot less less than perfect than it was before. I decided to just explicitly break out the explanations for both forms of input. I used the term "stacked" vs "concatenated" because it kind of makes sense considering the difference between these 2 operations in torch.

This is how it renders (roughly):
image

Looks like it says there is now a conflict in loss.py. not sure what that is caused by.

@t-vi
Copy link
Collaborator

t-vi commented Mar 28, 2019

Thanks! Two quick comments:

  • In the explanation of target lengths, you write <= T when it should be <= S. I missed that earlier, sorry!
  • I must admit I don't understand the explanation for concatenated lengths. If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor. maybe?

@rlorigro
Copy link
Contributor Author

Ah sorry about that, I will fix it, and use your wording for target_lengths

@rlorigro
Copy link
Contributor Author

@t-vi @ezyang

Documentation should be fixed now

@t-vi
Copy link
Collaborator

t-vi commented Mar 29, 2019

Looks good! Thanks! But there is a merge conflict for the formatting of the reduction strings.

@soumith
Copy link
Contributor

soumith commented Mar 29, 2019

I fixed the merge conflict. Thanks a lot @rlorigro !

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@soumith merged this pull request in 96456bf.

@rlorigro
Copy link
Contributor Author

rlorigro commented Apr 3, 2019

Thanks @soumith for fixing the conflict. Does this page get updated whenever the next version of pytorch is officially released: https://pytorch.org/docs/stable/nn.html#ctcloss ?

@soumith
Copy link
Contributor

soumith commented Apr 3, 2019

@rlorigro
Copy link
Contributor Author

rlorigro commented Apr 3, 2019

Awesome, thanks. Ah it looks like the kerning on the "math" font is making it almost unreadable in some places. That was unfortunately not rendered in the renderer I used for testing.

Also the underscore is being parsed as a subscript in target_lengths

@edchengg
Copy link

edchengg commented May 13, 2019

why is the input size for class is C = 20 which excluding blank? If the network does not predict the blank in the output, why would this loss make sense...

# Initialize random batch of input vectors, for *size = (T,N,C) input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
????

instead of

# Initialize random batch of input vectors, for *size = (T,N,C) input = torch.randn(T, N, C + 1).log_softmax(2).detach().requires_grad_()

@rlorigro
Copy link
Contributor Author

I think you are right, it should actually be C+1 for input and and C for target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CTCLoss documentation is unclear

7 participants