Make force alignment accessible from pocketsphinx_batch and the ps_decoder API#144
Make force alignment accessible from pocketsphinx_batch and the ps_decoder API#144nshmyrev merged 1 commit intocmusphinx:masterfrom dhdaines:dev/force_align_api
Conversation
…imal command line interface in pocketsphinx_batch which allows you to do force alignment of transcripts. Test it out on test/data/librivox, also there is a unit test.
|
Welcome back, David! |
|
Hi! Thanks!
…On Sun, Sep 9, 2018 at 3:02 AM, Nickolay V. Shmyrev < ***@***.***> wrote:
Welcome back, David!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#144 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADK8UEyNbsEaTk1cfMwdL2o0S0tVyp33ks5uZL0EgaJpZM4WgHyT>
.
|
|
Hi! I realized that this branch doesn't exactly do what the user would expect for force alignment. The issue is that state_align search wasn't actually designed to do force alignment - it really just aligns a state sequence to a feature sequence. The reason why I wrote it in the first place was for two purposes:
So, there are some things that force alignment should do which it doesn't do, specifically:
On the other hand, I have gotten very good results by using FSG search for force alignment at the word level - this is because it is already equipped to do the stuff mentioned above. The other issue is that VAD and noise removal must be turned off for force alignment, because otherwise the output timestamps won't necessarily correspond to the input. Since the state alignment search isn't useful on its own I would like to switch the meaning of the "alignment" interface in pocketsphinx_batch to do traditional force alignment with FSG search. In addition I would probably add something to either force noise removal and VAD off if -adcin is enabled, because this behaviour is very unexpected to the user in this case (even if it is super useful for ASR). |
This provides a simple (maybe too simple) API for doing force alignment as well as a command-line interface for it via pocketsphinx_batch. This works like any other kind of search, you do:
In pocketsphinx_batch there is -alignctl, -aligndir and -alignext, these point you to a control file with transcription files (one file per utterance), the directory and file extensions.
The transcription is expected to be whitespace-separated tokens. It will add the
<s>and</s>tokens for you, which may or may not be the right thing to do (perhaps we should just add them if they aren't present).This will only do word alignments even though it is capable of doing more than that, because that's all the ps_seg_iter interface allows. We should probably fix that. In the near term I will add output to TextGrid files to the batch interface so we can get the phone segmentation that way, and also be drop-in compatible with the Montreal Force Aligner.