I'm eager to integrate SpaCy 3 into our recipes. Any estimate of when we might be support (even beta support) in for SpaCy 3 in Prodigy?
Ha, I was actually going to open a thread about this today, but you beat me to it For context and completeness, here's an overview of what's coming in spaCy v3:
What already works with Prodigy
You can already train spaCy v3 models using annotations collected in Prodigy by exporting them with data-to-spacy
and then running spacy convert
to convert the corpus to the new and compact binary .spacy
.
What's coming for Prodigy nightly
There are only a handful of internals (like imports, calls to add_pipe
) that have to be adjusted to make Prodigy run with spaCy v3. I've already been working on this in the background and we'll probably have a beta program for users who are interested in testing it (no ETA yet, though, depends on how things go). The trickier parts are the active learning annotation models and updating from binary annotations in the loop so that might take a bit more work and we might not have that ready for the nightly.
Cool stuff that will be possible in the future
- Easily use transformer models in the loop during annotation (also to semi-automatically create a dataset that you can then use to train a more lightweight downstream model).
- Integrate Prodigy into end-to-end workflows using spaCy projects with tracked changes – for example, you could have a step
spacy project run annotate
that updates your corpus, and then re-runspacy project run train
if the data has changed, package your model, deploy it, visualize it, whatever. - Prodigy can expose custom data readers that load and convert annotations from a dataset or an exported JSONL file and you can use them in your
config.cfg
. Ideally, I'd love to deprecate Prodigy'strain
wrapper and just make it very easy to usespacy train
with Prodigy instead. - Support for dependency matcher patterns.
- Possibly a bunch of other things I haven't thought of yet
First of all. Congratulations on the new library - you guys rock!
Are you saying no ETA on the beta program or the final release? I'll happily sign up as beta tester if you need any.
Thanks!
I meant for Prodigy nightly (but obviously the final release as well ) because there are still a few things to do and it's hard to predict how long that takes. But that's good to know, thanks! I'll keep updating this thread.
We'd be interested in trying out any nightly prodigy builds with SpaCy 3.0 support. We have a custom recipe that doesn't do training in the loop but, periodically batch trains and swaps out the model. Potentially that could alleviate some of the concerns with training transformer models in the loop.
Hi. Wondering if there is any update here, given that spaCy 3.0 has now been released?
Now that spaCy v3.0 stable is out, we can build Prodigy against it and get the nightly ready. Keep an eye on this thread for the announcement of the Prodigy nightly program
Update