Add py-rouge implementation in tests and remove py-rouge/setuptools dependencies#3575
Add py-rouge implementation in tests and remove py-rouge/setuptools dependencies#3575vfdev-5 merged 4 commits intopytorch:masterfrom
Conversation
|
@omkar-334 thanks a lot for taking the initiative to fix py-rouge dependency issue, I appreciate that! About bundling the wordnet data, I think we can put it elsewhere, for example, I can create a special repo in the https://github.com/pytorch-ignite org to host the files. |
I realized it was because I didn't install
do you think that doint this would introduce a network dependency? |
We just download these files. I do not want to store them in ignite repo. |
|
sure, you can upload them to a repo and give me the URLs. i can change the current test to download from there and read instead. |
|
For now use the URLs from py-rouge repo and we'll update the urls once we have them somewhere else. I wonder whether we could not find another already existing storage for these files |
|
Done, all tests are passing. |
vfdev-5
left a comment
There was a problem hiding this comment.
Thanks a lot @omkar-334 !

py-rougeis unmaintained (last release 2018) and difficult to use:pkg_resources, which required pinningsetuptools<82To remove this fragile dependency while preserving test behavior, this PR adds the minimal
Rougeimplementation used in tests.Changes
Vendoring
py-rouge'sRougeclass intotests/ignite/metrics/nlp/_pyrouge.pypkg_resources; data files are resolved via__file__py-rouge,setuptools<82, and related TODO fromrequirements-dev.txtTest fixes
worker_idparameter fromdownload_nltk_punktfixture (caused failures withoutpytest-xdist)nltk.download("punkt_tab")to support newer NLTK versionsThere are other implementations like
rouge-scoreby google andtorchmetricsby lighting-AI but I didn't find them suitable.Ignite’s tests rely on corpus-level multi-reference aggregation (summing raw n-gram counts before computing P/R).
rouge-scoreandtorchmetricsinstead average per-reference scores and do not expose raw counts, leading to different results for unequal-length references.All test but one passing
