Skip to content

Improve copyrights detection#3752

Merged
AyanSinhaMahapatra merged 29 commits intodevelopfrom
misc-copyrights
Jun 26, 2024
Merged

Improve copyrights detection#3752
AyanSinhaMahapatra merged 29 commits intodevelopfrom
misc-copyrights

Conversation

@pombredanne
Copy link
Member

@pombredanne pombredanne commented Apr 26, 2024

This PR improves copyright detection

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Reported-by: Anton Augsburg @vw-anton
Reference: #3655
Signed-off-by: Philippe Ombredanne <[email protected]>
Reported-by:  Dimitris Iliou @dimitris-iliou
Reference: #3735
Signed-off-by: Philippe Ombredanne <[email protected]>
Spotted in some common python libraries such as numpy and scipy

Signed-off-by: Philippe Ombredanne <[email protected]>
Use an input file where each line is either:
- a URL to fetch
- a text to test

Then generate a test data files pair accordingly

Signed-off-by: Philippe Ombredanne <[email protected]>
- Start detecting "is held by"
- Do not include some trailing junk

Signed-off-by: Philippe Ombredanne <[email protected]>
Reference: #3764
Reported-by: Anton Augsburg @vw-anton
Signed-off-by: Philippe Ombredanne <[email protected]>
Make detection of copyright with a single lowercase name more specific

Reference: #3764
Reported-by: Anton Augsburg @vw-anton
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
This makes copyright detection more specific

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
Also improve NOTICEs, and other misc. variants
Don not detect "The Initial Developer"

Signed-off-by: Philippe Ombredanne <[email protected]>
Reference: #3797
Reported-by: Jörg Arndt @Joerki
Signed-off-by: Philippe Ombredanne <[email protected]>
Handle corner cases with markup
Detect new copyright forms.

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* Handle better various parens, markup and quotes

Signed-off-by: Philippe Ombredanne <[email protected]>
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pombredanne we need to fix the test failures here and after regenerating it seems to me like some of these are regressions potentially, we need more review of these failures.

Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne pombredanne changed the title Apply small copyrights detection improvements Improve copyrights detection Jun 22, 2024
Signed-off-by: Philippe Ombredanne <[email protected]>
@pombredanne
Copy link
Member Author

@AyanSinhaMahapatra ready for your review, all greeen

Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I have a couple small questions and fixes here.

Signed-off-by: Philippe Ombredanne <[email protected]>

Co-authored-by: Ayan Sinha Mahapatra <[email protected]>
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks++ @pombredanne This improves copyright detection a lot!
Merging!

@AyanSinhaMahapatra AyanSinhaMahapatra merged commit 1242518 into develop Jun 26, 2024
@AyanSinhaMahapatra AyanSinhaMahapatra deleted the misc-copyrights branch June 26, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants