-
-
Notifications
You must be signed in to change notification settings - Fork 708
Description
Short Description
In many license expression rules, certain words or phrases carry a much higher significance than others. For example, This software is distributed under a seems a lot less important than MIT License for a rule that describes an MIT license notice. I think it would be valuable if these "defining words" could optionally be marked and, if declared, would prevent matches from happening unless those defining phrases are present in the matched text.
Possible Labels
- new feature
- license scan
Select Category
- Enhancement
- Add License/Copyright
- Scan Feature
- Packaging
- Documentation
- Expand Support
- Other
Describe the Update
This is a feature proposal which I haven't been able to find any discussions of when searching through the issues (it's quite possible that I have just missed it!). Anyhow, I think it would be really valueable if it was possible in a license expression rule to mark certain "defining words" (or phrases) as required to be present in the scanned text in order for a match to be reported.
As an example, I've seen quite a lot of false positives that I believe could be eliminated if these crucial phrases were enforced. For example, the following text
## License
This SDK is distributed under the Apache License, Version 2.0, see LICENSE for more information.
was matched by mit_923.RULE:
License
Distributed under the MIT License. See LICENSE for more information.
even though it's pretty clear to a human reader that there are certain crucial aspects of this rule that doesn't match. That is to say, some words in the rule are more significant in others, and are essentially defining for the license expression. In this case, of course it is MIT that would be that defining phrase.
As can be clearly seen from a scan match:
"matched_rule": {
"identifier": "mit_923.RULE",
..
},
"matched_text": "License\n\n[This] [SDK] [is] distributed under the [Apache] License, [Version] [2].[0], see LICENSE for more information."
the defining aspect of the license expression (MIT) is missing from the match.
Now, I think it would be really cool and, I imagine, would reduce false positive by quite a significant number, if it was possible to mark these defining words/phrases (mandatory words, keywords, or whatever one migth want to call them) in a rule.
How This Feature will help you/your organization
I believe it would reduce the number of false positives quite substantially although that remains to be seen.
Possible Solution/Implementation Details
Although I'm unsure of the feasibility of implementing such a solution, it would be nice if it was possible to highlight these words directly in the license RULE file, for example surrounded by some markup:
License
Distributed under the {{MIT}} License. See LICENSE for more information.
The semantics of this would be that the rule would never match (irrespective of score) if MIT wasn't present in the matched text.
I suppose an alternative realization, although not as appealing, would be to include these mandatory keywords/phrases as an attribute in the .yml file:
license_expression: mit
is_license_notice: yes
relevance: 100
referenced_filenames:
- LICENSE
key_phrases:
- MITor something to that effect.
Example/Links if Any
Can you help with this Feature
I have spent zero time in the codebase, but if given a proper introduction I suppose I might be able to help out.