Skip to content

YaraX Analyzer with Yara-Forge Rule Repository integration#2980

Merged
mlodic merged 5 commits intodevelopfrom
yarax
Nov 5, 2025
Merged

YaraX Analyzer with Yara-Forge Rule Repository integration#2980
mlodic merged 5 commits intodevelopfrom
yarax

Conversation

@spoiicy
Copy link
Member

@spoiicy spoiicy commented Aug 29, 2025

closes #2592 and #2589

Description

This PR aims to add new YaraX analyzer alongwith integration of yara-forge rule repository for enhanced ruleset selection.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist

  • I have read and understood the rules about how to Contribute to this project
  • The pull request is for the branch develop
  • A new plugin (analyzer, connector, visualizer, playbook, pivot or ingestor) was added or changed, in which case:
    • I strictly followed the documentation "How to create a Plugin"
    • Usage file was updated. A link to the PR to the docs repo has been added as a comment here.
    • Advanced-Usage was updated (in case the plugin provides additional optional configuration). A link to the PR to the docs repo has been added as a comment here.
    • I have dumped the configuration from Django Admin using the dumpplugin command and added it in the project as a data migration. ("How to share a plugin with the community")
    • If a File analyzer was added and it supports a mimetype which is not already supported, you added a sample of that type inside the archive test_files.zip and you added the default tests for that mimetype in test_classes.py.
    • If you created a new analyzer and it is free (does not require any API key), please add it in the FREE_TO_USE_ANALYZERS playbook by following this guide.
    • Check if it could make sense to add that analyzer/connector to other freely available playbooks.
    • I have provided the resulting raw JSON of a finished analysis and a screenshot of the results.
    • If the plugin interacts with an external service, I have created an attribute called precisely url that contains this information. This is required for Health Checks (HEAD HTTP requests).
    • If the plugin requires mocked testing, _monkeypatch() was used in its class to apply the necessary decorators.
    • I have added that raw JSON sample to the MockUpResponse of the _monkeypatch() method. This serves us to provide a valid sample for testing.
    • I have created the corresponding DataModel for the new analyzer following the documentation
  • I have inserted the copyright banner at the start of the file: # This file is a part of IntelOwl https://github.com/intelowlproject/IntelOwl # See the file 'LICENSE' for copying permission.
  • Please avoid adding new libraries as requirements whenever it is possible. Use new libraries only if strictly needed to solve the issue you are working for. In case of doubt, ask a maintainer permission to use a specific library.
  • If external libraries/packages with restrictive licenses were added, they were added in the Legal Notice section.
  • Linters (Black, Flake, Isort) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
  • I have added tests for the feature/bug I solved (see tests folder). All the tests (new and old ones) gave 0 errors.
  • If the GUI has been modified:
    • I have a provided a screenshot of the result in the PR.
    • I have created new frontend tests for the new component or updated existing ones.
  • After you had submitted the PR, if DeepSource, Django Doctors or other third-party linters have triggered any alerts during the CI checks, I have solved those alerts.

Important Rules

  • If you miss to compile the Checklist properly, your PR won't be reviewed by the maintainers.
  • Everytime you make changes to the PR and you think the work is done, you should explicitly ask for a review by using GitHub's reviewing system detailed here.

@spoiicy
Copy link
Member Author

spoiicy commented Aug 29, 2025

JSON Result when valid match is found

  "report": [
    {
      "rule_metadata": {
        "id": "09a400f5-e837-58c2-9b51-9213c8ab0883",
        "date": "2024-01-01",
        "hash": "3a9ee09ed965e3aee677043ba42c7fdbece0150ef9d1382c518b4b96bbd0e442",
        "tags": "FILE",
        "score": 50,
        "author": "Jonathan Peters",
        "quality": 80,
        "modified": "2024-01-03",
        "reference": "https://www.gapotchenko.com/eazfuscator.net",
        "logic_hash": "5f3f3358e3cfb274aa2e8465dde58a080f9fb282aa519885b9d39429521db6d9",
        "source_url": "https://github.com/cod3nym/detection-rules//blob/5939dadd34ebd3c111f97ba0bc0085b639e142a5/yara/dotnet/obf_eazfuscator.yar#L1-L28",
        "description": "Detects .NET images obfuscated with Eazfuscator string encryption. Eazfuscator is a widely used commercial obfuscation solution used by both legitimate software and malware.",
        "license_url": "https://github.com/cod3nym/detection-rules//blob/5939dadd34ebd3c111f97ba0bc0085b639e142a5/LICENSE.md"
      },
      "pattern_details": [
        {
          "match_details": [
            {
              "match_length": 10,
              "match_offset": 249641,
              "match_xor_key": null
            }
          ],
          "pattern_identifier": "$sa1"
        },
        {
          "match_details": [
            {
              "match_length": 10,
              "match_offset": 249652,
              "match_xor_key": null
            }
          ],
          "pattern_identifier": "$sa2"
        },
        {
          "match_details": [
            {
              "match_length": 5,
              "match_offset": 271822,
              "match_xor_key": null
            }
          ],
          "pattern_identifier": "$sa3"
        },
        {
          "match_details": [
            {
              "match_length": 4,
              "match_offset": 263900,
              "match_xor_key": null
            }
          ],
          "pattern_identifier": "$sa4"
        },
        {
          "match_details": [
            {
              "match_length": 27,
              "match_offset": 44444,
              "match_xor_key": null
            },
            {
              "match_length": 27,
              "match_offset": 44873,
              "match_xor_key": null
            }
          ],
          "pattern_identifier": "$op1"
        },
        {
          "match_details": [],
          "pattern_identifier": "$op2"
        },
        {
          "match_details": [
            {
              "match_length": 9,
              "match_offset": 43142,
              "match_xor_key": null
            },
            {
              "match_length": 9,
              "match_offset": 43165,
              "match_xor_key": null
            },
            {
              "match_length": 9,
              "match_offset": 43202,
              "match_xor_key": null
            },
            {
              "match_length": 9,
              "match_offset": 43232,
              "match_xor_key": null
            },
            {
              "match_length": 9,
              "match_offset": 43262,
              "match_xor_key": null
            },
            {
              "match_length": 9,
              "match_offset": 43292,
              "match_xor_key": null
            },
            {
              "match_length": 9,
              "match_offset": 43322,
              "match_xor_key": null
            }
          ],
          "pattern_identifier": "$op3"
        },
        {
          "match_details": [],
          "pattern_identifier": "$op4"
        }
      ],
      "rule_identifier": "COD3NYM_SUSP_OBF_NET_Eazfuscator_String_Encryption_Jan24"
    }
  ],
  "data_model": null,
  "errors": [],
  "parameters": {
    "rule_set": "full"
  }
}

Copy link
Member

@mlodic mlodic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) good job, few things to address

"name": "YaraX",
"description": "[YaraX](https://virustotal.github.io/yara-x/docs/intro/getting-started/) is a re-incarnation of YARA, a pattern matching tool designed with malware researchers in mind. This new incarnation intends to be faster, safer and more user-friendly than its predecessor.",
"disabled": False,
"soft_time_limit": 60,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the performance, it should be fast but we could put this to a higher value to be cautious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, for the most part of my testing, the analyzer finished in around 30 seconds. But sure I can raise this to a higher value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah please do that if you can

return True

except Exception as e:
logger.error(f"Failed to update yara-forge rules. Error: {e}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.exception so that we can have the traceback. Also please add message this to self.report as suggested in the other PRs. (the addition in self.report is automatically handled by AnalyzerRunException. Another option could be to just raise the AnalyzerRunException directly here.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


rule_dir = f"{BASE_RULES_LOCATION}/{self.rule_set}"
if not os.path.isdir(rule_dir) and not self.update(rule_set=self.rule_set):
logger.info(f"Failed to update {self.rule_set} rule set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logger.info is not necessary because the AnalyzerRunException will already trigger a log.error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it.

)

rule_dir = f"{BASE_RULES_LOCATION}/{self.rule_set}"
if not os.path.isdir(rule_dir) and not self.update(rule_set=self.rule_set):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to know whether these rules have a new version or not like for yara or is not possible?

In case it is not possible, it would be nice to add an additional parameter called "force packages download" or something similar to force the download of these packages even if they are already present. Otherwise there is no chance to get the new content. This parameter would be false as default. Thoughts?

Copy link
Member Author

@spoiicy spoiicy Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done by creating an entry in a simple model to track the last_downloaded_version and then when in future the rules are supposed to be updated, it can be cross-checked with the db entry.

If you visit this URL, you can see the tag_name , using which we can achieve this.

Let me know what are your thoughts on this implementation? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good proposal, feel free to do that thanks ;)

rules = compiler.build()
logger.info("Successfully compiled and built rules")

logger.info(f"Starting scanning {self.filename} with {self.rule_set} rules")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you log the self.md5 too please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


logger.info(f"Successfully scanned {self.filename} with hash {self.md5}")

return "No Match" if not result else result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would always return the same type and, to be more specific, a JSON, You can create a dict with a single key called "results" and in case there are no matches it would be enough to have an empty list

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely create a dict with key "results" but I would like to return the following

{"results": "No Match"}

as the result, since this would explicitly inform the user that there were no matches instead of passing an empty list, which can be vague.

Copy link
Contributor

@fgibertoni fgibertoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments from me also, nice work overall!

for rule in scan_results.matching_rules:
logger.info(f"Rule Identifier: {rule.identifier}")
rule_metadata = {}
for detail in rule.metadata:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unpacking them in the for loop to improve readability?

Suggested change
for detail in rule.metadata:
for identifier, value in rule.metadata:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, if rule.metadata is a list of tuple with two elements each you can remove the for loop by calling dict() directly:

>>> test = [("value", "data"), ("value2", "data2")]
>>> dict(test)
{'value': 'data', 'value2': 'data2'}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll do it this way.

@spoiicy spoiicy marked this pull request as ready for review September 7, 2025 13:44
Copy link
Contributor

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some food for thought. View full project report here.

@spoiicy spoiicy requested review from fgibertoni and mlodic September 7, 2025 13:45
Copy link
Contributor

@fgibertoni fgibertoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! I'll let also @mlodic and @drosetti have a look at this before merging 😄

Copy link
Member

@mlodic mlodic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is some shared code with the Capa PR. It could make sense to to merge that one first and then import then changes here and re-use the same functions here (with a Mixing as helper). Other than that, changes are fine, good job!

@spoiicy
Copy link
Member Author

spoiicy commented Sep 15, 2025

there is some shared code with the Capa PRs. It could make sense to to merge that one first and then import then changes here and re-use the same functions here (with a Mixing as helper). Other than that, changes are fine, good job!

Sure this can definitely be done. I'll create the helper methods and make adequate changes to the existing code, so that we can have more modular code.

@github-actions
Copy link

This pull request has been marked as stale because it has had no activity for 10 days. If you are still working on this, please provide some updates or it will be closed in 5 days.

@github-actions github-actions bot added the stale label Sep 26, 2025
@fgibertoni fgibertoni added keep-open To avoid workflow closing PRs and removed stale labels Sep 29, 2025
Copy link
Contributor

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth considering. View full project report here.

rules_file_path = self.get_rule_location()
logger.info(f"Found rules at {rules_file_path}")

with open(rules_file_path, mode="r") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnicodeDecodeError can occur if the content of the file has characters incompatible with the OS's default encoding. Python uses the OS's default text encoding on the content because encoding is not set. Read more.

@spoiicy spoiicy requested a review from fgibertoni November 3, 2025 13:13
@mlodic mlodic self-requested a review November 4, 2025 14:49
@mlodic mlodic merged commit eb813e3 into develop Nov 5, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

keep-open To avoid workflow closing PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants