Skip to content

Consolidate PRs into single branch#219

Merged
plusvic merged 10 commits intoVirusTotal:masterfrom
wxsBSD:next
Dec 12, 2022
Merged

Consolidate PRs into single branch#219
plusvic merged 10 commits intoVirusTotal:masterfrom
wxsBSD:next

Conversation

@wxsBSD
Copy link
Contributor

@wxsBSD wxsBSD commented Dec 8, 2022

This PR consolidates #217 (add a modules list to the yara object) and #210 (support xor_value in returned strings), to make merging them easier. The xor_value work is going to break a lot of existing scripts using yara-python as string matches are no longer returned as a tuple but instead have their own object. I think this is worth doing as it allows for better extensibility in the future.

It also updates the yara submodule to 65feb41d, which is the latest in master as of this writing.

I also fixed up a test that was broken after a change to non-ascii bytes in regex in yara.

wxsBSD added 8 commits July 22, 2022 21:30
Extend the tuple that represents an instance of a match to include the xor key.
This breaks all existing scripts that are unpacking the tuple, which I'm not
very happy with.

This also updates the submodule to use the latest master so that I can get the
new xor key values.

Also, adds a fix to get yara building here by defining BUCKETS_128 and
CHECKSUM_1B as needed by the new tlsh stuff (discussed with @metthal).
Add a StringMatch object, which represents a matched string. It has an
identifier member (this is the string identifier, eg: $a) and an instances
member which contains a list of matched string instances.

It also keeps track of the string flags internally but does not expose them
directly as the string flags contain things that are internal to YARA (eg:
STRING_FLAGS_FITS_IN_ATOM). The reason it keeps track of the string modifiers
is so that it can be extended to allow users to take action based upon certain
flags. For example, there is a "is_xor()" member on StringMatch which will
return True if the string is using the xor modifier. This way users can call
another method (discussed below) to get the plaintext string back.

Add a StringMatchInstance object which represents an instance of a matched
string. It contains the offset, matched data and the xor key used to match the
string (this is ALWAYS set, even to 0 if the string is not an xor string).

There is a "plaintext()" method on the StringMatchInstance objects which will
return a new bytes object with the xor key applied. This allows users to do
something like this:

```
print(instance.plaintext() if string.is_xor() else instance.matched_data)
```

Technically, the plaintext() method will return the matched_data if the xor_key
is 0 so they don't need to do the conditional but this allows them a nice way to
know if the xor_key is worth recording along with the plaintext.

I decided not to implement richcompare for these new objects as it isn't
entirely clear what I would want to do the comparison on.
Add a "matched_length" member to match instances. This is useful when the
"matched_data" member is a subset of the actually matched data.

Add a test for this that sets the max_match_data config to 2 and then checks to
make sure the "matched_length" and "matched_data" members are correct.
Add support for getting the list of available modules. It is available just by
accessing the yara.modules attribute, which contains a list of available
modules.

>>> print('\n'.join(yara.modules))
tests
pe
elf
math
time
console
>>>

Note: This commit also brings in the necessary defines to build the authenticode
parser, which is also done in the xor_value branch. Also, this commit updates
the yara submodule which will likely overwrite the changes done in the xor_value
so I recommend updating the submodule after both are merged.
Copy link
Member

@plusvic plusvic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's failing to pass the tests in Windows. The build fails with ...

c:\projects\yara-python\yara\libyara\tlshc\tlsh_impl.h(61): error C2065: 'TLSH_CHECKSUM_LEN': undeclared identifier
c:\projects\yara-python\yara\libyara\tlshc\tlsh_impl.h(61): error C2057: expected constant expression
c:\projects\yara-python\yara\libyara\tlshc\tlsh_impl.h(62): error C2229: struct '<unnamed-tag>' has an illegal zero-sized array
c:\projects\yara-python\yara\libyara\tlshc\tlsh_impl.h(72): error C2065: 'CODE_SIZE': undeclared identifier
c:\projects\yara-python\yara\libyara\tlshc\tlsh_impl.h(72): error C2057: expected constant expression
c:\projects\yara-python\yara\libyara\tlshc\tlsh_impl.h(81): error C2229: struct 'TlshImpl' has an illegal zero-sized array

The errors don't look related to this PR, it was failing with previous commits.

@wxsBSD
Copy link
Contributor Author

wxsBSD commented Dec 8, 2022

The compiler needs whatever the equivalent of -DBUCKETS_128=1 and -DCHECKSUM_1B=1 is on windows. These are required to work with the TLSH and authenticode parser that Avast contributed.

@plusvic plusvic merged commit 65378d4 into VirusTotal:master Dec 12, 2022
@wxsBSD wxsBSD deleted the next branch December 12, 2022 17:06
plusvic added a commit that referenced this pull request Mar 31, 2023
The previous example was out of data due changes in the API implemented in version 4.3.0 (#219)
plusvic added a commit that referenced this pull request Mar 31, 2023
The previous example was out of data due changes in the API implemented in version 4.3.0 (#219)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants