Skip to content

Support xor_value in returned strings.#210

Closed
wxsBSD wants to merge 3 commits intoVirusTotal:masterfrom
wxsBSD:xor_value
Closed

Support xor_value in returned strings.#210
wxsBSD wants to merge 3 commits intoVirusTotal:masterfrom
wxsBSD:xor_value

Conversation

@wxsBSD
Copy link
Contributor

@wxsBSD wxsBSD commented Jul 23, 2022

Extend the tuple that represents an instance of a match to include the xor key.
This breaks all existing scripts that are unpacking the tuple, which I'm not
very happy with.

This also updates the submodule to use the latest master so that I can get the
new xor key values.

Also, adds a fix to get yara building here by defining BUCKETS_128 and
CHECKSUM_1B as needed by the new tlsh stuff (discussed with @metthal).

Extend the tuple that represents an instance of a match to include the xor key.
This breaks all existing scripts that are unpacking the tuple, which I'm not
very happy with.

This also updates the submodule to use the latest master so that I can get the
new xor key values.

Also, adds a fix to get yara building here by defining BUCKETS_128 and
CHECKSUM_1B as needed by the new tlsh stuff (discussed with @metthal).
@wxsBSD
Copy link
Contributor Author

wxsBSD commented Jul 23, 2022

I'm not super happy with just extending the tuple here as it will break existing scripts that are unpacking the tuple in assignment. They will have to go from (offset, identifier, data) = ... to (offset, identifier, data, xor) = .... The only scripts that won't break are those that do tup = ...; tup[2] which I don't think would be very common.

Since this is going to break a lot of scripts, I wonder if it makes sense to completely remove the tuple entirely and replace it with an actual object with members instead. Doing so would make it more extensible in the future. I could even support a plaintext method that will take the matched data, apply the xor key and return the plaintext string automatically.

Assuming this PR (or some variant of it) is a good idea I'll update the docs with whatever is decided after it is merged.

Copy link
Contributor

@metthal metthal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I understand keeping the rules backwards compatible at all cost, but I wouldn't be afraid of breaking compatibility with scripts or other libyara based tools if it's properly reflected in the version number.

On that note, I really like your idea of replacing the tuple with an actual object, or at least a named tuple. It would make it much more easier to add new fields in the future without worrying about breaking user scripts.

@wxsBSD
Copy link
Contributor Author

wxsBSD commented Jul 25, 2022

I'm going to update this to use an actual object so it is more extensible in the future.

Add a StringMatch object, which represents a matched string. It has an
identifier member (this is the string identifier, eg: $a) and an instances
member which contains a list of matched string instances.

It also keeps track of the string flags internally but does not expose them
directly as the string flags contain things that are internal to YARA (eg:
STRING_FLAGS_FITS_IN_ATOM). The reason it keeps track of the string modifiers
is so that it can be extended to allow users to take action based upon certain
flags. For example, there is a "is_xor()" member on StringMatch which will
return True if the string is using the xor modifier. This way users can call
another method (discussed below) to get the plaintext string back.

Add a StringMatchInstance object which represents an instance of a matched
string. It contains the offset, matched data and the xor key used to match the
string (this is ALWAYS set, even to 0 if the string is not an xor string).

There is a "plaintext()" method on the StringMatchInstance objects which will
return a new bytes object with the xor key applied. This allows users to do
something like this:

```
print(instance.plaintext() if string.is_xor() else instance.matched_data)
```

Technically, the plaintext() method will return the matched_data if the xor_key
is 0 so they don't need to do the conditional but this allows them a nice way to
know if the xor_key is worth recording along with the plaintext.

I decided not to implement richcompare for these new objects as it isn't
entirely clear what I would want to do the comparison on.
@wxsBSD
Copy link
Contributor Author

wxsBSD commented Jul 29, 2022

The commit I just made gives more detail on the changes. I'd love to hear more about what I should do with richcompare for the new objects. It is unclear to me how I want to compare two strings, or two string instances, so I left them out for now. I can revisit that after some more discussion I think.

@wxsBSD
Copy link
Contributor Author

wxsBSD commented Jul 29, 2022

One more thing, I noticed the tests were always using self.assertTrue() when it is much nicer to use the other assertions. I switched my tests in this PR to use self.assertEqual() (or other things if needed) and will go through all the other tests in a future PR if desired. It makes it much easier to debug failed tests when it tells you the failed values (instead of just "False is not true" messages =b).

Add a "matched_length" member to match instances. This is useful when the
"matched_data" member is a subset of the actually matched data.

Add a test for this that sets the max_match_data config to 2 and then checks to
make sure the "matched_length" and "matched_data" members are correct.
@wxsBSD
Copy link
Contributor Author

wxsBSD commented Aug 22, 2022

I'll update the docs in the main yara repo once this is merged.

@wxsBSD
Copy link
Contributor Author

wxsBSD commented Dec 8, 2022

Closing out as it has been merged into my "next" branch for inclusion.

@wxsBSD wxsBSD closed this Dec 8, 2022
@wxsBSD wxsBSD deleted the xor_value branch December 12, 2022 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants