Support xor_value in returned strings.#210
Conversation
Extend the tuple that represents an instance of a match to include the xor key. This breaks all existing scripts that are unpacking the tuple, which I'm not very happy with. This also updates the submodule to use the latest master so that I can get the new xor key values. Also, adds a fix to get yara building here by defining BUCKETS_128 and CHECKSUM_1B as needed by the new tlsh stuff (discussed with @metthal).
|
I'm not super happy with just extending the tuple here as it will break existing scripts that are unpacking the tuple in assignment. They will have to go from Since this is going to break a lot of scripts, I wonder if it makes sense to completely remove the tuple entirely and replace it with an actual object with members instead. Doing so would make it more extensible in the future. I could even support a Assuming this PR (or some variant of it) is a good idea I'll update the docs with whatever is decided after it is merged. |
metthal
left a comment
There was a problem hiding this comment.
Personally, I understand keeping the rules backwards compatible at all cost, but I wouldn't be afraid of breaking compatibility with scripts or other libyara based tools if it's properly reflected in the version number.
On that note, I really like your idea of replacing the tuple with an actual object, or at least a named tuple. It would make it much more easier to add new fields in the future without worrying about breaking user scripts.
|
I'm going to update this to use an actual object so it is more extensible in the future. |
Add a StringMatch object, which represents a matched string. It has an identifier member (this is the string identifier, eg: $a) and an instances member which contains a list of matched string instances. It also keeps track of the string flags internally but does not expose them directly as the string flags contain things that are internal to YARA (eg: STRING_FLAGS_FITS_IN_ATOM). The reason it keeps track of the string modifiers is so that it can be extended to allow users to take action based upon certain flags. For example, there is a "is_xor()" member on StringMatch which will return True if the string is using the xor modifier. This way users can call another method (discussed below) to get the plaintext string back. Add a StringMatchInstance object which represents an instance of a matched string. It contains the offset, matched data and the xor key used to match the string (this is ALWAYS set, even to 0 if the string is not an xor string). There is a "plaintext()" method on the StringMatchInstance objects which will return a new bytes object with the xor key applied. This allows users to do something like this: ``` print(instance.plaintext() if string.is_xor() else instance.matched_data) ``` Technically, the plaintext() method will return the matched_data if the xor_key is 0 so they don't need to do the conditional but this allows them a nice way to know if the xor_key is worth recording along with the plaintext. I decided not to implement richcompare for these new objects as it isn't entirely clear what I would want to do the comparison on.
|
The commit I just made gives more detail on the changes. I'd love to hear more about what I should do with richcompare for the new objects. It is unclear to me how I want to compare two strings, or two string instances, so I left them out for now. I can revisit that after some more discussion I think. |
|
One more thing, I noticed the tests were always using |
Add a "matched_length" member to match instances. This is useful when the "matched_data" member is a subset of the actually matched data. Add a test for this that sets the max_match_data config to 2 and then checks to make sure the "matched_length" and "matched_data" members are correct.
|
I'll update the docs in the main yara repo once this is merged. |
|
Closing out as it has been merged into my "next" branch for inclusion. |
Extend the tuple that represents an instance of a match to include the xor key.
This breaks all existing scripts that are unpacking the tuple, which I'm not
very happy with.
This also updates the submodule to use the latest master so that I can get the
new xor key values.
Also, adds a fix to get yara building here by defining BUCKETS_128 and
CHECKSUM_1B as needed by the new tlsh stuff (discussed with @metthal).