fix the TypeAssigner/TypeResolver#869
Merged
swissiety merged 27 commits intosoot-oss:developfrom Feb 27, 2024
Merged
Conversation
d8f0673 to
cd247e7
Compare
streams > loops
Since locals with the same name can't be differentiated in Jimple, they need to be treated the same, even when they are actually different locals. This is currently only possible because the bytecode to Jimple conversion tries to use the provided debug names of locals, but different locals (meaning local ids) might have the same name.
The `TypeAssigner`/`TypeResolver` previously wouldn't function correctly when a variable got assigned both primitive and non-primitive values. The top type fixes this. The top type is a superclass of all other types. This is similar to `java.lang.Object` but also includes primitive types. This type can't exist in Java source code, but it can implicitly exist in bytecode. This happens when the compiler re-uses local variables with the same id, but different types. If you see this type when you didn't expect it, you probably need to **turn on the `LocalSplitter`**. The `LocalSplitter` will remove all situations where a `TopType` could be created by the `TypeAssigner` (at least when the bytecode has been generated from Java source code).
Consider the statement `l1 = l2[l3]`, which contains the expression `l2[l3]`. When `l2` has a non-array type, it can't be known what the type of the array ref expression is. Because the result of the array access could be an object or a primitive, the top type has to be chosen here.
The `TypeResolver` needs this because it might create `TopType` array types.
…lue in one statement
`[int[]][0] = [char]` would previously try to find the least common ancestor between `int[]` and `char[]` which doesn't exist (or is just `TopType`). Now it correctly finds the common ancestor between `int` and `char`, which is `int`, and then turns that into an array type to arrive at `int[]`.
Correctly handling arrays that are `null` or get `null` assigned to an index.
`Integer1` can be used where a `boolean` is expected.
`replaceLocal` is really slow (especially when there are a lot of statements), but we can just replace all locals manually, because the `TypeResolver` already updates all relevant statements.
The old implementation was unnecessarily complicated and also produced a suboptimal number of casts in some cases.
How this has ever worked, I truly don't know.
`short` is an ancestor of `byte`. This is also seen in Figure 8 of the paper.
Can't upgrade the types of primitive arrays when they get assigned to, because primitive arrays don't form a type hierarchy (e.g. `byte[]` can't be assigned to `int[]` even though `byte` can be assigned to `int`).
- restrictions on the base type of an array now work correctly for all right-hand sides and not just locals (e.g. `array[index] = constant` will properly restrict the type of the array to something that can be assigned the type of the constant) - using `TypePromotionVisitor` multiple times works correctly now (`typingChanged` wasn't getting reset); not sure how this ever worked - the type promoter is now run before the cast counter and the underspecified type conversion; the type promoter now also promotes type restrictions that don't appear as assignments and were therefore ignored before; e.g. `if l0 == 0.0` requires `l0` to be a float/double even when `l0` never directly gets a float/double assigned to it; this should only be relevant without the `LocalSplitter` and for non-Java bytecode
aab00de to
9d13b1b
Compare
These could fail because of non-deterministic handling in the order of statements
1dd55d0 to
6c32446
Compare
swissiety
approved these changes
Feb 27, 2024
| @@ -99,7 +106,8 @@ public boolean isAncestor(@Nonnull Type ancestor, @Nonnull Type child) { | |||
| // TODO: [ms] check: the dimension condition check as it seems weird? | |||
Collaborator
There was a problem hiding this comment.
as you are currently in the topic - does this check make sense i.e. can we remove my comment?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a
TopTypethat gets used when no possible Java type can be determined for a variable. This generally happens when local variables get re-used by the compiler, which means this shouldn't happen when using theLocalSplitter1.E.g. a variable gets both
java.lang.Stringandbytevalues assigned to it; orintandboolean. In these cases there is no common type that the variable could be typed with (including casts), so it results inTopType.It is also possible to get a
TopType[]when a local with incompatible array types overlap. For that to work, I had to allow all types as base types forArrayTypebecause theTypeResolvermight need to create aTopTypearray type.Most of the issues with the
TypeResolverwere related to primitive types, augmented integer types, and arrays.In bytecode
boolean,byte,short,char, andintaren't different types. Figuring out correct and tight types of variables that fall in this category requires keeping track of the value ranges that get assigned to the variables. This caused a bunch of issues, since the augmented integers types that are used to track these value ranges weren't always handled correctly. Particularly with variables with abooleantype caused problems, since they aren't assignment-compatible with the rest of the small integer types.Another issue that combines primitives and arrays was that you can't upgrade the types of primitive arrays when they get assigned to. For normal reference types a
java.lang.String[]can be assigned into ajava.lang.Object[]becausejava.lang.Stringget be assigned tojava.lang.Object, but primitive arrays don't form a type hierarchy, e.g.byte[]can't be assigned toint[]even thoughbytecan be assigned toint.Here's an example that requires
TopType[]:What's the type of
null[index]?The following example can actually occur in real code because control flow might change the runtime order of the instructions compared to the ordering in bytecode
The type of
null[index]is nowBottomType(other types are not possible because the array might be an array of primitives or of reference types).When a type never gets a non-null value assigned to it, it will now get upgraded to
java.lang.Objectafter all other assignment constraints are satisfied. This is the most conservative choice. A possible future enhancement would be to look at the usage of the variable instead of just assignments to it to get a more accurate typing.Interestingly, this choice means because both these examples can't be differentiated in bytecode:
ain the second example can't easily be typed asObject[], since that would be incompatible with the first example.In these cases the
TypeResolverwill type bothaandbasObjectand insert a cast toObject[]before the array access.Possible future work:
[0..127] + 1currently results inintbut it could just upgrade to[0..32767])Fixes #853
Footnotes
It turns out there are actually parts of the Java Class Library that contain bytecode which has no valid typing. One example is
<java.lang.invoke.DirectMethodHandle$Holder: int getBooleanVolatile(java.lang.Object,java.lang.Object)>which returns abooleanas anint. This is possible in bytecode sincebooleans are just numbers there, but these types are incompatible in the Java type hierarchy. Other bytecode that was not generated from Java might have these issues too. ↩