Skip to content

fix the TypeAssigner/TypeResolver#869

Merged
swissiety merged 27 commits intosoot-oss:developfrom
Timbals:fix/type-assigner
Feb 27, 2024
Merged

fix the TypeAssigner/TypeResolver#869
swissiety merged 27 commits intosoot-oss:developfrom
Timbals:fix/type-assigner

Conversation

@Timbals
Copy link
Copy Markdown
Contributor

@Timbals Timbals commented Feb 22, 2024

This PR introduces a TopType that gets used when no possible Java type can be determined for a variable. This generally happens when local variables get re-used by the compiler, which means this shouldn't happen when using the LocalSplitter1.

E.g. a variable gets both java.lang.String and byte values assigned to it; or int and boolean. In these cases there is no common type that the variable could be typed with (including casts), so it results in TopType.

It is also possible to get a TopType[] when a local with incompatible array types overlap. For that to work, I had to allow all types as base types for ArrayType because the TypeResolver might need to create a TopType array type.

Most of the issues with the TypeResolver were related to primitive types, augmented integer types, and arrays.

In bytecode boolean, byte, short, char, and int aren't different types. Figuring out correct and tight types of variables that fall in this category requires keeping track of the value ranges that get assigned to the variables. This caused a bunch of issues, since the augmented integers types that are used to track these value ranges weren't always handled correctly. Particularly with variables with a boolean type caused problems, since they aren't assignment-compatible with the rest of the small integer types.

Another issue that combines primitives and arrays was that you can't upgrade the types of primitive arrays when they get assigned to. For normal reference types a java.lang.String[] can be assigned into a java.lang.Object[] because java.lang.String get be assigned to java.lang.Object, but primitive arrays don't form a type hierarchy, e.g. byte[] can't be assigned to int[] even though byte can be assigned to int.


Here's an example that requires TopType[]:

local = null;
local[0] = "";
local = null;
local[0] = 0;

What's the type of null[index]?
The following example can actually occur in real code because control flow might change the runtime order of the instructions compared to the ordering in bytecode

int[] a = null;
int b = a[0];
a = new int[1];

The type of null[index] is now BottomType (other types are not possible because the array might be an array of primitives or of reference types).

When a type never gets a non-null value assigned to it, it will now get upgraded to java.lang.Object after all other assignment constraints are satisfied. This is the most conservative choice. A possible future enhancement would be to look at the usage of the variable instead of just assignments to it to get a more accurate typing.

Interestingly, this choice means because both these examples can't be differentiated in bytecode:

int[] a = null
int b = a[0]
Object[] a = null
Object b = a[0]

a in the second example can't easily be typed as Object[], since that would be incompatible with the first example.
In these cases the TypeResolver will type both a and b as Object and insert a cast to Object[] before the array access.

Possible future work:

  • the biggest enhancement would be to take into account not only assignments but also the uses of variables as constraints. This would enhance some edge cases with arrays. The original paper only has a small section on arrays that describes a very crude method of dealing with arrays that would result in very loosely typed arrays. The current implementation is already significantly better than the paper, but could still do better.
  • arithmetic operations with augmented integer types don't result in the tighest possible types ([0..127] + 1 currently results in int but it could just upgrade to [0..32767])
  • performance: most of the time is spent querying the type hierarchy; this can probably be significantly improved with caching and/or a custom data structure

Fixes #853

Footnotes

  1. It turns out there are actually parts of the Java Class Library that contain bytecode which has no valid typing. One example is <java.lang.invoke.DirectMethodHandle$Holder: int getBooleanVolatile(java.lang.Object,java.lang.Object)> which returns a boolean as an int. This is possible in bytecode since booleans are just numbers there, but these types are incompatible in the Java type hierarchy. Other bytecode that was not generated from Java might have these issues too.

Since locals with the same name can't be differentiated in Jimple, they need to be treated the same, even when they are actually different locals.

This is currently only possible because the bytecode to Jimple conversion tries to use the provided debug names of locals, but different locals (meaning local ids) might have the same name.
The `TypeAssigner`/`TypeResolver` previously wouldn't function correctly when a variable got assigned both primitive and non-primitive values. The top type fixes this.

The top type is a superclass of all other types. This is similar to `java.lang.Object` but
also includes primitive types.

This type can't exist in Java source code, but it can implicitly exist in bytecode. This happens
when the compiler re-uses local variables with the same id, but different types.

If you see this type when you didn't expect it, you probably need to **turn on the `LocalSplitter`**. The `LocalSplitter` will remove all situations where a `TopType` could be created by
the `TypeAssigner` (at least when the bytecode has been generated from Java source code).
Consider the statement `l1 = l2[l3]`, which contains the expression `l2[l3]`.

When `l2` has a non-array type, it can't be known what the type of the array ref expression is. Because the result of the array access could be an object or a primitive, the top type has to be chosen here.
The `TypeResolver` needs this because it might create `TopType` array types.
`[int[]][0] = [char]` would previously try to find the least common ancestor between `int[]` and `char[]` which doesn't exist (or is just `TopType`). Now it correctly finds the common ancestor between `int` and `char`, which is `int`, and then turns that into an array type to arrive at `int[]`.
Correctly handling arrays that are `null` or get `null` assigned to an index.
`Integer1` can be used where a `boolean` is expected.
`replaceLocal` is really slow (especially when there are a lot of statements), but we can just replace all locals manually, because the `TypeResolver` already updates all relevant statements.
The old implementation was unnecessarily complicated and also produced a suboptimal number of casts in some cases.
How this has ever worked, I truly don't know.
`short` is an ancestor of `byte`.
This is also seen in Figure 8 of the paper.
Can't upgrade the types of primitive arrays when they get assigned to, because primitive arrays don't form a type hierarchy (e.g. `byte[]` can't be assigned to `int[]` even though `byte` can be assigned to `int`).
- restrictions on the base type of an array now work correctly for all right-hand sides and not just locals (e.g. `array[index] = constant` will properly restrict the type of the array to something that can be assigned the type of the constant)
- using `TypePromotionVisitor` multiple times works correctly now (`typingChanged` wasn't getting reset); not sure how this ever worked
- the type promoter is now run before the cast counter and the underspecified type conversion; the type promoter now also promotes type restrictions that don't appear as assignments and were therefore ignored before; e.g. `if l0 == 0.0` requires `l0` to be a float/double even when `l0` never directly gets a float/double assigned to it; this should only be relevant without the `LocalSplitter` and for non-Java bytecode
@Timbals Timbals marked this pull request as ready for review February 23, 2024 16:33
These could fail because of non-deterministic handling in the order of statements
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 70.65868% with 49 lines in your changes are missing coverage. Please review.

Project coverage is 68.26%. Comparing base (87d4748) to head (6c32446).

Files Patch % Lines
...tecode/interceptors/typeresolving/TypeChecker.java 38.09% 6 Missing and 7 partials ⚠️
...interceptors/typeresolving/PrimitiveHierarchy.java 46.66% 4 Missing and 4 partials ⚠️
...terceptors/typeresolving/TypePromotionVisitor.java 42.85% 5 Missing and 3 partials ⚠️
.../interceptors/typeresolving/BytecodeHierarchy.java 61.53% 2 Missing and 3 partials ⚠️
...ecode/interceptors/typeresolving/TypeResolver.java 90.56% 3 Missing and 2 partials ⚠️
...tecode/interceptors/typeresolving/CastCounter.java 87.87% 1 Missing and 3 partials ⚠️
...code/interceptors/typeresolving/types/TopType.java 40.00% 3 Missing ⚠️
...tup/core/jimple/visitor/ReplaceUseStmtVisitor.java 71.42% 2 Missing ⚠️
...va/bytecode/interceptors/typeresolving/Typing.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop     #869      +/-   ##
=============================================
+ Coverage      67.91%   68.26%   +0.34%     
- Complexity      3848     3877      +29     
=============================================
  Files            310      311       +1     
  Lines          15225    15186      -39     
  Branches        2590     2587       -3     
=============================================
+ Hits           10340    10366      +26     
+ Misses          3965     3911      -54     
+ Partials         920      909      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Collaborator

@swissiety swissiety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 excellent job!
Issue for fixing labels -> branching/trap+info in combination with insertBefore(...) is created #873.

@@ -99,7 +106,8 @@ public boolean isAncestor(@Nonnull Type ancestor, @Nonnull Type child) {
// TODO: [ms] check: the dimension condition check as it seems weird?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you are currently in the topic - does this check make sense i.e. can we remove my comment?

@swissiety swissiety merged commit d645ed6 into soot-oss:develop Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix TypeAssigner

2 participants