Feature: implement single memory object resolution for symbolic addre…#1380
Feature: implement single memory object resolution for symbolic addre…#1380MartinNowack merged 4 commits intoklee:masterfrom
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #1380 +/- ##
==========================================
+ Coverage 70.48% 70.50% +0.02%
==========================================
Files 154 154
Lines 18870 18914 +44
Branches 4424 4440 +16
==========================================
+ Hits 13301 13336 +35
- Misses 3654 3656 +2
- Partials 1915 1922 +7
|
|
Hi, it seems that the tests for MacOs and FreeBSD are failing due to missing support for uclibc and posix-runtime there. I wasn't sure if that's something expected. Is there perhaps a way to skip the test under those OSs? Alternatively I could try to make a test case that doesn't use these features. |
|
@tkuchta Thanks for the PR, it's great to finally see work on this feature! I haven't reviewed the changes (might take a bit, as it's a busy period), but looking quickly over the test you added, I don't think you need uclibc nor the POSIX runtime, do you? |
|
@ccadar yes, indeed, the use of uclibc and posix runtime doesn't seem to be necessary. I have submitted two more commits with test case simplification and one fix (I skipped one solver->setTimeout call by accident). One of the FreeBSD checks is failing but it seems unrelated to my change: the failure is in other test cases and the change is enabled only under a new command line flag. |
8eb5994 to
abca75b
Compare
…sses. This feature implements tracking of and resolution of memory objects in the presence of symbolic addresses. For example, an expression like the following: int x; klee_make_symbolic(&x, sizeof(x), "x"); int* tmp = &b.y[x].z; For a concrete array object "y", which is a member of struct "b", a symbolic offset "x" would normally be resolved to any matching memory object - including the ones outside of the object "b". This behaviour is consistent with symbex approach of exploring all execution paths. However, from the point of view of security testing, we would only be interested to know if we are still in-bounds or there is a buffer overflow. The implemented feature creates and tracks (via the GEP instruction) the mapping between the current symbolic offset and the base object it refers to: in our example we are able to tell that the reference should happen within the object "b" (as the array "y" is inside the same memory blob). As a result, we are able to minimize the symbolic exploration to only two paths: one within the bounds of "b", the other with a buffer overflow bug. The feature is turned on via the single-object-resolution command line flag. A new test case was implemented to illustrate how the feature works.
abca75b to
7f1f6c9
Compare
MartinNowack
left a comment
There was a problem hiding this comment.
@tkuchta Thanks a lot for your PR and sorry for the long delay.
I added several comments that would improve the code.
While I see the positive impact this PR can have, there are still two main shortcomings:
base_addrsis not copied as part of theExecutionState's copy constructor. After each state fork, the entries for the new state are empty and the optimisation cannot be used in the forked state- No entries are removed from the
base_addrs, i.e. if objects are freed essentially adding more memory pressure over time.
With the current state of this PR, missing the copy constructor entry reduces the impact that the second point has.
Solving the second point properly would take more effort.
As this PR has a positive impact for your use case and probably the community, and is guarded by a flag, we could go for it and fix the two main shortcomings in a subsequent PR. Alternatively, of course, we can fix this with this PR already but that requires more code to be added.
Once again, sorry for the delay. And, looking forward to get this one in.
lib/Core/ExecutionState.h
Outdated
| bool forkDisabled = false; | ||
|
|
||
| /// @brief Mapping symbolic address expressions to concrete base addresses | ||
| typedef std::map<ref<Expr>, ref<ConstantExpr>> base_addrs_t; |
There was a problem hiding this comment.
Please use the newer using syntax to declare aliases
| typedef std::map<ref<Expr>, ref<ConstantExpr>> base_addrs_t; | |
| using base_addrs_t = std::map<ref<Expr>, ref<ConstantExpr>> ; |
lib/Core/Executor.cpp
Outdated
|
|
||
| bool const_base = isa<ConstantExpr>(base); | ||
| bool update = false; | ||
| ref<ConstantExpr> original_base = 0; |
There was a problem hiding this comment.
Move that up and make a copy of base directly.
There was a problem hiding this comment.
Just to clarify: when you refer to making a copy of base directly, what do you mean? Does is relate to the copy constructor?
There was a problem hiding this comment.
No, in this case just a:
| ref<ConstantExpr> original_base = 0; | |
| ref<ConstantExpr> original_base = base; |
The goal would be to bring both guarded code blocks together.
lib/Core/Executor.cpp
Outdated
| ref<Expr> key = 0; | ||
| ref<ConstantExpr> value = 0; | ||
|
|
||
| if (SingleObjectResolution) { |
There was a problem hiding this comment.
Please merge this block into the follow-up block.
lib/Core/Executor.cpp
Outdated
| KGEPInstruction *kgepi = static_cast<KGEPInstruction*>(ki); | ||
| ref<Expr> base = eval(ki, 0, state).value; | ||
|
|
||
| bool const_base = isa<ConstantExpr>(base); |
There was a problem hiding this comment.
All this definitions are only used inside of the SingleObjectResolution block, please move them there.
lib/Core/Executor.cpp
Outdated
| value = base_it->second; | ||
| } | ||
| } else { | ||
| original_base = dyn_cast<ConstantExpr>(base); |
There was a problem hiding this comment.
No need for this we already made the copy earlier.
lib/Core/Executor.cpp
Outdated
| ref<ConstantExpr> value = 0; | ||
|
|
||
| if (SingleObjectResolution) { | ||
| ExecutionState::base_addrs_t::iterator base_it; |
There was a problem hiding this comment.
Use auto and move it into the the next if branch.
| // dereference of a pointer within a struct | ||
| int *tmp = &b.y1[x].z; | ||
|
|
||
| // CHECK: SingleObjectResolution.c:26: memory error: out of bound pointer |
There was a problem hiding this comment.
Put the CHECK above the line where the error is expected. Instead of encoding the line number directly, use [[@LINE+1]], which makes it less fragile for future changes.
There was a problem hiding this comment.
Add additional checks for the solver queries to number you expect, i.e. use:
// RUN: not %klee-stats --print-columns 'Queries' --table-format=csv %t.klee-out | FileCheck %s --prefix CHECKSTATS`
// CHECK-STATS: {{.*\.klee-out}},42
|
Hi Martin, thank you very much for the feedback. I will try to address the mentioned points and follow up with more commits on the PR branch. |
|
@MartinNowack I came across something interesting while testing the follow-up changes. |
|
@tkuchta I've recently run into a similar example and @danielschemmel was able to quickly spot the issue. Try replacing your But yes, in some cases, due to its design, KDAlloc decreases the impact of this useful feature you are implementing. |
…e more map added to ExecutionState); now storing addresses of MemoryObjects for easier cleanup
|
@ccadar thank you for the help! indeed this solved the issuse @MartinNowack I've pushed a reworked version to my branch. |
|
I can see that the check for the number of queries failed. On my local setup that number I specified in the test passes - probably there is a difference in the STP version. Given the difference I was wondering whether to keep that check in the test - what do you think? |
|
Given that the number of queries depends e.g. on the generated bitcode the test is quite brittle its current form. You can try to use |
|
@251 Thanks for the suggestion. I've modified the test case (dfs + SolverQueries), but the number on my machine is unfortunately still different than in the CI tests |
|
I can see that for example on mac os tests from the CI the number of queries is also different while most other tests passed. Shall I remove that check from the test? What would you advise? |
|
I didn't check the cause for the different number of queries but on Linux it's this one: for the It seems to be a bit of a chicken and egg problem after having a look at the query dumps. The difference is caused in Line 3607 in fc83f06 processTestcase decides not to write a test (due to write-no-tests). So the solver calls are made unnecessarily.
So we could either analyse the root cause or get rid of the useless queries. |
|
Hi @251, thanks for the analysis. If my understanding is correct, the discrepancy that we see is not related to my change but more fundamental - is that right? If that's the case I was wondering if mabye I should remove that check from the test case in order to prevent introducing false alarms in the tests. |
|
Hi @tkuchta, we had an internal discussion and since a fix would require another PR and the test would still be fragile due to LLVM version differences, it's a reasonable decision to remove the check for now. |
|
@251 Thank you, I removed the check from the test case |
|
@tkuchta Thanks for the changes. We still need to finish the other PR before this one gets merged. But it's the next one in line. In general, changes look great. The only thing left would be to merge the commits as fits but this can be done as part of a final rebase. So, almost there 😄 . |
MartinNowack
left a comment
There was a problem hiding this comment.
@tkuchta Thanks a lot for this PR. It looks great now. Let's merge this.
|
Great, thanks for the reviews! |
…sses.
This feature implements tracking of and resolution of memory objects in the presence of
symbolic addresses.
For example, an expression like the following:
int x;
klee_make_symbolic(&x, sizeof(x), "x");
int* tmp = &b.y[x].z;
For a concrete array object "y", which is a member of struct "b", a symbolic offset "x" would normally be resolved to any matching memory object - including the ones outside of the object "b".
This behaviour is consistent with symbex approach of exploring all execution paths.
However, from the point of view of security testing, we would only be interested to know if we are still
in-bounds or there is a buffer overflow.
The implemented feature creates and tracks (via the GEP instruction) the mapping between the current
symbolic offset and the base object it refers to: in our example we are able to tell that the reference
should happen within the object "b" (as the array "y" is inside the same memory blob). As a result, we are able to minimize the symbolic exploration to only two paths: one within the bounds of "b", the other with a buffer overflow bug.
The feature is turned on via the single-object-resolution command line flag.
A new test case was implemented to illustrate how the feature works.
Thank you for contributing to KLEE. We are looking forward to reviewing your PR. However, given the small number of active reviewers and our limited time, it might take a while to do so. We aim to get back to each PR within one month, and often do so within one week.
To help expedite the review please ensure the following, by adding an "x" for each completed item: