Feature: implement single memory object resolution for symbolic addre… by tkuchta · Pull Request #1380 · klee/klee

tkuchta · 2021-02-03T08:02:33Z

…sses.

This feature implements tracking of and resolution of memory objects in the presence of
symbolic addresses.
For example, an expression like the following:
int x;
klee_make_symbolic(&x, sizeof(x), "x");

int* tmp = &b.y[x].z;

For a concrete array object "y", which is a member of struct "b", a symbolic offset "x" would normally be resolved to any matching memory object - including the ones outside of the object "b".
This behaviour is consistent with symbex approach of exploring all execution paths.
However, from the point of view of security testing, we would only be interested to know if we are still
in-bounds or there is a buffer overflow.

The implemented feature creates and tracks (via the GEP instruction) the mapping between the current
symbolic offset and the base object it refers to: in our example we are able to tell that the reference
should happen within the object "b" (as the array "y" is inside the same memory blob). As a result, we are able to minimize the symbolic exploration to only two paths: one within the bounds of "b", the other with a buffer overflow bug.

The feature is turned on via the single-object-resolution command line flag.

A new test case was implemented to illustrate how the feature works.

Thank you for contributing to KLEE. We are looking forward to reviewing your PR. However, given the small number of active reviewers and our limited time, it might take a while to do so. We aim to get back to each PR within one month, and often do so within one week.

To help expedite the review please ensure the following, by adding an "x" for each completed item:

The PR addresses a single issue. In other words, if some parts of a PR could form another independent PR, you should break this PR into multiple smaller PRs.
There are no unnecessary commits. For instance, commits that fix issues with a previous commit in this PR are unnecessary and should be removed (you can find documentation on squashing commits here).
Larger PRs are divided into a logical sequence of commits.
Each commit has a meaningful message documenting what it does.
The code is commented. In particular, newly added classes and functions should be documented.
The patch is formatted via clang-format (see also git-clang-format for Git integration). Please only format the patch itself and code surrounding the patch, not entire files. Divergences from clang-formatting are only rarely accepted, and only if they clearly improve code readability.
Add test cases exercising the code you added or modified. We expect system and/or unit test cases for all non-trivial changes. After you submit your PR, you will be able to see a Codecov report telling you which parts of your patch are not covered by the regression test suite. You will also be able to see if the Github Actions CI and Cirrus CI tests have passed. If they don't, you should examine the failures and address them before the PR can be reviewed.
Spellcheck all messages added to the codebase, all comments, as well as commit messages.

codecov · 2021-02-03T08:21:42Z

Codecov Report

Merging #1380 (8fe9fa1) into master (9edf8e8) will increase coverage by 0.02%.
Report is 1 commits behind head on master.
The diff coverage is 77.50%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1380      +/-   ##
==========================================
+ Coverage   70.48%   70.50%   +0.02%     
==========================================
  Files         154      154              
  Lines       18870    18914      +44     
  Branches     4424     4440      +16     
==========================================
+ Hits        13301    13336      +35     
- Misses       3654     3656       +2     
- Partials     1915     1922       +7

Files	Coverage Δ
lib/Core/ExecutionState.cpp	`71.42% <100.00%> (+1.02%)`	⬆️
lib/Core/ExecutionState.h	`84.00% <ø> (ø)`
lib/Core/Executor.cpp	`76.42% <74.64%> (-0.03%)`	⬇️

... and 3 files with indirect coverage changes

tkuchta · 2021-02-03T10:01:55Z

Hi, it seems that the tests for MacOs and FreeBSD are failing due to missing support for uclibc and posix-runtime there. I wasn't sure if that's something expected. Is there perhaps a way to skip the test under those OSs? Alternatively I could try to make a test case that doesn't use these features.

ccadar · 2021-02-03T12:08:10Z

@tkuchta Thanks for the PR, it's great to finally see work on this feature! I haven't reviewed the changes (might take a bit, as it's a busy period), but looking quickly over the test you added, I don't think you need uclibc nor the POSIX runtime, do you?

tkuchta · 2021-02-03T15:07:39Z

@ccadar yes, indeed, the use of uclibc and posix runtime doesn't seem to be necessary. I have submitted two more commits with test case simplification and one fix (I skipped one solver->setTimeout call by accident). One of the FreeBSD checks is failing but it seems unrelated to my change: the failure is in other test cases and the change is enabled only under a new command line flag.

…sses. This feature implements tracking of and resolution of memory objects in the presence of symbolic addresses. For example, an expression like the following: int x; klee_make_symbolic(&x, sizeof(x), "x"); int* tmp = &b.y[x].z; For a concrete array object "y", which is a member of struct "b", a symbolic offset "x" would normally be resolved to any matching memory object - including the ones outside of the object "b". This behaviour is consistent with symbex approach of exploring all execution paths. However, from the point of view of security testing, we would only be interested to know if we are still in-bounds or there is a buffer overflow. The implemented feature creates and tracks (via the GEP instruction) the mapping between the current symbolic offset and the base object it refers to: in our example we are able to tell that the reference should happen within the object "b" (as the array "y" is inside the same memory blob). As a result, we are able to minimize the symbolic exploration to only two paths: one within the bounds of "b", the other with a buffer overflow bug. The feature is turned on via the single-object-resolution command line flag. A new test case was implemented to illustrate how the feature works.

MartinNowack

@tkuchta Thanks a lot for your PR and sorry for the long delay.

I added several comments that would improve the code.

While I see the positive impact this PR can have, there are still two main shortcomings:

base_addrs is not copied as part of the ExecutionState's copy constructor. After each state fork, the entries for the new state are empty and the optimisation cannot be used in the forked state
No entries are removed from the base_addrs, i.e. if objects are freed essentially adding more memory pressure over time.

With the current state of this PR, missing the copy constructor entry reduces the impact that the second point has.

Solving the second point properly would take more effort.

As this PR has a positive impact for your use case and probably the community, and is guarded by a flag, we could go for it and fix the two main shortcomings in a subsequent PR. Alternatively, of course, we can fix this with this PR already but that requires more code to be added.

Once again, sorry for the delay. And, looking forward to get this one in.

MartinNowack · 2023-11-12T11:05:18Z

lib/Core/ExecutionState.h

  bool forkDisabled = false;

+  /// @brief Mapping symbolic address expressions to concrete base addresses
+  typedef std::map<ref<Expr>, ref<ConstantExpr>> base_addrs_t;


Please use the newer using syntax to declare aliases

Suggested change

typedef std::map<ref<Expr>, ref<ConstantExpr>> base_addrs_t;

using base_addrs_t = std::map<ref<Expr>, ref<ConstantExpr>> ;

MartinNowack · 2023-11-12T11:08:44Z

lib/Core/Executor.cpp


+    bool const_base = isa<ConstantExpr>(base);
+    bool update = false;
+    ref<ConstantExpr> original_base = 0;


Move that up and make a copy of base directly.

Just to clarify: when you refer to making a copy of base directly, what do you mean? Does is relate to the copy constructor?

No, in this case just a:

Suggested change

ref<ConstantExpr> original_base = 0;

ref<ConstantExpr> original_base = base;

The goal would be to bring both guarded code blocks together.

MartinNowack · 2023-11-12T11:11:04Z

lib/Core/Executor.cpp

+    ref<Expr> key = 0;
+    ref<ConstantExpr> value = 0;
+
+    if (SingleObjectResolution) {


Please merge this block into the follow-up block.

MartinNowack · 2023-11-12T11:11:39Z

lib/Core/Executor.cpp

    KGEPInstruction *kgepi = static_cast<KGEPInstruction*>(ki);
    ref<Expr> base = eval(ki, 0, state).value;

+    bool const_base = isa<ConstantExpr>(base);


All this definitions are only used inside of the SingleObjectResolution block, please move them there.

MartinNowack · 2023-11-12T11:13:11Z

lib/Core/Executor.cpp

+          value = base_it->second;
+        }
+      } else {
+        original_base = dyn_cast<ConstantExpr>(base);


No need for this we already made the copy earlier.

MartinNowack · 2023-11-12T11:14:57Z

lib/Core/Executor.cpp

+    ref<ConstantExpr> value = 0;
+
+    if (SingleObjectResolution) {
+      ExecutionState::base_addrs_t::iterator base_it;


Use auto and move it into the the next if branch.

MartinNowack · 2023-11-12T11:20:41Z

test/Feature/SingleObjectResolution.c

+  // dereference of a pointer within a struct
+  int *tmp = &b.y1[x].z;
+
+  // CHECK: SingleObjectResolution.c:26: memory error: out of bound pointer


Put the CHECK above the line where the error is expected. Instead of encoding the line number directly, use [[@LINE+1]], which makes it less fragile for future changes.

Add additional checks for the solver queries to number you expect, i.e. use:

// RUN: not %klee-stats --print-columns 'Queries' --table-format=csv %t.klee-out | FileCheck %s --prefix CHECKSTATS` // CHECK-STATS: {{.*\.klee-out}},42

tkuchta · 2023-11-13T05:54:41Z

Hi Martin, thank you very much for the feedback. I will try to address the mentioned points and follow up with more commits on the PR branch.

tkuchta · 2023-11-14T13:57:02Z

@MartinNowack I came across something interesting while testing the follow-up changes.
It seems that when kdalloc is turned on, my test case immediately terminates with a pointer error and a single path even without turning on the new feature. Do you have any thoughts why that would be the case? The test case spawns multiple memory objects which we access via a symbolic pointer and in principle KLEE should fork to reach each of the objects. When running with kdalloc==false I see the past behaviour, i.e. large malloc and long execution which I assume means that the forking happens.

ccadar · 2023-11-14T14:15:49Z

@tkuchta I've recently run into a similar example and @danielschemmel was able to quickly spot the issue. Try replacing your int index by a size_t and you should see a similar behaviour.

But yes, in some cases, due to its design, KDAlloc decreases the impact of this useful feature you are implementing.

…e more map added to ExecutionState); now storing addresses of MemoryObjects for easier cleanup

tkuchta · 2023-11-17T15:02:58Z

@ccadar thank you for the help! indeed this solved the issuse

@MartinNowack I've pushed a reworked version to my branch.
I have also implemented a cleanup that should work both for malloc/free as well as for stack variables.
I've modified the test case to include both types of vars + make sure that state forking works now (thanks for spotting the missing copy constructor part!).
Please let me know if you have any further comments.

tkuchta · 2023-11-17T17:42:01Z

I can see that the check for the number of queries failed. On my local setup that number I specified in the test passes - probably there is a difference in the STP version. Given the difference I was wondering whether to keep that check in the test - what do you think?

251 · 2023-11-17T18:00:18Z

Given that the number of queries depends e.g. on the generated bitcode the test is quite brittle its current form. You can try to use SolverQueries instead and use dfs to get a slightly more stable version that at least does not depend on caching effects due to the search heuristic.

tkuchta · 2023-11-17T20:54:02Z

@251 Thanks for the suggestion. I've modified the test case (dfs + SolverQueries), but the number on my machine is unfortunately still different than in the CI tests

/tmp/cirrus-ci-build/test/Feature/SingleObjectResolution.c:6:17: error: CHECK-STATS: expected string not found in input
// CHECK-STATS: 193
                ^
<stdin>:1:1: note: scanning from here
SolverQueries
^
<stdin>:2:1: note: possible intended match here
186
^

tkuchta · 2023-12-01T08:50:46Z

I can see that for example on mac os tests from the CI the number of queries is also different while most other tests passed. Shall I remove that check from the test? What would you advise?

251 · 2023-12-01T17:57:42Z

I didn't check the cause for the different number of queries but on Linux it's this one:

Constraints [
(Eq false
     (Eq 0
         (ReadLSB w32 0 y)))
(Eq false
     (Ult (Add w64 508
                   (Mul w64 24
                            (ReadLSB w64 0 x)))
          477))
]
Query [
(Add w64 139382106358268
          (Mul w64 24
                   (ReadLSB w64 0 x)))
]

for the return *ptr in bar().

It seems to be a bit of a chicken and egg problem after having a look at the query dumps. The difference is caused in

klee/lib/Core/Executor.cpp

Line 3607 in fc83f06

solver->getRange(state.constraints, address, state.queryMetaData);

when the state is terminated. KLEE solves, calls the termination function and than processTestcase decides not to write a test (due to write-no-tests). So the solver calls are made unnecessarily.

So we could either analyse the root cause or get rid of the useless queries.

tkuchta · 2023-12-03T14:53:21Z

Hi @251, thanks for the analysis. If my understanding is correct, the discrepancy that we see is not related to my change but more fundamental - is that right? If that's the case I was wondering if mabye I should remove that check from the test case in order to prevent introducing false alarms in the tests.

251 · 2023-12-06T14:30:20Z

Hi @tkuchta, we had an internal discussion and since a fix would require another PR and the test would still be fragile due to LLVM version differences, it's a reasonable decision to remove the check for now.

tkuchta · 2023-12-09T13:58:04Z

@251 Thank you, I removed the check from the test case

MartinNowack · 2023-12-17T09:11:00Z

@tkuchta Thanks for the changes. We still need to finish the other PR before this one gets merged. But it's the next one in line. In general, changes look great. The only thing left would be to merge the commits as fits but this can be done as part of a final rebase. So, almost there 😄 .

MartinNowack

@tkuchta Thanks a lot for this PR. It looks great now. Let's merge this.

tkuchta · 2024-01-12T14:28:36Z

Great, thanks for the reviews!

tkuchta force-pushed the base_objects_tracking branch from 8eb5994 to abca75b Compare February 20, 2023 10:48

tkuchta force-pushed the base_objects_tracking branch from abca75b to 7f1f6c9 Compare October 12, 2023 07:54

ccadar mentioned this pull request Oct 30, 2023

Support for LLLVM 15 16 #1664

Merged

8 tasks

MartinNowack requested changes Nov 12, 2023

View reviewed changes

Follow-up: applied review comments, implemented meta-data cleanup (on…

91bd489

…e more map added to ExecutionState); now storing addresses of MemoryObjects for easier cleanup

Make test more deterministic

7b5eb5e

Remove check for the number of solver queries

8fe9fa1

tkuchta requested a review from MartinNowack December 9, 2023 13:59

MartinNowack approved these changes Jan 12, 2024

View reviewed changes

MartinNowack merged commit dd4bf4c into klee:master Jan 12, 2024

	typedef std::map<ref<Expr>, ref<ConstantExpr>> base_addrs_t;
	using base_addrs_t = std::map<ref<Expr>, ref<ConstantExpr>> ;

	ref<ConstantExpr> original_base = 0;
	ref<ConstantExpr> original_base = base;

Conversation

tkuchta commented Feb 3, 2021

Uh oh!

codecov bot commented Feb 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tkuchta commented Feb 3, 2021

Uh oh!

ccadar commented Feb 3, 2021

Uh oh!

tkuchta commented Feb 3, 2021

Uh oh!

MartinNowack left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkuchta commented Nov 13, 2023

Uh oh!

tkuchta commented Nov 14, 2023

Uh oh!

ccadar commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkuchta commented Nov 17, 2023

Uh oh!

tkuchta commented Nov 17, 2023

Uh oh!

251 commented Nov 17, 2023

Uh oh!

tkuchta commented Nov 17, 2023

Uh oh!

tkuchta commented Dec 1, 2023

Uh oh!

251 commented Dec 1, 2023

Uh oh!

tkuchta commented Dec 3, 2023

Uh oh!

251 commented Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkuchta commented Dec 9, 2023

Uh oh!

MartinNowack commented Dec 17, 2023

Uh oh!

MartinNowack left a comment

Choose a reason for hiding this comment

Uh oh!

tkuchta commented Jan 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Feb 3, 2021 •

edited

Loading

ccadar commented Nov 14, 2023 •

edited

Loading

251 commented Dec 6, 2023 •

edited

Loading