Skip to content

Conversation

@avanhatt
Copy link
Contributor

@avanhatt avanhatt commented Oct 1, 2021

Description of changes:

RMC can currently take a long time to verify code with lots of virtual function pointer calls (or, never terminate). This is in part because CBMC uses a very general heuristic to reason about function pointers: it assumes any function of the right type signature could be a target. CBMC does allow you to restrict possible function targets via a command line option and auxiliary json file: http://cprover.diffblue.com/md__home_travis_build_diffblue_cbmc_doc_architectural_restrict-function-pointer.html

Note, using this functionality requires this branch of CBMC, diffblue/cbmc#6376, tracking issue to update RMC once that merges: #671

For virtual function pointer through a vtable, we can often do much better than CBMC's default heuristic. In particular, we can virtually guarantee (unless the user transmutes the vtable memory) that a call through a vtable will be one of the trait-defined method for that trait type that we have previously seen in codegen. Because CBMC asserts(false) if the function pointer is not actually within the restricted set, this optimization is sound.

This is especially helpful for drop_in_place destructor calls through a vtable. Many functions have types that match drop since it takes a single self argument and returns unit. In a Firecracker example, we have seen 300+ function targets identified by the default CBMC heuristic for a single virtual drop call, which otherwise renders CBMC verification basically intractable (think: a massive, 300+ arm if block after CBMC performs function pointer elimination).

This PR add a VtableCtx object that build a map from virtual call sites to possible trait-defined method implementations, then writes out the restrictions as a JSON file.

You can invoke this on single Rust files by passing a --restrict-vtables flag to rmc, or on full crates with dependencies by adding the following to your harness script (which should eventually be handled by cargo rmc:

RUST_BACKTRACE=full RUSTFLAGS=" -Z restrict_vtable_fn_ptrs ...
... 
rmc-link-restrictions . <output dir>
...
goto-instrument  --function-pointer-restrictions-file restrictions.json ... 

Example:

Say we have two traits that both implement a noise method, Animal and Other. Without the vtable restriction flag, any call to noise and any implicit drop on a dynamic trait object would unroll to every implementation of either trait:

  if(!(var_21.vtable->3 == (int (*)(void *))Sheep::noise))
  {
    if(var_21.vtable->3 == (int (*)(void *))i32::noise)
      goto __CPROVER_DUMP_L11;

    if(var_21.vtable->3 == (int (*)(void *))Cow::noise)
      goto __CPROVER_DUMP_L12;

    __CPROVER_assume(0);
  }

Now, we see each call site only considers the implementations that match that trait type:

// Animal::noise 
  if(!(main_1_function_pointer_call_1 == (int (*)(void *))Sheep::noise))
  {
    if(main_1_function_pointer_call_1 == (int (*)(void *))Cow::noise)
      goto __CPROVER_DUMP_L2;

    /* dereferenced function pointer at must be one of [Sheep::noise, Cow::noise] */
    assert(0);
    __CPROVER_assume(0);
  }
  
// Other::noise
  if(!(main_2_function_pointer_call_1 == (int (*)(void *))i32::noise))
  {
    /* dereferenced function pointer at must be i32::noise */
    assert(0);
    __CPROVER_assume(0);
  }

We also get the benefit of CBMC inserting an assert(0) at the end of the block instead of just an assume(0).

Resolved issues:

Resolves #521

Call-outs:

Testing:

  • How is this change tested?
  • Manually improved performance on the Firecracker block device example.
  • Is this a refactor change?

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@avanhatt avanhatt force-pushed the vtable-fn-ptr-restrictions branch from 7f075ae to 400eb8b Compare October 1, 2021 21:02
@avanhatt avanhatt requested a review from danielsn October 1, 2021 21:14
Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make this change less CBMC dependent? Split between collecting potential candidates and doing the CBMC translation after that.

@avanhatt
Copy link
Contributor Author

avanhatt commented Oct 1, 2021

@celinval Agreed on the goal. I tried to keep the totally CBMC-specific logic to within get_virtual_function_restrictions, do you see other place that it would make sense to split off?

@celinval
Copy link
Contributor

celinval commented Oct 4, 2021

@avanhatt I was thinking about whether we should split this file: compiler/rustc_codegen_rmc/src/context/vtable_ctx.rs

@avanhatt avanhatt force-pushed the vtable-fn-ptr-restrictions branch 2 times, most recently from 500fb23 to 9c09cd8 Compare October 20, 2021 17:37
@avanhatt avanhatt marked this pull request as ready for review October 20, 2021 17:38
@avanhatt avanhatt force-pushed the vtable-fn-ptr-restrictions branch from 9c09cd8 to 4b33922 Compare October 29, 2021 16:37
@avanhatt avanhatt force-pushed the vtable-fn-ptr-restrictions branch from fe4fb66 to 4c60685 Compare November 8, 2021 19:42
@avanhatt avanhatt requested a review from a team as a code owner November 11, 2021 15:03
@avanhatt avanhatt force-pushed the vtable-fn-ptr-restrictions branch from ad92aca to 06d823d Compare November 11, 2021 15:10
@avanhatt avanhatt requested a review from celinval November 12, 2021 00:44
@avanhatt
Copy link
Contributor Author

@celinval many rebases and switching to serde later, this is now ready for re-review! The CBMC specifics are now in an entirely different crate.


/// Add a given call site for a virtual function, incremementing the call
/// site index.
fn add_call_site(&mut self, trait_name: String, method: usize, function_location: String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of things of type "String"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the tostring needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anymore at least

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interned string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the format of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the name now, renamed the variable to make clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right place to put the clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we talked about this in a chime sync, this is just the set of assumptions on the type, so returning None is no less safe than unimplemented! but let's us not crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. Can you please add a comment to the code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you intern the string these clones become unnecessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using a serde_json Value instead of the actual struct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS this the best name for this flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you like emit_vtable_restrictions better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment makes me expect a list of files, not the data from inside them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect linking and serializing to be independent steps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a single call to serde_json::to_value(&output).unwrap() so I don't feel strongly about that, but happy to move if you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just feels weird given the function name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to be stable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but there is no safe/stable way to mangle the vtable AFAIK, and the test would fail if it were to change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.restrictions.json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InternedString

scripts/rmc.py Outdated
Comment on lines 61 to 62
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the new whitespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check python formatting with autopep8 on changed files, it inserted the whitepace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah didn't realize quite how copious it was, might need to move some of these to doc comments to make it happy

Comment on lines 1287 to 1288
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a strong reason to even make this a flag? Can we emit these files unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reasons are that currently emitting restrictions currently (1) requires wrapper functions at ever vtable call, because CBMC uses fragile naming conventions for specifying the function call site, and (2) the restrictions could be incomplete if the Rust code is doing certain types of casts that we don't handle, meaning the restriction set would be too small.

The 3rd even more transient reason is that the CBMC PR has not landed, so restrictions fail on our minimum required CBMC version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get that broken into an issue, with e.g. links to CBMC issues or PRs we depend on?

This change is awesome, and I'd like to make sure we have a well-lit path to making it the always-on default. e.g. do we know what the "certain types of casts that we don't handle" are, and can we document that?

(This is obviously not a blocker to merging, I just don't want it accidentally dropped or anything!)

@avanhatt avanhatt force-pushed the vtable-fn-ptr-restrictions branch from ed04501 to 19bb530 Compare November 17, 2021 18:32
@avanhatt avanhatt requested a review from danielsn November 17, 2021 20:25
Copy link
Contributor

@danielsn danielsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with comments

}
}

struct InternedStringVisitor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really go with InternedString itself.

version = "0.1.0"
dependencies = [
"rmc_restrictions",
"rustc_data_structures",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we using from here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler's FxHashMap, not strictly necessary/could be replaced with a vanilla HashMap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If easy, could do that here; if not, lets just do that as a cleanup PR

) {
assert!(self.emit_vtable_restrictions);
let site = CallSite {
trait_method: TraitDefinedMethod { trait_name: trait_name, vtable_idx: method },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to say trait_name: trait_name, trait_name would do

)
.into();

// We only have the Gotoc type, we need to normalize to match the MIR type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk here if MIR changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less so now that we have the type_map, we are normalizing the MIR on both sides instead of relying on string editing.

)
.as_stmt(Location::none());

// Declare symbol for the single, self parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but this feels like something that would happen often enough to be worth a helper function

let key = TraitDefinedMethod { trait_name, vtable_idx: method };

if let Some(possibilities) = self.possible_methods.get_mut(&key) {
possibilities.push(imp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we might want to keep this list ordered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently.

}

/// Add a given call site for a virtual function, incremementing the call
/// site index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it increment the index??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was stale, removed.

Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished reviewing this but so far I haven't seen anything critical. Once @danielsn approves this, I'm ok with merging and making the changes as a follow up PR.

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. Can you please add a comment to the code?

if let Some(fn_symbol) = self.symbol_table.lookup(&fn_name) {
if self.vtable_ctx.emit_vtable_restrictions {
// Add to the possible method names for this trait type
self.vtable_ctx.add_possible_method(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the if statement to be inside the add_possible_method() implementation. This will make the code cleaner and less error prone.

.codegen_unimplemented(
format!("drop_in_place for {}", drop_sym_name).as_str(),
Type::empty(),
let drop_sym = self.ensure(&drop_sym_name, |ctx, name| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the drop case. Would it make more sense for us to replace the function call with the unimplemented assertion instead of generating the function body?


struct InternedStringVisitor;

impl<'de> serde::Deserialize<'de> for InternedString {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a unit test for this?

@avanhatt
Copy link
Contributor Author

Will address followups in #671, merging to unblock Ted's linked PR.

@avanhatt avanhatt merged commit 7820fa5 into model-checking:main Nov 29, 2021
tedinski pushed a commit to tedinski/rmc that referenced this pull request Apr 26, 2022
Add vtable restriction option, need CBMC PR to use
tedinski pushed a commit that referenced this pull request Apr 27, 2022
Add vtable restriction option, need CBMC PR to use
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use vtable cache to restrict CBMC function pointers

4 participants