And here's an implementation in Python that I think provides that capability:
When I run the above script I get
$ python3 add_one.py 4 4 plus one is 5 $ python3 add_one.py 4.2 4.2 plus one is 5.2 $ python3 add_one.py cat Error: 'cat' is not a valid float. Usage: python add_one.py <number>
Next I'm going to intentionally add a few bugs and then ask how to prove the implementation has no bugs:
I've added three bugs in v2: a deterministic bug, a random bug, and bug that depends on the user's environment. A brute force test would be expensive but could identify the first two bugs.
There are a couple problems with v1 of the program requirements to "add one to a user-provided value."
- The input range (unstated) is negative infinity to positive infinity.
- Python does not have a built-in limit for the size of integers. The maximum integer value is restricted only by the available memory of the system.
- For floats there is an upper bound:
>>> import sys >>> sys.float_info.max 1.7976931348623157e+308
- Time-out conditions are unspecified. So if the program doesn't respond for 5 minutes, the requirements have nothing to say about that.
Rewrite the program requirements to be more specific:
If the "computer" doing this calculation has a 1Hz CPU clock frequency with 1byte of RAM, that might result in the Python program being "right" but the hardware being inadequate.
Also, let's make explicit the assumption that we are operating in base 10 numbers.
To be safe with the input string, let's bound that to be less than 1000 characters.
The revised implementation is
Normal testing involves evaluating pre-determined cases, like "input 5, get 6" and "input 5.4, get 6.4" and "input 'cat', get error" and "input (nothing), get error."
Property-based testing (e.g., https://hypothesis.readthedocs.io/en/latest/) is where you "write tests which should pass for all inputs in whatever range you describe, and let Hypothesis randomly choose which of those inputs to check - including edge cases you might not have thought about."
To run the above script, usepytest name_of_file.py
Similarly, https://github.com/zlorb/PyModel is a model checker that generates test cases based on constraints.
This blog post is about formal methods. There are a few options:
- formal design specification
- https://github.com/fizzbee-io/FizzBee is a design specification language. FizzBee is supposed to be before you start coding
- formal implementation verification
- https://github.com/marcoeilers/nagini, a static verification tool for Python using Viper, is an implementation verification language. https://www.youtube.com/watch?v=PIwP3SuWLb0
- https://deal.readthedocs.io/basic/verification.html is an implementation verification language.
- https://github.com/pschanely/CrossHair: a static verification tool for Python using symbolic execution. "Repeatedly calls your functions with symbolic inputs. It uses an SMT solver (a kind of theorem prover) to explore viable execution paths and find counterexamples for you."
- https://github.com/formal-land/coq-of-python - Translate Python code to Coq code for formal verification. "formal-land" is a commercial company selling verification-as-a-service:
- https://github.com/arsalan0c/dafny-of-python: translates a program written in a subset of typed Python along with its specification, to the Dafny verification language. Uses Dune, a build system for OCaml projects. Last update was 4 years ago. See https://dune.build/ and https://dune.readthedocs.io/en/latest/index.html
For Dafny you write the program in Dafny and compile to Python.
For Deal you write Python and provide decorators.
Dafny
Dafny was created by Rustan Leino at Microsoft Research.
Dafny uses the Z3 automated theorem prover and Boogie.
Boogie is a simple programming language that is meant to besource: https://www.reddit.com/r/ProgrammingLanguages/comments/tc55ld/how_does_the_dafny_programming_language/
- an easy compile target (think "like JVM bytecode, but for proving code correct")
- easy to analyze soundly
- not actually intended to be executable
Instead of running Boogie programs, the Boogie compiler looks through the Boogie code to find assertions. For each assertion, the compiler generates a "verification condition", which is a formula based on a (symbolic) analysis of the program; the verification condition formula is constructed so that if the verification condition is true, the assertion holds.
It then hands those verification conditions, along with annotations in the program like assumptions, preconditions, postconditions, and loop invariants, to an SMT solver (Boogie uses Z3 by default). The SMT solver determines whether or not the assumptions definitely ensure the verification condition holds; Boogie complains about the assertions whose verification-conditions haven't been shown to hold.
- @requires(...): Preconditions -- what must be true before the function is called.
- @ensures(...): Postconditions -- what must be true after the function returns normally.
Deal
python3 -m deal prove name_of_file.py