-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
benchmark: port benchmark.sh to Python, add multicore, multiple runs, persistent+shmem #1853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I don’t mind merging this. |
|
@cjb please see my previous comment :) |
|
@vanhauser-thc Ready for another look, thanks! I don't think the GitHub diff view allows you to render a preview of the .ipynb notebook properly, so here's a link to it: https://github.com/cjb/AFLplusplus/blob/dev-benchmark-py/benchmark/benchmark.ipynb |
|
I am not a fan of jsonlines unless the benchmark tool shows also how it compares to other CPU setups. What I mean is - look at the COMPARISON. a user can just look at the file and see how his setup compares. also whatever format there is, text, json, ... you should add some results there. I will then also add a few. |
Ah, these are already part of this PR -- the diff collapsed it because the changes are large, but there's a full set of experiment results with different parameters for an Intel desktop CPU and an AWS 192 vCPU instance in this PR, in benchmark-results.jsonl, and a Python notebook discussing the results and performing analysis on them live. Data in this PR: https://github.com/cjb/AFLplusplus/blob/dev-benchmark-py/benchmark/benchmark-results.jsonl
Makes sense, I think doing both can work: write the raw data to the JSON Lines, and also write a one-line summary of it to the COMPARISON file after each run for easy textual viewing, I'll work on that. The reason to have the JSON Lines version is to be able to answer more complex questions than "How fast is this machine?" -- the Jupyter notebook analysis gives answers to "How much faster is persistent mode with shared memory? How much faster is multicore, with and without persistent mode? How much faster does it get if you boot with |
|
perfect.
I will merge this once this is added :) |
|
@vanhauser-thc Ready for another look! Please could you re-run this on your own machines (which will add them to COMPARISON), now that it tracks multi-core perf too? |
|
@cjb
The and finally there is not a duplicate check before writing entries in the COMPARISON file. |
|
on a different system I get a script error from python3.8: |
|
Thanks!
It should be on the first line of output:
Fixed.
Ah,
I suppose I'd prefer not to add this -- I'd consider this as a human-readable file, not a machine-readable one.
Fixed. |
please use execs_per_sec from fuzzer_stats for all values and do not calculate your own :-) it can only be less correct :)
otherwise a line is added for every time the user executes it, duplicating existing entries. this is not helpful :) also please make the string longer, it is too short to document which specific processor it is :) |
Done. A reason I was feeling distrusting of
This doesn't answer the point about the file being intended as human-readable, yet we're supposing a parser for it. I could add a parser anyway. It would raise the question of what a duplicate entry is. What if I'm experimenting with system settings and trying to see their effect on the numbers? Do you just want to compare the CPU model on the last line of the
Done -- I was trying to keep the whole file near a standard terminal width, but no big deal. |
just check if '^PROCESSORNAME' is present in the file and do not write if so. very simple. (with a warning to remove the line if they want to save it) |
Done. |
|
Sorry I found one more issue: not sure what is going wrong, this is the exec data in the fuzzer stats: |
|
btw I don't get why you calculate from total_execs / runtime ... this value is already present in fuzzer_stats and called execs_per_sec :) |
|
I'm not sure what's going wrong either. Perhaps
That isn't at play in this crash, since the crashing section is not doing that, but I tried to explain why I did this above:
Here is another explanation of the same problem. If I run with But again, the flow we're talking about, and the number being printed to COMPARISON, does not use this flow with dividing by total runtime. It should just be summing |
This is only supposed to print the sum of all of the |
|
OK this is solved :) it comes from fuzzer_stats. I will do a fix. |
|
fixed the inf bug in dev only the first instance has real results, all others are not really starting up correctly. maybe pipe stdout + stderr of these somewhere in python and see what is going wrong (I am not a python guy ...) |
|
fixed the benchmark.py |
I think so -- the Jupyter notebook contains examples of measuring the perf difference due to afl-system-config and afl-persistent-config, and how perf scales with the number of cores used, and none of those experiments would be possible if we refused to write more than one line per CPU to the JSON output. |
|
Did you mean to pass We should also lower the number of runs now that each run in each mode takes 10 seconds -- the runtime of the script is now 60s. I guess two runs (40 seconds total) should be okay. (I'll also need to redo the analysis in the Jupyter script.) |
|
@vanhauser-thc I think this should be ready to merge now, I removed the self-calculation of execs_per_sec and re-ran on my machines. (Want to add your own machines to COMPARISON?) |
|
Thanks! Will add my machines tomorrow |
Hi @vanhauser-thc, thanks for benchmark.sh!
I've been hacking on it towards a goal of being able to compare execs-per-dollar across cloud instances and consumer machines, and I'd love to get some feedback. The first commit in the series is a straight port from shell to Python, and you can still get that original behavior from this version, with:
But the default arguments are now instead equivalent to:
The defaults:
benchmark-results.jsonlfile for you, in JSON Lines format.Since each run is recorded, it's possible to do some basic data analysis and graphs. Here I did n=36 different campaigns, with
nas the number of parallel afl-fuzz workers:And used jq to see which campaign resulted in the maximum
execs_per_secvalue:Thanks again! I'm especially interested in feedback about whether it seems valid to test and compare numbers for a multicore campaign in this way.