deploy

How do I Reproduce Your Key Result in the Paper?

We assume you have the latest Ansible installed on your work computer (a work computer is your laptop/home computer).
On your work computer, you have cloned the latest libhotstuff repo and updated all submodules (if not sure, run git submodule update --init --recursive). Finally, you have already built the repo so binaries hotstuff-keygen and hotstuff-tls-keygen are available in the root directory of the repo.
Right now, you should be at /scripts/deploy directory in your shell (cd <path-to-your-libhotstuff-repo>/scripts/deploy).

In this example, we use a typical Linux image, Ubuntu 18.04, on Amazon EC2. But any machine with Ubuntu 18.04 installed may work, in general.
We assume you have already properly configured the intra-network for the machines that participate in our experiment. This includes some replica machines (machines dedicated to running replica processes) and several client machines.
- Replica machines should be able to talk to each other via TCP port ranging from 10000 (default value generated by gen_conf.py, which could be changed).
- Each client machine should be able to talk to all replica machines via TCP ranging from 20000.
- All machines should be accessible from your work computer given an ssh private key.
- NOTE: In our paper, we used c5.4xlarge to match the configuration of our baselines.

Edit both replicas.txt and client.txt:
- replicas.txt: each line is the external IP and local IP separated by one or more spaces. The external IP will be used for control actions between your work computer and replica machines, whereas the local IP is the address used in your inter-replica network infrastructure, with which replicas establish TCP connections with others.
- clients.txt: each line is a single external IP.
- The same IP can appear multiple times in both files. In this case, you will share the same machine among different processes (not recommended for replicas due to performance reasons).
Generate node.ini and hotstuff.gen.*.conf by running ./gen_all.sh.
Change the ssh key configuration in group_vars/all.yml.
Build libhotstuff on all remote machines by ./run.sh setup.

(optional) Change the parameters in hotstuff.gen.conf to your liking.
(optional) Change the parameters in group_vars/clients.yml to your liking.
(for replicas) Create a new experiment run and start all replica processes by ./run.sh new myrun1.
(wait for a while until all replica processes settle down, for good network like EC2, 10 seconds should be more than enough)
(for replicas) Create a new experiment run and start all client processes by ./run_cli.sh new myrun1_cli.
(wait until all commands are submitted, or you simply would like to end the experiment)
To collect the results, run ./run_cli.sh stop myrun1_cli followed by ./run_cli.sh fetch myrun1_cli.
To analyze the results, run cat myrun1_cli/remote/*/log/stderr | python ../thr_hist.py.
- With all default settings on c5.4xlarge, I got the following results:
```
[349669, 367520, 371855, 370391, 366159, 367565, 365957, 322690]
lat = 6.955ms # mean end-to-end latency
lat = 6.970ms # after removing outliers
```
Finally, stop replicas: ./run.sh stop myrun1.

Each ./run.sh new (same for ./run_cli.sh) will create a folder that contains everything (chosen parameters, raw results) for the run. A good practice is to always move on to a new name for a different run, so you keep all of your previous experiments nicely.
The run.sh script does NOT detect whether there is some other unfinished run (it does, however, prevents you from messing up the state of the same run, given the id like "myrun1"), so you need to make sure you always stop (gracefully exit and all results are available) or reset (simply kill all processes) any historical runs to start fresh.
To check the whether processes are still alive: ./run.sh check myrun1.