Home

Welcome to the gradient_coding wiki!

This page contains instructions for running the associated implementation on Amazon EC2. We use a cluster management toolkit called StarCluster (http://star.mit.edu/cluster/) to manage a cluster of EC2 machines.

StarCluster setup for Amazon EC2

Install the StarCluster toolkit (http://star.mit.edu/cluster/)
To configure StarCluster, edit the config file (found in .starcluster folder) as follows:
- Add AWS Security Keys in the fields AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. See here for more details.
- Add AWS User Id in the field AWS_USER_ID. See here for more details.
- Generate an EC2 Key pair (see here for more details). Add the key location in the StarCluster config file, by defining a "key" in the config file, for e.g.
```
[key myrandomkey]   
KEY_LOCATION = ~/myrandomlocation/myrandomkey.pem
```
- Define plugin templates in the config file to add MPI and mpi4py to the EC2 cluster machines --- these are used in our implementation, and are not part of standard AMIs provided by StarCluster.
```
[plugin mpich2]
SETUP_CLASS = starcluster.plugins.mpich2.MPICH2Setup

[plugin mpi4py]
SETUP_CLASS = starcluster.plugins.pypkginstaller.PyPkgInstaller
PACKAGES = mpi4py
```
- Define volume templates in the config file to attach an EBS volume to an EC2 cluster via NFS-share --- these may be used to store data. You will need the volume id of the EBS volume to attach it.
```
[volume mydata]
VOLUME ID = vol-123123123123
MOUNT_PATH = /mydatapath
```
- Define cluster templates in the config file to launch a cluster on EC2. Here is a sample configuration we used:
```
[cluster myclusterconf]
KEYNAME = myrandomkey
CLUSTER_SIZE = 21
CLUSTER_USER = sgeadmin
CLUSTER_SHELL = bash
NODE_IMAGE_ID = ami-6b211202
NODE_INSTANCE_TYPE = t2.micro
MASTER_IMAGE_ID = ami-3393a45a
MASTER_INSTANCE_TYPE = m1.small
PLUGINS = mpi4py, mpich2
SPOT_BID = 0.5
SUBNET_ID = subnet-9999a99b9
PUBLIC_IPS = True
VOLUMES = mydata
```
- Some EC2 instances can only be launched on a subnet. See here to create it. Also, SPOT_BID specifies a spot bid for spot type instances
Once you have edited the config file and added a cluster template (myclusterconf in the above example), you can launch an EC2 cluster as:
```
starcluster start -c myclusterconf mynewcluster
```
You may ssh to the Master node in your cluster as:
```
starcluster sshmaster mynewcluster
```
Finally, you may terminate your cluster as:
```
starcluster terminate mynewcluster
```

Usage Instructions

Launch an EC2 cluster using StarCluster.
Clone this repository into the Master node of your cluster
Edit the Makefile as needed. Below are some pointers:
- Specify the folder containing data in the field DATA_FOLDER
- Specify the data set path and size in the fields DATASET, N_ROWS and N_COLS, and set IS_REAL to 1.
- You may use make arrange_real_data to preprocess the data and break it into partitions (this may have to be re-written for your specific use case, but some examples are provided)
- To work with random data instead, use make generate_random_data to generate an artificial dataset (of size specified in the Makefile, in the fields N_ROWS and N_COLS) and set IS_REAL to 0.
- Specify the total number of workers and the number of stragglers in the fields N_PROCS and N_STRAGGLERS, respectively.
- If using partial coding schemes (see paper for details), specify the no. of partitions each worker processes in N_PARTITIONS, and also set PARTIAL_CODED to 1
Edit the number of iterations, regularization coefficient and learning rate schedules in the file main.py through the variables num_itrs, alpha and learning_rate_schedule, respectively.
Now, you can run (accelerated) gradient descent for various schemes as follows:
- make naive for the Naive (uncoded) scheme
- make avoidstragg for the Ignoring Stragglers scheme
- make cyccoded for the Cyclic Repetition scheme
- make repcoded for the Fractional Repetition scheme
- make partialcyccoded for the Partial Cyclic Repetition scheme
- make partialrepcoded for the Partial Fractional Repetition scheme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Welcome to the gradient_coding wiki!

StarCluster setup for Amazon EC2

Usage Instructions

Clone this wiki locally