-
Notifications
You must be signed in to change notification settings - Fork 4
Home
This page contains instructions for running the associated implementation on Amazon EC2. We use a cluster management toolkit called StarCluster (http://star.mit.edu/cluster/) to manage a cluster of EC2 machines.
-
Install the StarCluster toolkit (http://star.mit.edu/cluster/)
-
To configure StarCluster, edit the config file (found in
.starclusterfolder) as follows:- Add AWS Security Keys in the fields
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY. See here for more details. - Add AWS User Id in the field
AWS_USER_ID. See here for more details. - Generate an EC2 Key pair (see here for more details). Add the key location in the StarCluster config file, by defining a "key" in the config file, for e.g.
[key myrandomkey] KEY_LOCATION = ~/myrandomlocation/myrandomkey.pem - Define plugin templates in the config file to add MPI and mpi4py to the EC2 cluster machines --- these are used in our implementation, and are not part of standard AMIs provided by StarCluster.
[plugin mpich2] SETUP_CLASS = starcluster.plugins.mpich2.MPICH2Setup [plugin mpi4py] SETUP_CLASS = starcluster.plugins.pypkginstaller.PyPkgInstaller PACKAGES = mpi4py - Define volume templates in the config file to attach an EBS volume to an EC2 cluster via NFS-share --- these may be used to store data. You will need the volume id of the EBS volume to attach it.
[volume mydata] VOLUME ID = vol-123123123123 MOUNT_PATH = /mydatapath - Define cluster templates in the config file to launch a cluster on EC2. Here is a sample configuration we used:
[cluster myclusterconf] KEYNAME = myrandomkey CLUSTER_SIZE = 21 CLUSTER_USER = sgeadmin CLUSTER_SHELL = bash NODE_IMAGE_ID = ami-6b211202 NODE_INSTANCE_TYPE = t2.micro MASTER_IMAGE_ID = ami-3393a45a MASTER_INSTANCE_TYPE = m1.small PLUGINS = mpi4py, mpich2 SPOT_BID = 0.5 SUBNET_ID = subnet-9999a99b9 PUBLIC_IPS = True VOLUMES = mydata - Some EC2 instances can only be launched on a subnet. See here to create it. Also, SPOT_BID specifies a spot bid for spot type instances
- Add AWS Security Keys in the fields
-
Once you have edited the config file and added a cluster template (
myclusterconfin the above example), you can launch an EC2 cluster as:starcluster start -c myclusterconf mynewcluster -
You may ssh to the Master node in your cluster as:
starcluster sshmaster mynewcluster -
Finally, you may terminate your cluster as:
starcluster terminate mynewcluster
- Launch an EC2 cluster using StarCluster.
- Clone this repository into the Master node of your cluster
- Edit the Makefile as needed. Below are some pointers:
- Specify the folder containing data in the field
DATA_FOLDER - Specify the data set path and size in the fields
DATASET,N_ROWSandN_COLS, and setIS_REALto 1. - You may use
make arrange_real_datato preprocess the data and break it into partitions (this may have to be re-written for your specific use case, but some examples are provided) - To work with random data instead, use
make generate_random_datato generate an artificial dataset (of size specified in the Makefile, in the fieldsN_ROWSandN_COLS) and setIS_REALto 0. - Specify the total number of workers and the number of stragglers in the fields
N_PROCSandN_STRAGGLERS, respectively. - If using partial coding schemes (see paper for details), specify the no. of partitions each worker processes in
N_PARTITIONS, and also setPARTIAL_CODEDto 1
- Specify the folder containing data in the field
- Edit the number of iterations, regularization coefficient and learning rate schedules in the file
main.pythrough the variablesnum_itrs,alphaandlearning_rate_schedule, respectively. - Now, you can run (accelerated) gradient descent for various schemes as follows:
-
make naivefor the Naive (uncoded) scheme -
make avoidstraggfor the Ignoring Stragglers scheme -
make cyccodedfor the Cyclic Repetition scheme -
make repcodedfor the Fractional Repetition scheme -
make partialcyccodedfor the Partial Cyclic Repetition scheme -
make partialrepcodedfor the Partial Fractional Repetition scheme
-