CS6410 Fall 2009 |
TR 10:10-11:25 |
Upson 211 |
CS6410: Advanced Systems |
|
Getting Started on AWSThe CS6410 projects will use the Amazon Web Services (AWS) to give us practical experience in using web services and especially clusters. Amazon has graciously offered to provide us with a prepaid AWS account to support the course. This is a single account, to be shared among all the CS6410 students. 1. Conventions for Sharing a Single AWS AccountBecause we will all be sharing a single Amazon AWS account, it will be possible (with some effort) for students to decrypt and read one another's Amazon machine image files, or to steal one another's static content files. This is not fundamentally different from the bad old days when CS programming projects were done on a batch computing system and printouts were delivered in sorted piles in public terminal rooms. The University Academic Integrity Policy applies to everything we put into AWS, and you are expected to follow this policy scrupulously. Also as a consequence of sharing a single account, we will need to modify some of the procedures from the Amazon documentation so different students' projects will not interfere with one another. We have the following issues:
The following sections will describe what you need to do to set up your system to use the shared AWS account, in particular pointing out which parts of the Amazon "Getting Started" documentation you should not do. The examples will follow UNIX syntax and filename conventions, with occasional hints for Windows users. Long commands may be split across several lines in this documentation for readability; your command interpreter won't let you do that. 2. Downloading and Installing AWS Account Information
To get the shared AWS account information,
log on to CMS go to lab0
and download the file ~/.awsor (for Windows users) c:\awscontaining two files: pk-xxxxxxxx.pemValues for xxxxxxxx will come from the download. These files contain the private key and X.509 certificate used to authenticate to Amazon EC2. You will also have modified your environment to contain AWS_ACCOUNT_ID=xxxxxxxxor (for Windows users) AWS_ACCOUNT_ID=xxxxxxxxThe actual values for xxxxxxxx will come from the download. All done! In following sections we will discuss getting started with S3 and EC2 in some detail. 3. General InstructionsAmazon's top-level documentation page (here) has links to "Getting Started" guides for each of the many services included in AWS. We discuss two of them -- S3 and EC2 -- in the next two sections of this document. The Amazon guides are simple and informative, but they are not designed for a shared AWS account. As you work through them, you will sometimes be instructed to use the Amazon web site to sign up for a service or to create a new X.509 certificate. In other places you will be instructed to create a new keypair, to modify the rules of the default network security group, or to do some other thing that affects the global state of the AWS account. Clearly, if several students were to try this concurrently it would be a Bad Thing. So, You should ignore such instructions!The AWS account has already been set up, the AccountID, KeyID, Secret Key, X.509 certificate have already been created, and you have installed them on your machine. Your bucket names, keypair names, image names and security group names should always be constructed according to the conventions described in Section 1, and you should avoid the default network security group altogether. 4. Getting Started with S3Download the prefered set of tools to use in this class, Tim Kay's command line aws shell. It is available here, and runs on pretty much any platform, including Windows, Linux, or Mac OS X with minimal external dependencies. Make sure your ~/.awssecret file is properly set before issuing any commands. All examples in this document are performed using the aws tool, however, you may use any other tools that offer the same functionality. To start up type aws in the shell---the program will return a list of available actions and help syntax. The Amazon "Getting Started" document for S3 is here. You can ignore this if you have already installed the aws tools as described above. There are also a couple of S3 shells available as linux 5. Getting Started with EC2The Amazon "Getting Started" document for EC2 is here. You can ignore this if you have already installed the aws tools as described above. The Amazon online EC2 DeveloperGuide includes documentation for the command line tools. It is a Good Idea to read the manual page for each ami/api command as you are about to use it, to make sure you understand what it is about to do. Running an InstanceThis section includes a step for "Generating a Keypair", which must be changed to conform to our shared account naming conventions. The Amazon instructions tell you to name your keypair "gsg-keypair" (The "gsg" part presumably stands for "Getting Started Guide.") Instead, you should use a netid-specific name following our naming conventions; for example, kp-netid-xxxwhere the netid part should be replaced by your own netid, and xxx is a postfix that helps you identify the key. The Amazon document instructs you to store the private key of the keypair in a local file. The logon step where you connect to your instance using an ssh client requires the name of this file, so put it someplace where you can find it, for example aws add-keypair kp-hw228-test > ~/.aws/id-rsa-kp-hw228-testAgain, the name you use for this file is arbitrary, but you need some convention that will enable you to remember the name of the RSA private key file associated with each of your EC2 keypair names. ssh has been noticed to refuse using public keys unless the file permissions are set apropriately (e.g. on UNIX the command chmod 600 ~/.aws/id-rsa-kp-hw228-test should do the trick).
To debug your ssh connections, use the -v or -vv flag.
The Network Security Group is another important issue that is not well covered in the Amazon document. Every EC2 instance runs in a named security group that you specify when you start the instance. The security group has a set of firewall rules that control network connectivity between instances in the group and instances outside it. If you start an instance without explicitly specifying a security group, the instance runs in a predefined group named "default". Clearly, it would be a Bad Idea for concurrent users of a shared account to have instances running in the (same) default security group; so we use the naming convention described above for security groups. You should create a new security group for the remainder of this exercise using a command like aws add-group gp-netid-xxx -d "yyyyyyyy"Where as above you should replace netid by your own netid, xxx by a string to make the security group name unique among all the security group names you define, and yyyyyyy by a short description of this security group. For example, aws add-group gp-hw228-test -d "test group for getting started" You can check to make sure this worked by typing aws describe-groups gp-netid-xxxor just aws describe-groupswhich will list all groups that have been defined by anyone using the shared account. For some of the later steps in this exercise you will need to specify the group name explicitly rather than allowing it to default to "default". The command that actually starts your instance is the first of these. For example aws run-instances ami-7fd83816 -g gp-hw228-test -k kp-hw228-teststarts an instance in the specified group gp-hw228-test. The Amazon Machine Image used in the example is called ami-7fd83816. For a complete list of available images, you can run the ec2-describe-images. In general we will provide you with an image that contains all the libraries required for the lab assignments -- ami-7fd83816 is one such image. aws authorize gp-hw228-test -P tcp -p 22 -s 0.0.0.0/0Opens the standard TCP ports from any address (0.0.0.0/0) for ssh (22) and HTTP (80) in the group gp-hw228-test.
At this point, you should be able to connect to your instance with
the ssh command from the Amazon documentation,
using the name of the RSA private key file
you saved when you created your EC2 keypair,
and the external network address assigned to your running instance (to find it
out use ssh -i ~/.aws/id-rsa-kp-hw228-test root@ec2-67-202-33-73.compute-1.amazonaws.comWell, "Congratulations!" You have started an instance. As the Amazon docs warn you, DO NOT go away without remembering to shut down your instance -- the CS6410 course account is charged for the instance for as long as it continues to run. Bundling an AMIFor this part of the exercise there is a convention that may not be obvious: examples that use the prompt string prompt>are commands you execute on your local machine; but examples using the prompt string #are commands you execute on a running EC2 instance to which you have logged in as the root user with ssh .
It is important to keep this straight.
In addition, we need to worry about naming conventions for Amazon Machine Image (AMI) bundles. The procedure used in the Amazon document gives the image files default names, and so does not support bundling more than one AMI into the same bucket. First, you need to copy the private key and X.509 certificates associated with (shared) account up to the running machine instance you want to bundle. Using our naming and environment conventions a command something like prompt> scp -i ~/.aws/id-rsa-kp-hw228-test ${EC2_PRIVATE_KEY} ${EC2_CERT}will upload these files to the /mnt directory of the running instance.
At this point you are ready to ask the EC2 instance to bundle itself. The default bundling command creates a number of files with names of the form image.foo.barin the /mnt directory of the instance.
These names appear in the S3 bucket where the AMI is stored,
preventing you from creating more than one AMI in the same bucket.
This is a Bad Thing.
To avoid it, you need to add a common prefix to the name of each file
in the bundle using the -p option to the ec2-bundle-vol command.
This is the "Image Name Prefix" discussed in our naming conventions in Section 1.
The command
# ec2-bundle-vol -d /mnt -p im-hw228-testbundles the image to a collection of files on /mnt ;
all the file names will begin with the prefix "im-hw228-test" rather than "image".
Here xxxxxxxx must be replaced by the names of the .pem files
that were uploaded using scp above,
and iiiiiiii must be replaced by your AWS Account ID,
the value of the environment variable AWS_ACCOUNT_ID
on your local machine.
Sadly, these values don't appear in the environment of the instance,
where you are executing the ec2-bundle-vol command,
so you will need to cut and paste.
The next step is to upload the AMI to S3. For example, the command: # ec2-upload-bundle -b edu-cornell-cs-cs6410-hw228 -m /mnt/im-hw228-test.manifest.xmlwas used to upload an AMI to my ownpersonal bucket. The aws-key-id and aws-secret-key need to be replaced by their true values. These are the values of $AWS_KEY_ID$ and $AWS_SECRET_KEY$ on your own machine, but again, as the upload command is being run on the EC2 instance, the environment variables will not be available and you will have to cut and paste. Once you have successfully uploaded your AMI you no longer need your running instance; you can shut it down with the command # /sbin/shutdown -h nowon the instance itself, or you can use the AWS command prompt> aws terminate-instances i-nnnnnwhich is probably more reliable. The final step of this lengthy process is to register your AMI so you can start it in a new instance. Again the Amazon documentation needs to be modified to get the manifest file name right. For example, the command prompt> aws register edu-cornell-cs-cs6410-hw228/im-hw228-test.manifest.xmlcould be used to register the image we uploaded in the previous step. At this point you have a registered AMI and can try running it. The command given in the Amazon document has one of the same issue we discussed when running a public AMI: for a shared account, you should never start an instance in the "default" group, so the command to run your instance should specify one of your own group names, for example prompt> aws run-instances ami-5bae4b32 -g gp-hw228-test -k kp-hw228-test
Cleaning UpThis subsection of the Amazon documentation describes how to deregister your AMI and remove it from S3, so it won't consume space unnecessarily. S3 file space is not free, but it is not nearly as expensive as instances. So Don't leave any EC2 instances running!The command prompt> aws describe-instanceswill show you a list all running instances with enough information that you can determine the instance ids of all instances that are yours (contain your netid). You can then terminate them by running prompt> aws terminate-instances i-nnnnn ...Please don't forget to do this step. |