CS 6464: Advanced Distributed Storage Systems
Lab1 - TCP proxy
Due: Tuesday, February 10th at 11:59pm

Introduction

The first programming project is meant to introduce you to some programming tools you'll be using for the rest of the course, particularly the Unix software development environment (gcc/g++, make, etc.) and the C++ asynchronous I/O library (essentially a wrapper on top of select). You may find it useful to refer to the source of the C++ multifinger program described in Using TCP through sockets (section 6.7). More trivial examples can be found here. Warning: if you choose to use the libasync suio::output function instead of the conventional write library function, you must be aware of the fact that it's description is incorrect. In particular, it MAY return 0 not only if EAGAIN. You have been warned!

Your task will be to write a TCP Proxy using the same C++ asynchronous library. You'll learn how to write both client and server code in this lab.

A TCP proxy server is a server that acts as an intermediary between a client and another server, called the destination server. Clients establish connections to the TCP proxy server, which then establishes a connection to the destination server. The proxy server sends data received from the client to the destination server and forwards data received from the destination server to the client. Interestingly, the TCP proxy server is actually both a server and a client. It is a server to its client and a client to its destination server.

A TCP proxy server can be useful to get around services which restrict connections based on the network addresses. For example, the web page http://fireless.cs.cornell.edu/courses/2009sp/cs6464/restricted/ is only accessible from EC2 XXX.compute-1.amazonaws.com hosts. If you try to access it from elsewhere, you will receive an access denied error. However, you can view this page from a web browser anywhere on the Internet by running a proxy server on one of the EC2 instance machines. The web server will think it is serving the data to a web client on the machine running the proxy. However, the proxy is forwarding the data out of the class network, thus subverting the protection mechanism.

The assignment

The proxy server you will build for this lab will be invoked at the command line as follows:

# ./tcp-proxy destination-host destination-port listen-port

For example, to redirect all connections to port 3000 on your local machine to yahoo's web server, run:

# ./tcp-proxy www.yahoo.com 80 3000 

As another example, to view the restricted web page mentioned above, you might run the following command on your EC2 machine:

# ./tcp-proxy fireless.cs.cornell.edu 80 4000 
Then you can view the restricted web page by typing the URL http://ec2-75-101-184-233.compute-1.amazonaws.com:4000/courses/2009sp/cs6464/restricted/ into your browser window, provided that ec2-75-101-184-233.compute-1.amazonaws.com is the public DNS name returned by the ec2-runinstances command, and that you have authorized network access on the proxy listen-port (-p 4000).

The proxy server will accept connections from multiple clients and forward them using multiple connections to the server. No client or server should be able to hang the proxy server by refusing to read or write data on its connection. For instance, if one client suddenly stops reading from the socket to the proxy, other clients should not notice interruptions of service through the proxy. You will need asynchronous behavior, described in "Using TCP Through Sockets".

The proxy must also handle hung clients and servers. In particular, if one end keeps transmitting data but the the other stops reading, the proxy must not buffer an unlimited amount of data. Once the amount of buffered data in a given direction reaches some high water mark (e.g., 8K), the proxy must stop reading in that direction until the buffer drains.

Connection termination

The proxy must handle end-of-file conditions as transparently as possible. If it reads end-of-file from one socket, it should pass the condition along to the other socket (using shutdown) after writing any remaining buffered data. However, the proxy should continue to forward data in the other direction. The proxy should terminate a connection pair and close the file descriptors under either of the following two circumstances:

  1. The proxy has read an end-of-file (or experienced a read error other than EAGAIN) in both directions and has written all remaining buffered data.
  2. The proxy experiences a write error (other than EAGAIN) in either direction.
The reason for giving up more easily on write errors is that they signify some failure of the higher-level protocol. A read end-of-file can be a legitimate part of a protocol, whereas when a program writes data to the network, it indicates a serious problem if no one is there to read it.

The proxy will enforce an upper limit on the number of active connections. Once this limit is reached, no new connections are accepted --- upon closing a connection, a pending connection (if any) is accepted.

Extra Credit

For extra credit, if the proxy has buffered data in one direction and is unable to write any of it for 10 seconds, it should abort the connection pair.

Fetching and building the source

Start by uploading the skeletal tcp-proxy build tree in your home directory on one of your EC2 instances (recall that prompt> denotes your own machine, while # denotes the EC2 instance). For this assignment we have prepared an image that you can use out of the box. The image is called ami-efd83886, and you can either use it or bundle a new image yourself, provided that ami-efd83886 is the ancestor.

prompt> scp -i ~/.aws/id-rsa-kp-tm255-lab1 tcp-proxy.tar.gz root@ec2-75-101-184-233.compute-1.amazonaws.com:
prompt> ssh -i ~/.aws/id-rsa-kp-tm255-lab1 root@ec2-75-101-184-233.compute-1.amazonaws.com
# tar --no-same-owner -xzf tcp-proxy.tar.gz
# cd tcp-proxy
# make
g++ -Wall -g -O2 -I/usr/local/include/sfslite -L/usr/local/lib/sfslite -o tcp-proxy tcp-proxy.cc -lm -lasync -ldmalloc -lresolv
g++ -Wall -g -O2 -I/usr/local/include/sfslite -L/usr/local/lib/sfslite -o test-tcpproxy test-tcpproxy.cc -lm -lasync -ldmalloc -lresolv
It is very important you use the --no-same-owner flag for tar, since you are acting as root on the EC2 machine, and otherwise tar will change permissions (sadly propagating up the entire /root/tcp-proxy path) to the user ID of your local machine (the machine the scp command was initiated from), which does not necessarily exist on the EC2 instance (unless you fancy being root on your box). As a result, subsequent SSH connections will fail, since the /root home folder of user root has just been owned by a rogue user ID.

Make sure that you save your work before terminating your instance, either by placing it in a bucket, using a version control system like (CVS, SVN, Git, darcs, etc.), or simply fetching it back on your machine. To fetch it back on your machine, follow the steps:
# cd ~/tcp-proxy
# make dist
prompt> scp -i ~/.aws/id-rsa-kp-tm255-lab1 root@ec2-75-101-184-233.compute-1.amazonaws.com:tcp-proxy/tcp-proxy.tar.gz .

Note that the code is built with dmalloc support for debugging (just in case you need it). Alternatively, you can develop, test and debug your proxy on your own Linux / UNIX machine (e.g. BSD, Darwin), provided that you installed sfslite beforehand. Sfslite depends on the GNU Multiple Precision Arithmetic Library that is typically bundled with distributions --- e.g. on Ubuntu 8.04 you can install it by typing apt-get install libgmp3-dev (for MacOSX instructions, contact the TA). The caveat is that you will not be able to test by connecting to the restricted URL http://fireless.cs.cornell.edu/courses/2009sp/cs6464/restricted/.

That's it! You've now built tcp-proxy. To test it, type, for example:
# ./tcp-proxy www.yahoo.com 80 1234
Now, in another window, run:
# telnet localhost 1234
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.
# 
The message "Connected to localhost" says that your proxy accepted a TCP connection, but then immediately closed it, since the proxy is not fully implemented. Your must finish implementing the proxy. For your convenience you have also been provided with a basic list implementation, found in the list.h file --- this is in fact the list used throughout the Linux kernel so it may take a while to get used to it. However, you are free to use any basic C/C++ library (like STL for example), or you can design your own data structures.

Testing

You should test your proxy to make sure that it continues to forward data even when some connections aren't responding. Here's one test you should be able to pass.

First, run the proxy and point it at fireless.cs.cornell.edu's HTTP port:

# ./tcp-proxy fireless.cs.cornell.edu 80 1234
Now, in another window, use telnet to fetch /courses/2009sp/cs6464/big through the proxy:
# telnet localhost 1234
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /courses/2009sp/cs6464/big
Watch the data go by for a while, then interrupt the output by typing control-], after which telnet should stop and print telnet>. Now check that the proxy hasn't been hung because telnet isn't reading data; suspend your telnet by typing ``z RETURN'' and fetch something else:
telnet> z

Suspended
# telnet localhost 1234
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /courses/2009sp/cs6464/small
You got it!
Connection closed by foreign host.
If you see "You got it!," your program passes the test.

Now try to access the restricted page from your web browser with a URL like href=http://ec2-75-101-184-233.compute-1.amazonaws.com:1234/courses/2009sp/cs6464/restricted/. Again, make sure ec2-75-101-184-233.compute-1.amazonaws.com is your EC2 machine running your tcp-proxy, and that you have authorized access on port 1234 to it.

Next, lower the maximum number of allowed concurrent proxied connections to something like 2, and test by pointing your proxy to fireless.cs.cornell.edu, port 80, just like in the first test. Start by opening 3 telnet connections, but without issuing the HTTP GET. The third connection should not be accepted. Now issue GET /courses/2009sp/cs6464/small in one of your two connected telnet prompts, and once you received the responce from the server your third connection should be accepted --- HTTP web servers orderly terminate the connection to indicating that the end of file was reached.

Once your proxy passes some basic tests, you can test it with the automated program test-tcpproxy. You can find this program bundled inside the lab1 source tarball, in fact it is built along with the tcp-proxy, when you issue the make command. Assuming your proxy is in ./tcp-proxy, you can test it as follows:

# ./test-tcpproxy ./tcp-proxy
Single echo connection: passed
Two echo connections: passed
20 echo connections: passed
Bulk data, 20 connections: passed
Mix of blocked and normal: passed
One-way shutdown: passed
Early close: passed
Non-timeout of active client: passed
Timeout of lazy client: passed
# 
Your program should pass all phases of the tests, except for the last one (which you should interrupt by typing control-c). For extra credit, it should also pass the last test.


How/What to hand in

TCP proxy

You should submit two things:

First, use the script command to create a typescript file. When you run script, everything you type gets saved in a file called typescript. Press CTRL-D to finish the script. The typescript file should be included on CMS with the software distribution. For example:

# script
Script started, output file is typescript
# ./test-tcpproxy ./tcp-proxy
Single echo connection: passed
Two echo connections: passed
20 echo connections: passed
Bulk data, 20 connections: passed
Mix of blocked and normal: passed
One-way shutdown: passed
Early close: passed
Non-timeout of active client: passed
Timeout of lazy client: passed
# ^D Script done, output file is typescript
# 
Second, you should build the software distribution with the make dist command, as follows:
# make dist
rm -fr .DS_Store *.tar.gz *.ps *.pdf *.o *.dSYM *~ tcp-proxy test-tcpproxy
tar -czf /tmp/tcp-proxy.tar.gz ../sol --exclude=tcp-proxy.tar.gz --exclude=".svn" && mv /tmp/tcp-proxy.tar.gz . 
tar: Removing leading `../' from member names
# 
Make sure the typescript file is included in the tcp-proxy.tar.gz bundle created by make dist (by having it in the tcp-proxy directory ahead of time). To turn in your distribution, upload the tcp-proxy.tar.gz file on CMS.

If you have any problems about submission, please contact the TA.


Useful tips