CS 6464: Advanced Distributed Storage Systems
Lab2 - S3FS (Filesystem on top of S3)

Introduction

In this lab you will implement a file system on top of Amazon's S3 storage backend. You will use FUSE and libs3 library:

----------------------     -------------
|                    |     |           |
|   App   s3fs--libs3|-----| Amazon S3 |
|     |    |         |  |  |           |
|--------------------|  |  -------------
|     |    |         |  |  -------------
|     Kernel         |  |  |           |
|  FUSE module       |  ---| Amazon S3 |
|                    |     |           |
----------------------     -------------

Your job is to implement the s3fs component---we have provided you with a skeleton implementation, as well as the put/get/remove operations over the lowlevel libs3 library interface. You will spend most of the time implementing the code missing from the s3fs.h and s3fs.c files; refer to the s3ops.h header file for the s3 operations that you may use. Currently, the s3fs contains a toy ``hello world'' filesystem implementation.

The assignment

The assignment is split into several parts, however you may always work ahead!

Part A (Due: Monday February 16th 11:59pm and Thursday, February 19th at 11:59pm)

In this lab you will have to choose the format for file and directory data and meta-data. Meta-data includes per-file information (e.g. file length) --- refer to the struct stat data structure (man 2 stat) for more details. This information typically corresponds to an i-node in an on-disk UNIX file system. FUSE requires a file system to store such generic information for every file and directory (ideally you would store the entire contents of the struct stat data structure, in practice, we do not care about the user ID of owner, device ID, group ID, or blocksize for filesystem I/O so you are not required to maintain those values). You will have to maintain a mapping on S3 between i-node number (of type fuse_ino_t) and meta-information (e.g. could be something as simple as a struct stat).

Data comes in two flavors: file contents and directory contents. File contents are regular plain streams of bytes. A directory's content (similar to the VFS common file model) is also a stream of bytes but they represent a list of (name, inode number) tuples. This allows you to handle operations like create / remove a file / directory, lookup for an entry in a directory, etc.

Your job is to provide a design document (.txt ASCII file format) that clearly explains how the data and meta-data are stored in yor S3 bucket(s) (you will be able to use at most two buckets, one for meta-data and one for data). Your document should specify under what keys you will store the data and meta-data, and what is the precise format of the directory contents. You will then implement the s3fs_format function, that resets your S3 bucket(s) to a clean, empty, mount-able file system.

You will also implement the s3fs_init, s3fs_getattr, s3fs_lookup, s3fs_readdir, s3fs_create/s3fs_mknod and s3fs_mkdir operations, located in s3fs.h/s3fs.c.

s3fs_getattr: returns the meta-data of a file / directory.

s3fs_create/s3fs_mknod, s3fs_mkdir: add an entry to the relevant directory list, and possibly create empty data mappings. For s3fs_mkdir you may have to decide what to do with the ``.'' and ``..'' entries.

s3fs_lookup, s3fs_readdir: lookup must search the directory list, while readdir must return each entry from the directory list.

Tips on implementing a FUSE file system:

For these labs you will be interfacing with FUSE via its "lowlevel" API. We have provided you with startup code in the main() method of s3fs_main.cc that handles the lowlevel setup. You will, however, have to add a method handler for each new operation you'd like to support; you can do this by assigning method pointers to the appropriate fields in the s3fs_ops_init function that can be found in the s3fs.h file. We have already done this for the toy filesystem example, but you will need to add / replace handlers as the lab assignment progresses.

You should study fuse_lowlevel.h for what these lowlevel API function definitions must be, and what functions are used in turn to reply back with information to the FUSE subsystem. Study the toy implementation to get a sense of how a full FUSE operation handler works, and how it communicates its results and errors back to FUSE. A lowlevel FUSE handler function is required to return a reply on all return code paths, failure to do so will result in your s3fs application hanging indefinitely. As a corolary, each function is required to issue replies of certain types (can be more than one type). Additionally, in the s3fs.h we point you to the relevant man page for each FUSE function.

You are free to choose any i-node number identifier you like for newly created files and directories, however, FUSE assumes that the i-node number for the root directory (i.e. /) is 1. Therefore you'll need to ensure that when S3FS mounts it is ready to export an empty directory stored under that i-node number. Moreover, each file and directory in the file system must have a unique i-node number. For example, you may keep a superblock holding a monotonically increasing next free i-node counter.

Sending back directory information for the s3fs_readdir operation is a bit tricky, so we've provided you with much of the necessary code in the dirbuf_add, reply_buf_limited, and s3fs_readdir methods (to be fair, the author of fuse wrote that code in his FUSE ``hello-world'' example we provided you with). All that's left for you to do is to get the corresponding directory listing from your s3 backend storage and add it to the b data structure using dirbuf_add.

Beware: unless you are using user ID of owner, you should set the mode of all directories to (S_IFDIR | 0777) and files to (S_IFREG | 0666).

What to turn in

There are two deadlines for this part (part A) of the lab. First, you will have to turn in the design document (via CMS) by Monday, February 16th 11:59pm. The implementation of the FUSE handlers (init, getattr, create/mknod, mkdir, lookup, and readdir) are due Thursday, February 19th 11:59pm. Check the How/What to hand in subsection for instructions.

Part B (Due: Thursday, February 26th at 11:59pm)

Your job is to implement s3fs_unlink, s3fs_rmdir, s3fs_setattr, s3fs_read, s3fs_write and s3fs_rename functions in s3fs.h/s3fs.c. As always, refer to the FUSE lowlevel header file for all necessary function specifications.

s3fs_unlink, s3fs_rmdir: removes the entry from the parent directory list and cleans up the associated data and meta-data.

s3fs_setattr: is issued by the OS to set one or more attributes. The to_set argument to your handler is a bitmask that informs the function which of the attributes should be set. There are in fact only three attributes you will have to handle, each corresponding to the following bitmask flags:FUSE_SET_ATTR_SIZE, FUSE_SET_ATTR_ATIME, and FUSE_SET_ATTR_MTIME (this Wikipedia page can give you a rough idea of when atime and ctime should be updated). Note that setting the size attribute of a file can correspond to truncating it completely to zero bytes, truncating it to a subset of its current length, or even padding bytes on to the file to make it bigger. Your system should handle all these cases correctly --- refer to the manpage of truncate(2) for details.

s3fs_read/s3fs_write: should be pretty straightforward. A non-obvious situation can arise if the client tries to write at a file offset past the current end of the file. Linux expects the file system to return '\0's for any reads of unwritten bytes in these ``gaps.'' For details, consult the manpage for lseek(2) and write(2). Also make sure that s3fs_write returns the amount of bytes written without accounting for the ``gaps'' created --- otherwise the FUSE library passes this incorrect value down to the kernel module, which or may not handle it properly (for example in stock 2.6.24 Linux the file system believes it encountered a kernel BUG if such a value is encountered).

s3fs_rename: should also be pretty straightforward, albeit more intricate due to all the special cases. Refer to the manpage of rename(2) for details.

What to turn in

You will have to turn in the implementation of the FUSE handlers (setattr, read, write and rename) by Thursday, February 26th 11:59pm. Check the How/What to hand in subsection for instructions.

Fetching and building the source

Start by uploading the skeletal S3FS build tree in your home directory on one of your EC2 instances (recall that prompt> denotes your own machine, while # denotes the EC2 instance). Note that you are expected to develop / test on the EC2 instances, if you choose another environment, you will do so at your own risk, moreover, you will suffer from relatively large latencies since the S3 put/get HTTPS requests would travel over the WAN (also libs3 timeouts have been observed to be quite common over the WAN).

Open the s3fs_main.c file and replace the current values of the inode_bucket and the data_bucket variables with apropriate values --- you will probably have to create those buckets ahead of time. Make sure that the S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY environment variables are also set ahead of time. For this assignment we have prepared an image that you can use out of the box. The image is called ami-efd83886, and you can either use it or bundle a new image yourself, provided that ami-efd83886 is the ancestor.

prompt> scp -i ~/.aws/id-rsa-kp-tm255-lab1 s3fs.tar.gz root@ec2-75-101-184-233.compute-1.amazonaws.com:
prompt> ssh -i ~/.aws/id-rsa-kp-tm255-lab1 root@ec2-75-101-184-233.compute-1.amazonaws.com
# tar --no-same-owner -xzf s3fs.tar.gz
# cd s3fs
# make
if [ `uname` = "Darwin" ]; then \
                g++ -Wall -g -O2  -I/usr/include/libxml2 -D_FILE_OFFSET_BITS=64 -I/usr/local/include/fuse -I/usr/include/fuse -D__FreeBSD__=10 -o s3fs s3fs_main.cc s3ops.c s3fs.c list.h s3fs.h s3ops.h /usr/lib/libs3.a -lfuse -lcurl -lssl -lcrypto -lz -L/usr/lib -lxml2 -lz -lpthread -licucore -lm -lpthread; \
        else \
                g++ -Wall -g -O2  -I/usr/include/libxml2 -D_FILE_OFFSET_BITS=64 -I/usr/local/include/fuse -I/usr/include/fuse -o s3fs s3fs_main.cc s3ops.c s3fs.c list.h s3fs.h s3ops.h -lfuse -lcurl -lssl -lcrypto -lz -L/usr/lib -lxml2 -lz -lpthread -licucore -lm -lpthread -ls3; \
        fi

It is very important you use the --no-same-owner flag for tar, since you are acting as root on the EC2 machine, and otherwise tar will change permissions (sadly propagating up the entire /root/s3fs path) to the user ID of your local machine (the machine the scp command was initiated from), however this user ID does not necessarily exist on the EC2 instance (unless you fancy being root on your box). As a result, subsequent SSH connections will fail, since the /root home folder of user root has just been owned by a rogue user ID.

Make sure that you save your work before terminating your instance, either by placing it in a bucket, using a version control system like (CVS, SVN, Git, darcs, etc.), or simply fetching it back on your machine. To fetch it back on your machine, follow the steps:

# cd ~/s3fs
# make dist
prompt> scp -i ~/.aws/id-rsa-kp-tm255-lab1 root@ec2-75-101-184-233.compute-1.amazonaws.com:s3fs/s3fs.tar.gz .

Now that you've built s3fs, you can test it by typing:

# mkdir /tmp/fuse
# ./s3fs /tmp/fuse -d

The -d flag is fo debugging purposes, it will show you which of the FUSE filesystem functions are being called, and the exit status (e.g. error conditions). To test this toy file system, open an additional terminal and type:

# ls -l /tmp/fuse/
total 1
-r--r--r--  1 root  wheel  13 Dec 31  1969 hello
# cat /tmp/fuse/hello 
Hello World!
#

If your application hangs and stubbornly refuses to exit, you can typically force it to terminate by unmounting the initial mount point:

# umount /tmp/fuse/

That's it. Now you have to implement your file system.

Testing

For each stage you can test your filesystem by writing an automated tool that issues file system operations like mkdir, mknod, write etc. pointing them to your FUSE mountpoint (/tmp/fuse in the example above). For example you can use python's os package to issue (indireclty) all operations you've implemented. You can also simply issue in the shell commands like echo "test write and setattr" > /tmp/fuse/file.

How/What to hand in

For each of the parts of the lab, you should submit the to-date distribution of your s3fs program on CMS. You should build the software distribution with the make dist command, as follows:

# make dist
rm -fr .DS_Store *.tar.gz *.ps *.pdf *.o *.dSYM *~ s3fs core
tar -czf /tmp/s3fs.tar.gz ../s3fs --exclude=s3fs.tar.gz --exclude=".svn" && mv /tmp/s3fs.tar.gz .
tar: Removing leading `../' from member names
#

If you have any problems about submission, please contact the TA.

Useful tips

FUSE: The FUSE file system web page is here, however pretty much the only lowlevel interface documentation (which you will use) is in this header file.

Development on Unix startup tips.

GDB: This is the debugger, use it! You can check the GDB manual for the full documentation; a GDB crash course can be found here. For a graphical front-end to GDB, see DDD.

Core files: When / if your program goes down in flames (or crashes), it will leave a core file behind (a.k.a. a core dump). However, most current operating systems have the limit set for core file size at zero, therefore you will have to first use ulimit command before you will see the core files being created (e.g. ulimit -c unlimited). You can examine the core files with gdb in order to learn what went wrong --- this is an invaluable tool. You can start by typing gdb program program.core, and then typing the gdb command bt or backtrace. GDB will in turn return a trace pointing to where your program has crashed.

CS 6464: Advanced Distributed Storage Systems Lab2 - S3FS (Filesystem on top of S3)

Introduction

The assignment

Part A (Due: Monday February 16th 11:59pm and Thursday, February 19th at 11:59pm)

Tips on implementing a FUSE file system:

What to turn in

Part B (Due: Thursday, February 26th at 11:59pm)

What to turn in

Fetching and building the source

Testing

How/What to hand in

Useful tips

CS 6464: Advanced Distributed Storage Systems
Lab2 - S3FS (Filesystem on top of S3)