In this lab you will implement a file system on top of Amazon's S3 storage backend. You will use FUSE and libs3 library:
---------------------- ------------- | | | | | App s3fs--libs3|-----| Amazon S3 | | | | | | | | |--------------------| | ------------- | | | | | ------------- | Kernel | | | | | FUSE module | ---| Amazon S3 | | | | | ---------------------- -------------
Your job is to implement the s3fs
component---we have provided
you with a skeleton implementation, as well as the put/get/remove
operations
over the lowlevel libs3
library interface. You will spend most of the time implementing the code missing from the s3fs.h
and s3fs.c
files; refer to the s3ops.h
header file for the s3 operations that you may use. Currently, the s3fs contains
a toy ``hello world'' filesystem implementation.
The assignment is split into several parts, however you may always work ahead!
In this lab you will have to choose the format for file and directory data and
meta-data. Meta-data includes per-file information (e.g. file length) --- refer
to the struct stat
data structure (man 2 stat) for more details. This information typically corresponds to an i-node
in an on-disk UNIX file system. FUSE requires a file system to store such generic
information for every file and directory (ideally you would store the entire
contents of the struct stat
data structure, in practice, we do not
care about the user ID of owner, device ID, group ID, or blocksize for filesystem
I/O so you are not required to maintain those values). You will have to maintain
a mapping on S3 between i-node number (of type fuse_ino_t
) and meta-information (e.g. could be something as simple as a struct stat
).
Data comes in two flavors: file contents and directory contents. File contents are regular plain streams of bytes. A directory's content (similar to the VFS common file model) is also a stream of bytes but they represent a list of (name, inode number) tuples. This allows you to handle operations like create / remove a file / directory, lookup for an entry in a directory, etc.
Your job is to provide a design document (.txt ASCII file format) that clearly
explains how the data and meta-data are stored in yor S3 bucket(s) (you will be able to use at most two buckets, one for meta-data and one for data). Your document should specify under what keys you will store the data and meta-data, and what is the precise format of the directory contents. You will then implement
the s3fs_format
function, that resets your S3 bucket(s) to a clean,
empty, mount-able file system.
You will also implement the s3fs_init, s3fs_getattr, s3fs_lookup, s3fs_readdir,
s3fs_create/s3fs_mknod
and s3fs_mkdir
operations, located in s3fs.h/s3fs.c
.
s3fs_getattr:
returns the meta-data of a file / directory.
s3fs_create/s3fs_mknod, s3fs_mkdir:
add an entry to the
relevant directory list, and possibly create empty data mappings. For
s3fs_mkdir
you may have to decide what to do with the ``.'' and ``..''
entries.
s3fs_lookup, s3fs_readdir:
lookup must search the directory
list, while readdir must return each entry from the directory list.
For these labs you will be interfacing with FUSE via its "lowlevel" API.
We have provided you with startup code in the main()
method of
s3fs_main.cc
that handles the lowlevel setup. You will,
however, have to add a method handler for each new operation you'd like to
support; you can do this by assigning method pointers to the appropriate
fields in the s3fs_ops_init
function that can be found in the
s3fs.h
file. We have already done this for the toy filesystem example,
but you will need to add / replace handlers as the lab assignment progresses.
You should study
fuse_lowlevel.h for what these lowlevel API
function definitions must be, and what functions are used in turn to reply back
with information to the FUSE subsystem. Study the toy implementation to get a sense
of how a full FUSE operation handler works, and how it communicates its results
and errors back to FUSE. A lowlevel FUSE handler function is required to return
a reply on all return code paths, failure to do so will result in your
s3fs application hanging indefinitely. As a corolary, each function is required
to issue replies of certain types (can be more than one type). Additionally, in the s3fs.h
we
point you to the relevant man page for
each FUSE function.
You are free to choose any i-node number identifier you like for newly created files and directories, however, FUSE assumes that the i-node number for the root directory (i.e. /) is 1. Therefore you'll need to ensure that when S3FS mounts it is ready to export an empty directory stored under that i-node number. Moreover, each file and directory in the file system must have a unique i-node number. For example, you may keep a superblock holding a monotonically increasing next free i-node counter.
Sending back directory information for the s3fs_readdir
operation
is a bit tricky, so we've provided you with much of the necessary code in the
dirbuf_add, reply_buf_limited,
and s3fs_readdir
methods
(to be fair, the author of fuse wrote that code in his FUSE ``hello-world'' example we provided you with).
All that's left for you to do is to get the corresponding directory listing from
your s3 backend storage and add it to the b
data structure using
dirbuf_add.
Beware: unless you are using user ID of owner, you should set the mode of all directories to (S_IFDIR | 0777) and files to (S_IFREG | 0666).
There are two deadlines for this part (part A) of the lab. First, you will have to turn in the design document (via CMS) by Monday, February 16th 11:59pm. The implementation of the FUSE handlers (init, getattr, create/mknod, mkdir, lookup, and readdir) are due Thursday, February 19th 11:59pm. Check the How/What to hand in subsection for instructions.
Your job is to implement s3fs_unlink, s3fs_rmdir, s3fs_setattr, s3fs_read, s3fs_write
and
s3fs_rename
functions in s3fs.h/s3fs.c
. As always,
refer to the FUSE lowlevel header file for all
necessary function specifications.
s3fs_unlink, s3fs_rmdir:
removes the entry from the parent
directory list and cleans up the associated data and meta-data.
s3fs_setattr:
is issued by the OS to set one or more attributes. The
to_set
argument to your handler is a bitmask that informs the function
which of the attributes should be set. There are in fact only three attributes
you will have to handle, each corresponding to the following bitmask flags:
FUSE_SET_ATTR_SIZE, FUSE_SET_ATTR_ATIME,
and FUSE_SET_ATTR_MTIME
(this Wikipedia page can give you a rough idea of when atime
and ctime
should be updated).
Note that setting the size attribute of a file can correspond to truncating it
completely to zero bytes, truncating it to a subset of its current length, or
even padding bytes on to the file to make it bigger. Your system should
handle all these cases correctly --- refer to the manpage of truncate(2)
for details.
s3fs_read/s3fs_write:
should be pretty straightforward. A non-obvious situation
can arise if the client tries to write at a file offset past the current end of the
file. Linux expects the file system to return '\0's for any reads of unwritten bytes
in these ``gaps.'' For details, consult the manpage for lseek(2)
and write(2).
Also make sure that s3fs_write
returns the
amount of bytes written without accounting for the ``gaps'' created --- otherwise the FUSE
library passes this incorrect value down to the kernel module, which or may not
handle it properly (for example in stock 2.6.24 Linux the file system believes it
encountered a kernel BUG if such a value is encountered).
s3fs_rename:
should also be pretty straightforward, albeit
more intricate due to all the special cases. Refer to the manpage of
rename(2)
for details.
You will have to turn in the implementation of the FUSE handlers (setattr, read, write and rename) by Thursday, February 26th 11:59pm. Check the How/What to hand in subsection for instructions.
Start by uploading the skeletal S3FS
build tree in your home directory on one of your EC2 instances (recall that
prompt>
denotes your own machine, while #
denotes
the EC2 instance). Note that you are expected to develop / test on the EC2 instances, if you
choose another environment, you will do so at your own risk, moreover, you will
suffer from relatively large latencies since the S3 put/get HTTPS requests would
travel over the WAN (also libs3 timeouts have been observed to be quite common over the WAN).
Open the s3fs_main.c
file and replace the current
values of the inode_bucket
and the data_bucket
variables
with apropriate values --- you will probably have to create those buckets ahead of
time. Make sure that the S3_ACCESS_KEY_ID
and S3_SECRET_ACCESS_KEY
environment variables are also set ahead of time. For this assignment we have prepared an image that you can
use out of the box. The image is called ami-efd83886, and you can either
use it or
bundle a new image yourself, provided that ami-efd83886 is the ancestor.
prompt> scp -i ~/.aws/id-rsa-kp-tm255-lab1 s3fs.tar.gz root@ec2-75-101-184-233.compute-1.amazonaws.com: prompt> ssh -i ~/.aws/id-rsa-kp-tm255-lab1 root@ec2-75-101-184-233.compute-1.amazonaws.com # tar --no-same-owner -xzf s3fs.tar.gz # cd s3fs # make if [ `uname` = "Darwin" ]; then \ g++ -Wall -g -O2 -I/usr/include/libxml2 -D_FILE_OFFSET_BITS=64 -I/usr/local/include/fuse -I/usr/include/fuse -D__FreeBSD__=10 -o s3fs s3fs_main.cc s3ops.c s3fs.c list.h s3fs.h s3ops.h /usr/lib/libs3.a -lfuse -lcurl -lssl -lcrypto -lz -L/usr/lib -lxml2 -lz -lpthread -licucore -lm -lpthread; \ else \ g++ -Wall -g -O2 -I/usr/include/libxml2 -D_FILE_OFFSET_BITS=64 -I/usr/local/include/fuse -I/usr/include/fuse -o s3fs s3fs_main.cc s3ops.c s3fs.c list.h s3fs.h s3ops.h -lfuse -lcurl -lssl -lcrypto -lz -L/usr/lib -lxml2 -lz -lpthread -licucore -lm -lpthread -ls3; \ fiIt is very important you use the
--no-same-owner
flag for tar, since you are acting as root
on the EC2 machine, and
otherwise tar will change permissions (sadly propagating up the entire /root/s3fs path) to the user ID of your local machine
(the machine the scp command was initiated from), however this user ID does not necessarily exist on the EC2 instance (unless you fancy being root
on your box).
As a result, subsequent SSH connections will fail, since the /root
home folder of user root
has just been owned by a rogue user ID. # cd ~/s3fs # make dist prompt> scp -i ~/.aws/id-rsa-kp-tm255-lab1 root@ec2-75-101-184-233.compute-1.amazonaws.com:s3fs/s3fs.tar.gz .
s3fs
, you can test it by typing:
# mkdir /tmp/fuse # ./s3fs /tmp/fuse -dThe
-d
flag is fo debugging purposes, it will show you which of the
FUSE filesystem functions are being called, and the exit status (e.g. error conditions).
To test this toy file system, open an additional terminal and type:
# ls -l /tmp/fuse/ total 1 -r--r--r-- 1 root wheel 13 Dec 31 1969 hello # cat /tmp/fuse/hello Hello World! #If your application hangs and stubbornly refuses to exit, you can typically force it to terminate by unmounting the initial mount point:
# umount /tmp/fuse/
That's it. Now you have to implement your file system.
For each stage you can test your filesystem by writing an automated tool that issues
file system operations like mkdir,
mknod,
write
etc. pointing them to your FUSE
mountpoint (/tmp/fuse
in the example above). For example you can use
python's os package to issue
(indireclty) all operations you've implemented. You can also simply issue in the shell commands
like echo "test write and setattr" > /tmp/fuse/file
.
# make dist
rm -fr .DS_Store *.tar.gz *.ps *.pdf *.o *.dSYM *~ s3fs core
tar -czf /tmp/s3fs.tar.gz ../s3fs --exclude=s3fs.tar.gz --exclude=".svn" && mv /tmp/s3fs.tar.gz .
tar: Removing leading `../' from member names
#
If you have any problems about submission, please contact the TA.
ulimit
command before you will see the core
files being created (e.g. ulimit -c unlimited
). You can examine
the core files with gdb
in order to learn what went wrong --- this
is an invaluable tool. You can start by typing
gdb program program.core
, and then typing the gdb command
bt
or backtrace
. GDB will in turn return a trace pointing
to where your program has crashed.