Serving Web Pages from AFS

Introduction

It's often convenient to serve web pages and web content out of AFS rather than storing it all on the same system as the web server. By putting the pages in AFS, they can be easily updated by anyone using an AFS client, they are automatically backed up, and they gain the additional security benefits of AFS. All of the central campus web servers are configured this way.

Putting web pages in AFS poses a few challenges, however, since AFS is an authenticated file system and the web server may have to be able to authenticate to Kerberos in order to read (and therefore serve) the web pages and content. This page outlines the problems and several possible solutions.

Of course, the easiest case is where all of your web content is world-visible and doesn't require any protection. In this case, just make sure that system:anyuser can read all of the directories you're making available via the web and configure your web server to serve content out of an AFS path just like you'd configure it to serve content out of any other path. But the more common case is when the content should be protected so that only the web server and the maintainers can read it. If you're in that situation, read on.

AFS PAGs

In order to understand how to authenticate your web server to AFS, you have to understand how AFS does authentication. The internal details can be ignored when you're using AFS as a normal user, but are very important for understanding AFS authentication in long-running daemon procesess like web servers.

AFS authentication tokens (the credential that is used to tell the AFS server who one is) are kept in the kernel and are associated with two things: a Unix user UID, and a PAG. A PAG is basically an additional high-numbered Unix group that the user is given temporarily. A PAG can be thought of as something that is given to a set of processes and that is inherited (so if a process inside a PAG starts a new process, that process is also inside the same PAG). Normally, PAGs are created automatically by login whenever a user logs onto the system. A new PAG can also be created on demand by running the process that should be placed in a new PAG inside /usr/bin/pagsh (possibly /usr/afsws/bin/pagsh on Solaris), such as by replacing /bin/sh as the first line of a shell script with /usr/bin/pagsh instead. A process can only belong to one PAG at a time.

Any tokens obtained within a particular PAG are available to all processes contained in that PAG and are only usable by processes inside that PAG. This is why, when you log on multiple times to a system, each one of your logins will have a separate AFS token and reauthenticating in one login won't affect the other logins. Even though they're all under the same Unix user ID, each login is in a separate PAG.

Processes can also exist outside of any PAG. This is true of any processes started by the system on boot, such as daemons started by system init scripts. Any processes outside of PAGs cannot use a token associated with a PAG, and if they obtain AFS tokens, those tokens will only be available to other processes that also are not in a PAG and that are running as the same Unix user ID.

To summarize, AFS tokens are associated with either a PAG or with the lack of a PAG and a particular Unix user ID. Processes inside a PAG cannot share tokens with processes outside of PAGs, and vice versa. Processes started by the system at boot will normally be running outside of any PAG, and processes started by users will always be inside a PAG since a new PAG is created on login.

Authing daemons to AFS

First, any server that needs authenticated access to AFS will need its own service principal, essentially a Kerberos identity and password for a machine or service rather than for a user. Generally, this Kerberos identity is of the form service/service-name (examples include service/coursework, service/stanfordwho, or service/netdb). For more information, see An Introduction to Keytabs. You will need to request a service principal for your application following the instructions on that page if it needs to authenticate to AFS.

Once a keytab for that service principal has been downloaded, the basic command to obtain AFS credentials is:

k5start -t -U -f /path/to/keytab -l 600 -K 30

This obtains Kerberos credentials and AFS tokens for whatever service principal has its key stored in /path/to/keytab. k5start will then continue running in the background, waking up every so often to check the credentials and obtaining new ones when necessary.

k5start can be obtained from its distribution page.

Now, there are two basic strategies for authenticating a web server (or any other long-running daemon) to AFS, one of which runs the web server inside a PAG, and the other of which runs it outside of a PAG.

Run the above k5start command out of an init script or some similar process that runs at boot, and also only run your web server through similar means. This puts k5start and the web server both outside of any PAG, so the web server can use the PAG-free AFS token that k5start obtains.

Advantage: k5start and the web server process are independent. You can run k5start directly out of inittab so that it will be restarted if it dies for any reason, or run it out of /service if you're using svscan from daemontools.

Disadvantage: If anyone ever logs on to the machine and starts the web server by hand, the web server will lose its credentials. This is because the logged-in user will be inside a PAG, so when they run the web server, the web server will be inside a PAG, and won't be able to get at k5start's token. Instead, if you need to restart the web server, you need to do it via an at job or some other mechanism that causes the process to be started by some other system process (like atd) that is already running outside of a PAG.
Modify your web server init script to use /usr/bin/pagsh instead of /bin/sh as the shell and then run k5start inside that init script right before you start the web server. You will want to set the environment variable KRB5CCNAME to the path to a ticket cache specific for that application so that k5start doesn't overwrite root's ticket cache. This puts the web server and k5start inside a PAG (that's the function of pagsh; it creates a PAG and then passes the rest of the script to the regular shell), and as long as the web server is always started via that init script, they'll always be in the same PAG. To the stop section of the init script, add a command to kill the running kstart.

Advantage: You don't have to fiddle with weird ways of starting the web server and can just use the init script as normal. Both processes are running inside a PAG, which is a superior configuration from an AFS security standpoint (although not by so much as to have it matter all that greatly). In addition, if a person logs in and starts the server by hand, the web server won't inherit their tokens and actually be running as that (possibly privileged) user, since the init script creates a new PAG.

Disadvantage: You can't run k5start from inittab or svscan, so it won't be monitored and restarted if it dies. So if k5start dies for any reason (due to some temporary weirdness in Kerberos or network problems or some other such thing), it won't get restarted and the web server authentication will eventually expire.

Either of these methods will work, and both methods are currently in use for different services. We recommend the second solution slightly over the first, but it may also be worthwhile to have some sort of monitoring in place to make sure that k5start hasn't died. With the first solution, everyone maintaining that server needs to have a pretty strong understanding of PAGs and tokens and how AFS authentication works, or the web server will end up running with the tokens of some random user, apparently working (with more privileges than it should have), and then mysteriously failing a day later when those tokens expire.