Running Minecraft on Amazon EC2

20 August 2013

Before we're ready to run Minecraft in the cloud, we first need to chat about environment variables, init.d scripts, and automatic backups to permanent storage. Part 2 of a three-part series.

We concluded our Introduction to Minecloud having decided to break the project into two parts: 1) an on-demand, ephemeral Minecraft server; and, 2) a web application to start and stop the server.

Today, I'm going to discuss the first part—the server. (Code can be found here.)

Where to Host?

Since Amazon Web Services has been the leading pioneer in the cloud computing arena, they were the obvious choice for hosting the server. I'm sure there are plenty of other worthy vendors, but hey, if it's good enough for Netflix, Dropbox, etc., then it's good enough for me.

In order to run a server on Amazon's EC2 service, you actually launch a type of virtual machine that they call an Amazon Machine Image (AMI). You have a ton of flexibility on the type of virtual machine that you can run. If you want to run Linux, you can run any flavor you choose. If Windows is your thing, you can do that, too.

One strategy for EC2 users is to launch an AMI with a stock install of their preferred operating system, and then install and configure software after they launch it. Obviously, this adds time until a server is ready to use. However, if you're running the same type of server over and over, it makes more sense to use a customized AMI that is already pre-configured with your choice of software, so that it is ready to go as soon as it is launched. You can choose from AMIs built by others that are available in their AMI Marketplace, or you can create your own custom AMI.

Not surprisingly, a search for Minecraft in the Marketplace turned up nothing, so I had to create my own.

What Needs to Run?

What do I need to install to get a fully operational Minecraft server that can be launched by the Minecloud web application? I started with an Ubunutu 12.04 base image and added the following pieces:

Java. Rather than use open JDK, I decided to use the Oracle JDK, even though it is more of a pain to install. I couldn't find a definitive recommendation for sticking with the Oracle JDK (the Minecraft Wiki page is as close to definitive as I could find), but I've seen reports of better stability using it, and I take comfort in using the same version of Java that the Minecraft developers do.
Minecraft. (Hard to play without the Minecraft server...)
Minecraft Server Manager (MSM). This is an awesome management script for Minecraft. It began it's life as an init script to start the Minecraft server automatically during bootup, but then grew to be so much more, handling tasks like backups and Minecraft updates.
Minecloud Management Scripts. While MSM can back up Minecraft worlds, it didn't implement everything I needed to back up and restore the game data from an ephemeral server. Plus, I needed to manage who has permission to play on the server and track when players login and logout. For all these management tasks, I wrote several custom python and shell scripts.

Last requirement: All of the above had to be installed and configured to run automatically upon startup.

How to Build?

How do you create a custom AMI? The short answer is you take a snapshot of an already running EC2 instance, and that becomes your new, bootable AMI. That means that our AMI creation process consists of three main tasks: launch an Ubuntu 12.04 server, install and configure software, and then take a snapshot. We could do the installation and configuration by hand, but since this is a process that we'll need to do on a fairly regular basis (to keep up with both Ubuntu updates as well as Minecloud updates), it's best to automate it.

The tool I chose for automation is Puppet. There are plenty of other configuration management options (chef, salt, ansible, to name but a few), but the official Learning Puppet Tutorial for beginners is fantastic, so I went with that. To use Puppet, you write "manifests" in a custom domain-specific language that defines the state you want the server to attain. You can install packages, change configuration files, start and stop services, and so much more.

While the Puppet manifests I wrote for Minecloud may not be completely idiomatic (I'm still a Puppet beginner, after all!), I loved having a complete set of instructions for building a server to my exact specifications in a reproducible way. Super useful.

Of course, Puppet doesn't handle the launching of an EC2 instance, nor the snapshotting and termination of it, so to that end, I included a build-ami.py script that handles all the steps necessary to create a custom Minecloud AMI. It launches an Ubuntu 12.04 instance, uploads the Puppet manifests, applies the manifests, takes the snapshot, and finally shuts down the instance.

It's designed to be quite simple to use, but unfortunately, it requires Python, Boto, and Fabric to be installed (preferably in a virtualenv) on your local machine, which can be a bit daunting for a non-Python developer. A better option that should be ready in the near future is Mitchell Hashimoto's latest project, called Packer, which is designed to create AMIs (and images for other platforms), but does it with a single binary, which is much simpler to install than a complete Python virtual environment. Packer doesn't have a Puppet provisioner just yet, but it's coming soon, and once it does, I plan to switch from my build-ami.py script to a solution based on Packer.

The Lifecycle of a Minecloud Server

Now that we've gone through the process of building an AMI, what happens when you run it? Let's examine all the tasks that a Minecloud server performs from bootup to shutdown.

Bootup

Right after launch, the Minecloud server completes a number of tasks before it is ready to accept Minecraft players. (The names of the init scripts are in parentheses.)

Sync the game data files from S3. (msm-s3-sync)
Update the Minecraft server to the latest version. (msm-jar-update)
Update the list of players allowed to play in the Minecraft world. (msm-update-auth-lists)
Start Minecraft game. (msm)
Notify the web application that Minecraft is ready. (msm-update-instance-state)

Running

Once the bootup process completes and Minecraft is up and running, the following ongoing maintenance tasks are run as cron jobs. (Names of cron jobs in parentheses.)

Sync the game data files to S3. (msm-backup-working-files-to-s3)
Notify the web application if any players log in or log out. (msm-track-active-players)

Shutdown

Shutdown is the simplest of all.

Stop Minecraft game. (msm)
Perform a complete backup of all game data files to S3. (msm-pre-shutdown-backup.sh)

Implementation Notes

While that's a pretty good overview of how the Minecloud server works, it's possible that it's not quite detailed enough for some. Well, if that describes your thinking, you're in luck! After all, I've got a reputation for exhaustive blog posts to maintain--so, lets dive into some of the details of the implementation.

Setting Environment Variables...Always

When a Minecloud EC2 instance is running, it needs access to at least 3 pieces of configuration data: the AWS_ACCESS_KEY_ID, the AWS_SECRET_ACCESS_KEY, and the name of the S3 bucket. While it's possible to hard code these values into AMI during the AMI creation phase, that didn't seem like the most flexible solution. Instead, following the lead of both the Twelve Factor App methodology and Heroku's extensive use of configuration variables, I decided that all scripts on the Minecloud EC2 instance would read their configuration from the environment.

Once I made that decision, the next step was to figure out how to do it.

Some googling showed that there was a lot of disagreement on how to best set environment variables on a system-wide basis in Linux so that they were available for all users. Some recommended editing /etc/profile, while others suggested /etc/bash.bashrc. Red Hat-based distributions favored adding custom .sh scripts under /etc/profile.d. But, for Ubuntu systems, the approved solution appears to be to add them to an /etc/environment file.

This worked great...except for init.d scripts.

Many of the scripts that I run as part of the booting up process rely on the variables I set in /etc/environment, but unfortunately, I discovered that the init.d process runs with a very restricted environment and doesn't source that file. Only the LANG and TERM variables are present in the init environment.

For that reason, you'll see this little snippet in my init scripts that require access to /etc/environment variables:

# Include environment variables from /etc/environment to
# set AWS variables and MSM_S3_BUCKET.
if [ -f /etc/environment ]; then
    while read LINE
    do
        export $LINE
    done < /etc/environment
fi

I was anticipating having to do a similar workaround for cron jobs, since I came across many pleas for help in setting environment variables for cron, but in Ubuntu, it was a non-event, since "cron supports the pam_env module, and loads the environment specified by /etc/environment."

Don't Dilly-Dally During Shutdown

Using init scripts to handle all the boot up tasks keeps things simple. To start playing, you boot up a server and it automatically runs all the init.d scripts to set everything up and then notifies you when it's ready to play.

Originally, I thought that shutting down could be handled the same way. Since I download the Minecraft game data files from S3 by running an init.d script during boot up, I figured I could upload those files back to S3 from the same init.d script during shutdown. And, it turns out I could...as long as it didn't take too long.

Once I started testing the shutdown process with realistic game data, I kept experiencing truncated data and incomplete backups on S3. Which was weird, since I never saw that when I was running my tiny test world. After scratching my head about this for awhile, I posted a question on Stack Overflow, and AWS guru Eric Hammond helped set me straight that it was the size of the data I was backing up that was causing the problem, because I was prolonging the shutdown process beyond what Amazon allows.

Here's an excerpt from his helpful answer:

When you stop or terminate an EC2 instance, Amazon sends a soft shutdown request to the operating system to let it wrap up in a clean, safe manner. If the system does not indicate it is powering down within a short time (minutes) then Amazon effectively pulls the power plug forcing a hard shutdown.

Backups are Pretty Important

This whole idea of using temporary servers falls apart if you can't maintain the game world in a consistent state. That means that you need to back up the game data files when one server shuts down, and restore the game data files when the next server starts up. The go-to solution for highly reliable, permanent storage is Amazon's Simple Storage Service (S3), so that's where I decided to store all game data files and backups.

What to Back Up

Here's the file hierarchy for the Minecraft data files:

/opt/msm
└── default
    ├── banned-ips.txt
    ├── banned-players.txt
    ├── ops.txt
    ├── server.jar
    ├── server.log
    ├── server.properties
    ├── white-list.txt
    ├── world -> /opt/msm/servers/default/worldstorage/world
    └── worldstorage
        ├── readme.txt
        └── world
            ├── data/
            ├── DIM1
            │   └── region/
            ├── DIM-1
            │   └── region/
            ├── level.dat
            ├── level.dat_old
            ├── players/
            ├── region/

To preserve the exact state of the Minecraft server, we need to save everything under /opt/msm/servers/default, and conveniently enough, MSM has a command to zip up that directory:

$ msm default backup

Ideally, we would like to back up the directory frequently, say every hour, so that if the server should crash, then at most you would lose 1 hour of game play. The problem is, if we save the MSM backup .zip file every hour, then we're going to consume a lot of storage space. For instance, my backup .zip file is over 300 MB in size.

That suggests a hybrid backup strategy. First, save the directory as individual files to S3 once an hour, overwriting the last files from the previous hour. Since many of the files won't change, it should be more efficient to perform an rsync-like backup, rather than save the .zip file. We'll call these backed up files the "working files." Second, save the backup .zip file at the end of a playing session right before the server shuts down. These are called "archives," and are only used if there should ever be a problem with the working files and we need to roll back to a previous point in time.

How to Back Up

As I mentioned previousely, the Boto python library is fabulous for interacting with the AWS API. But, for reading and writing files to S3, it isn't necessary to work directly with the low-level functionality that boto provides. Instead, there are plenty of higher-level scripts that have already been written to provide this functionality, and I evaluated three different ones to see which suited my needs the best:

s3put This script is included in the boto package, and it provides a simple interface for writing files to S3. However, it doesn't provide the ability to download files from S3.
boto-rsync This script is built on the boto library also, and what's great about boto-rsync is that not only can this script read and write to S3, it also partially emulates the behavior of rsync. Why only partially? Because, like rsync, it will consider a source file and a destination file to differ if they are different sizes. However, unline rsync, it won't check to see if they have the same MD5 checksums, which means that if the source and destination files are different, but have the same file size, then boto-rsync won't recognize them as different.
s3cmd In many ways, this is the best option of all. It can read and write to S3. And, it can better perform rsync style backups, since it verifies MD5 checksums. Unfortunately, it isn't based on Boto, so it has it's own unique configuration.

Given its comprehensiveness, s3cmd certainly seemed like the best choice, but I went in another direction, When I began working on the Minecloud project, s3cmd hadn't been updated in a long time and I worried that the project was dying a slow death. More minor nits: I knew that I might be interested in using IAM roles for AWS permissions in the future, which Boto supported, but s3cmd didn't. And, I was annoyed by having to maintain two different configuration files for storing my AWS credentials.

For those reasons, I decided to use a combination of s3put (for writing to S3) and boto-rsync (for reading from S3) in my backup scripts. However, development has picked up again on s3cmd, and once the new 1.5 version is officially released on PyPi, I most likely will switch over to it.

How to Back Up in the Future

The design of my backup plan assumes that the single source of truth for Minecraft game data is the data stored on S3, and what exists on the instance filesystem is temporary and can be thrown away with impunity. However, I now realize that Elastic Block Storage (EBS) volumes also provide permanent storage, and they may form the basis for a better solution in the future.

An EBS volume is block level storage that can be attached to an EC2 instance. That means I can store the /opt/msm filesystem to an EBS volume and attach it to an EC2 instance when I'm running it, and then detach it when I shut down the Minecraft EC2 instance. What's great (and what I didn't quite grasp as I began working on this project) is that while it is detached, you only pay S3 storage charges for the volume (cheap!).

Thus, rather than copying working files to S3, I could just store them on the EBS volume, and only use S3 for the backed up archives. While it's true that the reliability of EBS volumes isn't as good as S3 (be sure to read their discussion of EBS Volume Durability) , it is still pretty darn great...and I always have the backups available on S3 in case an EBS volume should fail.

What would this solve? Two things:

Much faster launch and termination times. Copying a large Minecraft world (>500 MB) to S3 takes several minutes, but if we attach an EBS volume instead, we can get rid of that waiting time altogether.
Simpler syncing of game data to permanent storage. Copying to disk is simpler than copying to S3.

Finis

That covers the basics of how the Minecloud AMI works. Coming soon in Part 3: an overview of the Minecloud web application.