Python Deployment Chronicles: Pinning virtualenv, setuptools and pip

In my previous post I talked about how to ensure that none of your Python project's dependencies are being downloaded from the Internet when you create a fresh virtualenv and install them. This is good for deployments: each deployment is completely reproducible since every package's source is installed from a specific version of the codebase that's versioned alongside the code you're deploying, and deployments don't require external network access to succeed.

There's one piece that's still missing, though: isolating and pinning the installation of the installation/bootstrapping tools themselves -- virtualenv, pip, and setuptools.

Virtualenv / Pip Anatomy

First, a quick run-through of the relationship between them all.   Pip can be made to invoke virtualenv, and invoking virtualenv will install both pip and setuptools, so it's a bit hard to keep it all straight.

When you create a new virtualenv, the virtualenv tool will pre-install pip and setuptools (or, if you prefer, a setuptools fork called distribute) in your new environment.  I'll come back to this fact later in the post.

Meanwhile, you can also create a new virtualenv with pip:

  $ pip -E /path/to/virtualenv install SomePackage

That command will install SomePackage into the virtualenv at /path/to/virtualenv, creating the virtualenv at /path/to/virtualenv if it doesn't already exist.  At CCNMTL, this has been the entry-point for our bootstrap script; basically we run

  $ rm -rf ./ve && pip -E ./ve install -r requirements.txt

on every deployment to create a completely new virtualenv with the needed set of packages installed.

Where do the pip and virtualenv commands come from, though?

Virtualenv comes in a convenient single-file version which you can run with the python command of your choice. Of course you can install it too (using easy_install, pip, your OS packaging system, etc...) but I find it's easier to keep track of if you just grab the file and invoke it with python:

  $ wget https://github.com/pypa/virtualenv/raw/master/virtualenv.py
  $ python virtualenv.py --help

One advantage here is that you can invoke this file with any version of Python, rather than needing to install a separate version for each Python on your system.  Another is that you can check that file into your project's code repository, isolate it from the rest of the system, and version it along with your code.  In other words, all those advantages of per-project containment.

The same used to be true of pip -- it came in a single file, pip.py, which you could just grab a copy of, check in to your source repository, and run in your bootstrap script.  So this is how we've been doing our deployments at CCNMTL -- each project has its own copy of that pip.py file, and uses it to create a virtualenv and install our project's dependencies.

Around version 0.7 pip stopped being a single file, and became a package instead.  (If you're really interested, that refactoring was discussed here and implemented starting here.)  So, now, you really have to install it somewhere.  Of course, as I mentioned above, you'll get a version of pip conveniently installed for you in each new virtualenv you create, so usually you don't even notice this fact.

At CCNMTL we've avoided this problem so far: all of our projects are still using the old single-file version of pip.  Since it's contained and versioned with each project, it's easy not to upgrade, and so far we haven't run into any show-stopping bugs or missing features that would force us to upgrade.

There are a few messy details though.  Unlike virtualenv, pip isn't entirely self-contained -- it relies on a few other non-stdlib Python modules being importable when it runs, including virtualenv, setuptools and pkg_resources (which comes with Setuptools).  At CCNMTL, we've handled that in various ways -- we check in a virtualenv.py file with the source next to the pip.py file, and we usually have setuptools and pkg_resources installed system-wide, even though we rarely use these system-wide installations.

But, because it's no longer distributed as a single file, and because it requires precisely the modules that Virtualenv provides for you in a fresh environment, it makes more sense to just start deployments from a local, single-file virtualenv.py, rather than from pip:

  $ rm -rf ./ve && python virtualenv.py ./ve
  $ ./ve/bin/pip install -r requirements.txt 

This way you get everything you need, isolated in your virtualenv, without even needing setuptools installed globally.  (Indeed, since I started writing this post, this recommendation was added to the pip docs.)

Pinning setuptools and pip

So, you've checked in a copy of virtualenv.py in your source distribution, and your deployments start by invoking it.  So far, so good -- virtualenv.py is now effectively pinned to a known-good version (whichever version you downloaded it at) and, since you've just checked in the file with your project's source, it doesn't require Internet access and is properly versioned alongside your project and all its other dependencies.

But, at this point, you may be wondering how Virtualenv gets setuptools and pip installed if you aren't relying on any system-level packages.

The answer?  It downloads them from the Internet.

So now we're sort of back where we started -- everything about the bootstrapping process is completely version-pinned, isolated from network dependencies, and self-contained, except for setuptools and pip!  They'll still be fetched from PyPI (better hope PyPI's up, and you have network access, during your deployment) and they'll end up installed at their latest released versions (better hope your deployment wasn't relying on some now-unsupported edge-case behavior).

Luckily, virtualenv has a somewhat hidden feature to get around this: before going to the network to install setuptools and pip, it will look for local distributions of each of them in a couple of places, including a directory named virtualenv_support in the same parent directory as your virtualenv.py file.  

This gives us an easy way to pin-and-localize setuptools and pip as well.  Grab an appropriate Setuptools .egg (yes, it has to be an .egg, not a tarball) and a pip tarball from their PyPI pages, drop them in a new virtualenv_support directory, and check that in to your project's source repository next to the virtualenv.py file.  If for some reason you need an older version of Setuptools or pip, no problem -- just fetch the version you need instead of the latest release.

Now, finally, your entire project deployment, with all of its dependencies, is versioned alongside your code, pinned to specific versions, and completely isolated from external network access -- including the dependencies of the deployment bootstrapper itself.

Here's an example of what this might look like in a project.  The virtualenv_support directory contains local copies of setuptools and pip; a copy of virtualenv.py is located in the same directory as virtualenv_support; and a simple bootstrap script creates a fresh virtualenv (implicitly using those local versions of setuptools and pip) and then uses the new virtualenv's pip and easy_install scripts to install the project's requirements, also from local source distributions.