Topic: Process

At CCNMTL we focus on pedagogical innovation, but we continue to work on projects that involve delivering static educational materials in traditional sequential formats. We work hard to carve out places for study in a world of instruction, but there is plenty of important knowledge that people want to acquire, and training people on skills continues to be an important component of education and often a precondition of concept formation.

In many of our projects, we've explored the boundaries of what we call - "Serial Directed Learning Modules". The key properties of these projects include:

  • Nested, hierarchical, rich content with idiosyncratic navigation and access rules
  • Rich interactive activities (quizzes, drag/drop, planning, mapping)
  • Detailed reporting on the learner's performance and completion

In our partnership with the Columbia University Medical Center and our strategic Triangle Initiative we've worked on several multimedia behavioral interventions that conform to this delivery pattern. We've worked on direct interventions relating to HIV couples counseling, childhood diabetes and cavity prevention, treatment adherence, and we've developed directing learning modules for teaching practitioners about tobacco cessation, child abuse, and more.

While similar in the abstract, these projects vary in their devilish details. Some of these environments are mediated by a service provider, such as a social worker and their patient, while others are self-directed. Some require multiple modes with additional notes available only to the facilitator. A few lessons are completed in a single sitting, while others must preserve state and pick up where the learner left off.

We try to balance the effort of creating unique works of art with churning out boilerplate, cookie-cutter sites. We've explored the use of general purpose content management solutions (CMS) for these projects and are regularly stymied by the mismatch between these styles of interaction and the sweet spots of the CMS platforms we know well. CMS platforms are great for creating collections of random-access content, and organizing and relating it in a variety of ways. The business rules around the directed learning projects often left us wrestling with CMS environments, wishing we had developed them using a lightweight MVC framework, without as much overhead to introduce the customize workflows these projects demand.

After building of a few of these sites à la carte, we began to generalize our approach and developed the PageTree hierarchical menu-creation system for Django. PageTree evolved into an open-source lightweight, domain-specific content management system, and we introduced a modular architecture for embedding and assembling PageBlocks which introduces elements like text, media, or custom Javascript activities within pages. The source code for PageTree and a basic set of PageBlocks are available on our 'ccnmtl' github account. We have also released the code and content powering the childhood diabetes intervention - and it is available here.

As the demand for these sites has grown, we've recently created a system for "farming" these PageTree sites -- aptly named "Forest" -- that allows our project managers to very quickly set up their own PageTree sites (called "Stands") in order to get a skeletal site up and running without the bottleneck of overhead of developer intervention. You can see a self-documenting demo of Forest here.

This approach allows us to collect content as early as possible. The features can be developed around the content, instead of vice-versa. If the site requires custom functionality that goes beyond the generic features of the Forest farm, we can spin off an independent Django site from the Forest farm, and begin development at the onset with the site's content already in place.

This system helped us achieve a nice balance between customization and efficiency, and we are pleased with the flexibility this approach has enabled for this class of projects. We're in the process of conceptualizing a roadmap for PageTree sites, and have been imagining a collaborative authoring platform that supports versioning, SCORM authoring/publishing platform, BasicLTI compliance, and more.

For years, I’ve watched our video team do amazing work shooting, editing, and encoding video for the web. I think most production companies would be shocked at how much high quality work our team produces with so few staff, a tight budget, and tighter time constraints.

When I look closely at how they do what they do, I’m impressed and just a little frightened at how many manual steps are involved in getting a video online. Manual steps that take time, attention to detail, expertise, and are ripe for mistakes.

I try to automate everything I touch. I can’t help it. It’s the curse of being a programmer.

At CCNMTL most of our new Python projects are written in Django, but we still support a number of older projects that were written with TurboGears 1.0.4. They've continued to be stable, and we don't do a ton of new development on them, so it hasn't been worthwhile to upgrade them to newer versions of TurboGears.

But we do occasionally make changes to their code, and recently we've begun migrating them to newer servers.  So I recently spent some time updating their deployment processes to CCNMTL's current best practices:

  • Installation with pip instead of easy_install
  • Fully pinned local source distributions versioned alongside the code
  • No Internet access required anywhere in the deployment
  • Containment with virtualenv
I ended up with a package that you can use to create an isolated TurboGears 1.0.4 environment to run legacy projects in, or (if for some reason you want to) to create new TurboGears 1.0.4 projects.  You can get it on Github here: https://github.com/ccnmtl/turbogears_pip_bootstrapper

In this post I'll go into detail about what it does, and the hurdles I ran into along the way.

Earlier this week, I wrote about how to make virtualenv install pip and setuptools from local source distributions, instead of fetching unpinned copies of them from the Internet, which it does (somewhat silently) by default. The approach relied on a somewhat buried feature of virtualenv: looking for appropriate distributions in a virtualenv_support directory before downloading them.

In a future release of virtualenv, this will be easier, and also more apparent.  I submitted patches for two new features which were accepted by virtualenv's maintainers:

These new features are documented in the source here.  If you want to start using them now, you can fetch a copy of virtualenv.py from the "develop" branch: https://github.com/pypa/virtualenv/raw/develop/virtualenv.py

In my previous post I talked about how to ensure that none of your Python project's dependencies are being downloaded from the Internet when you create a fresh virtualenv and install them. This is good for deployments: each deployment is completely reproducible since every package's source is installed from a specific version of the codebase that's versioned alongside the code you're deploying, and deployments don't require external network access to succeed.

There's one piece that's still missing, though: isolating and pinning the installation of the installation/bootstrapping tools themselves -- virtualenv, pip, and setuptools.

Anders has written several times about our deployment strategy for Django apps at CCNMTL. Aside from containment of each project with virtualenv, we also try to make sure that deployments never depend on anything external, and can be done without access to the wider Internet. We do this by an aggressive form of version pinning: in each project's repository, we check in source tarballs of all the project's dependencies, including Django itself. We then have a pip requirements file that points to each of these local files in order. (Here's an example, and the bootstrap script that uses it.)

There are two benefits to this approach. First, it removes our deployments' dependencies on external web services, like PyPI, being online. Second, it ensures that we know exactly what versions we're using of all the Python code in a project's deployment. That makes deployments trivially repeatable, and gives us the ability to roll back a deployment to any earlier version -- so if a new deployment doesn't work properly for some reason, we can re-deploy the last tagged deployment and know that (barring system-level changes) it'll work exactly as expected.

The other week, we made a new deployment to one of our Django projects, and the site stopped working. It turned out that the wrong version of Django was installed somehow: the project was built on Django 1.0, but this broken deployment ended up with Django 1.2 instead. And, oddly, rolling back to the previous deployment didn't fix the problem.

I see programmers as inherently helpful people. Given a 57-step flowchart describing the steps some poor soul has to execute manually, most programmers get a little gleam in their eye and set about providing a streamlined solution. Programmers truly love removing those inefficiencies. Meanwhile, the customer stops wrestling with a frustrating system and gets on with his job.

I recently had the opportunity to untangle a complicated little process knot. The technical details are applicable solely to the Columbia community, but I think the story is a reminder that a core engineering duty is to tackle real world inefficiencies.

One of the primary tenets of agile development is test first, test often. After working in a small XP shop doing mobile development, I came to believe strongly that quality code hinges on a test-driven approach.

Coders, impatient with paper specs and endless product meetings, often rush to their keyboards and push out half-baked, poorly implemented solutions that don't meet anyone's needs. Writing tests -- especially in a test-first approach -- provides time for thoughtful inquiry into an application's overall design and specific functionality. The coder can express herself in her own comfortable environment and language. The resulting tests become permanent artifacts, able to verify functionality as the application is enhanced and refactored.

And, in less altruistic, more self-serving terms: good tests mean good code, and good code makes the coder look good. Why wouldn't you want to write tests?

Still, I was a little apprehensive when asked to setup a test infrastructure for the Mondrian JavaScript components. (Mondrian is our snazzy new web-based, multimedia, annotation environment). I've tackled many server-side testing tasks, but have managed to circumvent the swampy land of JavaScript. JavaScript generally does not lend itself to testing. Most JavaScript code I've seen is poorly organized, fragmentary and tightly-bound to the browser. I've often lamented the lack of good JavaScript testing tools, but also was loathe to tackle the seemingly messy, difficult task.

Everything is speeding up these days, even the authoring of books. Some information society researchers we know (including some of our friends from Eyebeam, Creative Commons and Shift Space) locked themselves up for a week in Berlin, and came out the other end with a print-ready book on the future of collaboration - Collaborative Futures.

Even though I didn't travel to Berlin, the authorship of the book was radically distributed and some of my writing made it into the final cut. A portion of essay I wrote last fall for a sociology seminar on the future on a (brief) history of version control systems and the significance of distributed version control systems made the cut.

The book will be released under a creative commons license, but they are also doing a print run of hard copies which will be available starting at the launch party, March 4th. Pre-order hard copy here (digital copy is available here).