Annotating Conversations

For thousands of years critical and scholarly discourse around text has revolved around citation and reference. What might this kind of discourse look like around multimedia - html text, images, audio, and video?

This question is a central theme in our technical work here at CCNTML, and a variety of our projects have taken a pass at this question from one angle or another. My colleagues have also taught me the importance of designing these kinds of features in ways that encourage students to critically engage with the source materials they are studying. How can we facilitate the marshaling of multimedia sources as evidence to support an argument or hypothesis?

Over the years I have become convinced that the 'annotation' abstraction represents the heart of participatory media and social networking systems. Central to these systems is the symmetry of the relationships. It may be as important to be able to look up all the annotations by a particular user, as it is to look up all the annotations on a particular object, as it is to look up all the annotations which have something in common and tie them back to the contributor and the original object.

Technically speaking, fine-grained, user created content annotations are an abstraction intended to capture engagements which involve attaching information to content on a per-user, per-object basis. Many metatdata schemes/vocabularies are per-object (e.g. categories in Wikipedia), while folksonomic systems like flickr and del.icio.us annotate content (images and links, respectively) on a per-user, per-object basis.

A robust user contributed content annotation framework would provide the core services to support a number of increasingly critical features which community driven sites rely upon. Over the years, we have come across various solutions to these kinds of problems, as well as developed some of our own.

This semester, we partnered with WGBH in our latest take on a video clipping and analysis environment, Project Vietnam, launched this week (after a pilot run this summer) and we plan to continue advancing and freely releasing this work.

In this analysis environment, a "clip" is just the annotation of an in-point and an out-point on the source stream. Project Vietnam is very similar to our VITAL software but is designed as a mashup, meaning we can analyze clips from across the internet - in theory, any video at the other end of a url (in practice, only quicktime streams so far, but the future looks bright). Our faculty partners on Project Vietnam are also very interested in social interactions between and across student submissions, so the environment aggregates and synthesizes student work, and allows for student collaboration on some of their essays.

This summer's Open Video Conference attempted to bring together interested stakeholders in this problem space. We have decided to start a conversation in their tent, holding our discussions with our perspective partners (including our collaboration with Georgetown's CNDLS on Project Rebirth) in public, in the hopes that perhaps other groups working on similar problems might stumble across our conversation and join in.

At the Open Video Conference alot of people were talking about about standards for "time-based metadata" "isochronic metadata" or "fine-grained metadata". There is a good chance that the web community will soon consolidate around these standards. In the meantime, our group has essentially bracketed the conversation around standards, and tried to proceed as-if that problem has already been solved, investigating what applications might look like once these standards are nailed down. While we need to track the standards conversations closely to leverage their gains, it is also exciting to bootstrap our way into the future of video annotations, by iterating over design approaches that presume these standard conversations are resolved.