A Look at YouTube's Automatic Captioning Features

The Columbia on YouTube EDU channel was selected as one of a limited set of partner channels to feature YouTube's new automatic captioning services. See our news posting announcing this release. These new features are Automatic Captions & Automatic Timing. This entry will review these features in detail.

Automatic Captions

Automatic captions uses automated speech recognition (ASR) technology, the same voice recognition algorithms found in Google Voice, another Google service that transcribes audio recordings (in this case, for phone voicemails). Any video uploaded to the channel can be selected for processing for machine-generated transcription by the channel administrator. Machine transcription takes a few hours and we have discovered that some videos are not accepted for machine transcription. We believe rejection occurs because of poor-quality audio. Once the audio track has been transcribed, the closed caption option becomes available to the viewer via the video controller.

For the viewer, there is a tricky maneuver necessary to view the automatic captions: (1) mouse over the Options button on the controller, the "up" arrow, to reveal the Closed Captions (CC) button; (2) toggle on the Closed Captions button. Turning on Closed Captions will activate a menu when you mouse over the arrow just to the left of the button. (Hint: when the arrow is black, the menu can be activated; when it is gray, the menu is not available.) Now select the "Transcribe Audio" command under Caption Actions. An alert will notify you about the experimental aspects of this feature and by Google's estimates, only about 70% accurate. Confirm to remove the alert box and the captions will be shown at the bottom of the video. You can turn off captions by toggling the Closed Caption button. If the "Transcribe Audio" command is not available, then the available captions are not machine generated. See "Improving Captions", below.

Another interesting option is available once captions are on. Revisiting the menu activated by the black arrow next to the Closed Caption button reveals a Translate Caption option. Selecting the translate command allows an on-the-fly translation of the captions to 51 languages.

Try out the machine-generated captions using the above instructions with this video from the Columbia University Libraries. (You may need to start playing the video to have the Closed Caption button appear.) Make sure to try out the translation option:

After watching for no more than a few seconds, it is easy to spot the errors in the transcription, but the text is sufficient to understand what is happening and for use in search tools. Given the recent release of this feature, not all existing Columbia on YouTube videos have been submitted for machine transcription. All new uploads will be automatically submitted for machine transcription, unless there is a request to opt out.

Improving Automated Captions

The machine generated transcription file can be downloaded by the channel administrator for editing. This text file (.sbv format), containing the timecoded transcription, can be edited to fix transcription mistakes with any simple text editor. Once the timecoded transcription file is fixed, it can be re-uploaded as a new captions track for the video. Again, only the channel administrator can perform this task. At that point, the Closed Caption button becomes a toggle for the captions track and the "Transcribe Audio" option is no longer available. The "Translate Caption" option continues to be available.

Try out this example with an edited caption file. Make sure to try out the translation option:

Automatic Timing
The automatic timing feature allows the uploading of a transcript file so that it can be timecoded. This is useful in the case where an existing transcript is available, but does not contain timing information. The channel administrator can upload the text file. YouTube will process it using the same ASR technology and return a timecoded transcript that can be used for captioning. Only English-language transcripts are supported.


The Automated Caption feature can be used as a simple and straightforward transcription tool for any video that would benefit from transcription. For example, a transcript can be generated for any lecture video. The translation feature adds another dimension to existing videos for reaching wider audiences, and of course, for language courses. The timecoded transcripts, machine-generated or not, can be used in environments beyond YouTube, such as for creating captioned DVDs or QuickTime files. The only caveat at the moment is that much of the work has to be done by the channel administrator. While, all of the heavy work of editing transcript can be delegated, the downloading and uploading of transcript files must be centralized, creating a possible bottleneck situation.

(Additional contributor: Maurice Matiz.)