Tuesday, December 22, 2009

Machine generated captions for BSD conference videos

One of the most frequent requests I've received, since Launching the BSD Conferences YouTube channel last year, has been for captions in Spanish, Russian, Chinese, and other languages. I was excited last month when Google announced automatic captions for Youtube videos using machine translation. This feature is still highly experimental but I am happy to report that it has been enabled for the BSD Conferences channel. In combination with the much more mature automatic translation feature, this means that captions are now available in over 50 languages from Afrikaans to Vietnamese for most of the 73 videos in the BSD Conferences channel.

The automatic captions are still highly experimental and the quality of transcription for highly technical content spoken by a diverse set of international speakers is a significant challenge to get right. If you are interested in helping to correct any of the English transcripts I would be happy to provide you a simple text file of the transcription, with each line offering the start and end time for the caption to be displayed, and the caption text. One advantage of the machine translation is that the most time consuming part of manually creating captions, synchronizing the timing of the text with the speech, has been done automatically. Even when the technical words are mangled, the timing information in the automatic captions files can be leveraged to make the process of manually improving the captioning much easier.

The experimental automatic captions are only available directly from the video watch pages, and not from channel pages or other views. For example, visit www.youtube.com/watch?v=nwbqBdghh6E to see one of our most popular videos, Kirk McKusick speaking on FreeBSD Kernel Internals. Hover over the triangle at the bottom right of the video, then over the CC submenu and select "Transcribe Audio". You can then choose to "Translate Captions" into a different language as well.