Sunday, January 10, 2010

Improved Conference Captions from Amazon Mechanical Turk

Just wanted to send a quick note that three of the popular videos from the BSD Conferences YouTube channel have been updated with human-edited English language caption files. These offer a significant improvement over the machine generated captions I wrote about last month.

The following videos have been updated:

I've also posted three simple captions text files which provide the times and text in a very simple ascii format in case anyone wants to provide a diff to improve any remaining mistakes in the captions.

The transcriptions were done with the help of the industrious workers behind Amazon Mechanical Turk. The three transcripts above, representing at least 6 person hours of work, but easily twice that much time, were completed for less than $50 by leveraging the timing information from free machine generated captions and mechanical turk for the editing. This is less than 1/10th of the cost of a commercial transcription service.

What is the quality of these captions in other languages when automatically translated with YouTube? Are there any other videos for which captions would particularly be useful?

AsiaBSDCon is coming up in March, and I hope to have things streamlined by then such that videos with both Japanese and English captions can be added to the channel shortly after the conference.