CaptionHub

Building the world's most advanced subtitling platform.

2014-2019

Previously known as:: Selworthy
Roles:: programming, web development, javascript, ruby on rails, infrastructure, technology strategy

CaptionHub is an online platform for subtitling video. It allows teams to generate and edit captions inside a web browser, previewing them in a real-time editor. They can edit and adjust timing of captions using a slippy timeline, similar to desktop video editors. When they’re happy with a preview, users can emit caption data in industry-standard file formats, or render out video with the captions burned-in. It also allows translators to easily translate original captions, editing subtitle timings themselves to reflect language-specific changes. Because it all runs in the browser, it can be accessed from anywhere in the world, letting teams with in-territory staff collaborate around the clock - and it has clear workflow tools to enable that collaboration. CaptionHub can even generate your original captions in one of 28 languages using speech recognition, producing captions that break at natural points in your video, such as on scene transitions or sentence breaks.

I was technology lead on CaptionHub from its inception until the end of 2018; it made frequent appearances in my Weeknotes as a project called ‘Selworthy’. I began as the sole developer and designer on the prototype application, and worked on it as it developed into a successful commercial product, leading the development team that grew around it, and developing the technology capacity of my client.

The project began in 2015, with a very specific problem that needed solving.

The Problem

Neon are a London-based VFX and motion graphics studio. Some of their business-to-business work included producing foreign-language versions of client content, with foreign-language subtitles. The process they used at the time - common to many studios - was taking translated content from a translation agency, letting a video editor apply the subtitles to video content, and then taking feedback from the translation agency and client on the final video, until it was signed off.

One of their clients came to them with a new request: would they be able to subtitle an internal corporate video into eleven languages, with a 24-hour turnaround time, every week? Early tests with a roomful of translators suggested that it might be possible… but also that with the tools they had, it was a high-effort solution. Their suspicions were aroused: surely there was a better way of performing this task?

Would it be possible to build some kind of browser-based tool that would let translators work quicker, remotely, and empower them to make decisions around text and timing that they probably ought to be making anyway? Frequently, errors that a video editor won’t notice - spelling errors in a language they don’t speak (especially one with a non-roman character set such as Arabic, Hebrew, or Japanese), or captions that once translated into a new language exceed the house style for characters-per-line - are exactly the sort of thing a translator is equipped to do.

Neon approached me with this idea, and I concurred: the state of the HTML5 web browser in 2015 was such that we had a good chance of being able to do this inside Chrome or Safari, using off-the-shelf tools and libraries. They initially hired me to build a proof-of-concept.

The Prototype

Our initial prototype focused on the specific needs of the initial target client. For the prototype, we only needed to accept the video and timed text formats that they specified, and we only had to produce the file formats they required, so we built the system around those assumptions. Where possible, I leaned on cloud services to streamline the prototype development process, deploying code to Heroku, and using cloud-based image processors and video encoders. That left me free to focus on the meat of the prototype.

During this prototyping phase, I built the very first pass at our ‘rich editor’: the system of a timeline and a table to let users edit captions, adjust their timing on a timeline, and see a realtime preview, the timeline staying in sync with video playback.

This was built out of a Backbone-driven set of views, all rendering the same caption data. The table of captions was a Backbone view… and so was the canvas element that made our timeline. requestAnimationFrame triggered re-renders, meaning our canvas was a view rendering at around 60fps. (Why Backbone? Well, it was 2015, and at that point, it seemed a mature and stable framework for a data-bound product).

I also built our first pass at workflow tools for this process. In doing so, began to develop our understanding of the problem domain, and grow some of our own domain language around the project (much of which proved to be surprisingly robust, and survives internally to this day.)

The prototype worked end-to-end, making translation of videos straightforward, and emitting files in standard formats that were easily imported into After Effects or Final Cut.

1.0: Turning it into a Product

We had definitely proved our system worked, and the client was enthusiastic, but there was more to be done to make a real product - and appetite within Neon to deliver this. I stayed on to continue working on the project, and we started on the long list of tasks to take our prototype to a 1.0, including:

replacing our reliance on cloud services so the software could operate on corporate hosting, inside the firewall. Crucially, that meant replacing the cloud video encoders with our own video encoding service. I built an internal webservice to handle queueing and processing these taks in the background, wrapping ffmpeg and integrating seamlessly with our existing primary application. This service encoded video for the realtime preview, generated thumbnails, and similar, notifying the end-user when their encodes were complete.
adding a visible audio waveform on the timeline, generated by our encoder tool. This made cutting captions to video much easier - it was instantly possible to see speech starting or stopping, and mark the ins and outs of captions to match.
adding to our feature set, from new tweaks and functionality in the editor, to better handling of validating captions against client specifications (such as number of characters per line, or minimum duration on-screen of a caption)
greatly increasing the number of data and video types we could import and export
supporting ways of adding original captions beyond ‘already having a timed text file’; sometimes, users might need to start from a blank slate, or a plain text file.

Beyond 1.0: growing functionality, becoming CaptionHub

Our 1.0 was a success: we licensed it to the client, hosted on their premises, and the product started being used in production. Now Neon were keen to develop this product into a software-as-a-service offering, primarily hosted in the cloud, for a much wider range of clients.

This was the point where our previously un-named product became CaptionHub proper, both in terms of the maturity of the product, and in terms of a new company being developed to support it.

Whilst aiding translators was a key feature for us, for many potential users, supportin single-language captions was enough (and - enough to sell them on the utility of the product). We improved the experience of single-language use, refining the UI and UX for users working in single languages.

Around the core captioning features, we also added new flexibility to our workflow features, making it easier for teams to integrate CaptionHub with their own processes.

I also developed two key features that became integral to CaptionHub.

Firstly, I rewrote the entire internals of the captioning system to guarantee that the captions we produced were frame-accurate. That’s a challenge, given the HTML5 video element only ever handles timings in milliseconds. But it’s important to ensure that the captions generated are accurate to frames, because often, captions break on scene transitions - and if the caption hangs around a frame too long, something is visibly wrong, even if you don’t know what it is. By ensuring that the smallest unit of time CaptionHub’s internals ever dealt with was a frame, and by making sure the UI enforced that logic, we could guarantee that CaptionHub would produce frame-accurate captions - a hugely important feature for many video producers.

The second feature I worked on, that visibly transformed the experience of using the tool, was Speech-to-Text transcription of video.

One of the slowest parts of the process is creating the ‘original’ set of captions. Typing them out takes long enough, but even slower is timing them: making sure they all begin and end at the right timestamps in the video. We partnered with a white-label speech-to-text provider whose software was producing remarkable results. Using their API, we could get accurate transcriptions of speech in a matter of minutes. (At this point we were primarily only able to transcribe English in a number of accents; by 2019, 28 languages would be supported for transcription).

We then wrote our own logic to produce captions from those transcriptions: making sure that captions broke on sentence breaks, didn’t leave punctuation hanging, and would ‘snap’ to scene-changes if it looked like they were around a camera transition. I also added the scene detection data to the editor timeline, highlighting edit points in orange. Caption in- and out-points would automatically snap to edit points once they were a couple of frames away.

The end result was that users could upload a video, and have it automatically captioned in a few minutes - already to a high quality. Most notably, because the timing of the captions needed little alteration, correcting this transcription was a matter of some swift text edits, rather than having to drag all the caption boxes around. This massively improved end-user experience. Turnaround on caption production got much faster - especially for single-language captions. Captionhub continues to add support for more languages and features around speech-to-text. For instance, it also supports aligning known text to video: if you have a script, and a video of someone speaking that script, we can produce captions that are 100% correct, but perfectly timed to their speech.

And as ever, we continued to improve on smaller features and workflow tools, as well as handling the needs of supporting multiple clients at once.

We continued to add to import/export formats as clients approached us with needs. In particular, I developed code for handling formats popular in the broadcast world despite their venerable age, such as SCC and STL. That required investing time reading documentation, poking in the hex-editor mines, and using libraries such as bindata to produce suitable output. But the work paid off, leading to integrating with broadcast tool workflows - vital for some of our clients.

We improved our infrastructure, bringing in a dedicated sysadmin, building up automated deployment tooling to make maintaining all our environments - including local development environments - straightforward, reliable, and easy to alter.

We also grew the development team, and my new colleagues were instrumental in adding all manner of functionality - such as handling invoicing and payment online, building our own API-based services and integrating with increasing numbers of third-party tools.

I had gone from sole developer and pathfinder to tech lead: helping shape the team, guiding the CEO and working with them on developing the product, and trying to disseminate my deep domain knowledge that I had grown with the highly talented team we’d built.

Increasingly, I moved further away from the coalface of implementation. Now that CaptionHub knew their product and users better, there was much better insight into the UI users required, and so the team ended up redesigning and rebuilding the front-end, incrementally replacing code with component-based architecture in Vue. The last piece to be updated was the rich video editor itself, and I was hugely pleased to see other developers taking my Backbone code and bringing it up-to-date within a reactive framework. This effort paid off: the code is more performant, clearer to work with for the development team, and delivered much better user experience to clients.

CaptionHub Today

CaptionHub continues to grow its userbase, and makes a strong impression with all the clients and agencies who use it. The team has grown and matured, and the product still grows in several directions at once - from new integrations and partnerships to new core functionality, such as captioning live video automatically.

I’m proud of all my work on it - but I’m equally proud of the amount of my code which has now been rewritten or mothballed. There is a particular delight in seeing a product grow to the point that you have slowly been superseded. And, in parallel with my code being replaced, refactored, and improved, I have stepped back from working on the product. I’m still in touch with the team, though, and consult from time-to-time, on product and technology strategy, as well as the nuance of broadcast data formats.

I’m also proud of the impact it has. I enjoy making tools for other people to do their work with. It was hugely rewarding to hear linguists and captioning teams tell you it’s the best tool they’ve ever used, or that they’re twice as productive as they used to be.

It was exciting to share in the journey of building a software team and a product. The VFX industry is highly reliant on cutting-edge technology, and full of technology savvy - it’s an industry where many companies end up building their own tools and products. CaptionHub took the Neon team on their own journey to building technology products, and they were a delight to work with; I enjoyed my close relationship with Tom Bridges, their CEO, who became a great product owner and CEO of CaptionHub.

My work on CaptionHub reflects the development of my personal practice. It is rooted in thinking-through-making: using the act of prototyping as a way of understanding a domain and the right thing to build, not just for myself as a developer, but also for a client.

For a long while, I was the developer and designer of the whole system, working across the stack from browser to infrastructure. As the product grew, we began to grow the team to spread the load, and a lot of my work became shaping features out of requirements, and leading on architecture. By the end, my work was spread out vertically across the product: at one end, I was implementing code to legacy broadcast specifications; at the other, advising on technology strategy; in the middle was a deeply collaborative relationship with the team, all of whom were hugely capable. I’m somewhat loath to describe what I did as leadership; at the time, it felt very much like listening, sharing, and frequently accepting the insight of others, and the team worked well as trusting equals.

I’m delighted to see CaptionHub in the world, thriving, and being put to such good use by so many teams.

Relevant Blogposts

Worknotes - February 2021

24 February 2021
Worknotes - November 2020

20 November 2020
Yearnotes, 2019

24 January 2020
Latest work case studies, summer 2019 edition

11 July 2019
Week 317

27 January 2019
Weeks 312-313

21 December 2018
Week 311

8 December 2018
Week 310

2 December 2018
Weeks 306-309

25 November 2018
Weeks 298-305

29 October 2018
Weeks 296-297

3 September 2018
Weeks 294-295

20 August 2018
Week 293

5 August 2018
Weeks 290-292

29 July 2018
Weeks 286-289

8 July 2018
Weeks 282-285

10 June 2018
Week 279

16 May 2018
Week 278

23 April 2018
Week 277

17 April 2018
Weeks 275-276

9 April 2018
Week 274

26 March 2018
Weeks 272-273

19 March 2018
Week 271

5 March 2018
Week 270

26 February 2018
Weeks 268-269

19 February 2018
Week 267

5 February 2018
Januarynotes (Weeks 261-266)

1 February 2018
Week 260

18 December 2017
Week 259

11 December 2017
Weeks 257-258

4 December 2017
Weeks 255-256

20 November 2017
Week 254

6 November 2017
Week 253

1 November 2017
Week 252

23 October 2017
Week 251

16 October 2017
Week 250

9 October 2017
Weeks 246-249

2 October 2017
Weeks 244-245

4 September 2017
Weeks 241-243

20 August 2017
Week 240

31 July 2017
Weeks 236-239

24 July 2017
Week 234-235

23 June 2017
Week 233

12 June 2017
Weeks 229-232

5 June 2017
Week 228

8 May 2017
Week 227

3 May 2017
Week 223

4 April 2017
Week 222

29 March 2017
Week 221

20 March 2017
Week 220

12 March 2017
Week 218-219

6 March 2017
Weeks 216-217

20 February 2017
Week 215

6 February 2017
Week 214

1 February 2017
Week 213

23 January 2017
Weeks 206-209

23 December 2016
Weeks 202-205

30 November 2016
Weeks 197-201

2 November 2016
Weeks 195-196

2 October 2016
Weeks 183-186

25 July 2016
Weeks 175-176

16 May 2016
Weeks 173-174

25 April 2016
Weeks 167-169

18 March 2016
Weeks 165-166

3 February 2016
Weeks 163-164

19 January 2016
Weeks 161-162

4 January 2016
Weeks 159-160

16 December 2015
Weeks 157-158

24 November 2015
Weeks 155-156

9 November 2015
Weeks 154-155

26 October 2015
Week 152-153

5 October 2015
Weeks 150-151

15 September 2015
Weeks 148-149

1 September 2015
Weeks 146-147

18 August 2015
Weeks 144-145

3 August 2015
Weeks 142-143

22 July 2015
Weeks 140-141

3 July 2015
Week 139

15 June 2015
Week 138

9 June 2015
Week 137

1 June 2015
Weeks 134-136

20 May 2015
Week 133

5 May 2015
Week 132

29 April 2015
Week 131

20 April 2015
Week 129

7 April 2015
Week 128

30 March 2015
Weeks 126-127

22 March 2015

External Links

Project Website

CaptionHub

Building the world's most advanced subtitling platform.

The Problem

The Prototype

1.0: Turning it into a Product

Beyond 1.0: growing functionality, becoming CaptionHub

CaptionHub Today

Relevant Blogposts

24 February 2021

20 November 2020

24 January 2020

11 July 2019

27 January 2019

21 December 2018

8 December 2018

2 December 2018

25 November 2018

29 October 2018

3 September 2018

20 August 2018

5 August 2018

29 July 2018

8 July 2018

10 June 2018

16 May 2018

23 April 2018

17 April 2018

9 April 2018

26 March 2018

19 March 2018

5 March 2018

26 February 2018

19 February 2018

5 February 2018

1 February 2018

18 December 2017

11 December 2017

4 December 2017

20 November 2017

6 November 2017

1 November 2017

23 October 2017

16 October 2017

9 October 2017

2 October 2017

4 September 2017

20 August 2017

31 July 2017

24 July 2017

23 June 2017

12 June 2017

5 June 2017

8 May 2017

3 May 2017

4 April 2017

29 March 2017

20 March 2017

12 March 2017

6 March 2017

20 February 2017

6 February 2017

1 February 2017

23 January 2017

23 December 2016

30 November 2016

2 November 2016

2 October 2016

25 July 2016

16 May 2016

25 April 2016

18 March 2016

3 February 2016

19 January 2016

4 January 2016

16 December 2015

24 November 2015

9 November 2015

26 October 2015

5 October 2015