Starting Scribe

Scribe is a voice transcription and annotation tool that I created as a side project. Though existing speech recognition technology can't transcribe full conversations perfectly quite yet, the concept has potential uses for students, user researchers, journalists, and more.

Whenever I've conducted user research in the past, I've always found myself frustrated by one particular thing—note-taking. Staying engaged and asking the right questions while simultaneously taking notes is a difficult skill that takes years to master. Why couldn't I design a similar voice annotation tool for user researchers?


Reducing the scope

When it comes to a voice transcription and annotation tool, there are a myriad number of possible use cases and users. Students can take notes on a professor's lecture as it is transcribed in real-time, journalists can refer back to their interviews for quotations, and user researchers can focus on user behavior instead of frantically taking notes while testing.

That being said, with so many possibilites comes countless questions. How might students record their lectures and take notes simultaneously? How would journalists incorporate other media like photos and videos into interview recordings? Could user researchers link their voice recordings to specific tests? To move forward with a reasonable scope, I had to add some constraints:

The tool would not support recording or real-time subscription.

While students especially could benefit from real-time transcription of professors' lectures, the fact remains that current speech recognition libraries are not accurate enough yet. As for not supporting recording, there are a number of voice recording apps available.

The tool would be purely audio and text-focused.

While users might want to incorporate photos or videos from their voice recordings, the app can quickly become cluttered with superfluous functionality. I decided to concentrate on creating the simplest and polished experience possible.

The tool would be web-based, and for all kinds of users.

By choosing not to specialize for a single profession, the app could be useful to a wider array of users as a pure voice annotation tool. And without a recording function, a mobile companion app would be unnecessary.


Choosing a layout

Early on, I looked into existing interactive transcripts and annotations. Examples from TED and Vox offered inspiration: TED's interface makes phrases clickable to link to specific moments in the talk, while Vox's annotations afforded the ability for multiple correspondents to interject their responses to ideas in Trump's inauguration speech.

Wanting to allow annotations without disrupting the vertical flow of the transcript itself, I opted for a style similar to the Cornell method of note-taking: transcript emphasized in a left column, and annotations grouped in the right column.

With such a text-heavy interface, I wanted reading to be as effortless as possible, whether users were reading the transcript or their own annotations. To aid the reader, I did the following things:

Optimize the text size and whitespace

I took into account typography best practices: with respect to the body text size, line spacing should be 120-145%, line length should be 45-90 characters, and spacing between paragraph should be 50-100%.

Contrast by color and typeface

I selected Minion, among the most legible serif typefaces available, for the transcript text, and San Francisco, an equally legible sans serif, for the annotations and menu. I also chose a blue for highlights and annotations, leaving black purely for the transcript and its playback controls.