In March 2018, we published in issue 4-39 an article “Transcribe Your Meetings in Real Time” introducing you to Otter.ai, a Los Altos, CA start-up. Now the company is expanding its technology to enable speakers on a Zoom call to see their words turned into accurate captions in real-time. This should put an end to the unfortunate miscommunications frequently caused by remote collaboration tools.
So, there should be no more excuses for misreporting the numbers presented by your sales team or missing the list of targets put forward by your manager.
Captions will appear directly within the call, with a couple of seconds of lag. Presumably, they will be accurate enough for crucial information to come out in plain text consistently.
The new feature will be incredibly helpful to users with accessibility needs and non-native English speakers struggling to make out the meaning of a sentence. Otter.ai currently only supports the English language but can handle a variety of accents, including southern American, Indian, British, Scottish, Chinese, and various European accents.
Two years ago, Otter.ai launched its popular speech-to-text software. Available as a mobile app or as a web-based tool, the technology soon started supporting online conferences, offering users the option to turn Zoom cloud recordings into written conversations to keep a record of their virtual meetings.
Earlier this year, Otter.ai launched Live Notes – a new feature that enables users to open a live transcript of the call during a video conference in a separate shared file, which transcribes what is being said in real-time.
Based on a sophisticated algorithm, Live Notes can separate human voices to identify different speakers and includes their name in the transcript to indicate that a given participant has started intervening. Users can then go back to the file to check a detail if they have missed a sentence or jumped late into the call.
Therefore, the new announcement builds on top of Live Notes, integrating the transcribed quotes directly into Zoom’s platform during a virtual meeting. In a demo call showcasing the technology, Otter.ai’s founder Sam Liang said, “Now, you will have Live Notes still going on in the background, but then you will also have the captions put down in the call. And there’s a pretty broad range of people that this will be helpful too.”
“It’s definitely a great help for people with a hearing disability, but also for international, distributed workforces who don’t speak English as their native language. And education as well: online classes could benefit from captions, on top of the Live Notes that they can go back to, to facilitate learning.”
The transcription is not precisely pitch-perfect: some sentences don’t make sense, and words occasionally come up deformed. Overall, however, Otter.ai’s algorithm, especially given the tool’s ease of use and accessibility, appears to be pretty accurate – an assessment echoed by most online reviews and user experiences.
Liang is confident that the technology’s accuracy is only improving as more users get on board, providing more training data for the speech-to-text algorithm and helping the AI work its way through background noise and strong accents.
By considering the context of an entire sentence, rather than working on a word-by-word basis, the AI can make more accurate decisions.
Similar methods have sparked the interest of the industry’s most prominent players, with IBM now offering a cloud-based, highly accurate speech-to-text platform as part of Watson’s services. In contrast, Amazon Transcribe provides an API for automatic speech recognition.
However, Otter.ai is arguably the most consumer-facing technology out there. Liang confirmed that the company is now working on a smoother integration with platforms like Microsoft Teams, Google Meet, or Cisco Webex to open up access to the transcription and live-captions features.
In Zoom, live captions are available now for Otter customers paying for a Business plan, as well as for Zoom Pro customers.