Professionals across different industries have transcribed speech to text manually in a variety of methods, attempting to save time and money in doing so. But the last decade has seen new developments in machine learning and speech recognition that revolutionize the way we process our audio files and give rise to a new wave of automated transcription services.
Weighing the pros and cons of each, the decision is clear for enterprise-level businesses and professional individuals alike.
Manual transcription is the same as data entry: users take an audio or video recording, listen to it in short sections and type all speech into text.
This process has been used by professionals in law, journalism and medical industries as a standard practice for many decades. The time-consuming and expensive process of hiring in-house employees to transcribe content led to businesses outsourcing their transcription work, paying an hourly premium for the convenience of having a passage recorded to paper without taking up too much of their workload or their workforce.
Although some transcribers can type audio and video in real time, it is an extremely rare quality that takes years to train and practice for. And even with extensive training, not all are able to perform the job: they often type a rate of at least 200 words per minute, which is over four times the rate of a person who types every day for their profession.
A recent technology that’s commonplace in many homes across the world completely disrupts traditional transcription methods. Although Amazon’s Alexa, Apple’s Siri and Google Home may not seem like likely culprits, the technology at their core, a technology that allows computers to decipher and respond to language, has also worked wonders for the transcription industry. Automated speech recognition (ASR) has given way to digital transcription that automatically converts voice-to-text – and they’re already starting to revolutionize the way we interpret interviews, speeches and lectures.
How does ASR work?
ASR works by analyzing a passage of audio or video that's uploaded by a user, then matching the sounds to hundreds of thousands of recognizable words in its dictionary. It converts the speech from the audio or video passages into text, which can then be searched, edited and verified. Not only that, but each word is stitched to the corresponding section of audio, making it easy to export important parts of the files.
The beauty of digital transcriptions is how willing the automated service is to do your grub work – for instance, it seamlessly adds captions or subtitles to videos in moments, where it would normally take a human hours or days. Captioning and subtitling are particularly invaluable tools when marketing to today’s on-the-go audience, who often watch video without sound. Digital also boasts the convenient ability to distinguish multiple speakers and create a new paragraph each time someone new starts speaking.
How do digital and manual transcription measure up? Well, when it comes to time-efficiency, digital transcripts carry a huge advantage over their manual counterpart. Unless your transcribers are one of the few extremely gifted typists who can transcribe in real time, they would have to adopt the clunky workflow of, “Play. Pause. Type. Rewind” (repeated ad nauseum). Not only is this inefficient, it’s dull enough to put even the most well-rested to sleep in minutes.
Digital transcription’s strength comes from being able to process files at a fraction of the time it would take a manual transcriber — usually, a 45-minute video file would take far less than 45 minutes to convert to text, a feat that's impossible for almost all manual transcription services.
What about accuracy?
Are digital transcriptions any more accurate than traditional methods? While many manual outsourced transcription services boast a 99% accuracy rate, premiums are often added for the extra effort it takes human transcribers to decipher poor-quality recordings. While machines aren't yet able to replicate flawless speech-to-text conversions, Trint’s services operate with approximately a 5-10% margin of error on clear recordings — a respectable figure when running at a flat hourly rate, not to mention in a fraction of the time.
Trint also offers customers helpful tips on how to improve the quality of recordings to maximize accuracy. What’s more, all transcripts are displayed on the time-stamped Trint Editor, which allows for any edits to be made easily and quickly — a feature that's unavailable in a manual transcription service.
Digital transcription weighs in favorably when bringing cost into consideration, too. While a simple, high quality, hour-long audio file might cost around £39 with other services, we charge a flat rate of £13.20 (US$15) per hour for their digital solutions. This is a significant saving, which increases at scale for larger transcripts and for customers utilizing our Basic and Supercharged subscription plans.
It’s also the immersive qualities of Trint’s digital transcription service that further set it apart from manual. While analog transcriptions are capable of supplying information on what's being said and by whom, it’s digital that has the power to connect transcriptions to the corresponding parts of the original audio file, allowing for instant reference. It's also an extremely powerful tool when it comes to video editing, providing filmmakers with the power to add captions on their movies with ease; for example, Trint’s free extension for Adobe's Premiere Pro video editing software means transcribing and embedding captions and subtitles happens in moments.