Transcription Options

All Tutorials

If you have an audio or video file that you need to get transcribed, this page describes some automated options to consider. But keep in mind, though, that none of these options (including NVivo transcription) provides a 100% perfectly accurate transcript. You will always have to go back in and correct your transcript.

Note: If you want to pay a human to transcribe things accurately (so no or minimal correction is needed later), you could try Rev or Transcript Divas.

First is a comparison table of the automated options. Below the table are instructions for each option.

Option	Cost	Transcription Formats	File Type Generated	Languages Supported	Ethics Considerations
Microsoft 365 – Word Transcription	$0 (but could potentially cost money to unlock unlimited minutes in the future)	just text; text and speakers; timestamps and text; timestamp, text and speakers	Word document	80+ languages/dialects supported	UofT’s Research Ethics Board has approved this approach in the past if you stated that you were keeping all files on OneDrive using multifactor authentication (but of course that depends on your particular situation and what you wrote in your research ethics protocol). Read more information on how the service works under the About Transcription heading.
aTrain	$0	text and speakers; timestamp, text and speakers	Text file	57 languages: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.	This is a program you run locally on your computer (with no need for internet access), so it is very likely that the Research Ethics Board would approve this approach.
Zoom	$0	timestamps and text; timestamp, text and speakers	VTT file; Text file	English only	Keep in mind that the recording and transcript are stored on Zoom’s servers. This may or may not be acceptable for research ethics.
YouTube	$0	timestamps and text	VTT file	Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Ukrainian, and Vietnamese	Keep in mind that the recording and transcript are stored on YouTube's servers. This may or may not be acceptable for research ethics.
Microsoft 365 – Stream Transcription	$0 for UofT Faculty and Staff	timestamps and text	VTT file	English, Chinese, French, German, Italian, Japanese, Portuguese, and Spanish	This is similar to using the Microsoft 365 Word solution above.
NVivo Transcription	$25USD/hour; cheaper bulk purchases available	timestamp, text and speakers	Word document; Text file	28 languages	Keep in mind that the recording and transcript are stored on QSR’s servers. This may or may not be acceptable for research ethics. Read more about their data security.

A. Microsoft 365 – Word Transcription

Summary: You can upload audio and video files to Microsoft Word 365 online and use the transcribe feature to get a Word document transcript with text, timestamps and speaker names.

Instructions:

Go to Word 365 online.
Enter in your UofT email address. Then it will take you to a page where you will log in using your UTORID credentials.
Create a new blank document.
Click on drop-down arrow next to the Dictate icon (i.e., the microphone icon) from Home Ribbon Menu, then select Transcribe.
From the right-side box, pick your language and upload your audio or video file.
You will see a progress bar as it transcribes - takes a bit of time, but pretty fast. Wait for it to be done.
When done, you should see a preview of the transcription.
Below the transcription, click on the drop-down arrow next to the Add to Document button.
Select the option you want: just text; text and speakers; timestamps and text; timestamps, text and speakers.
It will then add the transcript to the Word document that you can then edit and/or download.

Note: Currently, Microsoft is offering unlimited minutes. In the past, there was a limit of 300 minutes per month per user.

B. aTrain

Summary: This is a free, open source application you run locally on your computer. It will transcribe your audio and video files, creating text files with speaker names and timestamps.

Instructions: You can download and install it from the Microsoft store. Then you just select your audio or video file, and optionally provide information on what language the file is in, and how many speakers there are. You can also decide what level of model you'd like to use to run the transcription. The larger the model, the more accurate the transcription, but the longer the process takes. aTrain provides a paper with more details on its use.

C. Zoom

Summary: This is an easy option that works for both video and audio files and is available to UofT students right now. You have two options:

Host a meeting for one, share your screen, including system audio, record your meeting to the cloud, and play your video or audio files (as if presenting a webinar). Once the recording has saved to the cloud, it will also auto caption it, and you can use the Zoom interface to edit it and add speaker names. This results in a VTT file you can download
Host a meeting for one, share your screen, including system audio, turn on live captioning, and play your video or audio files (as if presenting a webinar). Zoom will auto caption it live. You can download the transcript when it is done, by selecting View Full Transcript from the Captions menu option, and then use the Save Transcript button. This results in a text file you can download. As you are the only person in the meeting, it will tag all the text with your Zoom name. You would have to edit it if you wanted appropriate speaker names.

Note: If you use Zoom to conduct your interviews live instead of working with a recording, you could use approach a) or b), and it would automatically detect the speakers and label them appropriately in the transcript.

Instructions:

Option a) instructions Then to get the VTT file, you can click on the recording name in your list of recordings in Zoom, hover over Audio Transcript item, and click on the small download icon (an arrow) to download it.
Option b) instructions Note that in newer Zoom clients, the menu option might just say Show Captions. You have to click on that to start up the captions and get the transcript going, but then you can click on Hide Captions to not see it on your own screen.

D. YouTube

Summary: If the video is not sensitive, you could use a free auto transcription service, such as YouTube, to create your transcription. You don't need to share the video publicly because you can upload it as a private file to your account. After uploading the video, you can use the caption service to generate a VTT file, which you can then download. But note that there won’t be any speakers’ names in the file; you would have to add those manually if you wanted them. Also, note that this only works for video files – see the Notes on Converting Audio Only Files to Videos section, if needed.

Instructions:

Follow the upload your video instructions (or the instructions if your video is longer than 15 minutes)
Then follow the instructions to use the caption service and generate a VTT file
Finally follow the instructions to download the VTT file

E. Microsoft 365 - Stream Transcription

Summary: An alternative available to faculty and staff at UofT is to use Stream. You upload your video, and then you can ask it to generate captions as a VTT file. Note that this only works for video files – see the Notes on Converting Audio Only Files to Videos section, if needed.

Instructions:

Go to Stream 365 online.
Enter in your UofT email address. Then it will take you to a page where you will log in using your UTORID credentials.
Once logged in, click on Upload on the right, next to the Filter button. Browse to your video file and select it to upload.
Once uploaded and showing up in the content list, click on it to play in Stream.
Click on Video Settings on the Right.
Expand the Transcript and captions section by clicking on its drop-down arrow.
Click on Generate and select the language to generate captions.
Once finished, you should see the captions listed in that section saying the language and below “Generated by Microsoft”.
If you click on the Transcript option that is now available on the right, you can view the video and transcript side-by-side if you want to make any edits.
When done, if you go back to Video Settings, to the Transcript and captions section, next to the generated caption listed, click on the … icon for those captions and select Download to download the VTT file, but note that there won’t be any speakers’ names in the file; you would have to add those manually if you wanted them.

F. NVivo Transcription

Summary: QSR offers a paid automated transcription service where you upload audio and video files to transcribe. You are able to edit the files in the online interface and download the transcripts as text or Word files when they are ready.

Instructions:

First sign up for the NVivo transcription service
Then follow QSR’s step-by-step instructions and how-to video for more information

Notes on Converting Audio Only Files to Videos

For YouTube and Stream, these tools work with video files only. For audio only files, you will have to turn it into a simple video to do this (so add a still image and save as a mp4 file). One way to do that would be to create one slide in PowerPoint, add the audio, and use PowerPoint to export the slideshow as a video mp4 file. Read more details on how to create videos from slideshows in MS PowerPoint.

Cleaning VTT files

If you want an automated way to strip out timestamps and numbering in Zoom transcript files, you can follow these instructions using REGEX and a text editor, such as Notepad++. (Generally, REGEX is a powerful way to identify patterns in text and could be used in a variety of ways to clean up transcripts)
Another automated way to strip out timestamps and other information in a VTT file, and just get the text (and sometimes speakers names), and to be used when your transcript does not contain sensitive information, is to try this CleanVTT online tool. Note that it was designed to work with Steam VTT files, so has varying success with VTT files created in other tools

Advice on Workflows for using VTT files in NVivo

The article “Auto-Creating, Correcting and Coding Transcripts from Microsoft Teams or Zoom in CAQDAS Software (ATLAS.ti, NVivo or MAXQDA)” discusses the general process of creating your own transcripts, cleaning up VTT files, and then bringing in those files along with your audio/video files into NVivo to work with them there.

Other Resources on Captions/Transcripts

Centre for Teaching Support & Innovation Guide to Captioning Videos

Also, visit our Getting Started page for more information, tutorials, and workshops on NVivo!

Technique: Qualitative Data Analysis | Tools: NVivo

Date Created: 2023-01-27 Updated: 2024-11-07

Transcription Options

A. Microsoft 365 – Word Transcription

B. aTrain

C. Zoom

D. YouTube

E. Microsoft 365 - Stream Transcription

F. NVivo Transcription

Notes on Converting Audio Only Files to Videos

Cleaning VTT files

Advice on Workflows for using VTT files in NVivo

Other Resources on Captions/Transcripts

Library links

Libraries

Contact