First is a comparison table of the automated options. Below the table are instructions for each option.
Option | Cost | Transcription Formats |
File Type Generated | Languages Supported |
Ethics Considerations |
---|---|---|---|---|---|
Microsoft 365 – Word Transcription | $0 (but could potentially cost money to unlock unlimited minutes in the future) | just text; text and speakers; timestamps and text; timestamp, text and speakers | Word document | 80+ languages/dialects supported |
UofT’s Research Ethics Board has approved this approach in the past if you stated that you were keeping all files on OneDrive using multifactor authentication (but of course that depends on your particular situation and what you wrote in your research ethics protocol). Read more information on how the service works under the About Transcription heading. |
aTrain | $0 | text and speakers; timestamp, text and speakers | Text file | 57 languages: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh. | This is a program you run locally on your computer (with no need for internet access), so it is very likely that the Research Ethics Board would approve this approach. |
Zoom | $0 | timestamps and text; timestamp, text and speakers | VTT file; Text file | English only | Keep in mind that the recording and transcript are stored on Zoom’s servers. This may or may not be acceptable for research ethics. |
YouTube | $0 | timestamps and text | VTT file | Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Turkish, Ukrainian, and Vietnamese | Keep in mind that the recording and transcript are stored on YouTube's servers. This may or may not be acceptable for research ethics. |
Microsoft 365 – Stream Transcription | $0 for UofT Faculty and Staff | timestamps and text | VTT file | English, Chinese, French, German, Italian, Japanese, Portuguese, and Spanish | This is similar to using the Microsoft 365 Word solution above. |
NVivo Transcription | $25USD/hour; cheaper bulk purchases available | timestamp, text and speakers | Word document; Text file | 28 languages | Keep in mind that the recording and transcript are stored on QSR’s servers. This may or may not be acceptable for research ethics. Read more about their data security. |
A. Microsoft 365 – Word Transcription
Summary: You can upload audio and video files to Microsoft Word 365 online and use the transcribe feature to get a Word document transcript with text, timestamps and speaker names.
Instructions:
- Go to Word 365 online.
- Enter in your UofT email address. Then it will take you to a page where you will log in using your UTORID credentials.
- Create a new blank document.
- Click on drop-down arrow next to the Dictate icon (i.e., the microphone icon) from Home Ribbon Menu, then select Transcribe.
- From the right-side box, pick your language and upload your audio or video file.
- You will see a progress bar as it transcribes - takes a bit of time, but pretty fast. Wait for it to be done.
- When done, you should see a preview of the transcription.
- Below the transcription, click on the drop-down arrow next to the Add to Document button.
- Select the option you want: just text; text and speakers; timestamps and text; timestamps, text and speakers.
- It will then add the transcript to the Word document that you can then edit and/or download.
Note: Currently, Microsoft is offering unlimited minutes. In the past, there was a limit of 300 minutes per month per user.
B. aTrain
Summary: This is a free, open source application you run locally on your computer. It will transcribe your audio and video files, creating text files with speaker names and timestamps.
Instructions: You can download and install it from the Microsoft store. Then you just select your audio or video file, and optionally provide information on what language the file is in, and how many speakers there are. You can also decide what level of model you'd like to use to run the transcription. The larger the model, the more accurate the transcription, but the longer the process takes. aTrain provides a paper with more details on its use.
C. Zoom
Summary: This is an easy option that works for both video and audio files and is available to UofT students right now. You have two options:
- Host a meeting for one, share your screen, including system audio, record your meeting to the cloud, and play your video or audio files (as if presenting a webinar). Once the recording has saved to the cloud, it will also auto caption it, and you can use the Zoom interface to edit it and add speaker names. This results in a VTT file you can download
- Host a meeting for one, share your screen, including system audio, turn on live captioning, and play your video or audio files (as if presenting a webinar). Zoom will auto caption it live. You can download the transcript when it is done, by selecting View Full Transcript from the Captions menu option, and then use the Save Transcript button. This results in a text file you can download. As you are the only person in the meeting, it will tag all the text with your Zoom name. You would have to edit it if you wanted appropriate speaker names.
Note: If you use Zoom to conduct your interviews live instead of working with a recording, you could use approach a) or b), and it would automatically detect the speakers and label them appropriately in the transcript.
Instructions:
- Option a) instructions Then to get the VTT file, you can click on the recording name in your list of recordings in Zoom, hover over Audio Transcript item, and click on the small download icon (an arrow) to download it.
- Option b) instructions Note that in newer Zoom clients, the menu option might just say Show Captions. You have to click on that to start up the captions and get the transcript going, but then you can click on Hide Captions to not see it on your own screen.
D. YouTube
Summary: If the video is not sensitive, you could use a free auto transcription service, such as YouTube, to create your transcription. You don't need to share the video publicly because you can upload it as a private file to your account. After uploading the video, you can use the caption service to generate a VTT file, which you can then download. But note that there won’t be any speakers’ names in the file; you would have to add those manually if you wanted them. Also, note that this only works for video files – see the Notes on Converting Audio Only Files to Videos section, if needed.
Instructions:
- Follow the upload your video instructions (or the instructions if your video is longer than 15 minutes)
- Then follow the instructions to use the caption service and generate a VTT file
- Finally follow the instructions to download the VTT file
E. Microsoft 365 - Stream Transcription
Summary: An alternative available to faculty and staff at UofT is to use Stream. You upload your video, and then you can ask it to generate captions as a VTT file. Note that this only works for video files – see the Notes on Converting Audio Only Files to Videos section, if needed.
Instructions:
- Go to Stream 365 online.
- Enter in your UofT email address. Then it will take you to a page where you will log in using your UTORID credentials.
- Once logged in, click on Upload on the right, next to the Filter button. Browse to your video file and select it to upload.
- Once uploaded and showing up in the content list, click on it to play in Stream.
- Click on Video Settings on the Right.
- Expand the Transcript and captions section by clicking on its drop-down arrow.
- Click on Generate and select the language to generate captions.
- Once finished, you should see the captions listed in that section saying the language and below “Generated by Microsoft”.
- If you click on the Transcript option that is now available on the right, you can view the video and transcript side-by-side if you want to make any edits.
- When done, if you go back to Video Settings, to the Transcript and captions section, next to the generated caption listed, click on the … icon for those captions and select Download to download the VTT file, but note that there won’t be any speakers’ names in the file; you would have to add those manually if you wanted them.
F. NVivo Transcription
Summary: QSR offers a paid automated transcription service where you upload audio and video files to transcribe. You are able to edit the files in the online interface and download the transcripts as text or Word files when they are ready.
Instructions:
- First sign up for the NVivo transcription service
- Then follow QSR’s step-by-step instructions and how-to video for more information
Notes on Converting Audio Only Files to Videos
For YouTube and Stream, these tools work with video files only. For audio only files, you will have to turn it into a simple video to do this (so add a still image and save as a mp4 file). One way to do that would be to create one slide in PowerPoint, add the audio, and use PowerPoint to export the slideshow as a video mp4 file. Read more details on how to create videos from slideshows in MS PowerPoint.
Cleaning VTT files
- If you want an automated way to strip out timestamps and numbering in Zoom transcript files, you can follow these instructions using REGEX and a text editor, such as Notepad++. (Generally, REGEX is a powerful way to identify patterns in text and could be used in a variety of ways to clean up transcripts)
- Another automated way to strip out timestamps and other information in a VTT file, and just get the text (and sometimes speakers names), and to be used when your transcript does not contain sensitive information, is to try this CleanVTT online tool. Note that it was designed to work with Steam VTT files, so has varying success with VTT files created in other tools
Advice on Workflows for using VTT files in NVivo
The article “Auto-Creating, Correcting and Coding Transcripts from Microsoft Teams or Zoom in CAQDAS Software (ATLAS.ti, NVivo or MAXQDA)” discusses the general process of creating your own transcripts, cleaning up VTT files, and then bringing in those files along with your audio/video files into NVivo to work with them there.
Other Resources on Captions/Transcripts
Also, visit our Getting Started page for more information, tutorials, and workshops on NVivo!