Speech to text - Kunnskapsbasen
Speech to text
Speech to text is an artificial intelligence-based automatic transcription service developed at NTNU that can be used for certain classifications of data (public and internal). On this page you will find information about what you can use the service for and how to use it.
Topic page on research data | Topic page on software
Norwegian version: Tale til tekst
What is Speech to text?
Speech to text is an automatic transcription service that is based on Whisper from OpenAI. Whisper recognizes 98 different languages, including Norwegian and English. You can either transcribe text in the same language or translate it into English.
Speech to text can be used to transcribe most types of audio and video files containing public (green) or internal (yellow) data, such as audio recordings, recordings from Zoom or Panopto (.opus unfortunately does not work). The service can also be used to streamline the subtitling of video. Read more about tools for captioning video.
The quality of the automatic transcription with Whisper will vary, and the text should be reviewed and corrected manually. Both the sound quality of the recording itself and the language or dialect to be transcribed will affect the accuracy of the transcript.
Get started
- Log in to Speech to text. Remember that you must be connected to NTNU's network, either directly from campus or via VPN. Use your NTNU user to log in (Feide login).
- Choose which language you want to transcribe, or whether you want automatic recognition. If you want the audio files translated into English, remember to check the box.
- Select the audio or video files you want to transcribe by dragging over files or pressing "browse files". Remember to read the terms of use and check that you have classified your data. The actual transcription process may take some time, depending on how long the queue is. You will receive an email when all transcriptions are complete.
- Download the files in the desired format (txt, vtt or srt). Choose srt format if you are going to subtitle the video.
Uploaded audio/video files are deleted as soon as the transcription is complete. Transcribed files are automatically deleted after 14 days, unless you delete them yourself.
Guidelines for use
- Free use for students and staff at NTNU.
- The service should only be used for public or internal data, not confidential (red) or highly confidential (black) data.
Classification of personal data
Personal data - including research data containing personal data - is usually classified as internal or confidential. Speech to text may NOT be used on material containing special categories of personal data, as this type of data is classified as confidential. Special categories include topics such as health, religion, race or political opinion. In addition, trade secrets or research subject to export controls will usually be classified as confidential.
Please note that all audio in uploaded files will be transcribed. Be aware if confidential or sensitive topics are recorded, even if they are unforeseen or small in scope. See NTNU's guidelines for information classification and data storage for more information.
Data management and privacy
Speech to text is set up on the servers and infrastructure of NTNU's IT department. Your audio or video content does not leave NTNU and is not shared with others. The service is provided by the Research Data Project at NTNU (2021-25), which has also conducted a risk assessment and privacy impact assessment (DPIA).
See the privacy statement for Speech to text.
Contact us
Contact NTNU Hjelp if you have questions about using Speech to text.
If you need help or have questions about research data, visit Forskningsdatahjelpen (Research Data @NTNU) for more information.