Wikier

Speech to text

Speech to text is an NTNU-developed service for automatic transcription using artificial intelligence. On this page you will find information about what you can use the service for and how to use it.

Topic page on research data | Topic page on software

Norwegian version: Tale til tekst

Illustration with laptop, speech bubble and transcribed document

How does Speech to text work?

Speech to text is an automatic transcription service using Whisper from OpenAi. Whisper recognizes 98 different languages, including Norwegian and English. You can either transcribe text in the same language or translate into English.

Speech to text can be used to transcribe most types of audio and video files containing public or internal data, such as audio recordings, recordings from Zoom or Panopto (.opus does not work). The service can also be used to streamline the subtitling of video. Read more about tools for captioning video.

The quality of the automatic transcription with Whisper will vary, and manual review and correction is required. This depends both on the sound quality of the recording itself, as well as the language or dialect to be transcribed.

Get started

 Log in to Speech to text

  1. Log in to Speech to text. Remember that you must be connected to NTNU's network, either directly from campus or via VPN. You use your NTNU user to log in (Feide login).
  2. Choose which language you want to transcribe, or whether you want automatic recognition. If you want the audio files translated into English, check this box.
  3. Select the audio or video files you want to transcribe by dragging over files or pressing "browse files". Remember to read the terms of use and check that you have classified your data. The actual transcription process may take some time, depending on how long the queue is. You will receive an email when all transcriptions are complete.
  4. Download the files in the desired format (txt, vtt or srt). Choose srt format if you are going to subtitle video.

Uploaded audio/video files are deleted as soon as the transcription is complete. Transcribed files are automatically deleted after 14 days, unless you delete them yourself.

Guidelines for use

  • Free use for students and staff at NTNU.
  • The service should only be used on public or internal data, not confidential or highly confidential data.

Classification of personal data

Personal data is usually classified as internal or confidential, including research data containing personal data. This means, among other things, that you may NOT use Speech to text on material containing special categories of personal data. Special categories include topics such as health, religion, race or political opinion. In addition, trade secrets or research subject to export controls will usually be classified as confidential.

Please note that all audio in uploaded files will be transcribed. Be aware if confidential or sensitive topics are recorded, even if they are unforeseen or small in scope. See NTNU's guidelines for information classification and data storage for more information.

Data management and privacy

Speech to text is set up on servers and infrastructure at NTNU IT, and your audio or video content does not leave NTNU and is not shared with others. The service is provided by the Research Data Project at NTNU (2021-25), which has also conducted a risk assessment and privacy impact assessment (DPIA).

See the privacy statement for Speech to text.

Contact us

Contact NTNU Hjelp if you have questions about using Speech to text.

If you need help or have questions about research data, visit Forskningsdatahjelpen (Research Data @NTNU) for more information.