How to transcribe audio to text? Audio to text converter | Free | Python 2023

Hey, Let's Learn Something
27 Jan 202310:24

TLDRThis tutorial video demonstrates how to install a free Python audio-to-text transcriber called 'whisper' by Open AI on a Windows computer. The process involves installing Anaconda, setting up a Python environment, and downloading the necessary application files. It guides through activating the environment, installing additional required packages, and using the medium model for transcription. The video also shows how to navigate to the audio file location, execute the transcription command, and obtain the transcribed text. It concludes with instructions on how to run the transcription process again and offers help commands for further exploration.

Takeaways

  • 😀 Install Python using Anaconda for the audio to text transcribing application.
  • 🔍 Use the Anaconda prompt to create and activate a Python environment named 'text_test_speech' with Python 3.9.
  • 📁 Download the Python application files for the transcribing tool from the provided link and extract them to the desktop.
  • 📝 Navigate to the Python folder using the Anaconda prompt and install required files by pasting the provided code.
  • 🛠 Install additional packages like ffmpeg, forge, and setuptools-rust using the Anaconda prompt.
  • 🎧 Choose a model size (tiny, small, medium, or large) for the transcription; medium is recommended for most audio.
  • 🔍 The transcription process auto-detects the language if not specified; however, it can be set manually with the '--language' flag.
  • 📑 The transcription results are saved as text files and can also be found in the sample folder.
  • ⏱️ The first transcription may take longer due to downloading the model file, but subsequent transcriptions will be quicker.
  • 🔄 To run the transcription again, open the Anaconda prompt, activate the environment, navigate to the sample audio folder, and execute the command with the appropriate file name and settings.
  • 📚 For more details and options, refer to the help command provided in the script or visit the whisper documentation.

Q & A

  • What is the purpose of today's video?

    -The purpose of the video is to teach viewers how to install a free audio to text transcriber, specifically the Whisper application by Open AI, on their Windows computer.

  • Which operating systems does the Whisper transcriber support?

    -The Whisper transcriber supports both Windows 10 and Windows 11 operating systems.

  • What is the first step to install the Whisper transcriber?

    -The first step is to install Python, and the video recommends using Anaconda for this installation.

  • How can one download and install Anaconda?

    -To download and install Anaconda, one should Google 'Anaconda', click on the link, and then click on 'Get Additional Installers' to download the installer for Windows, Mac OS, or Linux.

  • What is the environment name used in the video for installing Python?

    -The environment name used in the video is 'text_test_speech' with Python version 3.9.

  • How do you activate the Python environment created for the transcriber?

    -To activate the environment, one should use the command 'conda activate text_test_speech' in the Anaconda prompt.

  • What is the process to navigate to the Python folder using the Anaconda prompt?

    -To navigate to the Python folder, one should open the folder, copy its address, and then in the Anaconda prompt type 'cd' followed by a space and paste the copied address.

  • What additional files are required to be installed for the Whisper application?

    -The additional files required for the Whisper application include 'ffmpz' and 'setuptools-rust', which can be installed using specific commands in the Anaconda prompt.

  • What are the different model sizes available for the Whisper transcriber?

    -The different model sizes available for the Whisper transcriber are tiny, small, medium, and large, with each size offering varying levels of accuracy and speed.

  • How does one transcribe a sample audio file using the Whisper transcriber?

    -To transcribe a sample audio file, one should navigate to the folder containing the audio file, use the appropriate command in the Anaconda prompt with the desired model size and language specified, and execute the command.

  • What are the output files generated after transcribing an audio file?

    -After transcribing an audio file, the output files include a text file with the transcribed content, an SRT file with timestamps, and a VTT file.

  • How can one run the transcription process again after the initial setup?

    -To run the transcription process again, one should open the Anaconda prompt, activate the 'text_test_speech' environment, navigate to the folder with the sample audio, and execute the appropriate command with the necessary parameters.

Outlines

00:00

💻 Installing Python and Anaconda for Transcriber Setup

The video begins with an introduction to installing a free audio to text transcriber, Whisper by Open AI, on a Windows computer. The process starts with downloading and installing Anaconda for Python, which is accessible on multiple operating systems including Windows, Mac OS, and Linux. The viewer is guided to create a Python environment using the Anaconda prompt with a specific command, and then activate this environment. The video also includes a step to download the Python application files and navigate to the folder containing these files. The installation of additional required files for the application is explained, followed by the installation of ffmpz and setup tools.rust, which are necessary for the proper functioning of the transcriber.

05:00

🎧 Transcribing Audio to Text Using Open AI Whisper

This paragraph demonstrates the process of transcribing audio to text using the Whisper model. It starts with a sample audio clip of a speech by a U.S. president, which the user intends to transcribe. The video then shows how to navigate to the folder containing the sample audio and execute a command in the Anaconda prompt to transcribe the audio using the 'medium' model. The command includes replacing placeholders with the actual audio file name and specifying the language as English. The video mentions that the first transcription might take longer due to the download of a model file, but subsequent transcriptions will be quicker. The result is a transcribed text file, and optionally, an SRT file with timestamps can also be generated. The video concludes with instructions on how to run the transcription process again, including activating the environment and navigating to the audio file location.

10:01

🔍 Detecting Language and Finalizing the Transcription Process

The final paragraph of the script confirms that the language has been detected during the transcription process. It reiterates that the transcription process will take some time, but once completed, all the necessary files will be available in the designated folder. The video ends with a note of thanks and an encouragement for viewers to explore and experiment with the transcription tool further. It also provides a help command for users who might need additional information or assistance with the tool.

Mindmap

Keywords

💡Audio to Text Transcriber

An audio to text transcriber is a software application that converts spoken language from audio files into written text. In the context of the video, the transcriber in question is a Python application called 'whisper' by Open AI. It is used to transcribe audio files, making them searchable and accessible, which is particularly useful for tasks such as creating subtitles or closed captions for videos.

💡Anaconda

Anaconda is a popular distribution of the Python programming language for scientific computing, that aims to simplify package management and deployment. In the video script, Anaconda is used to install Python and manage the required environment for running the 'whisper' transcriber. It provides a user-friendly interface through the Anaconda Prompt, which is used throughout the installation process.

💡Python Environment

A Python environment is a self-contained directory that includes a Python interpreter, libraries, and other necessary components. In the script, creating a Python environment named 'text_test_speech' with Python version 3.9 is one of the initial steps. This environment allows for the isolation of dependencies for different projects, ensuring that the 'whisper' application runs with the correct version of Python and its dependencies.

💡Conda

Conda is a package manager and an environment management system that is used in the video to create and manage the Python environment. It is a command-line tool that allows users to install, update, and manage packages and environments. The script mentions using 'conda activate' to activate the created environment, which is necessary before running the 'whisper' transcriber.

💡FFmpeg

FFmpeg is a free and open-source software project consisting of a vast software suite of libraries and programs for handling video and audio. In the context of the video, FFmpeg is installed using the command 'conda install -c conda-forge ffmpeg-python' to handle multimedia files, which is essential for the audio to text transcription process.

💡Setup Tools

Setup Tools is a collection of enhancements to the Python 'distutils' that allow developers to more easily build and distribute Python software. In the video, 'setuptools-rust' is installed, which is a Rust extension for building Rust extensions for Python, potentially used to improve the performance of certain components within the 'whisper' transcriber.

💡Model Sizes

The 'whisper' transcriber offers different model sizes, such as tiny, small, medium, and large. These sizes refer to the complexity and capacity of the machine learning model used for transcription. The video script mentions using the 'medium' model, which is a balance between accuracy and speed, suitable for most audio transcription tasks.

💡Language Detection

Language detection is the process of automatically identifying the language of a given text or speech input. The 'whisper' transcriber can auto-detect the language of the audio, but the script also provides an example of how to specify the language manually using the '--language English' option, which can be helpful if the automatic detection is not accurate.

💡Transcription

Transcription in the context of the video refers to the process of converting audio speech into written text. The video demonstrates how to use the 'whisper' transcriber to transcribe a sample audio file. The transcription result is then saved as a text file, which can be used for various purposes such as creating subtitles or analyzing speech content.

💡SRT File

An SRT file, or SubRip file, is a subtitle file format that includes timestamps to synchronize the text with the audio or video it is meant to accompany. In the script, it is mentioned that the 'whisper' transcriber can create an SRT file, which can be opened with Notepad to view the transcribed text with timestamps, making it useful for video subtitling.

💡VTT File

A VTT file, or WebVTT file, is a text track file format used on the web for subtitles, captions, and other types of timed metadata. The video script mentions that the 'whisper' transcriber can also create a VTT file, which is similar to an SRT file but is designed to work with HTML5 video elements, providing a standardized way to add captions to online video content.

Highlights

Installing a free audio to text transcriber using Python application 'whisper' by Open AI.

The application is compatible with Windows 10 and 11, demonstrated on Windows 11.

First step is to install Python using Anaconda.

Anaconda can be downloaded for Windows, Mac OS, and Linux.

Using Anaconda prompt to create a Python environment.

Creating an environment named 'text_test_speech' with Python 3.9.

Activating the created Python environment.

Downloading the Python application files as a zip.

Extracting the downloaded files to the desktop.

Navigating to the Python folder using the Anaconda prompt.

Installing required files for the application.

Installing 'ffmpz' using conda.

Installing 'setuptools-rust' for additional functionality.

Selecting the 'medium' model for transcription accuracy and speed.

Using a sample audio clip for demonstration.

Transcribing the sample audio to text using the 'medium' model.

The transcription process may download a model file if required.

First transcription may take longer due to model file size.

Viewing the transcribed text and SRT file.

Transcription also creates a VTTD file.

Options for conversion include different model sizes and language specification.

Using the 'help' command for more information on usage.

Process of running the transcription after initial setup.

Auto-detection of language if not specified.

Final transcription results and file availability.