The BEST Text to Speech Software | This is WILD

Jenn Jager
5 Dec 202209:09

TLDRIn this video, the host explores Script OverDub, a text-to-speech feature of the Script platform, which is an all-in-one podcasting and audio editing tool. The host is excited to test OverDub's ability to create a personalized voice using AI, potentially saving time in the editing process. After recording a sample of their voice and submitting it for training, they receive an AI-generated voice that sounds remarkably like their own. The host then tests the voice by replacing a misspoken line in a YouTube video, finding the result to be quite natural and impressive, despite some minor inflection differences. They also critique the user interface of the Script platform as being hard to navigate and in need of improvement. The video concludes with anticipation for a future video featuring a custom avatar paired with the AI voice.

Takeaways

  • πŸŽ™οΈ Script Overdub is a text-to-speech feature within the Script platform, which is an all-in-one podcasting and audio editing tool.
  • πŸ” Script Overdub allows users to create a personalized voice using AI by recording their own voice and providing a sample to Descript.
  • πŸ’‘ The reviewer is excited to try Script Overdub as it could save time in the editing process by avoiding re-recording voiceovers for YouTube videos.
  • πŸ“ˆ There is a free trial with a thousand-word vocabulary limit, and pricing options for Creator and Pro tiers with the Pro tier offering unlimited vocabulary.
  • πŸ“² Users need to download the app to use Script Overdub, where they can record their voice or upload an audio file to create their AI voice.
  • πŸ“ Training scripts are provided, but the reviewer used 11 minutes of their own voice from YouTube to train the AI.
  • ⏱️ The training process for the AI voice can take between 2 to 24 hours, with the reviewer's voice taking about 15 hours to be ready.
  • πŸ‘‚ The generated voice was surprisingly similar to the reviewer's actual voice, with some minor differences in inflection.
  • 🎧 The reviewer tested the AI voice by replacing a misspoken line in a YouTube video, finding the integration to be quite seamless.
  • πŸ” The length of the generated voice was shorter than the original, which the reviewer found to be a positive aspect as it tightened up the speech.
  • πŸ’» The user interface of the Script app was criticized for being hard to navigate and lacking in color guidance for user actions.
  • πŸ‘ Despite the UI issues, the reviewer is impressed with the technology and considers it a potential game-changer for their video editing workflow.

Q & A

  • What is the main feature of Script Overdub discussed in the video?

    -The main feature discussed in the video is the text-to-speech functionality that allows users to generate a voiceover using their own recorded voice or stock voices provided by the platform.

  • What is the purpose of the Script Overdub feature for the video creator?

    -The purpose of the Script Overdub feature for the video creator is to save time during the editing process by not having to re-record voiceovers for mistakes made during initial recordings.

  • How does the Script Overdub feature work in terms of creating a custom voice?

    -To create a custom voice with Script Overdub, users can record their own voice directly into the app or upload an existing audio file. The app uses AI to generate a text-to-speech voice based on the provided voice samples.

  • What is the duration of the training script provided by Script Overdub for training the AI voice?

    -The training scripts provided by Script Overdub can go up to 30 minutes in length, with a recommendation of at least 10 minutes of reading for the AI voice training.

  • What is the pricing structure for the Script Overdub feature?

    -The pricing structure for Script Overdub includes a free trial with a thousand-word vocabulary for the Creator tier at $12 a month, and a Pro version at $24 a month with unlimited vocabulary.

  • How long does it take for Script Overdub to generate a custom voice after submitting training data?

    -After submitting training data, it can take between 2 and 24 hours for Script Overdub to generate a custom voice.

  • What was the creator's experience with the Script Overdub interface?

    -The creator found the Script Overdub interface to be not user-friendly, hard to navigate, and lacking in color contrast, which made it difficult to identify clickable areas.

  • How does the generated voice from Script Overdub compare to the original voice recording?

    -The generated voice from Script Overdub was found to be very similar to the original voice recording, although the creator noticed some differences in inflection that were not as pronounced as expected.

  • What is the video creator's plan for using the Script Overdub feature with Synthesia's custom avatar?

    -The video creator plans to pair the generated voice from Script Overdub with a custom avatar created by Synthesia, which will be featured in an upcoming video.

  • How does the video creator intend to use the Script Overdub feature in their YouTube videos?

    -The video creator intends to use the Script Overdub feature to replace misspoken lines in their YouTube videos without having to re-record the entire voiceover.

  • What are the potential benefits of using Script Overdub for video editing?

    -The potential benefits of using Script Overdub for video editing include time-saving by avoiding re-recording of voiceovers, and the ability to create a more polished final product with fewer retakes.

Outlines

00:00

πŸŽ™οΈ Script Overdub: A Text-to-Speech Tool for Podcasters

The video script introduces Script Overdub, a feature within the Script app that allows users to create a text-to-speech voiceover using their own voice. The host is excited to explore this tool, which could save time during the editing process by eliminating the need to re-record voiceovers. The Script app offers a free trial with a thousand-word vocabulary limit, with pricing options for Creator and Pro tiers. The host guides viewers through the process of creating a new voice, submitting training data, and waiting for the AI to generate their custom voice. After a successful training, the host listens to the AI-generated voice and is impressed by the resemblance to their own, despite minor inflection differences.

05:05

πŸ” Testing Script Overdub's Integration with Real Video Footage

In this paragraph, the host tests the Script Overdub feature by integrating its generated voiceover into an existing video project. They compare the AI-generated voice to the original audio from a camera microphone to assess the match in sound quality and naturalness. The host notes that the generated voiceover is shorter than the original, which could be beneficial for tightening up speech. After making minor adjustments to the script, the host imports the AI-generated voiceover into their video editing software, Final Cut Pro, and finds that, while they can detect slight differences in inflection, the overall quality and naturalness are impressive. The host concludes that the Script Overdub could be a game-changing tool for their video editing process, despite some criticisms regarding the app's user interface and navigation.

Mindmap

Keywords

πŸ’‘Text to Speech

Text to Speech (TTS) is a technology that converts written text into audible speech. In the context of the video, TTS is used to generate a voiceover for videos without the need for manual recording. The script's 'overdub' feature allows users to create a TTS version of their own voice, which can be used to replace misspoken words or to create voiceovers for videos efficiently.

πŸ’‘Script

In the video, 'script' refers to an all-in-one podcasting and audio editing tool that also offers voice transcriptions and a TTS feature called 'overdub'. The tool is used to create custom voiceovers and is highlighted for its ability to save time during the video editing process by allowing users to generate voiceovers from text.

πŸ’‘Overdub

Overdub is a feature within the script tool that enables text to speech functionality. It is particularly exciting for the video creator because it offers a way to correct mistakes in voiceovers without re-recording. The feature uses AI to generate a voice that can be dropped into videos, aiming to match the original recording's natural flow and tone.

πŸ’‘Custom Avatar

A custom avatar in the video refers to a digital representation of the creator that can be used in videos or other media. The creator is excited about getting a custom avatar made by Synthesia, which will be paired with the voice generated by descript to create a deep fake video. This indicates a combination of visual and audio personalization technologies.

πŸ’‘AI Voice

AI Voice denotes the artificial intelligence-generated voice that mimics the user's own voice. In the script, the creator is testing the AI voice feature of the script tool to see how closely it can replicate their natural speaking voice. The AI voice is used to create a text-to-speech voiceover that can be integrated into the creator's YouTube videos.

πŸ’‘Pro Plan

The Pro Plan mentioned in the script is a subscription tier for the script tool that offers additional features and capabilities compared to the free trial or the Creator tier. It includes an unlimited vocabulary for the overdub feature, which is beneficial for users who need to generate longer voiceovers.

πŸ’‘Transcription

Transcription in the context of the video is the process of converting spoken language into written form. The script tool provides voice transcription services, which are essential for creating AI-generated voices. The creator uses transcription to train the AI to mimic their voice accurately.

πŸ’‘Training Data

Training data is the audio or text data used to teach an AI model how to perform a specific task. In the video, the creator provides training data by recording their voice and submitting it to the script tool. This data helps the AI learn the nuances of the creator's speech patterns to generate a more accurate AI voice.

πŸ’‘User Interface (UI)

The user interface, or UI, is the part of a software program or application that allows users to interact with it. The creator of the video critiques the script tool's UI for being difficult to navigate and lacking in color guidance, which could make the tool more user-friendly.

πŸ’‘Synthesia

Synthesia is a company that specializes in creating custom avatars, which can be used for various purposes, including video production. In the video, the creator is getting a custom avatar made by Synthesia to be paired with the AI voice generated by the script tool, indicating an integration of different technologies for a unique video production experience.

πŸ’‘Voiceover

A voiceover is a production technique where a voice is recorded and added to a video, typically to narrate or provide additional information. In the script, the creator is testing the script's overdub feature to generate a voiceover that can be seamlessly integrated into their YouTube videos, enhancing the editing process.

Highlights

Introduction to Script Overdub, a text to speech feature within the Script platform.

Script is an all-in-one podcasting and audio editing tool with voice transcription capabilities.

Overdub allows users to create a text to speech version of their own voice using AI.

The reviewer is excited to try Overdub as part of a larger project involving a custom avatar.

Overdub could save time by eliminating the need to re-record voiceovers for YouTube videos.

The free trial of Script's Pro Plan includes a thousand-word vocabulary for Overdub.

Pricing for Script's Creator tier is $12 per month for a thousand words, with the Pro version at $24 per month for unlimited vocabulary.

Instructions on how to sign up for Script and download the app are provided.

Users can record their own voice directly in the Descript app or upload an audio file.

Descript recommends at least 10 minutes of reading for training the AI voice.

The transcription process takes only about 30 seconds, and users can submit training data for their voice.

Training the AI voice can take between 2 and 24 hours.

The reviewer's AI voice was ready after approximately 15 hours.

The AI-generated voice is remarkably similar to the reviewer's real voice, with some differences in inflection.

A real-life test is conducted by replacing a misspoken line in a YouTube video with the AI voice.

The AI voice from Script is slightly shorter than the original, which can be beneficial for editing.

The reviewer finds the AI voice to be uncannily similar and a potential game-changer for video editing.

Critique of the Script's user interface as being hard to navigate and in need of improvement.

The technology behind Overdub is praised despite the UI issues.

Upcoming video will feature the AI voice paired with a custom avatar from Synthesia.

A link to try Script Overdub is provided in the description for interested viewers.