• About AssemblyAI

JavaScript Text-to-Speech - The Easy Way

Learn how to build a simple JavaScript Text-to-Speech application using JavaScript's Web Speech API in this step-by-step beginner's guide.

JavaScript Text-to-Speech - The Easy Way

Contributor

When building an app, you may want to implement a Text-to-Speech feature for accessibility, convenience, or some other reason. In this tutorial, we will learn how to build a very simple JavaScript Text-to-Speech application using JavaScript's built-in Web Speech API .

For your convenience, we have provided the code for this tutorial application ready for you to fork and play around with over at Replit , or ready for you to clone from Github . You can also view a live version of the app here .

Step 1 - Setting Up The App

First, we set up a very basic application using a simple HTML file called index.html and a JavaScript file called script.js .

We'll also use a CSS file called style.css to add some margins and to center things, but it’s entirely up to you if you want to include this styling file.

The HTML file index.html defines our application's structure which we will add functionality to with the JavaScript file. We add an <h1> element which acts as a title for the application, an <input> field in which we will enter the text we want spoken, and a <button> which we will use to submit this input text. We finally wrap all of these objects inside of a <form> . Remember, the input and the button have no functionality yet - we'll add that in later using JavaScript.

Inside of the <head> element, which contains metadata for our HTML file, we import style.css . This tells our application to style itself according to the contents of style.css . At the bottom of the <body> element, we import our script.js file. This tells our application the name of the JavaScript file that stores the functionality for the application.

Now that we have finished the index.html file, we can move on to creating the script.js JavaScript file.

Since we imported the script.js file to our index.html file above, we can test its functionality by simply sending an alert .

To add an alert to our code, we add the line of code below to our script.js file. Make sure to save the file and refresh your browser, you should now see a little window popping up with the text "It works!".

If everything went ok, you should be left with something like this:

JavaScript Text to Speech application

Step 2 - Checking Browser Compatibility

To create our JavaScript Text-to-Speech application, we are going to utilize JavaScript's built-in Web Speech API. Since this API isn’t compatible with all browsers, we'll need to check for compatibility. We can perform this check in one of two ways.

The first way is by checking our operating system and version on caniuse.com .

The second way is by performing the check right inside of our code, which we can do with a simple conditional statement:

This is a shorthand if/else statement, and is equivalent to the following:

If you now run the app and check your browser console, you should see one of those messages. You can also choose to pass this information on to the user by rendering an HTML element.

Step 3 - Testing JavaScript Text-to-Speech

Next up, let’s write some static code to test if we can make the browser speak to us.

Add the following code to the script.js file.

Code Breakdown

Let’s look at a code breakdown to understand what's going on:

  • With const synth = window.speechSynthesis we declare the synth variable to be an instance of the SpeechSynthesis object, which is the entry to point to using JavaScript's Web Speech API. The speak method of this object is what ultimately converts text into speech.
  • let ourText = “Hey there what’s up!!!!” defines the ourText variable which holds the string of text that we want to be uttered.
  • const utterThis = new SpeechSynthesisUtterance(ourText) defines the utterThis variable to be a SpeechSynthesisUtterance object, into which we pass ourText .
  • Putting it all together, we call synth.speak(utterThis) , which utters the string inside ourText .

Save the code and refresh the browser window in which your app runs in order to hear a voice saying “ Hey there what’s up!!!! ”.

Step 4 - Making Our App Dynamic

Our code currently provides us with a good understanding of how the Text-to-Speech aspect of our application works under the hood, but the app at this point only converts the static text which we defined with ourText into speech. We want to be able to dynamically change what text is being converted to speech when using the application. Let’s do that now utilizing a <form> .

  • First, we add the const textInputField = document.querySelector("#text-input") variable, which allows us to access the value of the <input> tag that we have defined in the index.html file in our JavaScript code. We select the <input> field by its id: #text-input .
  • Secondly, we add the const form = document.querySelector("#form") variable, which selects our form by its id #form so we can later submit the <form> using the onsubmit function.
  • We initialize ourText as an empty string instead of a static sentence.
  • We wrap our browser compatibility logic in a function called checkBrowserCompatibility and then immediately call this function.

Finally, we create an onsubmit handler that executes when we submit our form. This handler does several things:

  • event.preventDefault() prevents the browser from reloading after submitting the form.
  • ourText = textInputField.value sets our ourText string to whatever we enter in the "input" field of our application.
  • utterThis.text = ourText sets the text to be uttered to the value of ourText .
  • synth.speak(utterThis) utters our text string.
  • textInputField.value resets the value of our input field to an empty string after submitting the form.

Step 5 - Testing Our JavaScript Text-to-Speech App

To test our JavaScript Text-to-Speech application, simply enter some text in the input field and hit “Submit” in order to hear the text converted to speech.

Additional Features

There are a lot of properties that can be modified when working with the Web Speech API. For instance:

You can try playing around with these properties to tailor the application to your needs.

This simple example provides an outline of how to use the Web Speech API for JavaScript Text-to-Speech .

While Text-to-Speech is useful for accessibility, convenience, and other purposes, there are a lot of use-cases in which the opposite functionality, i.e. Speech-to-Text, is useful. We have built a couple of example projects using AssemblyAI’s Speech-to-Text API that you can check out for those who want to learn more.

Some of them are:

  • React Speech Recognition with React Hooks
  • How To Convert Voice To Text Using JavaScript

Popular posts

🚀 Upgraded Automatic Language Detection + Latest Tutorials

🚀 Upgraded Automatic Language Detection + Latest Tutorials

Smitha Kolan's picture

Developer Educator

What is speech to text? The complete guide

What is speech to text? The complete guide

Jesse Sumrak's picture

Featured writer

Analyze Audio from Zoom Calls with AssemblyAI and Node.js

Analyze Audio from Zoom Calls with AssemblyAI and Node.js

David Ekete's picture

Announcements

Automatic language detection improvements: increased accuracy & expanded language support

JD Prater's picture

Head of Product Marketing

Code Boxx

Javascript Text To Speech (Simple Examples)

Table of contents, download & notes.

Here is the download link to the example code, so you don’t have to copy-paste everything.

EXAMPLE CODE DOWNLOAD

Sorry for the ads..., javascript text to speech.

All right, let us now get into more examples of using text-to-speech in Javascript.

TUTORIAL VIDEO

1) simple text to speech, 1a) the html.

Yes, that’s just a single button for this simple demo.

1B) THE JAVASCRIPT

This is the same as the introduction snippet, except that it does a feature check before enabling the test button – if ("speechSynthesis" in window) . At the time of writing, speechSynthesis is not “universally supported” in all browsers and operating systems. So, it’s good to add a few lines of code and do compatibility checks.

1C) THE DEMO

2) choosing a voice, 2a) the html, 2b) the javascript, 2c) the demo, 3) more controls – volume, pitch, rate, 3a) the html, 3b) the javascript, 3c) the demo, compatibility checks, links & references.

  • Skip to main content
  • Skip to search
  • Skip to select language
  • Sign up for free

Using the Web Speech API

Speech recognition.

Speech recognition involves receiving speech through a device's microphone, which is then checked by a speech recognition service against a list of grammar (basically, the vocabulary you want to have recognized in a particular app.) When a word or phrase is successfully recognized, it is returned as a result (or list of results) as a text string, and further actions can be initiated as a result.

The Web Speech API has a main controller interface for this — SpeechRecognition — plus a number of closely-related interfaces for representing grammar, results, etc. Generally, the default speech recognition system available on the device will be used for the speech recognition — most modern OSes have a speech recognition system for issuing voice commands. Think about Dictation on macOS, Siri on iOS, Cortana on Windows 10, Android Speech, etc.

Note: On some browsers, such as Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.

To show simple usage of Web speech recognition, we've written a demo called Speech color changer . When the screen is tapped/clicked, you can say an HTML color keyword, and the app's background color will change to that color.

The UI of an app titled Speech Color changer. It invites the user to tap the screen and say a color, and then it turns the background of the app that color. In this case it has turned the background red.

To run the demo, navigate to the live demo URL in a supporting mobile browser (such as Chrome).

HTML and CSS

The HTML and CSS for the app is really trivial. We have a title, instructions paragraph, and a div into which we output diagnostic messages.

The CSS provides a very simple responsive styling so that it looks OK across devices.

Let's look at the JavaScript in a bit more detail.

Prefixed properties

Browsers currently support speech recognition with prefixed properties. Therefore at the start of our code we include these lines to allow for both prefixed properties and unprefixed versions that may be supported in future:

The grammar

The next part of our code defines the grammar we want our app to recognize. The following variable is defined to hold our grammar:

The grammar format used is JSpeech Grammar Format ( JSGF ) — you can find a lot more about it at the previous link to its spec. However, for now let's just run through it quickly:

  • The lines are separated by semicolons, just like in JavaScript.
  • The first line — #JSGF V1.0; — states the format and version used. This always needs to be included first.
  • The second line indicates a type of term that we want to recognize. public declares that it is a public rule, the string in angle brackets defines the recognized name for this term ( color ), and the list of items that follow the equals sign are the alternative values that will be recognized and accepted as appropriate values for the term. Note how each is separated by a pipe character.
  • You can have as many terms defined as you want on separate lines following the above structure, and include fairly complex grammar definitions. For this basic demo, we are just keeping things simple.

Plugging the grammar into our speech recognition

The next thing to do is define a speech recognition instance to control the recognition for our application. This is done using the SpeechRecognition() constructor. We also create a new speech grammar list to contain our grammar, using the SpeechGrammarList() constructor.

We add our grammar to the list using the SpeechGrammarList.addFromString() method. This accepts as parameters the string we want to add, plus optionally a weight value that specifies the importance of this grammar in relation of other grammars available in the list (can be from 0 to 1 inclusive.) The added grammar is available in the list as a SpeechGrammar object instance.

We then add the SpeechGrammarList to the speech recognition instance by setting it to the value of the SpeechRecognition.grammars property. We also set a few other properties of the recognition instance before we move on:

  • SpeechRecognition.continuous : Controls whether continuous results are captured ( true ), or just a single result each time recognition is started ( false ).
  • SpeechRecognition.lang : Sets the language of the recognition. Setting this is good practice, and therefore recommended.
  • SpeechRecognition.interimResults : Defines whether the speech recognition system should return interim results, or just final results. Final results are good enough for this simple demo.
  • SpeechRecognition.maxAlternatives : Sets the number of alternative potential matches that should be returned per result. This can sometimes be useful, say if a result is not completely clear and you want to display a list if alternatives for the user to choose the correct one from. But it is not needed for this simple demo, so we are just specifying one (which is actually the default anyway.)

Starting the speech recognition

After grabbing references to the output <div> and the HTML element (so we can output diagnostic messages and update the app background color later on), we implement an onclick handler so that when the screen is tapped/clicked, the speech recognition service will start. This is achieved by calling SpeechRecognition.start() . The forEach() method is used to output colored indicators showing what colors to try saying.

Receiving and handling results

Once the speech recognition is started, there are many event handlers that can be used to retrieve results, and other pieces of surrounding information (see the SpeechRecognition events .) The most common one you'll probably use is the result event, which is fired once a successful result is received:

The second line here is a bit complex-looking, so let's explain it step by step. The SpeechRecognitionEvent.results property returns a SpeechRecognitionResultList object containing SpeechRecognitionResult objects. It has a getter so it can be accessed like an array — so the first [0] returns the SpeechRecognitionResult at position 0. Each SpeechRecognitionResult object contains SpeechRecognitionAlternative objects that contain individual recognized words. These also have getters so they can be accessed like arrays — the second [0] therefore returns the SpeechRecognitionAlternative at position 0. We then return its transcript property to get a string containing the individual recognized result as a string, set the background color to that color, and report the color recognized as a diagnostic message in the UI.

We also use the speechend event to stop the speech recognition service from running (using SpeechRecognition.stop() ) once a single word has been recognized and it has finished being spoken:

Handling errors and unrecognized speech

The last two handlers are there to handle cases where speech was recognized that wasn't in the defined grammar, or an error occurred. The nomatch event seems to be supposed to handle the first case mentioned, although note that at the moment it doesn't seem to fire correctly; it just returns whatever was recognized anyway:

The error event handles cases where there is an actual error with the recognition successfully — the SpeechRecognitionErrorEvent.error property contains the actual error returned:

Speech synthesis

Speech synthesis (aka text-to-speech, or TTS) involves receiving synthesizing text contained within an app to speech, and playing it out of a device's speaker or audio output connection.

The Web Speech API has a main controller interface for this — SpeechSynthesis — plus a number of closely-related interfaces for representing text to be synthesized (known as utterances), voices to be used for the utterance, etc. Again, most OSes have some kind of speech synthesis system, which will be used by the API for this task as available.

To show simple usage of Web speech synthesis, we've provided a demo called Speak easy synthesis . This includes a set of form controls for entering text to be synthesized, and setting the pitch, rate, and voice to use when the text is uttered. After you have entered your text, you can press Enter / Return to hear it spoken.

UI of an app called speak easy synthesis. It has an input field in which to input text to be synthesized, slider controls to change the rate and pitch of the speech, and a drop down menu to choose between different voices.

To run the demo, navigate to the live demo URL in a supporting mobile browser.

The HTML and CSS are again pretty trivial, containing a title, some instructions for use, and a form with some simple controls. The <select> element is initially empty, but is populated with <option> s via JavaScript (see later on.)

Let's investigate the JavaScript that powers this app.

Setting variables

First of all, we capture references to all the DOM elements involved in the UI, but more interestingly, we capture a reference to Window.speechSynthesis . This is API's entry point — it returns an instance of SpeechSynthesis , the controller interface for web speech synthesis.

Populating the select element

To populate the <select> element with the different voice options the device has available, we've written a populateVoiceList() function. We first invoke SpeechSynthesis.getVoices() , which returns a list of all the available voices, represented by SpeechSynthesisVoice objects. We then loop through this list — for each voice we create an <option> element, set its text content to display the name of the voice (grabbed from SpeechSynthesisVoice.name ), the language of the voice (grabbed from SpeechSynthesisVoice.lang ), and -- DEFAULT if the voice is the default voice for the synthesis engine (checked by seeing if SpeechSynthesisVoice.default returns true .)

We also create data- attributes for each option, containing the name and language of the associated voice, so we can grab them easily later on, and then append the options as children of the select.

Older browser don't support the voiceschanged event, and just return a list of voices when SpeechSynthesis.getVoices() is fired. While on others, such as Chrome, you have to wait for the event to fire before populating the list. To allow for both cases, we run the function as shown below:

Speaking the entered text

Next, we create an event handler to start speaking the text entered into the text field. We are using an onsubmit handler on the form so that the action happens when Enter / Return is pressed. We first create a new SpeechSynthesisUtterance() instance using its constructor — this is passed the text input's value as a parameter.

Next, we need to figure out which voice to use. We use the HTMLSelectElement selectedOptions property to return the currently selected <option> element. We then use this element's data-name attribute, finding the SpeechSynthesisVoice object whose name matches this attribute's value. We set the matching voice object to be the value of the SpeechSynthesisUtterance.voice property.

Finally, we set the SpeechSynthesisUtterance.pitch and SpeechSynthesisUtterance.rate to the values of the relevant range form elements. Then, with all necessary preparations made, we start the utterance being spoken by invoking SpeechSynthesis.speak() , passing it the SpeechSynthesisUtterance instance as a parameter.

In the final part of the handler, we include a pause event to demonstrate how SpeechSynthesisEvent can be put to good use. When SpeechSynthesis.pause() is invoked, this returns a message reporting the character number and name that the speech was paused at.

Finally, we call blur() on the text input. This is mainly to hide the keyboard on Firefox OS.

Updating the displayed pitch and rate values

The last part of the code updates the pitch / rate values displayed in the UI, each time the slider positions are moved.

Javascript Text To Speech: A Guide To Converting TTS With Javascript

Want to add a unique feature to your website? Learn how to integrate Javascript text-to-speech capabilities with this helpful guide.

Unreal Speech

Unreal Speech

Javascript text to speech technology provides a unique digital experience by converting written content into spoken words. By leveraging Javascript text to speech, developers gain access to an array of features that can enhance user interaction and engagement. When applying text to speech technology, developers can implement accessibility features like screen readers, voice commands, and more. Through this technology, developers can bring an innovative and interactive dimension to their digital applications. Embrace the possibilities of Javascript text to speech and explore the endless potential it holds for your projects.

Table of Contents

Introduction to javascript text to speech (tts), understanding javascript tts apis, getting started with javascript text to speech, customizing the speech to your preferences, best practices for javascript tts development, try unreal speech for free today — affordably and scalably convert text into natural-sounding speech with our text-to-speech api.

person helping junior dev with Javascript Text To Speech

Text-to-Speech (TTS) technology enables computers to convert written text into spoken words. In the context of JavaScript, TTS allows developers to integrate speech synthesis capabilities directly into web applications. With TTS, users can interact with websites and applications using voice commands and receive audible feedback.

Importance of TTS in Web Development

TTS plays a crucial role in enhancing the accessibility and usability of web content. It enables users with visual impairments or reading difficulties to access online information more effectively by listening to the content instead of reading it. TTS can provide a more engaging and interactive user experience, especially in applications that require hands-free operation or communication with users in diverse environments. Integrating TTS into web development projects can improve inclusivity, usability, and overall user satisfaction.

Cutting-Edge and Cost-Effective Text-to-Speech Solution

Unreal Speech offers a low-cost, highly scalable text-to-speech API with natural-sounding AI voices, which is the cheapest and most high-quality solution in the market . Unreal Speech can cut your text-to-speech costs by up to 90%. Get human-like AI voices with their super-fast, low-latency API, with the option for per-word timestamps. The simple, easy-to-use API allows you to give your LLM a voice with ease and offer this functionality at scale. If you are looking for cheap, scalable, realistic TTS to incorporate into your products, try our text-to-speech AP I for free today. Convert text into natural-sounding speech at an affordable and scalable price.

person working on code - Javascript Text To Speech

Web Speech API Overview

The Web Speech API provides a straightforward way to incorporate text-to-speech functionality into web applications using JavaScript. With just a few lines of code, developers can enable their browsers to convert text into speech. This API opens up a plethora of opportunities for enhancing user experiences and accessibility on the web.

TTS Functionality with Web Speech API

The Web Speech API offers developers various methods and functions to synthesize speech from plain text. Developers can further control speech synthesis parameters like voice selection and rate and handle related events effectively. While the Web Speech API's functionality largely depends on the voices available in the browser or the user's operating system, it provides a quick and cost-effective solution for integrating TTS capabilities into web applications.

Browser Compatibility and Support

Although the Web Speech API is supported by many modern web browsers such as Chrome, Firefox, and Safari, developers should be mindful of discrepancies in implementation across different platforms. Ensuring consistent behavior and fallback options is crucial for a seamless user experience. Testing browser compatibility and having fallback options ready can help maintain accessibility and functionality across various devices and platforms.

Affordable and Scalable Text-to-Speech Solution

If you are looking for cheap, scalable, realistic TTS to incorporate into your products, try our text-to-speech API for free today. Convert text into natural-sounding speech at an affordable and scalable price.

HTML and JS combination - Javascript Text To Speech

To utilize the Web Speech API for text-to-speech functionality in your JavaScript project, you need to include the necessary scripts in your HTML file. This typically involves referencing the SpeechSynthesis interface provided by the Web Speech API.

The speechSynthesis Interface

Begin by creating an HTML file and linking your JavaScript file using the script src tag. In your JavaScript file, initialize the speech synthesis object and set up an event listener for when the voices are ready. ```javascript const synth = window.speechSynthesis; // Wait for voices to be loaded synth.onvoiceschanged = () => {    const voices = synth.getVoices();    // Do something with the available voices }; ``` Once the voices are loaded, you can access them using the synth.getVoices() method. This will return a list of available voices that you can use for speech synthesis. You can loop through the voices using forEach and display them in your HTML. ```javascript const voiceSelect = document.getElementById(‘voice-select’); voices.forEach((voice) => {    const option = document.createElement(‘option’);    option.textContent = ${voice.name} (${voice.lang});    option.setAttribute(‘value’, voice.lang);    voiceSelect.appendChild(option); }); ``` Next, you can create a function to synthesize speech from the selected voice. This function takes the text input from a textarea element and uses the selected voice to generate speech. ```javascript const speak = () => {    const text = document.getElementById(‘text-input’).value;    const voice = voices[voiceSelect.selectedIndex];    const utterance = new SpeechSynthesisUtterance(text);    utterance.voice = voice;    synth.speak(utterance); }; ``` Add an event listener to the button or form submit to trigger the speak function. ```javascript const button = document.getElementById(‘speak-button’); button.addEventListener(‘click’, speak);

/ ``` With these few lines of code, you can convert text to speech in real-time. If you are looking for cheap, scalable, realistic TTS to incorporate into your products, try our text-to-speech API for free today. Convert text into natural-sounding speech at an affordable and scalable price.

To customize the speech rate, pitch, and volume with the Javascript text to speech, you can set properties on the SpeechSynthesisUtterance object. For the rate, a decimal value between 0.1 (slowest) and 10 (fastest) can be set, with 1 being the default value. The pitch, on the other hand, is a decimal value between 0 (lowest) and 2 (highest; almost audible only for felines), with 1 being the default value. For a decent speech quality , you can customize these values with an example shown below:

  • utterance.rate = 0.8;
  • utterance.pitch = 1;
  • utterance.volume = 1;

person testing his Javascript Text To Speech

Efficient Algorithms for Text-to-Speech Processing

When developing a JavaScript TTS system, I always make sure to implement efficient algorithms that can handle large volumes of text quickly and accurately. By using efficient algorithms, I can drastically reduce latency and resource consumption, resulting in a smoother and more responsive user experience.

Caching Frequently Used Data for Improved Performance

Caching frequently accessed or processed text data is vital for optimizing the performance of my JavaScript TTS applications. By storing commonly used text data in a cache, I can avoid repetitive computations during speech synthesis, thereby reducing the overhead and enhancing the overall efficiency of the system.

Testing Across Multiple Browsers for Compatibility

One of the key best practices I follow for JavaScript TTS development is to ensure that my implementation works seamlessly across various web browsers and versions. By validating my JavaScript TTS code across multiple browsers, I can identify and address any compatibility issues, guaranteeing a consistent user experience across different platforms.

Handling Browser-Specific Behavior with Fallbacks

To account for differences in browser behavior and feature support, I always incorporate fallback mechanisms or polyfills in my JavaScript TTS applications. By doing so, I can ensure that my TTS system functions correctly even in browsers that lack full support for the Web Speech API, thereby enhancing the overall reliability and robustness of the application.

Comprehensive Testing for Enhanced TTS Functionality

Thorough testing is a crucial aspect of developing a successful JavaScript TTS system. I test my TTS functionality across a wide range of scenarios, including varying text inputs, speech settings, and user interactions. Through comprehensive testing, I can identify and address any potential issues or limitations, ensuring that my JavaScript TTS implementation delivers a seamless and flawless user experience.

Unreal Speech is a game-changer in the world of text-to-speech API solutions. By using Unreal Speech, you can experience a drastic reduction in the costs associated with text-to-speech, potentially up to a whopping 90%. A unique aspect of Unreal Speech is its capability to provide you with a highly scalable text-to-speech API. Thus, whether your project is small or large-scale, Unreal Speech has your back. And the cherry on top? The quality of the speech generated by this tool is top-notch, with AI voices that sound very natural. These voices are so close to human-like that you might have to remind yourself that it's actually a machine speaking!

Precision and Ease of Use with Per-Word Timestamps

Unreal Speech also boasts a feature that is incredibly beneficial for many projects: per-word  timestamps. Developers like me often need such precision when it comes to generating speech from text. Thus, having per-word timestamps as a feature is not only convenient but also a necessity. Speaking of convenience, Unreal Speech is designed with a simple API that is easy to use. In Javascript Text-to-Speech, I can vouch for the fact that a simple interface is a blessing when working on projects that involve generating speech.

Unreal Speech's Fast Operation

Another amazing feature of Unreal Speech is the speed at which it operates. With super low latency, you can expect the tool to work fast, getting you your results in no time. This is a huge plus point, particularly when working on time-sensitive projects. Therefore, if you’re looking for an affordable and scalable text-to-speech solution that offers high-quality output and is easy to integrate into your products, Unreal Speech is the way to go. Give it a try today!

  • Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

Using Google Text-To-Speech in Javascript

I need to play Google text-to-speech in JavaScript. The idea is to use the web service:

http://translate.google.com/translate_tts?tl=en&q=This%20is%20just%20a%20test

And play it on a certian action, e.g. a button click.

But it seems that it is not like loading a normal wav/mp3 file:

How can I do this?

  • google-text-to-speech

Betamoo's user avatar

7 Answers 7

Another option now may be HTML5 text to speech , which is in Chrome 33+ and many others .

Here is a sample:

With this, perhaps you do not need to use a web service at all.

Brian M. Hunt's user avatar

  • 1 The great thing about the HTML5 Speech Synthesis is that you can adjust the voice pitch, fluctuation, etc –  Jake Wilson Commented Nov 15, 2014 at 3:01
  • 1 Supported by Safari, IE? –  Jasper Commented May 5, 2016 at 13:34
  • You should add properties such as pitch, rate, speed, and etc. –  user8903269 Commented Mar 15, 2018 at 0:17
  • @Rejaul Works in Chrome 76 –  Brian M. Hunt Commented Jul 29, 2019 at 18:43
  • This works on Chrome, Firefox, Edge, Opera and not on IE –  Nanju Commented Mar 18, 2020 at 5:44

Here is the code snippet I found:

  • 3 It seems that if you remove the 'dot' at the end it works fine, otherwise it's not playing the sound. –  Diego Commented Sep 17, 2013 at 15:53
  • 5 Keep in mind that Google Translate has a limit of ~100 letters. –  niutech Commented Jan 8, 2014 at 1:25
  • 8 It seems that Google will ban requests with a Referrer in the HTTP header. Is there a way to bypass that issue? –  jichi Commented Jan 31, 2014 at 20:03
  • @jichi Take a look on this answer and comments to this answer . –  Piotr Sobczyk Commented Aug 3, 2014 at 7:13
  • 22 As of today this doesn't seem to be working any more; I get 302 and then 403. –  icedwater Commented Jan 28, 2016 at 13:59

You can use the SpeechSynthesisUtterance with a function like say :

Then you only need to call say(msg) when using it.

Update : Look at Google's Developer Blog that is about Voice Driven Web Apps Introduction to the Web Speech API.

KyleMit's user avatar

  • 1 are you the same user as the JudahRR who wrote say.js? If so, you should disclose that affiliation. Also, probably no need to consume an external library for a 10 line function that just consumes the browser's native API anyway and doesn't need all of the default settings. –  KyleMit ♦ Commented Dec 23, 2018 at 1:01
  • 1 As this uses the google voices, is there any way to load other languages, like Danish? I tried 'da' and 'da-DK' but she still spoke English. I could use the other solutions here, but this options seem like the easiest one. –  Leif Neland Commented Apr 7, 2021 at 0:51

Very easy with responsive voice. Just include the js and voila!

ManuelC's user avatar

  • 4 Important note here: Although this has interesting immediate results, it is using their API via https://code.responsivevoice.org/getvoice.php to generate voices and send them back as rendered audio. Their service is however not free for non commercial projects. –  Jankapunkt Commented Sep 5, 2019 at 10:50

The below JavaScript code sends "text" to be spoken/converted to mp3 audio to google cloud text-to-speech API and gets mp3 audio content as response back.

jkr's user avatar

  • 2 this is the only answer that answers the question exactly –  Sam Commented Feb 11, 2022 at 19:38
  • This answer is great, but how should I decode the given data, and how to use it –  Богуслав Павлишинець Commented Jun 12, 2022 at 12:23
  • You can play the result for instance like this: const base64MP3 = res.audioContent; const audioSrc = data:audio/mpeg;base64,${base64MP3} ; const audioElement = new Audio(audioSrc); audioElement.play(); –  Petr Commented Mar 12 at 14:32
  • Does anyone happen to know what billing this answer falls under? I'm confused about all of the options when I go to the TTS api page. –  Patrick Commented Jul 3 at 18:45

I don't know of Google voice, but using the javaScript speech SpeechSynthesisUtterance, you can add a click event to the element you are reference to. eg:

const listenBtn = document.getElementById('myvoice'); listenBtn.addEventListener('click', (e) => { e.preventDefault(); const msg = new SpeechSynthesisUtterance( "Hello, hope my code is helpful" ); window.speechSynthesis.speak(msg); }); <button type="button" id='myvoice'>Listen to me</button>

h3t1's user avatar

  • as of 2023 it doesnt work on chrome mobile. on firefox mobile it works. –  Michal - wereda-net Commented Dec 13, 2023 at 12:30

Run this code it will take input as audio(microphone) and convert into the text than audio play.

Speech to text converter in JS var r = document.getElementById('result');

Varun Rajkumar's user avatar

Your Answer

Reminder: Answers generated by artificial intelligence tools are not allowed on Stack Overflow. Learn more

Sign up or log in

Post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged javascript google-text-to-speech or ask your own question .

  • The Overflow Blog
  • Where does Postgres fit in a world of GenAI and vector databases?
  • Mobile Observability: monitoring performance through cracked screens, old...
  • Featured on Meta
  • Announcing a change to the data-dump process
  • Bringing clarity to status tag usage on meta sites
  • What does a new user need in a homepage experience on Stack Overflow?
  • Staging Ground Reviewer Motivation
  • Feedback requested: How do you use tag hover descriptions for curating and do...

Hot Network Questions

  • Whatever happened to Chessmaster?
  • Usage of 別に in this context
  • Why are complex coordinates outlawed in physics?
  • Is there a difference between these two layouts?
  • How can I prove the existence of multiplicative inverses for the complex number system
  • Stuck on Sokoban
  • Why there is no article after 'by'?
  • Is 3 Ohm resistance value of a PCB fuse reasonable?
  • I overstayed 90 days in Switzerland. I have EU residency and never got any stamps in passport. Can I exit/enter at airport without trouble?
  • Is this screw inside a 2-prong receptacle a possible ground?
  • Suitable Category in which Orbit-Stabilizer Theorem Arises Naturally as Canonical Decomposition
  • High voltage, low current connectors
  • 3D printed teffilin?
  • How did Oswald Mosley escape treason charges?
  • How can I get the Thevenin equivalent of this circuit?
  • Why was this lighting fixture smoking? What do I do about it?
  • Is it possible to accurately describe something without describing the rest of the universe?
  • Where to donate foreign-language academic books?
  • Are there any polls on the opinion about Hamas in the broader Arab or Muslim world?
  • If inflation/cost of living is such a complex difficult problem, then why has the price of drugs been absoultly perfectly stable my whole life?
  • Could someone tell me what this part of an A320 is called in English?
  • I would like to add 9 months onto a field table that is a date and have it populate with the forecast date 9 months later
  • Is it possible to have a planet that's gaslike in some areas and rocky in others?
  • Background for the Elkies-Klagsbrun curve of rank 29

how to make text to speech in javascript

How to Convert Text to Speech using HTML, CSS and JavaScript

Faraz

By Faraz - Last Updated: August 09, 2024

Learn how to implement text to speech in JavaScript using Speech Synthesis API. Follow our step-by-step guide and add this exciting feature to your website!

how to convert text to speech in javascript.jpg

Table of Contents

  • Project Introduction
  • JavaScript Code

Welcome to our guide on how to convert text to speech in JavaScript using the Speech Synthesis API. Text-to-speech technology has become increasingly popular, and implementing it on your website can improve the user experience and accessibility for users with hearing impairments or learning disabilities. In this guide, we will cover everything you need to know to add this exciting feature to your website. We'll start with an introduction to text-to-speech and its importance, followed by an overview of the Speech Synthesis API and its features and limitations.

The Speech Synthesis API is a web browser API that enables developers to generate speech from the text on a webpage. The API provides a range of settings that can be used to customize the speech output, such as voice, pitch, rate, and volume.

One of the key features of the Speech Synthesis API is that it is easy to use and requires only a few lines of code to get started. It is also supported by most modern web browsers, making it a reliable choice for adding text to speech functionality to a website.

However, there are some limitations to the Speech Synthesis API. For example, the available voices and languages depend on the user's operating system and browser. Additionally, the quality of the speech output can vary depending on the voice used, and some voices may not sound natural or be easy to understand.

Despite these limitations, the Speech Synthesis API remains a powerful tool for adding text to speech functionality to a website, and it can greatly enhance the user experience for many users.

When creating an application, you may want to enable text-to-speech features for accessibility, convenience, or some other reason. In this tutorial, we'll learn how to create a very simple JavaScript text-to-speech application using JavaScript's built-in Web Speech API.

Watch the full tutorial on my YouTube Channel: Watch Here .

Let's start making an amazing text-to-speech converter Using HTML, CSS, and JavaScript step by step.

Join My Telegram Channel to Download the Project: Click Here

Prerequisites:

Before starting this tutorial, you should have a basic understanding of HTML, CSS, and JavaScript . Additionally, you will need a code editor such as Visual Studio Code or Sublime Text to write and save your code.

Source Code

Step 1 (HTML Code):

To get started, we will first need to create a basic HTML file. In this file, we will include the main structure for our text to speech converter.

The first line of the code, , declares the document type as HTML. This line is required for all HTML documents.

The html tag wraps around the entire document, and the lang attribute specifies the language of the document, in this case, English.

The head section contains meta information about the document, including the character encoding, which is set to UTF-8, and the viewport, which defines how the website should be displayed on different devices. It also includes a title tag which sets the title of the document, which appears on the tab of the browser.

In the head section, there is a link to an external stylesheet, which is used to define the visual styling of the document, and a script tag that includes an external JavaScript file. The type attribute in the script tag is set to "module" to indicate that the script uses ES6 modules.

The body section is where the actual content of the web page is contained. In this case, it consists of a div element with a class of "container" that wraps around a label and a textarea element. The label tag is used to associate a label with an input element, and the textarea is where the user can enter the text they want to be spoken. There is also a button element with an id attribute of "speak" that triggers the text-to-speech functionality when clicked.

After creating the files just paste the following below codes into your file. Make sure to save your HTML document with a .html extension, so that it can be properly viewed in a web browser.

This is the basic structure of our text to speech converter using HTML, and now we can move on to styling it using CSS.

Create a Search Filter with HTML, CSS, and JavaScript (Source Code).jpg

Step 2 (CSS Code):

Once the basic HTML structure of the text to speech converter is in place, the next step is to add styling to the text to speech converter using CSS.

Next, we will create our CSS file. In this file, we will use some basic CSS rules to create our text to speech converter.

The first section, *{} , is a universal selector that applies to all elements on the webpage. It sets the box-sizing property to border-box, which includes any padding and border in the element's total width and height, and sets the margin of all elements to 0, removing any default margin.

The body selector sets the display property to flex, which creates a flexible container for the page's content. The justify-content and align-items properties are set to center, which horizontally and vertically centers the content in the container. The min-height property is set to 100vh, which makes the container take up the entire viewport height.

The .container selector applies styles to a specific div element with a class of "container". It sets the background-color to #4FBDBA, which is a light blue color. The display property is set to grid, which allows the container to be laid out in a grid structure. The gap property sets the gap between grid items to 20 pixels. The width is set to 500 pixels, and the max-width property uses a calc() function to set the maximum width to be the width of the viewport minus 40 pixels. The padding property sets the padding inside the container, while the border-radius property sets the rounded corners of the container. The font-size property sets the size of the text inside the container.

The #text selector applies styles to a specific textarea element with an id of "text". It sets the display property to block, which makes the element take up the entire width of its container. The height is set to 100 pixels, while the border-radius property sets the rounded corners of the element. The font-size property sets the size of the text inside the element, while the border property sets the border around the element. The resize property is set to none, which disables resizing of the element by the user. The padding property sets the padding inside the element, while the outline property sets a visible border around the element when it is in focus.

The button selector applies styles to all button elements. It sets the padding inside the button, while the background property sets the background color of the button. The color property sets the color of the text inside the button, while the border-radius property sets the rounded corners of the button. The cursor property sets the type of cursor that appears when the button is hovered over. The border property sets the border around the button, while the font-size and font-weight properties set the size and weight of the text inside the button.

This will give our text to speech converter an upgraded presentation. Create a CSS file with the name of styles.css and paste the given codes into your CSS file. Remember that you must create a file with the .css extension.

Step 3 (JavaScript Code):

Finally, we need to create a speechSynthesis function in JavaScript.

The code uses the Document Object Model (DOM) to select and manipulate HTML elements, and it uses the Web Speech API to synthesize speech from text.

The first line of the code uses the getElementById method to select an HTML element with an id of "text". The textEL variable is then assigned to this element.

The second line of the code uses the getElementById method to select an HTML element with an id of "speak". The speakEL variable is then assigned to this element.

The third line of the code uses the addEventListener method to add a click event listener to the speakEL element. The speakText function is passed as the event handler for this listener. This means that when the "speak" button is clicked, the speakText function will be executed.

The speakText function first calls the cancel method on the window.speechSynthesis object. This stops any speech that is currently being synthesized.

The function then retrieves the text that has been entered into the textEL element by accessing its value property. This text is then used to create a new SpeechSynthesisUtterance object, which is a type of speech request.

The speak method of the window.speechSynthesis object is then called with the utterance object as a parameter. This causes the Web Speech API to synthesize speech from the text in the utterance object and speak it through the device's speakers.

Create a JavaScript file with the name of script.js and paste the given codes into your JavaScript file and make sure it's linked properly to your HTML document, so that the scripts are executed on the page. Remember, you’ve to create a file with .js extension.

Final Output:

create-fortnite-buttons-using-html-and-css-step-by-step-guide.webp

Conclusion:

In conclusion, adding text to speech functionality to a website can greatly improve the user experience for users with hearing impairments, learning disabilities, or anyone who prefers to listen to content instead of reading it. With the Speech Synthesis API, implementing this feature is simple and straightforward, making it accessible to developers of all skill levels.

By following the steps outlined in this guide, you can add text to speech functionality to your website and customize it to fit your users' needs. Remember to test your implementation thoroughly to ensure that it works correctly and provides a high-quality user experience.

Overall, the Speech Synthesis API is a powerful tool that can greatly enhance the accessibility and usability of your website. We hope that this guide has been helpful and that you can use these tips to create a more inclusive and accessible online experience for your users.

That’s a wrap!

I hope you enjoyed this post. Now, with these examples, you can create your own amazing page.

Did you like it? Let me know in the comments below 🔥 and you can support me by buying me a coffee

And don’t forget to sign up to our email newsletter so you can get useful content like this sent right to your inbox!

Thanks! Faraz 😊

Subscribe to my Newsletter

Get the latest posts delivered right to your inbox, latest post.

Create Sticky Bottom Navbar using HTML and CSS

Create Sticky Bottom Navbar using HTML and CSS

Learn how to create a sticky bottom navbar using HTML and CSS with this easy-to-follow guide.

How to Create a Dropdown List with HTML and CSS

How to Create a Dropdown List with HTML and CSS

August 29, 2024

10 Modern Logo Hover Effects with HTML and CSS

10 Modern Logo Hover Effects with HTML and CSS

August 28, 2024

Create Alert Ticker using HTML, CSS, and JavaScript

Create Alert Ticker using HTML, CSS, and JavaScript

Create Loan Calculator using HTML, CSS, and JavaScript

Create Loan Calculator using HTML, CSS, and JavaScript

August 27, 2024

Create Animated Logout Button Using HTML and CSS

Create Animated Logout Button Using HTML and CSS

Learn to create an animated logout button using simple HTML and CSS. Follow step-by-step instructions to add smooth animations to your website’s logout button.

Create Fortnite Buttons Using HTML and CSS - Step-by-Step Guide

Create Fortnite Buttons Using HTML and CSS - Step-by-Step Guide

June 05, 2024

How to Create a Scroll Down Button: HTML, CSS, JavaScript Tutorial

How to Create a Scroll Down Button: HTML, CSS, JavaScript Tutorial

March 17, 2024

How to Create a Trending Animated Button Using HTML and CSS

How to Create a Trending Animated Button Using HTML and CSS

March 15, 2024

Create Interactive Booking Button with mask-image using HTML and CSS (Source Code)

Create Interactive Booking Button with mask-image using HTML and CSS (Source Code)

March 10, 2024

Create Dice Rolling Game using HTML, CSS, and JavaScript

Create Dice Rolling Game using HTML, CSS, and JavaScript

Learn how to create a dice rolling game using HTML, CSS, and JavaScript. Follow our easy-to-understand guide with clear instructions and code examples.

Create a Breakout Game with HTML, CSS, and JavaScript | Step-by-Step Guide

Create a Breakout Game with HTML, CSS, and JavaScript | Step-by-Step Guide

July 14, 2024

Create a Whack-a-Mole Game with HTML, CSS, and JavaScript | Step-by-Step Guide

Create a Whack-a-Mole Game with HTML, CSS, and JavaScript | Step-by-Step Guide

June 12, 2024

Create Your Own Bubble Shooter Game with HTML and JavaScript

Create Your Own Bubble Shooter Game with HTML and JavaScript

May 01, 2024

Build a Number Guessing Game using HTML, CSS, and JavaScript | Source Code

Build a Number Guessing Game using HTML, CSS, and JavaScript | Source Code

April 01, 2024

Tooltip Hover to Preview Image with Tailwind CSS

Tooltip Hover to Preview Image with Tailwind CSS

Learn how to create a tooltip hover effect to preview images using Tailwind CSS. Follow our simple steps to add this interactive feature to your website.

Create Image Color Extractor Tool using HTML, CSS, JavaScript, and Vibrant.js

Create Image Color Extractor Tool using HTML, CSS, JavaScript, and Vibrant.js

January 23, 2024

Build a Responsive Screen Distance Measure with HTML, CSS, and JavaScript

Build a Responsive Screen Distance Measure with HTML, CSS, and JavaScript

January 04, 2024

Crafting Custom Alarm and Clock Interfaces using HTML, CSS, and JavaScript

Crafting Custom Alarm and Clock Interfaces using HTML, CSS, and JavaScript

November 30, 2023

Detect User's Browser, Screen Resolution, OS, and More with JavaScript using UAParser.js Library

Detect User's Browser, Screen Resolution, OS, and More with JavaScript using UAParser.js Library

October 30, 2023

Creating a Responsive Footer with Tailwind CSS (Source Code)

Creating a Responsive Footer with Tailwind CSS (Source Code)

February 25, 2024

Crafting a Responsive HTML and CSS Footer (Source Code)

Crafting a Responsive HTML and CSS Footer (Source Code)

November 11, 2023

Create an Animated Footer with HTML and CSS (Source Code)

Create an Animated Footer with HTML and CSS (Source Code)

October 17, 2023

Bootstrap Footer Template for Every Website Style

Bootstrap Footer Template for Every Website Style

March 08, 2023

Please allow ads on our site🥺

Logo

Text to speech with Javascript

Logo

Text-to-speech (TTS) is an assistive technology that has gained much popularity over recent years. It is also referred to as 'read aloud' technology because TTS pronounces all the written words.

TTS has been incorporated into many websites, apps, and digital devices. It is a notable alternative to plain text, extending the reach of content and broadening the audience. Today, TTS grew beyond an alternative for text. It now gained - among other functions - educational purposes . Though written text still reigns supreme in classical teachings, TTS’s popularity is largely based on its advantages over static text:

  • Helps people with reading difficulties
  • Convenience
  • Complements alternative learning styles
  • Accessibility to read text aloud

How Text to Speech works

Most TTS functionalities are inbuilt, seen in browsers, apps, and various pieces of software about text-to-speech. For example, Google Docs has an accessibility setting where readers have the option to ' Turn on screen reader support '. You can download certain pieces of TTS software to your device or enable it on a browser page on demand. This method works primarily for pages or apps without an inbuilt TTS. TTS applies in various forms. It highlights words as they go over them. Convenient options like start , stop , pause and cancel , give you, as a reader, exclusive control over how it aids you. Additionally, you also can switch between a list of male and female reading voices.

For the sake of this article, we will look at text-to-speech API on websites using JavaScript.

Why JavaScript?

JavaScript is a modern programming language that extensively participates in all web-related technology solutions. It is also called the language of the web . JavaScript, fused with HTML5 has a broad reach of DOMs and APIs . This synergy makes it easier for writing functionalities into a website, including a text-to-speech functionality powered by a Web Speech API .

Web Speech API

Web Speech API allows us to incorporate voice data or speech into web apps . It has two distinct functionalities – Speech Synthesis (text-to-speech) and Speech Recognition .

Speech Synthesis is the synthesizer that allows apps to read text aloud from a device or app. It is the control interface of the Web Speech API text-to-speech service.

Speech recognition is different from text-to-speech. In TTS the program reads the text for you, while speech recognition allows you to interface with your application using direct voice commands.

Related Articles

  • JavaScript Dom Manipulation
  • JavaScript One-Liners
  • Understanding OOP
  • Python List Comprehension
  • Popular JavaScript Frameworks
  • Understanding Data Structures
  • Git and Github
  • Introduction to Eventual Consistency
  • Microservice ABC
  • From Frontend to Fullstack
  • Application Performance 101

More Programming Tutorials

Getting started with speechsynthesis.

The SpeechSynthesis functionality is a robust controller with properties and methods that regulate the precise method for text conversion into speech. To convert text-to-speech, we only need to create an instance of the SpeechSynthesisUtterance() class and configure it with the properties and methods attached to it.

SpeechSynthesis has six properties , they include,

  • language : This gets and sets the language of the utterance.
  • pitch : Sets the pitch of the utterance. It ranges from 0 – 2 (0 is the lowest and 2 - the highest). We can adjust it using a slider.
  • rate : Sets the rate of the utterance. The rate ranges from 0.1 to 10 (0.1 is the lowest and 10 is the highest). Visually, we can set it using a slider.
  • volume : Sets the volume of the utterance. The volume ranges from 0 to 1 (0 is the lowest value, and 1 - the highest. We will set it visually using a slider.
  • text : Gets and sets the text for synthesizing.
  • voices : Sets the speaking voice.

SpeechSynthesis takes methods like these:

.cancel(): Like stop ; it removes all the utterances from the utterance queue .getvoices(): Gets the voices available on the Web Speech API synthesizer .pause(): Pauses an utterance .resume(): Fired when an utterance is paused .speak(): Reads an utterance aloud

To simply convert a text to speech , use:

Since not all browsers support the API, we do a check for this:

Next, we will create a simple demo with HTML, CSS, and JS to show how you can implement Web Speech API in browsers and websites.

Although we have populated the voices in the drop-down, they won't change to the selected voice unless we use the onchange function to target that.

Browser Compatibility

Web API SpeechSynthesis enjoys the full support of Chrome, Edge, Firefox, Opera, and Safari. Internet Explorer does not support this API. The onvoiceschanged() method is the only method not supported by Safari and Opera.

ResponsiveVoice JS

ResponsiveVoice is a text-to-speech API supported in over 51 languages.

ResponsiveVoice JS defines a selection of smart voice profiles . It knows which voice to enable on what device to create a consistent experience no matter where the user decides to use this functionality and is powered by LearnBrite . To get started with ResponsiveVoice, we have to add the following line of JS to the <head> of our HTML page:

ResponsiveVoice has functions like speak(), cancel(), voicesupport(), getvoices(), isplaying(), pause(), resume() and setDefaultVoice(). The speak() method takes parameters like [string voice] and [object parameters].

To simply use the speak() function:

This line brings up a prompt box in the browser that asks for permission to speak.

To check for browser support, we use the voicesupport()

We will create a demo HTML page, with a text area and buttons to test out the functionalities of ResponsiveVoice and demonstrate how you can implement ResponsiveVoice with TTS on websites.

ResponsiveVoice HTML

Responsivevoice css, responsivevoice javascript.

Mobile devices sometimes prevent browsers from playing audio without a user gesture. ResponsiveVoice can listen for a click and take it as the user gesture required by the browser. This ClickHook() is enabled with:

If listening for a click is not possible, the responsiveVoice.clickEvent() can be called directly from any user gesture and it will grant ResponsiveVoice the required permission.

We call this event using:

For more on ResponsiveVoice, please check here .

Additional Scope

Settextreplacement (array replacements).

ResponsiveVoice adds a setTextReplacements() that takes an array of words to be replaced, text to be replaced with, the voice profile, and system voices. This command is useful for specifying words or expressions with several pronunciations.

  • searchvalue: defines the text to be replaced supports regular expressions (required)
  • newvalue: the replacement text (required)
  • collectionvoices: Voice name (from ResponsiveVoice collection ) for which the replacement will be applied; it can be a unique name or an array of names (optional)
  • systemvoices: Voice name (from System voices collection ) for which the replacement will be applied. Can be a unique name or an array of names (optional)

The text to be replaced must be in the original text content, text area, or document. So if a text does not contain the word 'man', it would not be replaced with 'boy'.

To specify replacements only on certain voice profiles:

This API is compatible with modern browsers that support HTML5.

Limitation of TTS in Javascript

  • Not available in every language
  • The speech synthesis consumes more processing power
  • Voices are emotionless and unnatural .
  • Pronunciation depends on the speaker and cannot be adjusted

In this article, we looked at two contemporary methods of implementing TTS in a website using JavaScript. These two methods cover the basic features of TTS. They are convenient for users and give them more control over how a simple text could be digested . You can also implement these methods with ease and build an online story reader of your own. These are some of the basics of how to work TTS into your website, webshop or blog page. Some apps require something fancier and more extensive , and if that is the case — you may need the robust skills of a serious developer.

Ivan Georgiev | Software Engineer

Ivan Georgiev , Software Engineer

Ivan is an extremely intelligent and knowledgeable professional with a hunger for improvement and continuous learning. Being versatile in a number of programming languages, he picks up on new tech in a matter of hours. He is a person that masters challenges by investigation and clear, structured execution, bringing in his experience and innovative ideas to solve problems quickly and efficiently. Ivan is also a Certified Shopware Engineer.

Are you looking for more interesting articles?

Ui/ux design explained, cybersecurity and ai in health applications, css pre-processor: sass.

Web Design Code Snippets - CodeHim

Free Web Design Code & Scripts

Home / Text & Input / Text To Speech using JavaScript

Text To Speech using JavaScript

Text To Speech using JavaScript

Code Snippet:Speech Synthesis
Author: Aleksandar Sandro Cvetković
Published: January 14, 2024
Last Updated: January 22, 2024
Downloads: 779
License: MIT
Edit Code online:

how to make text to speech in javascript

This JavaScript code snippet helps you to create Text-to-Speech functionality on a webpage. It comes with a basic interface with input options for text, voice, pitch, and rate. When you press the “Speak” button, the text you enter is converted to speech using the selected voice, pitch, and rate. The “Stop” button halts speech output. It’s helpful for adding a text-to-speech feature to web applications.

You can use this code to implement text-to-speech functionality on your website. It adds an interactive and engaging feature, making your site more inclusive.

How to Create Text To Speech Functionality Using JavaScript

1. In your HTML file, create a structure for the TTS interface. You can use the following HTML code as a starting point. It includes input fields for text, voice selection, pitch, and rate, along with buttons for speaking and stopping the speech.

2. Apply the following CSS code to style the TTS interface. This will make it visually appealing and user-friendly. You can further customize the CSS to match your website’s design.

3. Finally, add the following JavaScript code to your project. It sets up the SpeechSynthesis API to handle text-to-speech. It also populates the voice selection dropdown with available voices. You can further customize the code to modify voice options or enhance user interactions.

That’s all! hopefully, you have successfully created Text-to-Speech feature using JavaScript. If you have any questions or suggestions, feel free to comment below.

Similar Code Snippets:

Live Markdown Editor With Preview Using Codemirror JS

I code and create web elements for amazing people around the world. I like work with new people. New people new Experiences. I truly enjoy what I’m doing, which makes me more passionate about web development and coding. I am always ready to do challenging tasks whether it is about creating a custom CMS from scratch or customizing an existing system.

Leave a Comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed .

Free Web Design Code & Scripts - CodeHim is one of the BEST developer websites that provide web designers and developers with a simple way to preview and download a variety of free code & scripts. All codes published on CodeHim are open source, distributed under OSD-compliant license which grants all the rights to use, study, change and share the software in modified and unmodified form. Before publishing, we test and review each code snippet to avoid errors, but we cannot warrant the full correctness of all content. All trademarks, trade names, logos, and icons are the property of their respective owners... find out more...

Please Rel0ad/PressF5 this page if you can't click the download/preview link

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

text-to-speech-javascript

Here are 12 public repositories matching this topic..., henryhale / ttspeech.

🔊 A fully basic voice synthesizer in vanillaJS

  • Updated Apr 30, 2023

nomaanulhasan / JavaScript-TTS

JavaScript - Text to Speech

  • Updated Feb 22, 2023

Mikulew / js-text-to-speech

Speech tools built into the browser to make a web page that can speak anything.

  • Updated Aug 10, 2022

brayanjeshua / chatgpt-to-speech

CHATGPT Text-to-Speech Application

  • Updated May 25, 2023

Cosaslearning / Text-To-Speech

This project is a web-based Text-To-Speech converter that allows users to enter text and convert it into spoken words.

  • Updated Nov 6, 2023

KOUISAmine / text-to-speech

Use the Google translator API to generate text to speech audio.

  • Updated Jan 3, 2024

Khokon9363 / vanilla_javascript_text-to-speech-converter_v0

Vanilla JavaScript Text to Speech Converter Version 0

  • Updated Mar 17, 2021

Khokon9363 / vanilla_javascript_text-to-speech-converter_v1

Vanilla JavaScript Text to Speech Converter Version 1

eelayoubi / aws-serverless-text-to-speech

AWS Serverless Text To Speech Application

  • Updated Jun 18, 2024

Thahee02 / text-voice-convertor

This project about text to voice convertor app using HTML, Tailwind CSS, JavaScript.

  • Updated Jan 20, 2024

Ajmal112 / texttospeech

The Text-to-Speech website is a testing API project that enables users to effortlessly convert text or sentences into MP3 audio files. With its user-friendly interface, users can simply input their desired text, initiate the conversion process, and obtain an audio file in seconds, facilitating convenient access to spoken content from written text.

  • Updated Jun 30, 2023

Himanshu157 / TEXT-TO-SPEECH-CONVERTOR

Text to voice Convertor, in this one of fantastic feature is that it can enhance he user interactivity , in this most of the user want to listen the things rather than read the content , Simply Text to speech converter takes words or input from user and converts into audio

  • Updated Aug 18, 2024

Improve this page

Add a description, image, and links to the text-to-speech-javascript topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-to-speech-javascript topic, visit your repo's landing page and select "manage topics."

  • HTML Tutorial
  • HTML Exercises
  • HTML Attributes
  • Global Attributes
  • Event Attributes
  • HTML Interview Questions
  • DOM Audio/Video
  • HTML Examples
  • Color Picker
  • A to Z Guide
  • HTML Formatter

How to convert speech into text using JavaScript ?

In this article, we will learn to convert speech into text using HTML and JavaScript. 

Approach: We added a content editable “div” by which we make any HTML element editable.

We use the  SpeechRecognition  object to convert the speech into text and then display the text on the screen.

We also added WebKit Speech Recognition to perform speech recognition in Google chrome and Apple safari.

InterimResults results should be returned true and the default value of this is false. So set interimResults= true

Use appendChild() method to append a node as the last child of a node.

Add eventListener, in this event listener, map() method is used to create a new array with the results of calling a function for every array element. 

Note: This method does not change the original array. 

Use join() method to return array as a string.

 

Final Code:

                 

Output: 

If the user tells “Hello World” after running the file, it shows the following on the screen.

author

Please Login to comment...

Similar reads.

  • Technical Scripter
  • Web Technologies
  • JavaScript-Misc
  • Technical Scripter 2020
  • Best 10 IPTV Service Providers in Germany
  • Python 3.13 Releases | Enhanced REPL for Developers
  • IPTV Anbieter in Deutschland - Top IPTV Anbieter Abonnements
  • Best SSL Certificate Providers in 2024 (Free & Paid)
  • Content Improvement League 2024: From Good To A Great Article

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

DEV Community

DEV Community

ℵi✗✗

Posted on Jan 2

Building a Real-time Speech-to-text Web App with Web Speech API

Happy New Year, everyone! In this short tutorial, we will build a simple yet useful real-time speech-to-text web app using the Web Speech API. Feature-wise, it will be straightforward: click a button to start recording, and your speech will be converted to text, displayed in real-time on the screen. We'll also play with voice commands; saying "stop recording" will halt the recording. Sounds fun? Okay, let's get into it. 😊

Web Speech API Overview

The Web Speech API is a browser technology that enables developers to integrate speech recognition and synthesis capabilities into web applications. It opens up possibilities for creating hands-free and voice-controlled features, enhancing accessibility and user experience.

Some use cases for the Web Speech API include voice commands, voice-driven interfaces, transcription services, and more.

Let's Get Started

Now, let's dive into building our real-time speech-to-text web app. I'm going to use vite.js to initiate the project, but feel free to use any build tool of your choice or none at all for this mini demo project.

  • Create a new vite project:
  • Choose "Vanilla" on the next screen and "JavaScript" on the following one. Use arrow keys on your keyboard to navigate up and down.

HTML Structure

CSS Styling

JavaScript Implementation

This simple web app utilizes the Web Speech API to convert spoken words into text in real-time. Users can start and stop recording with the provided buttons. Customize the design and functionalities further based on your project requirements.

Final demo: https://stt.nixx.dev

Feel free to explore the complete code on the GitHub repository .

Now, you have a basic understanding of how to create a real-time speech-to-text web app using the Web Speech API. Experiment with additional features and enhancements to make it even more versatile and user-friendly. 😊 🙏

Top comments (0)

pic

Templates let you quickly answer FAQs or store snippets for re-use.

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink .

Hide child comments as well

For further actions, you may consider blocking this person and/or reporting abuse

irfanghat profile image

A New Era of Simplified Deployment

Irfan Ghat - Aug 22

techalgospotlight profile image

Top 15 HTML Elements You Should Know in 2024

TechAlgoSpotlight - Aug 28

rehmanofficial profile image

React JS: A Comprehensive Guide to Modern Web Development

Abdul Rehman - Aug 13

uliyahoo profile image

🚀How I integrated an AI copilot into Dub.co (in a few minutes)🤖✨

uliyahoo - Aug 28

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

how to make text to speech in javascript

Adding Voice Recognition To A Web App

Jonathan ARNAULT

How to build an AI assistant that supports voice recognition? In this article, we will explain how we added speech-to-text support on an existing app.

We performed audio speech recognition at the edge using the Whisper model from OpenAI. This article will also explain how we recorded the user's microphone using the MediaStream Recording API provided by all major browsers.

This is the second part of a series about using modern tooling to build an AI assistant. The first article focuses on Text-to-Text generation: Building An AI Assistant at the Edge .

As a reminder, the objective of this series is to build a simple Aqua clone using Cloudflare Workers AI and Vue 3. The source code of the project is available on our GitHub at marmelab/cloudflare-ai-assistant .

The result of our work is a simple AI assistant that can listen to the user's voice and transcribe it into a prompt for a text editor:

Speech-to-Text using Cloudflare AI SDK

The Cloudflare Workers AI toolkit provides access to the Whisper voice recognition model from OpenAI. The model takes an audio file as input and returns the recognized voice as a text string. The model supports multiple audio formats, such as MP3, MP4, WAV or WebM.

An example of an API endpoint for the Nuxt framework that does speech-to-text inference is available below.

The Cloudflare AI API takes the raw audio byte array as input and returns the word count with the inferred words. The Cloudflare endpoint does not support bi-directional streaming, which would have been a must-have for our user case. Indeed, if we wanted real-time audio recognition, this would have made our task easier.

Speech-to-Text In The Browser

Another option we considered for speech recognition was the Web Speech API , built into modern browsers. This experimental feature provides a high-level framework for speech-to-text. The SpeechRecognition interface is especially interesting for us as it can stream recognized text on the fly.

However, this API is only available behind the webkit vendor prefix on Chrome and has not been implemented yet by Firefox. Furthermore, the model depends on the browser vendor and the quality of the results varies greatly. Finally, we wanted to test the Whisper model provided with Cloudflare Workers AI.

Calling the API Endpoint from the Front-end

Nowadays, all major browsers support the MediaStream Recording API . This low-level API supports capturing video and/or audio from the user device directly in the browser. This was especially interesting in our use case, as we want to record user's prompts from their microphone.

Here is an example of how the user microphone can be recorded using JavaScript / TypeScript:

While this feature is great, we found that every browser has a different set of supported audio mime types . As there is no standard regarding the audio format, these sets do not intersect across browsers: Chrome supports WebM and Safari supports MP3 for recording. As we are building a Proof of Concept, we only used WebM format with Chrome in the rest of this article to simplify the code.

During our development, we also learned the hard way that the Blob must have the same mime type as the MediaRecorder . If they are not the same, the audio could not be transcribed by Whisper.

A Vue Composable for Voice Input

The low-level nature of MediaStream Recording API makes it agnostic, hence, it can be integrated as a Vue composable. This composable is responsible for abstracting away all the difficulties from the API.

The composable provides access to the microphone state:

  • microphoneDisabled is true if the user has not granted access to their microphone;
  • recording is true when the microphone is on and recording the user's prompt;
  • startRecording that initializes the media recorders and its event listeners;
  • stopRecording that stops the media recorder and cleans up the internal state.

Furthermore, the composable is also in charge of transcribing the audio once recorded. It calls the API we implemented in the first part, and returns the following state:

  • loading is true when the audio is currently transcribing;
  • recordedText that holds the last transcribed text.

The complete useRecorder() composable is available below:

Putting it All Together

Now that our useRecorder composable has been set up, we can use it from a Vue component. To trigger the startRecording and stopRecording , we relied on pointerdown and pointerup events respectively on a microphone button. An example of how to use the composable is available below.

To avoid loss of information, we chose to display the microphone button only if the user did not type any prompt on the text field. Otherwise, the recorded text would supersede the user prompt.

Results, Limitations, and Future Directions

We tested the speech-to-text API with various prompts and found that the Whisper model is great even if we are not native English speakers . We must conduct a more detailed evaluation of the model performance with various prompts in different languages.

Since latency is a concern, we noticed that speech-to-text evaluation at the edge does not provide a significant advantage here . The voice detection is still the bottleneck and can take up to a few seconds.

Regarding our application, it misses some improvements to make it a better AI assistant. Firstly, the on-the-fly recognition while the microphone button is on has not been developed yet . To perform this, we could have used either the SpeechRecognition API or a VAD library such as the ones provided by ricky0123 . An example of a successful implementation of the latter is the Swift AI Assistant .

Secondly, we replace the previous text with the new one after each LLM call. The user has to spot the differences by himself and read the whole text to check the changes. A solution for this problem could be the implementation of a diffing algorithm to highlight the changes on the generated text.

Diffing in Aqua

Relying on Cloudflare Workers AI, we built a simple Aqua in 3 developers' day. Their API abstracts away most of the difficulties of both text-to-text and speech-to-text tasks. Even if we faced some challenges such as timeout for text generation or invalid mime types for audio synthesis, the overall developer experience was great.

Regarding the performance of the provided models, we are impressed with how well Whisper and Llama 3 models perform overall. While we face some missed detections, especially on Speech-to-Text tasks as we are not native English speakers, we would definitely use them again in future projects.

Certified B Corporation

IMAGES

  1. How to create Text to Speech App in JAVASCRIPT using WEB SPEECH API

    how to make text to speech in javascript

  2. Text to speech using JavaScript

    how to make text to speech in javascript

  3. How to Create Text to Speech Converter using JavaScript

    how to make text to speech in javascript

  4. JavaScript Text-to-Speech

    how to make text to speech in javascript

  5. How to make Text to Speech in Javascript using Simple 3 Lines of Code

    how to make text to speech in javascript

  6. Text To Speech Converter

    how to make text to speech in javascript

VIDEO

  1. How to Create a Text-to-Speech Converter Using JavaScript

  2. Text to speech Converter using javascript mini project || #coding #shorts

  3. Text to speech in Javascript#webdeveloper #reels #webdesign #website #javascript

  4. How to build Screen Reader (Text Speech) with JavaScript

  5. Text to Speech HTML, CSS, JAVASCRIPT 'Bringing Words to Life Unleashing Text to Speech Magi

  6. Text to speak project. use #html #css and #javascripttutorial

COMMENTS

  1. JavaScript Text-to-Speech

    Step 1 - Setting Up The App. First, we set up a very basic application using a simple HTML file called index.html and a JavaScript file called script.js . We'll also use a CSS file called style.css to add some margins and to center things, but it's entirely up to you if you want to include this styling file.

  2. Javascript Text To Speech (Simple Examples)

    Yes, the Stone Age of the Internet is long over. Javascript has a native speechSynthesis text-to-speech API, and it will work so long as the browser and operating system support it. var msg = new SpeechSynthesisUtterance("MESSAGE"); speechSynthesis.speak(msg); That covers the quick basics, read on for more examples!

  3. Using the Web Speech API

    Using the Web Speech API. The Web Speech API provides two distinct areas of functionality — speech recognition, and speech synthesis (also known as text to speech, or tts) — which open up interesting new possibilities for accessibility, and control mechanisms. This article provides a simple introduction to both areas, along with demos.

  4. Build a Text to Speech Converter using HTML, CSS & Javascript

    A text-to-speech converter should have a text area at the top so that, the user can enter a long text to be converted into speech followed by a button that converts the entered text into speech and plays the sound on click to it. In this article, we will build a fully responsive text-to-speech converter using HTML, CSS, and JavaScript. Approach

  5. Text To Speech In 3 Lines Of JavaScript

    Line 1: We created a variable msg, and the value assigned to it is a new instance of the speechSynthesis class. Line 2: The .text property is used to specify the text we want to convert to speech. And finally, the code on the 3rd (last) line is what actually make our browser talks.

  6. Javascript Text To Speech: A Guide To Converting TTS With Javascript

    Text-to-Speech (TTS) technology enables computers to convert written text into spoken words. In the context of JavaScript, TTS allows developers to integrate speech synthesis capabilities directly into web applications. With TTS, users can interact with websites and applications using voice commands and receive audible feedback.

  7. How To Build a Text-to-Speech App with Web Speech API

    We will now start building our text-to-speech application. Before we begin, ensure that you have Node and npm installed on your machine. Run the following commands on your terminal to set up a project for the app and install the dependencies. Create a new project directory: mkdir web-speech-app.

  8. Convert Text to Speech in Javascript using Speech Synthesis API

    The Speech Synthesis API is a JavaScript API that allows you to integrate text-to-speech (TTS) capabilities into web applications. It provides control over the voice, pitch, rate, and volume of the synthesized speech, offering flexibility in how the spoken output sounds. This API is supported in modern browsers like Chrome, Firefox, and Edge.

  9. Convert Text to Speech Using Web Speech API in JavaScript

    This will begin the process of transforming the text into speech. Before calling this function, the text property must be set. If you start another text-to-speech instance when one is already running, the new one will be queued behind the current one. document.querySelector("#start").addEventListener("click", () => {.

  10. Text To Speech Converter with JavaScript

    JavaScript Code. // Get the text area and speak button elements let textArea = document.getElementById("text"); let speakButton = document.getElementById("speak-button"); // Add an event listener to the speak button speakButton.addEventListener("click", function() { // Get the text from the text area let text = textArea.value; // Create a new ...

  11. Using Google Text-To-Speech in Javascript

    You can use the SpeechSynthesisUtterance with a function like say:. function say(m) { var msg = new SpeechSynthesisUtterance(); var voices = window.speechSynthesis ...

  12. How to Convert Text to Speech using HTML, CSS and JavaScript

    Step 2 (CSS Code): Once the basic HTML structure of the text to speech converter is in place, the next step is to add styling to the text to speech converter using CSS. Next, we will create our CSS file. In this file, we will use some basic CSS rules to create our text to speech converter. The first section, * {}, is a universal selector that ...

  13. Text to speech with Javascript

    Text to speech with Javascript. Text-to-speech (TTS) is an assistive technology that has gained much popularity over recent years. It is also referred to as 'read aloud' technology because TTS pronounces all the written words. TTS has been incorporated into many websites, apps, and digital devices. It is a notable alternative to plain text ...

  14. Building a Text-to-Speech Application with JavaScript

    Text-to-Speech (TTS) systems convert normal language text into speech. From now on we'll create a simple Text-to-Speech application using JavaScript, specifically using the Speech Synthesis interface of the Web Speech API. This interface is supported in almost all modern browsers and it's perfect for our application.

  15. Building a Text-to-Speech Application in JavaScript: 5 Easy Steps

    We'll leverage this API to convert text into speech. Create a new JavaScript file named main.js and link it to your HTML file as shown in the previous code snippet. In main.js, write the ...

  16. How To Convert Text to Speech With JavaScript

    Playing and Pausing the Text to Speech. When the user clicks on the play button we check if the browser's SpeechSynthesis object is currently in a paused state. If it is, we simply resume speaking. If it is not in a paused state we create a new instance of SpeechSynthesisUtterance. We pass in the text and selected voice, and call the speak method.

  17. Text To Speech using JavaScript

    Download (5 KB) This JavaScript code snippet helps you to create Text-to-Speech functionality on a webpage. It comes with a basic interface with input options for text, voice, pitch, and rate. When you press the "Speak" button, the text you enter is converted to speech using the selected voice, pitch, and rate.

  18. Building a Simple Voice-to-Text Web App Using JavaScript and Speech

    Step 3: JavaScript Magic — Making It Work 🪄 Here comes the fun part — adding the JavaScript code that brings our Voice-to-Text functionality to life! We'll break down the code into ...

  19. JavaScript Text to Speech with Code Example

    In this video, I have demonstrated with a code example how the Web Speech Api of JavaScript can be used to convert text to speech in web sites and web pages....

  20. text-to-speech-javascript · GitHub Topics · GitHub

    The Text-to-Speech website is a testing API project that enables users to effortlessly convert text or sentences into MP3 audio files. With its user-friendly interface, users can simply input their desired text, initiate the conversion process, and obtain an audio file in seconds, facilitating convenient access to spoken content from written ...

  21. How to convert speech into text using JavaScript

    A text-to-speech converter is an application that is used to convert the text content entered by the user into speech with a click of a button. A text-to-speech converter should have a text area at the top so that, the user can enter a long text to be converted into speech followed by a button that converts the entered text into speech and plays th

  22. Building a Real-time Speech-to-text Web App with Web Speech API

    In this short tutorial, we will build a simple yet useful real-time speech-to-text web app using the Web Speech API. Feature-wise, it will be straightforward: click a button to start recording, and your speech will be converted to text, displayed in real-time on the screen. We'll also play with voice commands; saying "stop recording" will halt ...

  23. Adding Voice Recognition To A Web App

    A solution for this problem could be the implementation of a diffing algorithm to highlight the changes on the generated text. Conclusion. Relying on Cloudflare Workers AI, we built a simple Aqua in 3 developers' day. Their API abstracts away most of the difficulties of both text-to-text and speech-to-text tasks.