Build speech-to-text app in JS using speechRecognition API

In this article, we will build a speech-to-text app in JavaScript without using any external API or libraries.

We will make use of the built-in speechRecognition API which lives in the browser.

Prerequisites

Basic understanding of HTML and CSS
Good understanding of JavaScript

We will use Live-server to sever our application. To do that we need to first of all install a live server using npm. To do that open the terminal and type the following:

npm i live-server -g

The -g flag will install Live-server globally on our local machine.

Building speech-to-text app in JavaScript

Now let’s set up the folder structure for our application.

Basically, our application will have just 3 files, index.html for our application markup,style.css for the application styles and app.js for all the javascript functions.

To create the files open up the terminal and type the following:

cd desktop
mkdir speech-app && cd speech-app
touch index.js && touch app.js && touch style.css
code .

The cd command is used to move into a directory, The mkdir command is used to create a new directory, and the touch command is used to create a new file. The code . is used to open up our application visual studio code.

Now let’s setup our index.html file we need to create a custom HTML boilerplate and then link our style.css and our app.js file.

So now our index.html should look like this:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Speech Application</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div class="words"></div>
<script src="app.js"></script>
</body>
</html>

Now let’s head over to our app.js and start writing some code.

To implement the speech recognition we will make use of the SpeechRecognition api which is a global variable that lives In the browser. If you are using a Chrome browser it is called webkitSpeechRecognition .

So we can create a single variable for this by doing this:

window.SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition

The next thing we have to do is to create a new instance of speech recognition:

const recognition = new SpeechRecognition();

After doing this, we will take the recognition variable and set a property called interimResults and set this to true.

What this does is give us a result as we are speaking, if we set this to false it will return a result after the user must have finished speaking.

recognition.interimResults = true;

Now we want a situation whereby when we start speaking it creates a paragraph and then when we stop it ends that paragraph and if we start talking again it creates a new paragraph. So to do that we need to create a new paragraph using the document.createElement method:

let p = document.createElement("p");
p.classList.add("para");
let words = document.querySelector(".words");
words.appendChild(p)

What we just did was create a new paragraph element and then append it to the div with the class of words.

After doing this we will add an event listener to the recognition variable which returns an event:

recognition.addEventListener("result", e => {
console.log(e)
});
recognition.start();

This will prompt the user that the browser needs access to the system’s microphone.

Open up the console and refresh the page, It should now prompt you to allow access to the system’s microphone.

Grant the permission by clicking the allow button and start talking, while talking you will see some results on the console:

The console.log(e) result returns a SpeechRecognitionEvent with property results which is a two-dimensional array. The first object of this response contains the transcript property.

This property holds the recognized speech in text format. If you haven’t set the property of interimResults to true then you will get only one result with only one alternative back.

To get the actual words of the speaker, we can store the result of the transcript in a variable:

recognition.addEventListener("result", e => {
  const speechToText = e.results[0][0].transcript;
  console.log(speechToText)
})

Setting interimResults to true means that you can start to render results before the user has stopped talking. Saying “hey I'm a JavaScript developer” I got the following interim results back:

The last result has a property isFinal and a value of true. This indicates that the text recognition has finished.

Now that we have the transcript of what we said and the isFinal property when the recognition has finished we can now display the transcript on the browser.

We can loop over the interimTranscript so that it gives us a result while we speak and then while we watch for the isFinal property to be set to true.

When it is set to true we now display the final result. With that, we can modify the code to this:

let speechToText = "";
recognition.addEventListener("result", e => {
  let interimTranscript = '';
  for (let i = e.resultIndex, len = e.results.length; i < len; i++) {
    let transcript = e.results[i][0].transcript;
    console.log(transcript)
    if (e.results[i].isFinal) {
      speechToText += transcript;
    } else {
      interimTranscript += transcript;
    }
  }
  document.querySelector(".para").innerHTML = speechToText + interimTranscript
});
recognition.addEventListener('end', recognition.start);
recognistion.start();

So now when we start speaking the interimTranscript result is displayed than when we are done and the isFinal property is set to true we then display the speechToText value.

Modifying languages and accents

If we have users that don’t speak English, we could modify the language preference by specifying a language parameter with recognition.lang . The default language is en-US.

For more info on the language option click here.

Modifying the user interface.

now that we have the core feature of our application, let’s now modify the user interface. Our Focus will be on mobile devices.

We will add a microphone svg in our markup below the div with the class of word:

<div class="obj">
        <div class="button" id="circlein">
            <svg class="mic-icon" version="1.1" xmlns="http://www.w3.org/2000/svg"
                xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" viewBox="0 0 1000 1000"
                enable-background="new 0 0 1000 1000" xml:space="preserve" style="fill:#1E2D70">
                <g>
                    <path
                        d="M500,683.8c84.6,0,153.1-68.6,153.1-153.1V163.1C653.1,78.6,584.6,10,500,10c-84.6,0-153.1,68.6-153.1,153.1v367.5C346.9,615.2,415.4,683.8,500,683.8z M714.4,438.8v91.9C714.4,649,618.4,745,500,745c-118.4,0-214.4-96-214.4-214.4v-91.9h-61.3v91.9c0,141.9,107.2,258.7,245,273.9v124.2H346.9V990h306.3v-61.3H530.6V804.5c137.8-15.2,245-132.1,245-273.9v-91.9H714.4z" />
                </g>
            </svg>
        </div>
    </div>

And then add some simple styles in the style.css file :

body {
    background-color: black;
    padding: 15px;
    font-family: cursive;
    color: white;
    font-size: 30px
}
#circlein {
    width: 70px;
    height: 70px;
    border-radius: 50%;
    justify-content: center;
    box-shadow: 0px -2px 15px #E0FF94;
}
.obj {
    display: flex;
    justify-content: center;
    height: 100%;
}
.mic-icon {
    height: 30px;
    position: absolute;
    margin: 21px;
}

With this simple markup and styles, we will have a result like this:

Now we want a situation whereby when we click on the mic element it starts recording and when we stop talking the mic element is disabled:

let mic = document.querySelector("#circlein")
let speechToText = "";
recognition.addEventListener("result", e => {
  let interimTranscript = '';
  for (let i = e.resultIndex, len = e.results.length; i < len; i++) {
    let transcript = e.results[i][0].transcript;
    console.log(transcript)
    if (e.results[i].isFinal) {
      speechToText += transcript;
    } else {
      interimTranscript += transcript;
    }
  }


  recognition.addEventListener('soundend', () => {
    mic.style.backgroundColor = null;
  });


  document.querySelector(".para").innerHTML = speechToText + interimTranscript
})



mic.addEventListener("click", () => {
  recognition.start();
  mic.style.backgroundColor = "#6BD6E1"
})

With this, we just built a speech recognition application without any dependency or package.

Conclusion

In this article, we learned how to use the javascript speechregonition API to build a simple speech-to-text app in JavaScript. Not that you have to be connected to the internet for this to work and also note that the application has to be served on a server too.

Build speech-to-text app in JavaScript using speechRecognition API

Build speech-to-text app in JavaScript using speechRecognition API

Prerequisites

Building speech-to-text app in JavaScript

Modifying languages and accents

Modifying the user interface.

Conclusion

Tags:

Deven Rathore

Leave a Reply Cancel reply

Top 15 Valheim game server hosting providers

Top Rust Server hosting for low latency gaming

Top 7 Gatsby App Hosting Providers

20 Awesome Video Hosting And Distribution Platforms

OpenShift Vs Kubernetes: Core Differences

Top OpenSource AI website generator with source code

22 Top Open source Text To Speech Software

35 Best Next.js Opensource Projects

35 Best GraphQL Open-Source Projects for faster development

Top 19 Reinforcement learning projects on Github

Why is Zero Touch Provisioning (ZTP) Needed?

Introduction to NetReputation on Reddit: Building Online Presence

Understanding the Importance of a Good Interview Guide

Maximizing Efficiency with Core App Dashboard: Your Ultimate Guide

Smart Square HMH Guide: Efficient Healthcare Management

Build speech-to-text app in JavaScript using speechRecognition API

Build speech-to-text app in JavaScript using speechRecognition API

Prerequisites

Building speech-to-text app in JavaScript

Modifying languages and accents

Modifying the user interface.

Conclusion

Share Article:

Tags:

How to setup GoLang Authentication with JWT

Creating a Python socket server with multiple clients

Leave a Reply Cancel reply