INSTITUTE OF
PROFESSIONAL STUDIES
AND RESEARCH
Project
TEXT TO SPEECH CONVERTER (TTS)
Submitted By:
NAME ROLL NUMBER
SHAKTI PRASAD NANDA
2104000622071028 UDAYA SHARMA
1
2104000622071032
2
CERTIFICATE
This is to certify that SHAKTI PRASAD NANDA with Roll
No:2104000622071028, & UDAYA SHARMA Roll No:
2104000622071032 are Bonafede student. They have done the
project work titled “TEXT TO SPEECH CONVERTER “ (TTS) under
my supervision in in partial fulfillment of the requirement as a
project for BCA. This is an original piece of work and it has not
been submitted anywhere else.
Their found to be very regular, sincere, hardworking
students and have undertaken a lot of trouble for the
completion of project.
Signature
(PROJECT GUIDE)
3
DECLARATION
We declare that this written submission represents our ideas in
our own words and where other’s ideas or words have been
included, we have adequately cited and referenced the original
sources. We also declare that we have adhered to all principles
of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/fact/source
in our submission. We understand that any violation of the
above will cause for disciplinary action by the University and
can also evoke penal action from the sources which have not
been properly cited or from whom proper permission has not
been taken when needed.
Signature :
Roll No :
Signature :
Roll No :
Signature :
Roll No :
4
ACKNOWLEDGEMENT
All that is written or mentioned in this sheet will hardly be
adequate in return for the amount of help and cooperation, we
have received from all the people who contribute to make this
project a reality. We are grateful for each one of them.
First of all, we wish to express our sincere thanks to Bibhu
Prasad Bhoi our project guide in this undertaking, who was
always there for me to offer help in all possible ways. He
provided assistance which resulted in successful completion of
the project with in the allotted time.
Thank You
Signature
5
REPORT OF APPROVAL
This project report entitled “TEXT TO SPEECH
CONVERTER (TTS)” Submitted by SHAKTI PRASAD NANDA
with Roll No:210400062207102 & UDAYA SHARMA Roll No:
2104000622071032 is approved for the degree of Bachelor of
Computer Applications (BCA) Of Utkal University, BBSR.
6
TABLE OF CONTENTS
Introduction
Objectives
Analysis and Design
Overview of speech synthesis
Domain specific synthesis
Unit selection synthesis
Use case diagram
Graph
Implementations
Coding
Html
Java script
CSS
Testing
Output
Future scope
Conclusion
7
ABSTRACT
A Text-to-speech synthesizer is an application that converts
text into spoken word, by analyzing and processing the text
using Natural Language Processing (NLP) and then using Digital
Signal Processing (DSP) technology to convert this processed
text into synthesized speech representation of the text. Here,
we
developed a useful text-to-speech synthesizer in the form of
a simple application that converts inputted text into
synthesized speech and reads out to the user which can then
be saved as an mp3. file. The development of a text to speech
synthesizer will be of great help to people with visual
impairment and make making through large volume of text
easier.
8
INTRODUCTION
Text-to-speech synthesis -TTS - is the automatic conversion of
a text into speech that resembles, as closely as possible, a
native speaker of the language reading that text. Text-to
speech synthesizer (TTS) is the technology which lets
computer speak to you. The TTS system gets the text as the
input and then a computer algorithm which called TTS engine
analyses the text, pre-processes the text and synthesizes the
speech with some mathematical models. The TTS engine
usually generates sound data in an audio format as the output.
The text-to-speech (TTS) synthesis procedure consists of two
main phases. The first is text analysis, where the input text is
transcribed into a phonetic or some other linguistic
representation, and the second one is the generation of
speech waveforms, where the output is produced from this
phonetic and prosodic information. These two phases are
usually called high and low-level synthesis. A simplified version
of this procedure is presented in figure 1 below. The input text
might be for example data from a word processor, standard
ASCII from e-mail, a mobile text-message, or scanned text
from a newspaper. The character string is then pre-processed
and analyzed into phonetic representation which is usually a
string of phonemes with some additional information for
correct intonation, duration, and stress. Speech sound is finally
generated with the low-level synthesizer by the information
from high-level one. The artificial production of speech-like
sounds has a long history, with documented mechanical
9
attempts dating to the eighteenth century.
Objectives
The objective of this project is to enables your text to be
converted into speech sounds by running the program. This
project will be developed using Html, CSS and JAVASCRIPT.
In this project, we add a message which we want to convert
into voice and run the program to play the voice of that text
message.
● Importing the modules
● Create the display window
● Define functions
10
ANALYSIS AND DESIGN
Overview of speech synthesis:
Speech synthesis can be described as artificial
production of human speech. A computer system used
for this purpose is called a speech synthesizer, and can
be implemented in software or hardware. A text-to-
speech (TTS) system converts normal language text into
speech. Synthesized speech can be created by
concatenating pieces of recorded speech that are stored
in a database. Systems differ in the size of the stored
speech units; a system that stores phones or diphones
provides the largest output range, but may lack clarity.
For specific usage domains, the storage of entire words
or sentences allows for high-quality output. Alternatively,
a synthesizer can incorporate a model of the vocal tract
and other human voice characteristics to create a
completely "synthetic" voice output. The quality of a
speech synthesizer is judged by its similarity to the
human voice and by its ability to be understood. An
intelligible text- to-speech program allows people with
visual impairments or reading disabilities to listen to
written works on a home computer.
11
12
Domain-specific Synthesis:
Domain-specific synthesis concatenates pre-recorded words
and phrases to create complete utterances. It is used in
applications where the variety of texts the system will output is
limited to a particular domain, like transit schedule
announcements or weather reports. The technology is very
simple to implement, and has been in commercial use for a long
time, in devices like talking clocks and calculators. The level of
naturalness of these systems can be very high because the
variety of sentence types is limited, and they closely match the
prosody and intonation of the original recordings. Because
these systems are limited by the words and phrases in their
databases, they are not general-purpose and can only
synthesize the combinations of words and phrases with which
they have been pre-programmed. The blending of words within
naturally spoken language however can still cause problems
unless many variations are taken into account. For example, in
nonrhotic dialects of English the "r" in words like "clear" /ˈklɪə/
is usually only pronounced when the following word has a
vowel as its first letter (e.g. "clear out" is realized as /ˌklɪəɾ
ˈʌʊt/). Likewise in French, many final consonants become no
longer silent if followed by a word that begins with a vowel, an
effect called liaison. This alternation cannot be reproduced by a
simple word- concatenation system, which would require
additional complexity to be context-sensitive. This involves
13
recording the voice of a person speaking the desired words and
phrases. This is useful if only the restricted volume of phrases
and sentences is used and the variety of texts the system will
output is limited to a particular domain e.g. a message in a
train station, whether reports or checking a telephone
subscriber’s account balance.
Unit Selection Synthesis:
Unit selection synthesis uses large
databases of recorded speech. During database creation,
each recorded utterance is segmented into some or all of the
following: individual phones, diphones, half-phones, syllables,
morphemes, words, phrases, and sentences. Typically, the
division into segments is done using a specially modified speech
recognizer set to a "forced alignment" mode with some
manual correction afterward, using visual representations such
as the waveform and spectrogram. An index of the units in the
speech database is then created based on the segmentation
and acoustic parameters like the fundamental frequency
(pitch), duration, position in the syllable, and neighboring
phones. At runtime, the desired target utterance is created
by determining the best chain of candidate units from the
database (unit selection). This process is typically achieved
using a specially weighted decision tree.
14
Use case diagram :
15
Graph:
16
IMPLEMENTATION
Modules Description:
In this work, the system is implemented for the
recognition of capital English character A to Z and
number 0 to 9. Each character is recognized at one
time. The recognized character is saved as text. There
are two portions in program; in the first portion it gives
the text output according to input image , then it
convert that text into the speech. In the second portion,
the e-text is directly input in computer, then it is
converted into speech. Firstly the input image of time
new romance, font size 12, bold type characters is
taken and then it is converted into text.
Tools used:
● HTML
● CSS
● JAVA SCRIPT
17
Coding:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-
scale=1.0">
<title>Text To Speech Converter</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div class="hero">
<h1>Text to Speech <span>Converter</span></h1>
<textarea placeholder="write anything here.. ."></textarea>
<div class="row">
<select></select>
<button><img
src="https://upload.wikimedia.org/wikipedia/commons/thumb/e
/e8/YouTube_Diamond_Play_Button.png/1024px-
YouTube_Diamond_Play_Button.png">Listen</button>
<script src="script.js"></script>
</div>
</div>
</body>
</html>
18
Java Script
let speech = new SpeechSynthesisUtterance();
let voices = [];
let voiceSelect = document.querySelector("select");
window.speechSynthesis.onvoiceschanged = () => {
voices = window.speechSynthesis.getVoices();
speech.voice = voices[0];
voices.forEach((voice, i) => (voiceSelect.options[i] = new Option(voice.name, i)));
};
voiceSelect.addEventListener("change",() =>{
speech.voice = voices[voiceSelect.value];
});
document.querySelector("button").addEventListener("click", () =>{
speech.text = document.querySelector("textarea").value;
window.speechSynthesis.speak(speech);
});
19
CSS
.hero {
width: 100%;
min-height:100vh;
background: linear-gradient(45deg, #010758,
#490d61 ); display: flex;
align-items: center;
justify-content:
center; flex-
direction: column;
color:#fff;
}
.hero h1{
font-size: 45px;
font-weight:
500; margin-top:
-50px; margin-
bottom: 50px;
}
.hero h1 span{
color:
#ff2963;
}
textarea{
width: 600px;
height: 250px;
background:
#403d84; color:
#fff;
font-size:
15px;
border: 0;
outline: 0;
padding: 20px;
border-radius:
10px; resize:
none; margin-
bottom: 30px;
}
textarea::placehol
der{ font-
size: 16px;
color: #ddd;
}
20
.row{
width: 600px;
display:flex;
align-
items:center;
gap: 20px;
}
21
button{
background-color:
#ff2963; color: white;
font-size:16px
30px; border-
radius:35px;
border:0;
outline:0;
cursor: pointer;
}
button{
background-color:
#ff2963; color:#fff;
font-size: 16px;
padding: 10px
30px; border-
radius: 35px;
border: 0;
outline: 0;
cursor:
pointer;
display:
flex;
align-items: center;
}
button img{
width: 16px;
margin-right:
10px;
}
select{
flex:1;
color:#fff;
background:#403
d84; height:
50px; padding:0
20px; outline:
0;
border: 0;
border-radius:
35px;
appearance:
none;
background-image: url(https://static.thenounproject.com/png/1123247-
200.png) ; background-repeat: no-repeat;
background-size: 15px;
background-position-x: calc(100%-
20px); background-position-y: 20px;
}
22
Testing:
● In this phase, we want to test our code and debug it if necessary.
● After debugging, We have to do corrections to the code and run
it.
● And now it will display the results and we are moving to the
next phase.
23
The Voice Processing Module
The Voice Processing Module In this module text is converted
to speech. The output of OCR is the text, which is stored in a
file (speech. txt). Here, Festival software is used to convert
the text to speech. Festival is an open source Text To Speech
(TTS) 7,8 system, which is available in many languages. In this
project, English TTS 9–11system is used for reading the text.
24
Output
25
FUTURE SCOPE
Another area of further work is the implementation of a
text to speech system on other platforms, such as telephony
systems, ATM machines, video games and any other platforms
where text to speech technology would be an added advantage and
increase functionality.
26
Conclusion:
Text to speech synthesis is a rapidly growing aspect of
computer technology and is increasingly playing a more
important role in the way we interact with the system and
interfaces across a variety of platforms. We have identified the
various operations and processes involved in text to speech
synthesis. We have also developed a very simple and attractive
graphical user interface which allows the user to type in
his/her text provided in the text field in the application. Our
system interfaces with a text to speech engine developed for
American English. This already exists in some native languages
e.g., Konkani , the Vietnamese synthesis system and the
Telugu language .
27
REFERENCES
1. Dutoit, T., Pagel, V., Pierret, N., Bataille, F., van der
Vrecken, O., 1996. The MBROLA Project: Towards a
set of high quality speech synthesizers of use for
noncommercial purposes. ICSLP Proceedings.
2. Text-to-speech (TTS) Overview. In Voice RSS
Website. Retrieved February 21, 2014, from
http://www.voicerss.org/tts/
3. Text-to-speech technology: In Linguatec Language
Technology Website. Retrieved February 21, 2014,
from
http://www.linguatec.net/products/tts/
information/t echnology
4. Dutoit, T., 1997. High-Quality Text-to-Speech
Synthesis:An Overview. Journal Of Electrical And
Electronics Engineering Australia 17, 25-36.
28
29