Madhuben & Bhanubhai Patel Institute of Technology
(A Constituent College of CVM University)
New V. V. Nagar
INFORMATION TECHNOLOGY DEPARTMENT
Seminar Report
on
Voice Assistant
Submitted By
Name of Student : BARIYA RIYABEN ANILBHAI
Enrolment Number : 12102080701102
SEMINAR (102040404)
A.Y. 2022-23 EVEN TERM
CERTIFICATE
This is to certify that the seminar report submitted along with the seminar entitled
“VOICE ASSISTANT” has been carried out by “RIYA BARIYA” under my
guidance in partial fulfillment in Information Technology, 4th Semester of
Madhuben and Bhanubhai Patel Institute of Technology Under Charutar Vidya
Mandal University, Vallabh Vidyanagar during the academic year 2022-23.
Table of Contents
Acknowledgements i
Abstract ii
List of Tables (if any) iii
List of Figures (if any) iv
1. Introduction 1
1.1. GENERAL HEADING 1
1.2. GENERAL HEADING 2
1.2.1 General Sub-heading 5
1.2.2 ........... 12
1.2.3 ............ 30
1.3 . . . . . . . . . . .. . . . . . . 45
1.4 .................. 58
2. Literature Reviews …….. 69
2.1. GENERAL HEADING 75
2.2. ……………………….. 80
2.3. ……………. 90
3. Proposed Work …………….. 93
3.1. Problem Definition …………….. 95
3.2. Solution approach …………….. 97
3.3. Results and Discussions …….... 99
4. Conclusions 100
Appendix A (if any)………………… 102
Appendix B (if any) …………… 104
References
ACKNOWLEDGEMENTS
The success and outcome of this project were possible by the guidance and
support from many people. I am incredibly privileged to have got this all along
with the achievement of my project. It required a lot of effort from each
individual involved in this project with me and I would like to thank them.
I appreciate and thank prof.pratik soni, for granting me an opportunity to do the
activity and providing me with all support and help, which made me
complete my work duly. I am thankful to our respected Principal and all faculty
members who directly or indirectly helped me with this.
I am thankful and lucky enough to get consistent encouragement, support and
supervision from all technical staffs which helped me in completing my activity
work. Also, I would like to continue my genuine esteems to all staff in laboratory for
their timely support.
Riya Bariya
III
ABSTRACT
In today's world, most Internet applications still establish user authentication
with a traditional text-based password. Designing a secure as well as user-
friendly password-based method has long been on the agenda of security
researchers. On the one hand, there are password manager programs that
make it easy to create site-specific strong passwords from a single user\'s
password to eliminate the memory burden caused by multiple passwords.
We offer different levels of authentication such as Textual Authentication,
Image Authentication and Audio Authentication to provide better security for
applications. User will select username and password while registering in text
step. During registration the user has to enter the registered username and
password, if it matches the database then the user can log in to the system.
In Image Authentication Model, we take image as input from user at the time
of registration and put quid point, qud point is selected part of image which
is selected by user.
At the time of login the user has to select the image and select the part of
the image which he/she wants to include at the time of registration, which is
called as Cued Points.
The most common computer authentication method is to use alphanumerical
usernames
and passwords. This method has been shown to have significant drawbacks.
For example,
users tend to pick passwords that can be easily guessed. On the other hand,
if a password
is hard to guess, then it is often hard to remember.
The most common computer authentication method is to use alphanumerical
usernames
and passwords. This method has been shown to have significant drawbacks.
For example,
users tend to pick passwords that can be easily guessed. On the other hand,
if a password
is hard to guess, then it is often hard to remember.
With increasing technical advancements the world is becoming digital at a
high pace and everything is happening online. From paying your bills to
ticket bookings to paying the person sitting next to you, you prefer to pay
online. Not only payments but all activities, be it, communication through e-
mails and messaging apps, keeping your documents in a digital locker, etc
happen online. With everything turning online, the risk of cybercrimes and
privacy breaches is also increasing. Passwords play a huge role in keeping
your data safe online as well as offline platforms. Passwords are the default
method of authentication to get access to our accounts. There are various
types of authentication available for users to secure their accounts.
IV
CHAPTER 1: INTRODUCTION
1.1:- Context
This project is based on Android application development and provide personal
assistant using voicerecognition or text mode operation. This program includes the
functions and services of: callingservices, text message transformation, mail
exchange, alarm, event handler, location services, music player service,
checking weather, Google searching engine, Wikipedia searching engine, robot
chat,camera, Bing translator, Bluetooth headset support, help menu and Windows
azure cloud computing.As it integrates most of the mobile phone services for daily
use, it could be useful for getting a moreconvenient life and it will be helpful for
those people who have disabilities for manual operations.This is also part of the
reason why it has been chosen as the degree project.
A voice assistant is a digital assistant that uses voice recognition, language
processing algorithms, and voice synthesis to listen to specific voice commands and
return relevant information or perform specific functions as requested by the user.
1.2:-Aim and Purpose
According to the overall description in the context, the purpose of the project is to
develop anAndroid application that provides an intelligent voice assistant with the
functionalities as callingservices, message transformation, mail exchange, alarm,
event handler, location services, music playservice, checking weather, searching
engine (Google, Wikipedia), camera, Bing translator, Bluetoothheadset support, help
menu and Windows azure cloud computing.Many years ago, software programs
were developed and run on the computer. Nowadays, smart phones are widely used
by all people. About 35 percent of the Americans have some sort ofSmartphone. This
shows that the market is increasing fast and there are also more capabilities
forSmartphone because of this wide use.
Therefore, the software development on the Smartphone is very promising. The
operation modes onthe Smartphone are by working with gestures and through the
keyboard. It is not a convenient wayfor users with completely manually input.
1.3:-Method and resources
This project mainly concerns the work on Android application development; request
calling betweendifferent Android applications, human-mobile phone
interaction,database creation and management,the program will reference a lot of
APIs from Google, Wikipedia, and Android development skills.
There is also some investigation works on the existed products in thisarea and
the tendency of voice product, personal assistant developing. Two products were
mainlyinvestigated that are popular and representative, the English product of “Siri”
and the Chinese product of “iFly”.
The investigation focus on how those ideasoriginated; what functionalities and
services they have; how they provide these services to thecustomers; test the product
and related functions to get the architect, structure, logical algorithms ofthose
products; how they spread and promote the product in marketing; and how they
refine andupgrade the products from different versions. Table-1 shows the
comparison about some basicfunctions between “Siri” and “iFly”.
Function Siri iFly
Call Service Yes Yes
SMS Message Service Yes Yes
Open Application No Yes
Web Search Service Google Search E Baidu Search Engine
ngine
Reminder 24h Unlimited
Music Play Local Library Local + Remote Library
Command Text Modify Yes No
Language English & Chinese
French &
German&
Japanese
have. This may be done in a
business environment, for
example, on the business
website, with a
chat interface. Virtual assistants
can tremendously save you
time. We spend hours in online
research
and then making the report in
our terms of understanding .
One of the main advantages of
voice searches is their rapidity.
In fact, voice is reputed to be
four
times faster than a written
search: whereas we can write
about 40 words per minute, we
are capable of speaking around
150 during the same period of
time15. In this respect, the
ability of personal assistants to
accurately recognize spoken
words are a
prerequisite for them to be
adopted by consumers .
CHAPTER - 2
PURPOSE, SCOPE AND
APPILCABILITY
have. This may be done in a
business environment, for
example, on the business
website, with a
chat interface. Virtual
assistants can tremendously
save you time. We spend
hours in online research
and then making the report in
our terms of understanding .
One of the main advantages of
voice searches is their
rapidity. In fact, voice is
reputed to be four
times faster than a written
search: whereas we can write
about 40 words per minute,
we are capable of speaking
around 150 during the same
period of
time15. In this respect, the
ability of personal assistants
to accurately recognize
spoken words are a
prerequisite for them to be
adopted by consumers .
CHAPTER - 2
PURPOSE, SCOPE AND
APPILCABILITY
CHAPTER-2:PURPOSE,SCOPE AND APPLICABILITY
have. This may be done in a business environment, for example, on the
business website, with a
chat interface. Virtual assistants can tremendously save you time. We spend
hours in online research
and then making the report in our terms of understanding .
One of the main advantages of voice searches is their rapidity. In fact, voice is
reputed to be four
times faster than a written search: whereas we can write
about 40 words per minute, we are capable of speaking around 150 during the
same period of
time15. In this respect, the ability of personal assistants to accurately
recognize spoken words are a
prerequisite for them to be adopted by consumers .
CHAPTER - 2
PURPOSE, SCOPE AND APPILCABILITY
2.1:-PURPOSE
Purpose of virtual assistant is to being capable of voice interaction, music playback,
making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and
providing weather, traffic, sports, and other real-time information, such as news.
Virtual assistant sense able users to speak natural language voice commands in order
to operate the device and its apps. There is an increased overall awareness and a
higher level of comfort demonstrated specifically by millennial consumers. In this
ever-evolving digital world where speed, efficiency, and convenience are constantly
being optimized, it’s clear that we are moving towards less screen interaction.
2.1.1:-PURPOSE OF VOICE ASSISTANTS
The primary purpose of developing voice recognition software is to make
everyday life easier in business and private life. Voice assistants can perform
many tasks for your business while you are occupied with something else. In
addition, they can replace a human being in completing specific tasks, such as:
answering questions, making calls, creating to-do lists, and much more. They,
therefore, allow you to save time and help you multitask.
An essential aspect of speech recognition software is that it can significantly
facilitate the functioning of people with disabilities.The assistant can search for
and read information, run various functions, or even order a taxi using only
voice commands. This significantly facilitates accessibility for people with
disabilities, as they do not need to be able to operate a screen or see specific
information at all.
2.1.2:-VOICE ASSISTANTS WORK
Communicating with a voice assistant is so simple it almost makes us forget
how impressive the technology behind it actually is.
Fig.2.1.2
If you’ve ever used a voice assistant on your phone, then you know that it first
needs a keyword that activates it, like ‘OK, Google’. When you pose aquestion
or give a command to your voice assistant it is converted thanks to the use
of Automatic Speech Recognition (ASR), which enables the device to
recognise and translate it from speech to text.
ow that the machine knows what was said and (presumably) what your
intention was, it can search for a valid answer, and respond accordingly. The
response is converted from written to spoken form using a text-to-speech
(TTS) technology, and voila: your voice assistant speaks.
2.2:-SCOPE
Voice assistants will continue to offer more individualizedexperiences as they get
better at differentiating between voices. However, it’s not just developers that need
to address the complexity of developing for voice as brands also need to understand
the capabilities of each device and integration and if it makes sense for them specific
brand. They will also need to focus on maintaining a user experience that is
consistent
within the coming years as complexity becomes more of a concern. This is because
the visual interface with voice assistants is missing. Users simply cannot see or
touch a voice interface.
There are a range of things that voice assistants can do which include:
Contract thermostats, lights and locks
Send email/text messages or initiate calls
Buy items online
Locate lost smartphones or other devices
Check traffic conditions and map travel routes
2.2.1:-- SCOPE OF VOICE ASSISTANTS IN EVERYDAY LIFE
Voice assistants are bots powered by artificial intelligence, voice recognition,
and natural language processing (NLP) to answer questions and hold
conversations audibly. While text-based interfaces require machines to process
text, analyze it, and map out a response, voice assistants do this audibly. In
simple terms, you could speak to voice assistants out loud instead of having to
click on call-to-action buttons or type out your question.
2.2.2:-SOME BOTS USE PASSIVE LISTENING
Voice assistants like Alexa, Cortana, and other consumer-facing bots are
considered passive listening devices. This essentially means the assistant is
constantly monitoring its surroundings for trigger words. Once the trigger
word isOther voice assistants like Siri or Google Assistant have options to either
be passive listeners or tap/touch activated.
2.2.3:-VOICE RECOGNITION FOLLOWS:-
The bot has been activated and now it’s ready to listen, but how exactly does it
know what it’s listening to? This is made possible with voice recognition
software, a subset of artificial intelligence and deep learning. Sound waves are
converted into structured, more understandable data for the machine to
process. Everything from tone, pitch, volume, and the precision of speech will
be factored in with voice recognition.
2.3:-APPLICABILITY
The mass adoption of artificial intelligence in users’ everyday lives is also Fuelling
the shift towardsvoice. The number of IoT devices such as smart thermostats and
speakers are giving voice assistantsmore utility in a connected user’s life. Smart
speakers are the number one way we are seeing voice being used. Many industry
experts even predict that nearly every application will integrate voice technology in
some way in the next 5 years. The use of virtual assistants can also enhance the
system of IoT (Internet of Things). Twenty years from now, Microsoft and its
competitors will be offering personal digital assistants that will offer the services of a
full-time employee usually reserved for the rich and famous.
Fig.2.3
2.3.1 VOICE SEARCH
Voice search is, arguably, the most common use of voice recognition. Reportedly, in
2022, in the US alone, 135.6M users would have used a digital assistant at least once
a month.
Moreover, according to a PWC survey, using a voice assistant to search for
something was the preferred method of 71% of participants (Figure 1).
Source: PWC.
2.3.2:SPEECH TO TEXT
Voice recognition enables hands-free computing. Its use cases include, but are not
limited to:
Writing emails
Composing a document on Google Docs
Automatic closed captioning with speech recognition (i.e., YouTube)
Automatic translation
Sending texts ( Figure 2)
Source: PWC
CHAPTER - 3 REQUIREMENT AND ANALYSIS
System Analysis is about complete understanding of existing systems and finding
where the existing system fails. The solution is determined to resolve issues in the
proposed system. It defines the system. The system is divided into smaller parts.
Their functions and inter relation of these modules are studied in system analysis.
The complete analysis is followed below.
3.1:-Problem definition
Usually, user needs to manually manage multiple sets of applicationsto complete
one task. For example, a user trying to make a travel plan needs to check for airport
codes for nearby airports and then check travel sites for tickets between
combinations of airports to reach the destination. There isneed of a system that can
manage tasks effortlessly. we already have multiple virtual assistants. But we hardly
use it. There are number of people who have issues in voice recognition.
3.1.1:-FISHBONE DIAGRAM
The fishbone methodology allows you to visualize an issue as having multiple
rootcauses that you can classify together. This diagram assesses all the possible
causes of a problem and breaks down its components and symptoms. These
aspects of an issue may relate to controls, technology, culture, procedures,
processes, and the environment.
While these are general concepts, they apply uniquely to different problems. A
fishbone analysis is usually the final output of a problem analysis. You can use
the diagram to develop strategies and make recommendations about how to fix
a problem. For instance, in the case of a power outage, the fishbone diagram
may represent the various causes of the problem, including failed incident
management.
3.1.2:-CAUSE AND EFFECT ANALYSIS
This method of analysis approaches a problem by examining its causesand
effects. This method explores both the direct and the indirect causesand effects
of the problem. By establishing a linear connection from the root cause of a
problem to its subsequent effects, you can better understand it.
This analysis is highly comprehensive, as it requires you to identify all the
causes
of an issue while also determining their level of contribution to the problem.
Using a cause-and-effect analysis means considering that one problem may
have multiple effects that may be difficult to trace. This method of analysis is
most beneficial if you have proper access to and understand all the material
factors surrounding a problem.
3.2:-FEASIBILITY STUDY
Feasibility study can help you determine whether or not you should proceed with
your project. It is essential to evaluate cost and benefit. It is essential to evaluate cost
and benefit of the proposed system. Five types of feasibility study are taken into
consideratioFeasibility study can help you determine whether or not you should
proceed with your project. It is essential to evaluate cost and benefit. It is essential to
evaluate cost and benefit of the proposed system. Five types of feasibility study are
taken into consideration.
3.3:-TECHNICAL FEASIBILITY
It includes finding out technologies for the project, both hardware and software. For
virtual assistant, user must have microphone to convey their message and a speaker
to listen when system speaks. These are very cheap now adaysand everyone
generally, possess them. Besides, system needs internet connection. While using it,
make Sure, you have a steady internet connection. It is also not an issue in this era
were almost every home or office has Wi-Fi.
3.3.1:-PURPOSE OF A TECHNICAL FEASIBILITY STUDY
A technical feasibility study helps find the answers to the following questions:
Is it possible to develop the product with the available technology in the
company?
Is the organisation equipped with the necessary technology for project
completion?
Are there technically strong employees who can deliver the product on time
and within budget using the available technology?
Is there scope in the company's budget to add more technical resources?
CHAPTER-4 TECHNOLOGY BEHIND VOICE ASSISTANTS
Voice assistants use Artificial Intelligence and Voice recognition to
accurately and efficiently deliver the result that the user is looking
for. While it may seem simple to ask a computer to set a timer, the
technology behind it is fascinating.
4.1:-VOICE RECOGNITION
Voice recognition works by taking an analog signal from a users voice and turning it
into a digital signal. After doing this, the computer takes the digital signal and
attempts to match it up to words and phrases to recognize the users intent. To do this,
the computer requires a database of pre-existing words and syllables in a given
language to be able to closely match the digital signal with. Checking the input
signal with this database is known as pattern recognition, and is the primary force
behind voice recognition.
4.1.1:-SPEECH RECOGNITION ALGORITHMS
The vagaries of human speech have made development challenging. It’s
considered to be one of the most complex areas of computer science –
involving linguistics, mathematics and statistics. Speech recognizers are made
up of a few components, such as the speech input, feature extraction, feature
vectors, a decoder, and a word output. The decoder leverages acoustic models,
a pronunciation dictionary, and language models to determine the appropriate
output. Speech recognition technology is evaluated on its accuracy rate, i.e.
word error rate (WER), and speed.
4.2:-ARTIFICIAL INTELLIGENCE
Artificial intelligence is using machines to simulate and replicate human
intelligence.
In 1950, Alan Turing (The namesake of our company) published his paper
“Computing Machinery and Intelligence” that first asked the question, can machines
think? Alan Turing then went on to develop the Turing Test, a method of evaluating
a computer to test its capability of thinking like a human. There were four
approaches later developed that defined AI, Thinking humanly/rationally, and acting
humanly/rationally. While the first two deal with reasoning, the second two deal with
actual behavior.
4.3:-Machine Learning
Machine learning refers to the subset of Artificial Intelligence where programs are
created without the use of human coders manually creating the program. Instead of
writing out the complete program on their own, programmers gives the AI “patterns”
to recognize and learn from and then gives the AI large amounts of data to sift
through and study. So instead of having specific rules to abide by, the AI searches
for patterns within this data and uses it to improve its already existing functions.
CHAPTER:5 BENEFITS OF VOICE ASSISTANTS
Some examples of what a Voice Assistant can do include:
Check the weather
Turn on/off connected smart devices
Search databases
One of the main reasons of the growing popularity of Voice User Interfaces (VUI) is
due to the growing complexity within mobile software without an increase in screen
size, leading to a huge disadvantage by using a GUI (Graphical User Interface)
5.1:-EFFICIENCY AND SAFETY
While typing has become much faster as people have gotten used to using standard
keyboards, using your voice will always be quicker, much more natural, and lead to
less spelling errors. This leads to a much more efficient and natural intelligent
workflow.
5.2:-QUICK LEARNING CURVE
One of the greatest benefits of voice assistants is a quick learning curve. Instead of
having to learn how to use devices like mice and touch screens and get used to using
specific physical devices, you can just use your natural conversation tendencies and
use your voice.
5.3:-WIDER DEVICE INTEGRATION
Since a screen or keyboard isn’t necessary, it’s easy to place voice integration into a
much wider array of devices. In the future, smart glasses, furniture, appliances, will
all come with voice assistants already integrated into the device.
CHAPTER:-6 Popular voice assistants
There’s a variety of voice assistants currently on the market, but some of them have become so
widely used, it’s pretty much a given you’ve heard about them in one way or another.
Let’s take a closer look at what they have to offer.
Siri
As we’ve mentioned, Apple’s Siri was the first voice assistant to become available on such a
large scale. Being a mobile-based voice assistant, Siri is able to perform lots of different tasks,
like making phone calls, sending text messages, taking photos, answering questions and showing
weather forecasts. Additionally, Siri is also used for devices compatible with HomeKit – Apple’s
home automation system.
Google Assistant
Google Assistant was introduced into the market in 2016, and deployed to Android devices a
year afterwards, making it a natural rival to Siri. The tasks Google Assistant can perform don’t
vary much from what Siri is capable of, though it is sometimes referred to as ‘the smarter’ out of
the two. Google Assistant can also be used to control smart devices compatible with Google
Home.
Amazon Alexa
Contrary to Google Assistant and Siri, Alexa is speaker-based, meaning it’s not installed directly
on your smartphone, but rather receives your commands through Amazon Echo speakers. It is
considered the best voice assistant for home automation, as it integrates with a large variety of
smart home devices.
Microsoft Cortana
Developed in 2014, Cortana is an assistant created mainly for the purpose of Windows 10 and
Microsoft 365 products, making it possible for users to organise and manage their work more
efficiently. Even though it is similar to Siri and Google Assistant in its abilities, Microsoft has
been struggling to keep up with the big players, and as of 2021 Cortana is no longer supported
for iOS and Android.