THRUVOX: Innovative Language App Report
THRUVOX: Innovative Language App Report
“ THRUVOX ”
Group Leader : Thilakesan Vishnujan
Group Members
I solemnly declare that the content presented in this individual report is the result of my own work and
research. The information provided is accurate and authentic to the best of my knowledge and it has not
been submitted before nor is it being submitted for any other degree Programs. This report has been
prepared as part of SDGP at University of Westminster, UK.
THRUVOX - SE- 80 2
Abstract + Keywords
Start a language changing trip with our amazing app, carefully made for turning English into Tamil
PDF and shortening it. This report shows the great things about technology. It explains how a tool
works that easily moves past language problems and makes information simple.
Look at how new natural language processing (NLP) methods and top-notch machine learning
models are combined to change our app's way of translation or summarizing. Beyond the
complicated parts, explore a user-focused design that promises an easy and fun experience. Report
shows how the combination of new ideas and everyday use can turn a complex PDF into simple
translations.
Work with the comparison study that shows how good our app is, and it's better than what we have
now. Be part of a big change in how we see and talk with many languages. We are opening up to
a world where language is no longer an obstacle but connects people together. design creates a
translation and summarization experience from English to Tamil that has never been seen before.
This report calls you, asking to explore the ideas of innovation that promise to change how we
understand and speak through writing.
THRUVOX - SE- 80 3
Acknowledgement
First off we would like to thank our parents and family for their support, continuously inspiring us
to be the best versions of ourselves and motivating us through all hardships. We would also like
to express our sincere gratitude to Mr. Banuka Athuraliya and the entire Software Development
Group Project module team for always making time to attend to each and every problem we have
encountered throughout this module, the support they have given and for their teachings. We
would also like to extend our gratitude to Miss Sapna Kumarapathirage, our mentor for always
providing us with valuable insight and always directing us the right way when we were facing
challenges. A special thanks should also go out to all those who took their time to fill out the
questionnaire and everyone who has listened and answered to our countless queries regarding
different aspects of the project.
THRUVOX - SE- 80 4
Table of Contents
THRUVOX - SE- 80 5
........................................................................................................................................... 7
........................................................................................................................................... 8
6.3.3. UI Design and mockups – des/high fidelity prototype ....................................................... 9
6.3.4. Activity Diagram .........................................................................................................10
6.4. Chapter Summary ..............................................................................................................11
Reference .................................................................................................................................... 1
Appendix..................................................................................................................................... 3
Table of Figures
List of Table
THRUVOX - SE- 80 6
Abbreviations Table
Abbreviations Explanation
AI Artificial Intelligence
FR
Functional Requirement
NR Non-Functional Requirement
ML Machine Learning
THRUVOX - SE- 80 7
SDGP Software Development Group Project
UI User Interface
UX User Experience
THRUVOX - SE- 80 1
Chapter 4: System Requirements Specification
This chapter is mainly focused on how the web application is expected to perform and the
functionalities of the software. This chapter will include an analysis of the stakeholders which
will be shown by using an onion model, the techniques which will be used for elicitation
purposes, use case diagrams will be used in order to indicate the high-level functions and the
scope of the system, functional and nonfunctional requirements in order to describe the
functions performed by the web application and to describe the quality attribute of the software.
THRUVOX - SE- 80 1
Chapter 4: System Requirements Specification
THRUVOX - SE- 80 2
Chapter 4: System Requirements Specification
Stakeholder Viewpoint
Functional beneficiary
User Those who use the developed system for their needs.
eg:- education, personal.
Financial beneficiary
Social beneficiary
Operational beneficiary
System Admin Wants to monitor the web application and the system to
maintain the standards and objectives of the project
Negative Stakeholders
THRUVOX - SE- 80 3
Chapter 4: System Requirements Specification
Regulatory
Neighboring systems
Brainstorming Method 01
Brainstorming sessions can be conducted with our own group members, colleagues and experts to
collect ideas to expand the system and add extra features which will suit the customer for a user-
friendly design. Friends, colleagues and experts participate in brainstorming sessions to consider
new ideas on how to add features and improve the system. It is also a time-consuming process. a
way to consume it, and can also be very effective; there are many giant ideas. go up when people
pull together.
THRUVOX - SE- 80 4
Chapter 4: System Requirements Specification
Through the method of Literature review, concentration will be focused on comparing like
systems. come within the purview of pdf translation. a valid library resource. Concern one that
IEEE and other online resources can help to address. In the field, what features are offered by
similar systems functionalities. with the strengths and weaknesses. This will provide a guide in
developing Thruvox prototype what it should have and what is not suitable.
An online questionnaire will be sent out to the general public mainly aiming at the users who are
expected to use the translation website after the implementation. Questionnaires are an efficient
approach to gather requirements from a large number of stakeholders within a short duration of
time. This questionnaire will help us to grasp the user expectations as well as the behavioral
patterns of the user which will eventually be taken into consideration when implementing the
THRUVOX. We can get to know if there's any other features that the users are expecting from
this kind of web application.
Having Interviews with Tamil professors and with domain translation experts can help to clarify
the uncertainties before implementing the prototype application. Unlike other requirement
elicitation techniques in an interview, we can query them instantly to get it clarified without
much hustle. These personnel will get the opportunity to express and explain in detail the
answers they provide in the questionnaire. This will be helpful to identify the view of students
and the likelihood requirements of having certain functions in the system. A conversation with
experts will give better direction and technical knowledge about the project. Depending on the
situation, these interviews will be face to face.
Observations Method 05
THRUVOX - SE- 80 5
Chapter 4: System Requirements Specification
The system can be enriched further by observing user and domain behavior, and also having the
accuracy of its operation be at a standard where you could actually say it works at a higher level.
● The questionnaire was distributed on November 25th, 2023 and 85 responses were
collected over a period of two weeks. The following information presents a detailed
analysis of the results of the questionnaire.
THRUVOX - SE- 80 6
Chapter 4: System Requirements Specification
THRUVOX - SE- 80 7
Chapter 4: System Requirements Specification
THRUVOX - SE- 80 8
Chapter 4: System Requirements Specification
THRUVOX - SE- 80 9
Chapter 4: System Requirements Specification
Description Allows the user to upload a PDF file to the system for
translation and summarization
Priority High
Supporting Actors -
THRUVOX - SE- 80 10
Chapter 4: System Requirements Specification
Priority High
Supporting Actors -
THRUVOX - SE- 80 11
Chapter 4: System Requirements Specification
Description Allows the user to translate a previously uploaded PDF file into
selected language.
Priority High
Supporting Actors -
THRUVOX - SE- 80 12
Chapter 4: System Requirements Specification
Post Conditions Translated PDF file is generated and available for viewing or
download.
Supporting Actors -
THRUVOX - SE- 80 13
Chapter 4: System Requirements Specification
fails.
Description Allows the user to view a preview of a PDF file before download.
Priority Medium
THRUVOX - SE- 80 14
Chapter 4: System Requirements Specification
Supporting Actors -
Trigger -
Post Conditions User has a clear visual representation of the translated PDF
content.
THRUVOX - SE- 80 15
Chapter 4: System Requirements Specification
Description Allows the user to modify the PDF file before download.
Priority Medium
Supporting Actors -
Trigger User clicks the "Edit PDF" button or link associated with a PDF
file
Main flow Actors System
User makes desired changes Systemr saves the edited PDF file
to the PDF content
THRUVOX - SE- 80 16
Chapter 4: System Requirements Specification
Post Conditions Edited PDF file is saved and available for further actions.
Description Allows the user to combine multiple PDF files into a single file
for streamlined translation
Priority Medium
Supporting Actors -
Pre-Conditions User must have uploaded multiple PDF files to the system
Trigger User selects the "Merge PDFs" option before the translation
User selects the PDF files to System merges the PDF files into
be merged a single, combined PDF
THRUVOX - SE- 80 17
Chapter 4: System Requirements Specification
Post Conditions Merged PDF file is created and ready for translation.
Priority High
Supporting Actors -
Trigger User clicks the "Download PDF" button or link associated with
the desired file.
Main flow Actors System
THRUVOX - SE- 80 18
Chapter 4: System Requirements Specification
Post Conditions PDF file is downloaded and saved on the user's device.
· Critical – The requirements that are critically needed in the successful completion
· Desirable – The requirements that can add value, but are not required immediately
· Luxury – The requirements that would add luxury to the system
FR1 Upload PDF Critical The user must be able to upload a PDF
document to the system for translation
and summarization.
FR2 Select Language Critical The user must be able to select the target
language for translation (Tamil)
FR3 Translate PDF from Critical The app must accurately translate the
English to Tamil text content of a PDF document from
English to Tamil
THRUVOX - SE- 80 19
Chapter 4: System Requirements Specification
FR4 Summarize translated Critical The system must generate a concise and
language PDF accurate summary of the translated
Tamil text.
FR5 Preview translated Desirable The user should be able to preview the
language PDF before translated PDF and summary before
download
downloading.
FR6 Edit translated language Desirable The system could allow the user to edit
PDF before download the translated PDF document before
downloading.
NF3 Usability High The app should be easy to use and navigate
for users with varying technical expertise
THRUVOX - SE- 80 20
Chapter 4: System Requirements Specification
As this chapter was focused on the system requirements, it looked at the appropriate
stakeholders, the techniques used for elicitation purposes and execution of them , a use case
diagram, use case description and the functional and non-functional requirements along with
their priority levels and descriptions.
THRUVOX - SE- 80 21
Chapter 5: Social, Legal, Ethical and Professional Issues
In the previous chapter the System Requirement Specifications were explained and depicted
in detail. This chapter will elaborate how the main Social, Legal, Ethical and Professional
Issues are addressed and mitigated by the group in order to complete this project for the
Software Development Group Project.
The English to Tamil PDF translation and summary project obtained datasets from various
platforms to ensure diverse and comprehensive datasets,
The dataset used in this work is obtained from the GitHub repository by the authors of the paper
"Neural Machine Translation from Tamil to English" (Jain, Punia, Hooda) and published in the
Journal of Statistics and Management Systems in 2020. This repository , last accessed June 12,
2021 The updated, provides an important and high-quality corpus of 236,427 English-Tamil
synonyms. Notably, the authors continued to improve the dataset by adding more sentences to
ensure it was relevant and rich for training analysis.
Effectively formatted information is scaled into six files, making it simple and easy to share. In
their experiment, the authors designed and tested two different architectures based on Encoder-
Decoder for translating Tamil into English. Notably, the authors aimed to solve challenges such as
the problem of polysemy after words, which made a valuable contribution to the field. In addition,
the authors performed several experiments, including pre-trained hidden words and tuning
hyperparameters to improve the translation quality
THRUVOX - SE- 80 1
Chapter 5: Social, Legal, Ethical and Professional Issues
The research paper presents and discusses the training example, which shows a remarkable
improvement by improving Google Translator by a significant difference of 7.5 BLEU scores. The
qualitative research involves people research by three Tamil scholars, besides providing nuanced
insights into the translation and accuracy of the translations, the paper provides translations by
Google Translator and suggested Tamil translators provides a comparative analysis. Below are the
citation of this dataset,
@article{jain2020neural,
title={Neural machine translation for Tamil to English},
author={Jain, Minni and Punia, Ravneet and Hooda, Ishika},
journal={Journal of Statistics and Management Systems},
volume={23},
number={7},
pages={1251--1264},
year={2020},
publisher={Taylor \& Francis}
The datasets used in this exercise are obtained from the Papers with Code repository, in particular
the XL-Sum dataset. This dataset is an important contribution to the abstract summarization field
because it includes a multilingual collection of 1.35 million article-summary pairs The authors
actively updated the dataset to include new definitions, using new languages ho, such as
Traditional Chinese, and various aspects were enhanced
The repository provides two versions of the dataset: the old version reported in the associated
document and the new version recommended for use. The final version boasts better structure,
larger analytical splits, duplication, and an increased size of 1.35 million pairs, making XL-Sum
the most extensive summary dataset available to the public there is It is ensured
THRUVOX - SE- 80 2
Chapter 5: Social, Legal, Ethical and Professional Issues
Importantly, the dataset is only available for non-commercial research purposes, under the
Creative Commons Attribution-Noncommercial-ShareAlike 4.0 International License (CC BY-
NC-SA 4.0), users are prompted to they do not comply with licensing laws, and copyright in
dataset content belongs to the original copyright holders.
To demonstrate eligibility, dataset authors ask users to identify a related document that uses any
part of the dataset, model, or code module. Quotes from the paper are provided as follows:
@inproceedings{hasan-etal-2021-xl,
title = "{XL}-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages",
author = "Hasan, Tahmid and
Bhattacharjee, Abhik and
Islam, Md. Saiful and
Mubasshir, Kazi and
Li, Yuan-Fang and
Kang, Yong-Bin and
Rahman, M. Sohel and
Shahriyar, Rifat",
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "[Link]
pages = "4693--4703",
}
Also Contents of this repository are restricted to only non-commercial research purposes under the
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-
NC-SA 4.0).
THRUVOX - SE- 80 3
Chapter 5: Social, Legal, Ethical and Professional Issues
During the second-year project, it is important to ensure that all social, legal, ethical and
professional issues are addressed. The social, legal, ethical and professional issues related to
“Thruvox” is detailed below,
5.3.1 Social
There are many important social factors that need to be deeply considered if one wants to
develop an app for summarizing and translating documents from English to Tamil. It is
important that the translation represents culture correctly and doesn't miss out on crucial details.
For this reason, the importance of being culturally sensitive is highlighted here. In order to make
the app appeal to all ages within the Tamil community, it needs to be built with an underlying
structure that includes a diverse set of dialects and varying regional tone. We can create an
accurate, relevant and complete summarization if we have a good grasp of the social, economic
and cultural backgrounds. To maintain gender equality while translating documents, it is
important that there are no gender-specific terms present to avoid reinforcing stereotypes. Issues
are likely to be spread out since users might put classified information; therefore, precautions
need to be taken. For an app to be deemed user-friendly there has to be made drastic
considerations just how accessible that the application is for one with varying digital needs
within their levels of abilities and skill. To maintain precision and avoid misinformation, there
should be quality control measures, so that complex content can also be assessed by humans. For
THRUVOX - SE- 80 4
Chapter 5: Social, Legal, Ethical and Professional Issues
a user to better understand the interface and how it works, guidelines must be provided by
developers. In order to make a summarized application and translation from English to Tamil, by
solving these social issues we can include everyone.
5.3.2 Legal
The project to build an app that translates English PDF documents into Tamil should involve
legal personnel as it will raise many legal concerns. There is a need to address potential
copyright issues through receiving proper licenses for modifying the content. Personal data can
only be handled after taking permission from the user, and some vital preventative measures
must be followed to ensure data safety. The application should be used as a tool by the
professionals, and it is important for the users of the app to know that in case there are
mistranslations from summaries then certain liabilities arise. It is necessary to have a clear
outline of rights and duties between users and providers through stating terms of uses because
this highlights the accuracy disclaimers. It is important for us to provide equal opportunities to
everyone, and ensure compliance of policies which facilitate individuals with special needs
because this will prevent any legal action. It is very crucial to meet government compliance
particularly in the areas that have specific language requirements or guidelines for online
applications. Biases that are unintended might lead to legal problems against the laws of anti-
discrimination so developers must keep this in mind. To avoid potential risks and to provide a
clear scheme for access, it is important to formulate an end-user license agreement. To make sure
that the corresponding company can comply together with legal obligations and decrease any
risks, attorneys who focus on these areas may be sought for counsel.
5.3.3 Ethical
Creating a software that translates English PDF to Tamil and provides the ability to summarize it
needs significant ethical decisions. The developers have the responsibility to make sure that there
are no biases in the algorithms. This is very important for content which is focused on issues
related to sensitive topics, as well as a diverse audience. As the application will be actively
involved with handling personal data, robust ideas, and measures to protect this data need to be
put in place. Users' consent must be gathered prior and there should be a consistent transparent
THRUVOX - SE- 80 5
Chapter 5: Social, Legal, Ethical and Professional Issues
mode of operations. To properly fulfill what the users expect, as well as deal with any system's
limitations appropriately there needs to be a clear line of communication in terms of what these
systems can and cannot do through the use of Informed Consent. One must take care of the
importance of difference in cultures to prevent any kind of bias and being rude. Legal issues like
copyright and intellectual property imply that developers need to get necessary permission. In
order to properly use the data, A business must follow GDPR laws (Good Data Protection
Series), and store the user data safely. If the summaries aren't correct it can lead to serious
problems in this way. To limit these issues, another important thing that needs to be done is
including a service disclaimer. It is necessary to ensure that the application can be accessed by
all individuals. This is important so as not to face legal challenges from the disabled individuals
who cannot access it. For the developers it is important to follow a particular set of standards
which involves language services, translations and online applications. End-user license
agreement (EULA) is beneficial for both parties as it highlights the clear terms of use,
limitations, and disclaimers. The ethical and professional demands must be met to ensure quality
results when using this application.
This chapter focuses on the challenges presented initially and what one needs to consider when
deploying technology meant to summarize documents in PDF format. The aspect of
professionalism focuses on abiding by ethical standards, and upholding a responsible
communication method both the capabilities and limitations. This chapter provides details about
the adaptation of PDF files. It gives readers an understanding of how to change and summarize
data in a documented file.
THRUVOX - SE- 80 6
Chapter 6: System Architecture & Design
THRUVOX - SE- 80 1
Chapter 6: System Architecture & Design
This architecture appears to be a traditional three-tier web application, likely for translating and
summarizing PDFs. The user interface sits on top, potentially a web page or mobile app, which
interacts with the middle tier, a business logic layer responsible for handling user requests and
orchestrating the translation and summarization tasks. Finally, the bottom tier consists of data
storage and external services, including translation and summarization engines.
THRUVOX - SE- 80 2
Chapter 6: System Architecture & Design
[Link] Diagram
THRUVOX - SE- 80 3
Chapter 6: System Architecture & Design
A class diagram is a visual representation of the classes, attributes, and relationships in a system.
It is an important tool in software development as it helps to understand the structure of a system
before it is built. In this class diagram, the class diagram portrays a sophisticated software system
designed for PDF translation and summarization, encompassing four core classes: UI Download,
T PDF Translate, NLP Processor etc. The User Interface coordinates user actions and controls the
presentation of upload alternatives, linguistic options, as well as download sequence. With the help
of Translate PDF, you can perform critical tasks like validating a PDFyou are working on , present
content to readers in an attractive manner and initiate editing processes as wellas translations from
text-to-HTML . NLP Processor leverages powerful language models to enhance translation and
summarization. The Download class makes it easy to get the final output. Is it elegant, then when
classes harmonize in the very concept of PDF translation and summarization that is user-oriented
as well as effective.
THRUVOX - SE- 80 4
Chapter 6: System Architecture & Design
THRUVOX - SE- 80 5
Chapter 6: System Architecture & Design
THRUVOX - SE- 80 6
Chapter 6: System Architecture & Design
THRUVOX - SE- 80 7
Chapter 6: System Architecture & Design
THRUVOX - SE- 80 8
Chapter 6: System Architecture & Design
UI Design and mockups des/high fidelity prototype has been moved to Appendix Section for
better clarity.
THRUVOX - SE- 80 9
Chapter 6: System Architecture & Design
THRUVOX - SE- 80 10
Chapter 6: System Architecture & Design
THRUVOX - SE- 80 11
Reference
Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-
occurrence statistics. Proceedings of the second international conference on Human Language
Technology Research, 138-145.
Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text
Summarization Branches Out, 74-81.
Nallapati, R., Zhai, F., & Zhou, B. (2017). SummaRuNNer: A recurrent neural network based
sequence model for extractive summarization of documents. Proceedings of the Thirty-First
AAAI Conference on Artificial Intelligence, 3075-3081.
W3C. (2019). Web Content Accessibility Guidelines (WCAG) 2.1. Retrieved from
[Link]
Rao, P., & Sankar, A. (2014). Cultural diversity and translation: A critical analysis. Language
and Intercultural Communication, 14(2), 159-175.
Subramonian, V. (2013). The Dravidian languages and the Indian sociolinguistic scene. The
Routledge handbook of sociolinguistics in India, 43-63.
Brundage, M., et al. (2018). The malicious use of artificial intelligence: Forecasting, prevention,
and mitigation. arXiv preprint.
THRUVOX - SE- 80 1
Koehn, P., & Knowles, R. (2017). Six challenges for neural machine translation. Proceedings of
the First Workshop on Neural Machine Translation, 28-39.
Rao, P., & Sankar, A. (2014). Cultural diversity and translation: A critical analysis. Language
and Intercultural Communication, 14(2), 159-175.
van der Voort, H., & Dankbaar, B. (2014). Stakeholders in online information services: A
literature review. Online Information Review, 38(2), 309-328.
Bhattacharjee, A., & Newman, M. (2007). Stakeholder analysis for information technology
projects in developing countries. The Journal of Development Studies, 43(8), 1249-1275.
THRUVOX - SE- 80 2
Appendix
THRUVOX - SE- 80 3
’
THRUVOX - SE- 80 4
THRUVOX - SE- 80 5