0% found this document useful (0 votes)
21 views29 pages

Project Update

The document outlines a mid-semester presentation on a rule-based Hindi to Indian Sign Language (ISL) translation system, highlighting the project's motivation to improve communication for India's deaf community. It discusses methodologies, including linguistic pre-processing, dependency parsing, and translation improvements, while also addressing limitations and future steps. The presentation emphasizes the need for context-aware translations and the integration of machine learning for enhanced translation accuracy.

Uploaded by

rishikakumari517
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views29 pages

Project Update

The document outlines a mid-semester presentation on a rule-based Hindi to Indian Sign Language (ISL) translation system, highlighting the project's motivation to improve communication for India's deaf community. It discusses methodologies, including linguistic pre-processing, dependency parsing, and translation improvements, while also addressing limitations and future steps. The presentation emphasizes the need for context-aware translations and the integration of machine learning for enhanced translation accuracy.

Uploaded by

rishikakumari517
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Mid-Semester Presentation on

Rule-Based Hindi to Indian Sign


Language Translation System

Department of Computer Science & Engineering


Institute of Engineering & Technology, Lucknow

Submitted By: Guided by:


Abhishek Srivastava (Roll No. 2100520100003) Dr. Girish Chandra
Govind Singh Tomar (Roll No. 2100520100029) Dr. Pawan Kumar Tiwari
Prashant Kumar Singh (Roll No. 2100520100049)
AGENDA
• Recap: Project Overview
• Literature Survey (Extended)
• Translation Improvement: Context-based Approach
• Linguistic Pre-processing Improvements
• Dependency Parsing
• Grammer Rules Based Reordering
• Stop-word Removal List
• Work Under Progress - Synonym Substitution
• Next steps
RECAP: PROJECT OVERVIEW

🔍 Motivation:

Millions in India’s deaf community lack access to effective communication tools.

Most existing systems cater to English; Hindi-to-ISL tools are rare.

Sign language is a visual language with unique grammar—direct translation


isn’t enough.

Bridging this gap promotes inclusivity and digital accessibility.


METHODOLOGY HIGHLIGHTS
• Tokenization,
Develop a rule-based NLP Pre- POS tagging,
translation system using processing dependency
🎯 OBJECTIVES

NLP techniques. parsing.

Accurately convert Hindi


• Convert to
text into grammatically Grammar ISL-
correct ISL Transformation compatible
structure.
Handle simple to  Apply
moderately complex word-to-
Hindi sentences sign
ISL Mapping dictionary
with
fallback
finger-
spelling.
📌
EXPECTE
D
OUTCOM
ES Incorporate
machine
Integrate real- learning for
High accuracy time speech- adaptive
in translation of to-sign translation.
Hindi to ISL. capabilities.
Expand to
more complex
sentence
Natural sign structures.
video outputs 🔭 FUTURE SCOPE
reflecting ISL
grammar.
LITERATURE SURVEY (EXTENDED)
📌 Previous Feedback:
• Included mostly local papers
• Suggested to reference international research from prestigious
publishers
📌 Action Taken:
Added 2 International Papers:
1. Survey Review Paper
Published in Springer
2. Experimental Research Paper
Published in ScienceDirect (Elsevier)
Kahlon, N. K., & Singh, W. (2023). Machine
translation from text to sign language: a
systematic review. Universal Access in the
Information Society, 22(1), 1-35.

~ Springer
• Classifies existing literature focusing on Sign Language Machine
Translation (SLMT) systems, sign synthesis, and evaluation
methods used to assess translation techniques.
• Systematically reviews conventional and state-of-the-art sign
language machine translation and sign language generation
projects.
METHOD USED
• A systematic literature review was conducted, following established guidelines.
• Electronic databases such as Springer, ScienceDirect, IEEE explore, Taylor and
Francis, and ACM Digital Library were searched.
• The search included keywords like "sign language," "machine translation," and "sign
generation/synthesis" in the titles or abstracts.
• The review focused on text-to-sign language translation, excluding studies based on
sign language recognition or sign-to-text translation.
• Both qualitative and quantitative research studies published up to 2020 were
included, with papers in languages other than English being excluded.
• Quality assessment was performed based on bias and internal and external validity
of results.
FINDINGS
• The review classifies SLMT approaches into rule-based, corpus-based (example-based, statistical,
hybrid), and neural machine translation.
• Rule-based machine translation (RBMT) is effective for small datasets but requires substantial
linguistic resources.
• Corpus-based machine translation (CBMT) relies on bilingual data and includes example-based
(EBMT), statistical (SMT), and hybrid (HBMT) approaches.
o Example-based machine translation (EBMT) is suitable for projects with limited datasets.
o Statistical machine translation (SMT) requires large bilingual corpora.
o Hybrid machine translation (HBMT) combines rule-based and corpus-based strategies.
• Neural machine translation (NMT) uses artificial neural networks and deep learning.
• Sign synthesis involves converting sign sequences into sign glosses, videos, or animated avatars.
• Evaluation methods include both automatic (e.g., WER, PER, BLEU, TER) and manual evaluations.
• The number of publications in the field has been consistent, with a shift from conventional to
contemporary techniques.
LIMITATIONS
• The review is limited to text-to-sign language translation, excluding the
reverse direction.
• Relevant articles published in languages other than English were not
included.
• There may have been disagreements during the inclusion and exclusion of
studies, but they were resolved through discussion.
San-Segundo, R., Barra, R., Córdoba, R., D’Haro, L.
F., Fernández, F., Ferreiros, J., ... & Pardo, J. M.
(2008). Speech to sign language translation
system for Spanish. Speech
Communication, 50(11-12), 1009-1020.

~ ScienceDirect (by Elsevier)

Explores the development of a Spanish to sign language translation system


METHOD USED
• The system translates Spanish sentences into Spanish Sign Language (LSE).
• It comprises a speech recognizer, a natural language translator, and a 3D avatar animation module.
• The speech recognizer decodes spoken utterances into word sequences.
• The natural language translator converts word sequences into sign sequences.
Two approaches were used for natural language translation:
o Rule-based translation: Uses expert-defined rules and computes sign confidence
measures from word confidence measures. The rule-based translation module contains 153
translation rules.
o Statistical translation: Uses parallel corpora to train a statistical model. The Phrase-based
translation system is based on the software released to support the shared task at the 2006
NAACL Workshop on Statistical Machine Translation.
• A 3D avatar animation module plays back the hand movements.
The avatar used is VGuido, developed in the eSIGN project.
FINDINGS
• The best configuration reported a 31.6% Sign Error
Rate (SER) and a 0.5780 BiLingual Evaluation
Understudy (BLEU) score.
• The rule-based translation strategy provided better
results than the statistical translation approach.
• The use of confidence measures from the speech
recognizer to compute a confidence measure for
each sign.
• A 40% delay reduction was achieved by modifying
the speech recognition system to report partial
recognition results every 100 ms.
LIMITATIONS
• Data sparseness due to the small amount of data significantly influenced the
decoding process.
• The high number of out-of-vocabulary (OOV) words in the test set.
There were 93 OOV words out of 532 (source language) and 30 OOV signs (target
language).
• The small size of available corpora.
• Errors in subject-verb agreement, complement detection, and the translation of
unknown words.
• Delay between the spoken utterance and the sign animation.
TRANSLATION IMPROVEMENT: CONTEXT-BASED
APPROACH
Issue
📌 Previous Feedback:
• Initial approach translated words
individually

Problem

• Lacked context awareness,


leading to incorrect meanings

📌 Solution Implemented:
✅ Switched to sentence-level translation using googletrans
✅ Considers context to determine correct word meanings
EXAMPLE
Sentence: कल मंगलवार था

Previous approach Modified approach

कल : Tomorrow Translation as a whole:


मंगलवार : Tuesday कल मंगलवार था: yesterday was
था: was Tuesday

Combining, Contextually correct


Tomorrow was Tuesday

But it is wrong contextually


MULTI-MEANING WORDS IN SINGLE SENTENCE

Sentence: अर्थ शास्त्र के सिद्धांतो के अर्थ समझना कठिन है

Here, the first अर्थ means “economics”,


and the second अर्थ means “meaning”

Translating them separately will cause wrong results.


But translating the whole sentence together using googletrans will result in:
“It is difficult to understand the meaning of the principle of economics”
which is the correct interpretation
LINGUISTIC PRE-PROCESSING IMPROVEMENTS
Input sentence: क्या बिना मेहनत के छात्र सफलता पा सकते हैं?

Previously
• Non-meaningful field names Now
• No explicit information about ranking of • Better field names
words. • Highest ranked word according to the
• The tree structure had to be created context of sentence listed explicitly
manually • Tree creation through code
DEPENDENCY PARSING: UNDERSTANDING SENTENCE STRUCTURE
Kiperwasser, E., & Goldberg, Y. (2016). Simple and accurate dependency parsing using bidirectional LSTM
feature representations. Transactions of the Association for Computational Linguistics, 4, 313-327.

• Dependency parsing analyzes the grammatical structure of a sentence by identifying relationships


between words.

• These relationships are represented as:


Dependency Type
Dependency arcs:

Head Word Dependent

Transition-based
Parsing
Approaches to
Dependency Parsing
Graph-based
Parsing
TRANSITION-BASED V/S GRAPH-BASED PARSING

Transition-based Parsing Graph-based Parsing


(e.g. Spacy) (e.g. Stanza)

Parsing as a sequence of transitions: Parsing as tree scoring and selection:


• Uses a stack and buffer to process words. • Assigns a score to all possible dependency arcs.
• Applies SHIFT, LEFT-ARC, and RIGHT-ARC • Uses machine learning models (e.g., MLP,
operations to build dependencies. biaffine classifier) for arc scoring.
Decision Making:
Tree Construction:
A classifier (often an MLP with BiLSTM features)
• Selects the highest-scoring dependency tree
predicts the best transition at each step.
using MST or Eisner's algorithm.
Tree Formation:
Dependency arcs accumulate to form a
dependency tree.
More globally optimized but computationally
Fast and efficient but may suffer from error expensive.
propagation.
DEPENDENCY PARSING IN THE STANZA PIPELINE
Dozat, T., & Manning, C. D. (2016).
Bi-LSTM-based deep Deep biaffine attention for neural
biaffine neural • Assigns a syntactic head to each word. dependency parsing. arXiv preprint
dependency parser arXiv:1611.01734.

• Bi-LSTMs create contextualized representations of words, incorporating both forward and


Feature
backward sentence contexts.
Representation

o Bi-LSTM outputs pass through dimension-reducing MLPs to create


representations for dependents and heads.
Deep Biaffine • A biaffine classifier calculates arc scores, determining the likelihood of
Attention for Arc one word being another’s head.
Prediction • The parser is trained to assign high scores to correct head-dependent
relationships.

o The root of the tree is linked to an artificial root


Root Word symbol.
Decision • The highest-scoring word for the artificial root is
chosen as the sentence root.
DEPENDENCY RELATIONS & LINGUISTIC FEATURES

Determining Children (Dependents):


 Each word’s head is identified based on highest arc score.
 A word’s dependents are all words for which it is predicted
as the head.

Labelling Dependency Relations: Linguistically Motivated Features:


 A biaffine classifier predicts labels (e.g., nsubj, dobj) for  Predicts linearization order and typical linear distance
each head-dependent pair. between words.
 Labels enhance syntactic structure understanding.  Improves parsing accuracy through linguistic insights.

Summary
 Bi-LSTMs generate contextual embeddings.
 Biaffine attention mechanism scores head-
dependent relationships.
 Biaffine classifier labels dependency relations
Automatic Dependency Parse tree creation
(Using networkx and matplotlib libraries)
Subject – Object – Verb
• Object before subject
• किताब पढ़ी राम ने।
• transformed to
GRAMMER • राम ने किताब पढ़ी।

RULES BASED Adjective – Adverb



REORDERING •
Adjective/Adverb after Noun/Verb
वह बुद्धिमान लड़का है। • वह धीरे-धीरे चल रहा है।
• transformed to
• • transformed to
वह लड़का बुद्धिमान है।
Indian Sign Language • वह चल रहा है धीरे-धीरे।

Secondary Course Negative sentences


• मैं स्कूल नहीं जाऊँगा।
(by National Institute of • transformed to
• मैं स्कूल जाऊँगा नहीं।
Open Schooling)
https://www.nios.ac.in/me Interrogative sentences
dia/documents/230-ISL/Ind • तुम कहाँ जा रहे हो?
• transformed to
ian-Sign-Language-230.pdf • तुम जा रहे हो कहाँ?
STOP-WORD REMOVAL LIST (STILL BEING
UPDATED)
Certain words, such as auxiliary verbs, conjunctions, and postpositions,
have to be removed since they do not contribute to ISL grammar.
• हो • रही • मे • जायेंगे • हुई • इनकी • जिन
• ये • एक • को • जरा • आदि • तिन्हें • तिस
• हूं • मगर • तहत • अभी • पे • तिन्हों • तिन
• होता • यदि • दुबारा • एवं • भी • इत्यादि • एस
• है • अथवा • तब • कर • कम • जितना • सो
• हैं • तक • दोनों • दिया • हुये • वग़ैरह • अत
• था • जबकि • प्रत्येक • बनी • प्रति • तिसे
• थे • के • ऐसा • ही • संग • काफ़ी
• गए • द्वारा • मात्र • वाले • बड़े • बाला
• रहा • लिये • समान • दो • जिससे • मानो
• पडा • बारे • तो • होती • रहती • जीधर
SYNONYM SUBSTITUTION & WORDNET'S DESIGN
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to
WordNet: An on-line lexical database. International journal of lexicography, 3(4), 235-244.

🔍 WordNet's Core Principle  Focuses on word meanings (not just alphabetical lists).

Groups synonymous words into synonym sets (synsets).


📚 Synsets – Core Building Blocks o E.g., {board, plank} (wood) vs {board, committee} (group)

 Defined by interchangeability:
🔁 Synonymy & Substitution If replacing word A with B doesn't change a sentence's truth, they are
synonyms (context-dependent).

 Synonyms must be from the same part of speech


🧩 Syntactic Category Constraint (e.g., noun ↔ noun).

 Disambiguate meanings (polysemy)


🎯 Purpose of Synsets  Identify similar meanings (synonymy)

 Word forms = columns | Meanings = rows


🧮 Lexical Matrix  Synonyms = different words in the same row
HOW SYNONYM SUBSTITUTION WORKS

Select
Identify Look it up Retrieve
suitable
target in associated
synonym,
word WordNet synsets
based on:

o 🔢 Frequency
of use
WordNet provides o 🧠 Contextual fit
structured lexical o 🔁 Avoiding
data repetition
NEXT STEPS

Finding ISL tags Video


Reading ISL for the words translation,
Dictionary in the given with fallback
sentence mechanism

You might also like