0% found this document useful (0 votes)

27 views30 pages

Report 261

The internship report details the development of an AI-powered interactive learning assistant designed to enhance classroom engagement through multimodal interactions (text, voice, visuals). The project utilizes the TinyLLaMA model optimized with Intel's OpenVINO toolkit, enabling real-time responses and efficient performance on low-resource devices. Key objectives include creating a user-friendly interface, integrating speech recognition, and ensuring the system is suitable for educational environments.

Uploaded by

suparnachandra paruchuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views30 pages

Report 261

Uploaded by

suparnachandra paruchuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

INTERNSHIP REPORT

On
AI-Powered Interactive Learning Assistant for Classrooms
By:-

NADEEM KHAN BU22CSEN0300262

P. SUPARNA CHANDRA BU22CSEN0300261

KAMATHAM KUSHAL BU22CSEN0300195

INTEL (Intel Unnati Industrial Training 2025 - Slot 2)

(Duration: 20/05/2025 to 10/07/2025)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Gandhi Institute of Technology and Management

(DEEMED TO BE A UNIVERSITY)

BENGALURU, KARNATAKA, INDIA

SESSION:2022-2026

1
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of any task
would be incomplete without the mention of the people who made it possible, whose
consistent guidance and encouragement crowned our efforts with success.

We consider it our privilege to express our gratitude to all those who guided us in
the completion of the project.

We express our gratitude to Director Prof. Basavaraj Gundappa Katageri for

having provided us with the golden opportunity to undertake this project work in
their esteemed organization.

We sincerely thank Dr.A. Vadivel, HOD, Department of Artificial Intelligence and

Data Science, Gandhi Institute of Technology and Management, Bengaluru for the
immense support given to us.

We express our gratitude to our project guide Rashmi K, Associate Professor,

Department of Computer Science and Engineering, Gandhi Institute of Technology
and Management, Bengaluru, for their support, guidance, and suggestions
throughout the project work.

Name: PARUCHURI SUPARNA CHANDRA

Registration No. BU22CSEN0300261

3
Problem statement 4:
AI-Powered Interactive Learning Assistant for Classrooms
Objective: Build a Multimodal AI assistant for classrooms to dynamically answer queries using
text, voice, and visuals while improving student engagement with personalized responses.

Prerequisites:
Familiarity with natural language processing (NLP) and multimodal AI concepts.
Knowledge of speech-to-text frameworks and computer vision techniques.
Programming skills in Python, with experience in libraries like Hugging Face Transformers and
OpenCV.

Problem Description:
Modern classrooms lack real-time, interactive tools to address diverse student needs and keep them
engaged. The objective is to create a multimodal AI assistant that:
Accepts and processes text, voice, and visual queries from students in real-time.
Provides contextual responses, including textual explanations, charts, and visual aids.
Detects disengagement or confusion using facial expression analysis and suggests interventions.

Expected Outcomes:
A multimodal AI assistant capable of answering real-time queries across various input formats.
Integration of visual aids (e.g., diagrams, charts) for better understanding.
A feature to monitor student engagement and adapt teaching methods dynamically.

Challenges Involved:
Combining multimodal inputs (text, voice, visuals) for consistent, context-aware
responses.Ensuring low-latency processing to maintain real-time interactions.Handling diverse
accents, noisy environments, and variations in facial expressions.

4
Contents

Title Page No.

Internship 1

Certificate 2

Acknowledgement 3

Problem Statement 4

Table of contents 5

Abstract 6

1. INTRODUCTION 7

2. OVERVIEW 8

3. OBJECTIVES 9

4. INSTALLATION STEPS 10-11

5. RAW MODEL TESTING 12-14

6. TINYLLAMA-1.1B-CHAT-V1.0 TO ONNX 15-18

7. OPENVINO MODEL OPTIMIZATION 19-21

8. STUDY BUDDY-AI ASSISTANT 22-26

9. RESULTS 27-28
10. CONCLUSION 29
11. REFERENCES 30

5
Abstract

This project presents the design and implementation of an AI-powered interactive assistant that
combines the efficiency of the TinyLLaMA language model with Intel’s OpenVINO toolkit and a
Gradio-based user interface. The goal is to create a lightweight, responsive, and accessible chatbot
system that supports both text and voice input while maintaining fast and accurate inference
capabilities on edge devices.

TinyLLaMA, a compact version of Meta’s LLaMA model, is chosen for its minimal computational
footprint and ability to perform various natural language understanding tasks. To improve
performance, the model is converted into OpenVINO’s Intermediate Representation (IR) format
(.xml and .bin), allowing for optimized execution on CPUs and integrated GPUs. This setup
significantly reduces latency and enables smooth deployment even in resource-constrained
environments.

For user interaction, the project employs Gradio, a web UI library that simplifies the creation of
intuitive and interactive interfaces. The chatbot interface features a clean dark theme, supports
real-time input via text and speech, and includes buttons for improved usability. Speech input is
handled using the SpeechRecognition library, which transcribes microphone audio using Google’s
speech-to-text engine. Optional components like PyAudio and pipwin are integrated to ensure
compatibility across operating systems, particularly Windows.

The result is a modular, extensible system that demonstrates the potential of deploying optimized
LLMs locally. This assistant can be used in various domains such as education, virtual support,
and intelligent tutoring. It showcases how lightweight model architectures, when combined with
inference optimization and user-centric design, can make conversational AI more practical,
scalable, and widely accessible.

6
INTRODUCTION

In the evolving landscape of Artificial Intelligence (AI), natural language processing (NLP) has
become a cornerstone technology powering virtual assistants, chatbots, and intelligent tutoring
systems. Large Language Models (LLMs) such as GPT, BERT, and LLaMA have demonstrated
remarkable capabilities in generating coherent, context-aware responses across a wide range of
domains. However, deploying these models on resource-constrained devices poses significant
challenges due to their size, latency, and memory requirements.

To address this gap, the present project explores the deployment of TinyLLaMA—a lightweight
and efficient variant of Meta’s LLaMA model—optimized using Intel’s OpenVINO toolkit. The
project focuses on building a complete AI assistant that not only performs real-time inference on
standard CPUs but also supports multimodal user interaction through a sleek and accessible
interface. The assistant allows users to communicate via both text and voice input, with the latter
transcribed using Google’s SpeechRecognition API.

The primary motivation for this project lies in the need to democratize access to AI by making
intelligent assistants functional on low-resource platforms, including local desktops, embedded
systems, and offline educational tools. By converting the TinyLLaMA model into OpenVINO's
Intermediate Representation (IR) format (.xml and .bin), the model becomes highly optimized for
CPU and GPU inference, drastically improving its responsiveness and efficiency.

This project, undertaken as part of a formal internship program, bridges the theoretical knowledge
of AI systems with their practical deployment in real-world scenarios. It emphasizes system
integration, performance optimization, and user-centric design principles. By the end of this
project, a fully functional, scalable, and responsive AI assistant was developed, demonstrating the
real-world potential of deploying optimized language models for everyday applications.

7
Overview

This project demonstrates the development of an AI-powered interactive assistant that integrates
lightweight natural language processing capabilities with real-time performance and user-friendly
interaction. The system leverages the TinyLLaMA language model—an efficient and compact
alternative to conventional large language models—and deploys it using Intel’s OpenVINO toolkit
to ensure high-speed inference on standard computing hardware, particularly CPUs. The goal was
to build a responsive, voice-enabled conversational system that maintains low latency and reduced
computational overhead while delivering accurate and coherent language responses.

The assistant supports two modes of user input: text-based and speech-based. The text input is
tokenized and processed directly using the Hugging Face Transformers library, while the speech
input is captured via a microphone and transcribed into text using the Google SpeechRecognition
API. Once the input is processed, it is passed to the OpenVINO-optimized TinyLLaMA model for
inference. The model returns a response, which is then decoded and displayed on a web-based user
interface developed using Gradio.

The Gradio interface provides an intuitive and modern user experience, featuring a dark-themed
layout, microphone support, dynamic output rendering, and interactive buttons. This interface
allows for quick prototyping and deployment of AI applications with minimum setup and
maximum accessibility. All components, including model inference and audio processing, are
executed locally (except voice transcription), ensuring enhanced data privacy and low reliance on
external resources.

Overall, the system serves as a blueprint for building real-time AI assistants that are optimized for
deployment in resource-constrained environments such as educational institutions, offline
customer support systems, and embedded AI applications. The integration of model optimization,
voice input, and user-centered design reflects a practical application of current AI technologies
and demonstrates the feasibility of deploying conversational models

8
Objectives

The primary objective of this project is to design and implement a compact, efficient, and
interactive AI assistant that leverages modern language modeling techniques while being
optimized for real-time performance on low-resource systems. The project focuses on deploying
the TinyLLaMA language model using Intel’s OpenVINO toolkit and integrating it into a user-
friendly web interface with multimodal input support. The detailed objectives of the project are as
follows:

1. To deploy a lightweight large language model (TinyLLaMA) for efficient and coherent
natural language understanding and generation.
2. To optimize the TinyLLaMA model using the OpenVINO toolkit, converting it into the
Intermediate Representation (IR) format (.xml and .bin) for enhanced performance and
reduced latency on CPU and integrated GPU hardware.
3. To build a sleek and interactive user interface using the Gradio framework that supports
both text and speech-based input for seamless human–AI communication.
4. To integrate voice input capability using Google’s SpeechRecognition library, enabling
real-time transcription of user speech into text that can be processed by the language model.
5. To maintain low system resource usage, making the application suitable for deployment in
environments with limited computational capabilities, including educational devices,
personal laptops, and local intranet servers.
6. To provide a hands-on implementation experience in model deployment, inference
optimization, natural language processing, and front-end development through open-
source technologies.
7. To simulate a real-world application scenario such as a digital tutor, personal assistant, or
customer service bot, thereby showcasing the practical viability of deploying
conversational AI on edge devices.

These objectives were set to ensure that the assistant not only meets functional expectations but
also adheres to performance, usability, and scalability standards required for practical AI

9
Installation Steps

To ensure a smooth and successful setup of the AI-powered interactive assistant, the following
installation steps must be followed. These steps cover the environment setup, package installation,
and model download process.

1. Prerequisites:

• Python version: 3.8 to 3.11

• pip package manager
• Internet connection (for first-time model download)

2. Create a Virtual Environment (Optional but recommended):

On Windows:

python -m venv llama_env

llama_env\Scripts\activate

On Linux/macOS:

python3 -m venv llama_env

source llama_env/bin/activate

3. Install Required Python Packages:

pip install transformers openvino gradio SpeechRecognition numpy

4. (Windows only) Install pipwin and PyAudio for voice input support:

pip install pipwin

pipwin install pyaudio

10
5. Download the TinyLLaMA Model and Tokenizer:

Run the following Python script to download and cache the model and tokenizer locally:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
cache_dir = "C:/Users/KHADEER KHAN/OneDrive/Documents/lama"
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir=cache_dir)
model = AutoModelForCausalLM.from_pretrained(model_id, cache_dir=cache_dir)
print(" TinyLLaMA model and tokenizer downloaded successfully.")

6. Launch the Application:

Create and run your main script (e.g., app.py) that loads the model, sets up the Gradio interface,
and starts the assistan

11
Raw model testing
5.1 System Requirements
* Python 3.8+
* PyTorch (CPU version)
* Transformers (HuggingFace)
* Tkinter (comes pre-installed with Python)
* TinyLLaMA model downloaded via HuggingFace or offline cache

5.2Model Location:
The model is loaded from:
C:/Users/KHADEER KHAN/OneDrive/Documents/lama

5.3 Model Loading

The TinyLLaMA 1.1B Chat model and tokenizer are loaded from HuggingFace's model hub
with cache stored locally.
The model is switched to evaluation mode using model.eval() to disable dropout and other
training behaviors for inference.

5.4 Code:
python
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir=cache_dir)
model = AutoModelForCausalLM.from_pretrained(model_id, cache_dir=cache_dir)
model.eval()

5.5 GUI Construction using Tkinter

A window titled “TinyLLaMA Chat (PyTorch CPU)” is initialized using Tkinter.

Contains:
An Entry box for user input.
An “Ask” button to trigger inference.
A ScrolledText widget to display both user

12
queries and AI responses.

GUI Widgets Used:

tk.Tk()
tk.Entry()
tk.Button()
scrolledtext.ScrolledText()
messagebox for warnings

5.6 Interaction Logic (ask\_question function)

This function handles the main logic of:
Accepting user input
Formatting the input prompt
Performing inference with TinyLLaMA
Decoding and extracting the AI response
Displaying the output with response time and token generation speed
Key Actions:
Input Prompt Template:
python
prompt = f"<|user|>\n{question}\n<|assistant|>\n"

Inference:
python
outputs = model.generate(
**inputs,
max_new_tokens=150,
do_sample=True,
temperature=0.7,
top_k=40,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id

13
)
Post-processing:
* Token decoding using tokenizer.decode
* Token counting using tokenizer.encode
* Splitting the response if it contains the original question
* Display of elapsed time and token speed

5.7 Performance Metrics Displayed

For every response, the app shows:
Time taken for inference (in seconds)
Generation speed (tokens/sec)
Total tokens generated
This is helpful to understand efficiency on CPU.
Example output:
Time: 2.34s
Speed: 32.1 tokens/s
Tokens: 75

5.8 Behavior Characteristics

Uses top-k and top-p sampling for more diverse outputs.
Caps generation at 150 tokens.
Responds to any input prompt (including statements, not just questions).
Minimal error handling (only for empty input).
5.9 Improvements Possible

Add streaming output (like character-by-character typing effect)

Add multi-turn chat history (currently it's stateless)
Add support for audio input via speech recognition
Add light/dark mode toggle
Run on GPU if available (currently CPU-only)
Add save/export chat history

14
6. Conversion of TinyLLaMA-1.1B-Chat-v1.0 to ONNX
Format Using HuggingFace Optimum
6.1 Goal

The objective of this procedure was to convert the PyTorch-based TinyLLaMA-1.1B-Chat-

v1.0 model into ONNX (Open Neural Network Exchange) format using the
optimum.exporters.onnx utility provided by HuggingFace Optimum. This conversion
includes support for past key values (KV caching), which enables faster autoregressive
inference, especially when generating long sequences. The final ONNX model is intended
for deployment with inference engines such as OpenVINO or ONNX Runtime, focusing on
efficient CPU usage.

6.2 Command Used

The conversion was performed using the following command executed in a Windows
command line environment:

python -m optimum.exporters.onnx ^
--model TinyLlama/TinyLlama-1.1B-Chat-v1.0 ^
--task text-generation-with-past ^
--device cpu ^
--cache_dir "C:\Users\KHADEER KHAN\OneDrive\Documents\lama" ^
"C:\Users\KHADEER KHAN\OneDrive\Documents\lama\tinyllama_onnx_past"

6.3 Argument Explanation

Argument Purpose
Launches the ONNX export utility from HuggingFace
-m optimum.exporters.onnx
Optimum
--model Specifies the pretrained model ID from HuggingFace Hub
--task text-generation-with-
Enables KV caching for fast autoregressive decoding
past
--device cpu Defines the target device for export (CPU in this case)
--cache_dir Local directory to store downloaded model/tokenizer files

15
Argument Purpose
final output path Directory path where ONNX model files will be saved

6.4 Files Cached During Export

During the conversion process, the following files were downloaded and cached in the
specified directory. These are essential for tokenization and consistent behavior across
inference sessions:

• tokenizer_config.json
• tokenizer.model
• tokenizer.json
• special_tokens_map.json
• config.json

Caching ensures offline functionality and reproducibility of model behavior.

6.5 Warnings Observed

a) Symlink Warning on Windows

UserWarning: huggingface_hub cache-system uses symlinks by default...

Explanation: Windows systems without Developer Mode or administrator privileges do not

support symbolic links. Therefore, the system defaults to copying files instead, which may
lead to increased disk usage.

Resolution: Enable Developer Mode in Windows or run the Python interpreter with
administrative privileges.

b) TracerWarnings During ONNX Tracing

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be

incorrect.

16
Explanation: These warnings occur commonly in transformer-based models that include
dynamic conditionals. They indicate potential limitations in generalizing the exported
computation graph.

Resolution: These warnings can typically be ignored unless the application requires handling
dynamic input shapes.

c) Missing Accelerate for Weight Deduplication

Warning: Weight deduplication check requires accelerate.

Explanation: The exporter could not check and remove duplicate model weights, leading to
a potentially larger ONNX model file.

Resolution: Install the accelerate library using pip install accelerate before running the
export process.

6.6 Output Directory

The exported ONNX model and associated files were saved in the following directory:

C:\Users\KHADEER KHAN\OneDrive\Documents\lama\tinyllama_onnx_past

The expected contents of this directory include:

• tinyllama_fp16.onnx (or similar filename)

• configuration.json
• Additional tokenizer or model metadata files

6.7 Purpose of text-generation-with-past

The use of the task flag text-generation-with-past is critical during model export. This option
enables the model to use cached attention key and value tensors (past_key_values), which
results in significant inference speed improvements during multi-token generation. Benefits
of this configuration include:

• Reduced latency per generated token

• Improved throughput during decoding loops

17
• Compatibility with optimized engines such as OpenVINO

Summary Table

Item Description
Model Exported TinyLLaMA-1.1B-Chat-v1.0
Format ONNX (task: text-generation-with-past)
Target Device CPU
Cache Used Yes (local directory specified)
Warnings Symlink fallback, TracerWarnings, Accelerate not installed
Output Location tinyllama_onnx_past
Result Model successfully exported to ONNX

6.8 Next Steps

• Evaluate the ONNX model’s inference quality using OpenVINO or ONNX Runtime.
• Optionally optimize the exported model further using utilities like onnx-simplifier
or openvino-optimize.
• Enable Developer Mode in Windows for symlink support, reducing disk usage.
• Install accelerate to ensure weight deduplication during future exports.
• Perform performance benchmarking (e.g., latency, memory use) to compare
PyTorch, ONNX, and IR models.

18
7. OpenVINO Model Optimization step using the OpenVINO
Model Optimizer (OVC):

7.1 Introduction

This section explains the process of converting the TinyLLaMA-1.1B-Chat-v1.0 ONNX

model into OpenVINO’s Intermediate Representation (IR) format using the OpenVINO
Model Optimizer (OVC). The conversion is aimed at enabling fast, efficient, and hardware-
accelerated inference using Intel CPUs, integrated GPUs, and edge devices. This process
includes compressing the model to FP16 precision to reduce size and improve inference
performance.

7.2 Goal

The primary goal is to convert the exported ONNX model of TinyLLaMA into OpenVINO's
IR format using FP16 compression. This allows optimized inference on Intel hardware and
ensures compatibility with the OpenVINO runtime environment.

7.3 Command Used

The following command was executed to perform the conversion:

python -m openvino.tools.ovc ^
"C:\Users\KHADEER
KHAN\OneDrive\Documents\lama\tinyllama_onnx_past\model.onnx" ^
--compress_to_fp16 ^
--output_model "C:\Users\KHADEER
KHAN\OneDrive\Documents\lama\tinyllama_ir_fp16\tinyllama_fp16.xml"

7.4 Explanation of Arguments

Argument Explanation

python -m openvino.tools.ovc Executes the Model Optimizer from OpenVINO Toolkit

"model.onnx" Path to the ONNX model exported via optimum

19
Argument Explanation

Compresses weights from FP32 to FP16 for faster and smaller

--compress_to_fp16
inference

--output_model Specifies the output IR filename (.xml) and directory (generates

"tinyllama_fp16.xml" .bin too)

7.5 What Is OVC?

The OpenVINO Model Optimizer (OVC) is a conversion tool within the OpenVINO
Toolkit. It transforms deep learning models from frameworks such as TensorFlow, PyTorch
(via ONNX), and others into the IR format used by the OpenVINO Runtime. IR models are
lightweight, hardware-agnostic, and ideal for edge deployment.

7.6 Benefits of FP16 Compression

Compressing the model from FP32 to FP16 brings the following benefits:

• Reduces the model’s storage size by approximately 50%

• Accelerates inference performance on Intel hardware with FP16 support
• Maintains nearly the same output accuracy for text generation tasks

7.7 Source and Target Format Summary

Source Format Target Format Compression Framework

ONNX (model.onnx) IR (.xml + .bin) FP32 → FP16 OpenVINO Runtime

7.8 Sample Python Code to Load IR Model

You can run the converted IR model using the following OpenVINO Python API:

from openvino.runtime import Core

core = Core()

model = core.compile_model("tinyllama_fp16.xml", "CPU")

20
Display input names and shapes

for input in model.inputs:

print(f"Input: {input.get_any_name()} | Shape: {input.shape}")

7.9 Troubleshooting and Recommendations

Issue Solution

OVC not found Ensure openvino-dev is installed: pip install openvino-dev

Input shape mismatch Add --input_shape argument to define specific input tensor shapes

Conversion takes time Add --silent to suppress logs and speed up the process

7.10 Use Case

Once optimized, the IR model can be used in:

• Real-time AI assistants on desktops or edge devices

• Embedded systems requiring fast NLP inference
• Cloud or local applications with OpenVINO integration

21
8. Study Buddy – AI Assistant Using TinyLLaMA, OpenVINO,
Gradio, and Speech Recognition
8.1 Introduction

The “Study Buddy” is a lightweight, locally-hosted AI assistant designed to serve as an

interactive learning companion. This chatbot leverages the TinyLLaMA model optimized
with Intel OpenVINO for efficient inference, supports both text and speech input, and
delivers responses through a sleek web interface built using Gradio. It is designed to operate
entirely offline after setup, enabling secure and accessible AI-assisted learning experiences
on low-resource systems.

8.2 Code Architecture and Functional Breakdown

8.2.1. Importing Required Libraries

The project imports various essential Python libraries:

• numpy: For tensor and array manipulations.

• openvino.runtime.Core: For compiling and executing the optimized OpenVINO IR
model.
• transformers.AutoTokenizer: For tokenizing user input and decoding model output.
• gradio: For building the graphical user interface.
• speech_recognition: For converting recorded audio into text using Google Speech
Recognition.
• threading: For interrupting model generation mid-process via a stop button
mechanism.

8.2.2. Loading the OpenVINO Model

An OpenVINO Core instance loads the TinyLLaMA-1.1B IR model. The compiled model
consists of an XML file (architecture) and a BIN file (weights), both generated from ONNX.
It is loaded specifically to run on the CPU backend.

8.2.3. Tokenizer Setup

22
The tokenizer, crucial for encoding user inputs and decoding model outputs, is initialized
from a local directory containing the TinyLLaMA tokenizer artifacts (e.g., tokenizer.json,
tokenizer_config.json).

8.2.4. Transformer Layer Detection

A utility function dynamically detects the number of transformer layers by probing for key-
value cache tensors until an exception is raised. This ensures that the inference logic
correctly handles all layers during generation.

8.2.5. Initializing Key Value Caches

Zero-filled numpy arrays are created for each layer’s past_key_values to avoid recomputing
attention scores and enable faster token generation.

8.2.6. Clean Output Decoding

A decode function is used to clean up the output string by removing any non-natural
language characters such as asterisks or placeholder symbols.

8.2.7. Prompt Formatting and Punctuation Handling

User messages are formatted with labels like “### Human:” and “### Assistant:”, followed
by automatic punctuation correction (appending a period if none is present), which improves
the model’s contextual understanding.

8.2.8. Stop Flag Mechanism

A threading.Event object is initialized to serve as a signal that can interrupt response

generation if the user presses a “Stop” button.

8.2.9. Autoregressive Text Generation (Streaming)

This generator function handles one-token-at-a-time output:

• Prompts are tokenized and sent to the model.

• The model generates logits for the next token.
• The next token is selected using argmax decoding.

23
• Decoded text is progressively returned via yield, enabling real-time streaming.
• Past key values are reused and updated to improve inference speed.
• Generation stops upon hitting an end-of-sentence token or stop signal.

8.2.10. Audio Transcription Support

Users can upload audio input in .wav format. The audio is processed through the
SpeechRecognition module, and the transcribed text is inserted into the input field
automatically.

8.2.11. Custom CSS Styling

A modern, dark-themed UI is implemented via inline CSS. It modifies the font, background
color, and UI components for an aesthetic and functional interface.

8.2.12. Gradio-Based Interface Design

The interface is built using Gradio Blocks and includes:

• Header with title and subheading

• Textbox for user input
• File upload for audio input
• Four buttons: Send, Stop, Clear, and Transcribe Audio
• Live chat display area for user and AI messages

8.2.13. Interactive Handlers

Gradio buttons are connected to corresponding Python functions:

• handle_chat(): Starts token generation and updates the chat log.

• stop_response(): Triggers the stop flag.
• clear_history(): Resets the chat log.
• transcribe_audio(): Converts audio to text and populates the input field.

24
8.3 Functional Overview

The chatbot operates in real time and can switch between text and voice inputs. Responses
are streamed dynamically and can be interrupted at any time. The UI is minimalistic yet
efficient, allowing intuitive use for educational purposes.

8.4 Input/Output Behavior

Input:

• Text entered manually by the user

• Voice input via uploaded audio files (.wav format)

Output:

• Assistant-generated text responses

• Responses are streamed word-by-word to simulate natural dialogue

Intermediate behavior:

• Streaming updates on the interface

• Optional user interruption of generation

8.5 Key Features

• TinyLLaMA model optimized with OpenVINO for local CPU inference

• Real-time token-by-token generation
• Audio-to-text input support via Google Speech API
• Sleek dark UI styled using custom CSS
• Stop button to halt long responses
• Compatible with low-resource systems

8.6 Technical Stack Summary

Component Description

Model TinyLLaMA-1.1B-Chat-v1.0 (converted to IR)

25
Component Description

Inference Engine Intel OpenVINO Runtime

Tokenizer HuggingFace Transformers (AutoTokenizer)

UI Framework Gradio (Blocks layout)

Audio Transcription SpeechRecognition (Google API)

Deployment Target CPU (desktop or low-power local devices)

8.7 Recommendations for Future Work

• Add support for persistent chat history (e.g., saving to JSON or SQLite)
• Integrate text-to-speech (TTS) for vocal responses
• Enable multilingual capabilities with Whisper or Vosk
• Wrap the app as a standalone desktop executable using PyInstaller
• Extend memory via sliding-window context or stateful conversation

26
RESULT:

27
28
Conclusion:

The development and deployment of the TinyLLaMA-powered AI Assistant using

OpenVINO and Gradio has demonstrated the feasibility and efficiency of running large
language models (LLMs) on edge devices and CPU-only environments. By converting the
original PyTorch-based TinyLLaMA-1.1B-Chat-v1.0 model into the ONNX format and
further optimizing it into the OpenVINO Intermediate Representation (IR), the project
achieved significant improvements in inference speed and resource efficiency.

Through the integration of modern tools such as HuggingFace Transformers for

tokenization, OpenVINO for hardware-accelerated inference, Gradio for an intuitive user
interface, and SpeechRecognition for audio input handling, the system provides a robust and
user-friendly chatbot experience. It supports both text and speech-based queries, processes
them with low latency, and streams real-time responses, all while maintaining a lightweight
deployment footprint.5

The successful conversion of the model, along with the seamless integration of modular
components, validates the project's objective of building a responsive, offline-capable AI
assistant. Moreover, the platform's architecture is designed with extensibility in mind,
enabling future upgrades such as multilingual support, TTS (Text-to-Speech) integration,
and further quantization or model distillation.

In conclusion, this project represents a practical and scalable approach to deploying efficient
AI systems in constrained environments. It provides a valuable learning experience in model
conversion, performance benchmarking, UI development, and real-world deployment
challenges. The project not only meets its technical goals but also lays a solid foundation for
future enhancements in the field of lightweight AI assistants.

29
References

1. Meta AI. (2023). TinyLLaMA-1.1B-Chat-v1.0.

https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
2. Intel OpenVINO Toolkit. https://docs.openvino.ai/
3. HuggingFace Transformers. https://huggingface.co/docs/transformers/index
4. Gradio UI Library. https://www.gradio.app
5. Python SpeechRecognition Library. https://pypi.org/project/SpeechRecognition/
6. ONNX (Open Neural Network Exchange). https://onnx.ai/
7. PyTorch Documentation. https://pytorch.org/docs/stable/index.html

2 Half With New Page Numbers
No ratings yet
2 Half With New Page Numbers
24 pages
Innoventure - Internship - FS
No ratings yet
Innoventure - Internship - FS
23 pages
Project
No ratings yet
Project
16 pages
Innoventure - Internship - FS
No ratings yet
Innoventure - Internship - FS
23 pages
Understanding
No ratings yet
Understanding
1 page
AI Assistant
No ratings yet
AI Assistant
92 pages
Research Paper 2
No ratings yet
Research Paper 2
6 pages
Reasechpaperon LLM
No ratings yet
Reasechpaperon LLM
25 pages
Micro Project STE 28 29
No ratings yet
Micro Project STE 28 29
20 pages
AI Voice Assistants with LLMs
No ratings yet
AI Voice Assistants with LLMs
25 pages
Chicken Curry Making
No ratings yet
Chicken Curry Making
25 pages
Building and AI Chatbot Using LLM
No ratings yet
Building and AI Chatbot Using LLM
69 pages
Interacive Learning Asistant For Computer Science Courses - 122635
No ratings yet
Interacive Learning Asistant For Computer Science Courses - 122635
11 pages
23stuchh010261 Report 2
No ratings yet
23stuchh010261 Report 2
33 pages
AI Teaching Assistant BOT for Large Classes
No ratings yet
AI Teaching Assistant BOT for Large Classes
33 pages
ML - LA Literature Survey
No ratings yet
ML - LA Literature Survey
12 pages
Project Report
No ratings yet
Project Report
57 pages
Ankit - Mrinalkar - Aryan - Amandeep (Minor Report)
No ratings yet
Ankit - Mrinalkar - Aryan - Amandeep (Minor Report)
19 pages
Educational Virtual Assistant Using LLMs
No ratings yet
Educational Virtual Assistant Using LLMs
7 pages
Buddybot: Ai Powered Chatbot For Enhancing English Language Learning
No ratings yet
Buddybot: Ai Powered Chatbot For Enhancing English Language Learning
6 pages
Virtual Teaching Chatbot
No ratings yet
Virtual Teaching Chatbot
3 pages
AI Assistant E.D.I.T.H: Project Overview
No ratings yet
AI Assistant E.D.I.T.H: Project Overview
22 pages
Koushik Final Project
No ratings yet
Koushik Final Project
37 pages
Voice Based System Assistant Using NLP and Deep Learning-1
No ratings yet
Voice Based System Assistant Using NLP and Deep Learning-1
82 pages
Presentation of Project
No ratings yet
Presentation of Project
13 pages
AI Chatbot with Voice Assistant Project
No ratings yet
AI Chatbot with Voice Assistant Project
31 pages
Phase One Report - Sample For Track 2 (Fullstack)
No ratings yet
Phase One Report - Sample For Track 2 (Fullstack)
11 pages
Final 18
No ratings yet
Final 18
25 pages
Responsible AI in Educational Chatbots
No ratings yet
Responsible AI in Educational Chatbots
43 pages
Presentation 2 K
No ratings yet
Presentation 2 K
12 pages
Analysis of Student-LLM Interaction in A Software Engineering Project
No ratings yet
Analysis of Student-LLM Interaction in A Software Engineering Project
8 pages
B8 Synopsis
No ratings yet
B8 Synopsis
9 pages
Ai Assistant ppt1
No ratings yet
Ai Assistant ppt1
11 pages
Project
No ratings yet
Project
4 pages
Himaanshu Sharma Synopsis
No ratings yet
Himaanshu Sharma Synopsis
4 pages
Mini Project - Documentation
No ratings yet
Mini Project - Documentation
40 pages
A Natural Language Processing Based Intelligent Bot Application
No ratings yet
A Natural Language Processing Based Intelligent Bot Application
6 pages
CD CS111ggggggggggggggggg
No ratings yet
CD CS111ggggggggggggggggg
54 pages
Black Book For Multimedia ChatBot
No ratings yet
Black Book For Multimedia ChatBot
60 pages
REPORT
No ratings yet
REPORT
22 pages
Mithun
No ratings yet
Mithun
30 pages
Team Grace (AI As A Technical Asistant)
No ratings yet
Team Grace (AI As A Technical Asistant)
2 pages
2025 PROJECT Dissertation Outline
No ratings yet
2025 PROJECT Dissertation Outline
6 pages
College Enquiry Chat Bot
100% (2)
College Enquiry Chat Bot
47 pages
Final Report (PRINT)
No ratings yet
Final Report (PRINT)
87 pages
22.A Custom Generative AI Chatbot As A Course Resource
No ratings yet
22.A Custom Generative AI Chatbot As A Course Resource
13 pages
Udaylokhande
No ratings yet
Udaylokhande
8 pages
Himaanshu Sharma - Major Project Mid-Term Report
No ratings yet
Himaanshu Sharma - Major Project Mid-Term Report
27 pages
AI Chatbot with NLP for College Queries
No ratings yet
AI Chatbot with NLP for College Queries
21 pages
Phase 3 Building Personalised
No ratings yet
Phase 3 Building Personalised
7 pages
Introduction To Docs and Image Based Voice Chatbots
No ratings yet
Introduction To Docs and Image Based Voice Chatbots
17 pages
Miniproject Synopsis
No ratings yet
Miniproject Synopsis
7 pages
Finale PPT Batch 1
No ratings yet
Finale PPT Batch 1
25 pages
Synopsis Completed Final
No ratings yet
Synopsis Completed Final
7 pages
IQP ZurichA24 SpecGenAI
No ratings yet
IQP ZurichA24 SpecGenAI
91 pages
Towards Next-Generation Intelligent Assistants Leveraging LLM Techniques
No ratings yet
Towards Next-Generation Intelligent Assistants Leveraging LLM Techniques
2 pages
Unit Ii Nosql Data Management
No ratings yet
Unit Ii Nosql Data Management
26 pages
Comprehensive Analysis of Deepfake Detection Models
No ratings yet
Comprehensive Analysis of Deepfake Detection Models
6 pages
CNN-LSTM Model For Deepfake Image Detection
No ratings yet
CNN-LSTM Model For Deepfake Image Detection
6 pages
Faceswap Finder A Fusion-Based Deepfake Detection Technique
No ratings yet
Faceswap Finder A Fusion-Based Deepfake Detection Technique
6 pages
Implementation of Convolutional Neural Networks For Lung Cancer Detection From CT Scans
No ratings yet
Implementation of Convolutional Neural Networks For Lung Cancer Detection From CT Scans
18 pages
Advanced Deepfake Detection and Classification Using Deep Learning
No ratings yet
Advanced Deepfake Detection and Classification Using Deep Learning
6 pages
Deep Fake Image Detection Using FaceForensics++
No ratings yet
Deep Fake Image Detection Using FaceForensics++
9 pages
Module 1
No ratings yet
Module 1
71 pages
Module 4
No ratings yet
Module 4
91 pages
Introduction To Algorithms: K. Sudarshana
No ratings yet
Introduction To Algorithms: K. Sudarshana
84 pages
Local Search Algorithms in Python
No ratings yet
Local Search Algorithms in Python
27 pages
202 01-Tran
No ratings yet
202 01-Tran
5 pages
DLP French Tip Design Thursday Practicum
No ratings yet
DLP French Tip Design Thursday Practicum
7 pages
Bhupender Khutela - Resume
No ratings yet
Bhupender Khutela - Resume
1 page
2021 VC Sacro Katha
No ratings yet
2021 VC Sacro Katha
19 pages
D.school's Facilitator's Guide To Leading Re.d The G.G. Exp PDF
No ratings yet
D.school's Facilitator's Guide To Leading Re.d The G.G. Exp PDF
15 pages
Airlive Mfp-101u U
No ratings yet
Airlive Mfp-101u U
113 pages
Phonology and Morphology
No ratings yet
Phonology and Morphology
10 pages
Intermediate Accounting 1st Edition Gordon Test Bank Updated 2025
100% (6)
Intermediate Accounting 1st Edition Gordon Test Bank Updated 2025
305 pages
Cbjeitpu 10
No ratings yet
Cbjeitpu 10
6 pages
TERI University Placement Brochure 2017
No ratings yet
TERI University Placement Brochure 2017
40 pages
Math Worksheet Frog
No ratings yet
Math Worksheet Frog
2 pages
How To Write A Literature Review For Software Development Project
100% (1)
How To Write A Literature Review For Software Development Project
8 pages
Teaching Philosophy 1
100% (2)
Teaching Philosophy 1
2 pages
Math Lab and Math Corner
No ratings yet
Math Lab and Math Corner
1 page
Mataverse Potential in Canada
No ratings yet
Mataverse Potential in Canada
48 pages
Licensure Exam Room Assignments
100% (2)
Licensure Exam Room Assignments
62 pages
Shivansh Tiwari Admit Card Upp
No ratings yet
Shivansh Tiwari Admit Card Upp
1 page
Moral Deliberation
No ratings yet
Moral Deliberation
12 pages
My Philosophy of Teaching and Learning
No ratings yet
My Philosophy of Teaching and Learning
5 pages
Growing With KIBO Standards Alignment ISTE
No ratings yet
Growing With KIBO Standards Alignment ISTE
8 pages
Deployment Plan Guide
No ratings yet
Deployment Plan Guide
4 pages
Methods of Acquiring Knowledge
No ratings yet
Methods of Acquiring Knowledge
37 pages
December Gasetan Guihan Newsletter 2019
No ratings yet
December Gasetan Guihan Newsletter 2019
4 pages
Hydraulics Homework Help for Students
100% (1)
Hydraulics Homework Help for Students
5 pages
Portfolio Assessment Rubrics Guide
No ratings yet
Portfolio Assessment Rubrics Guide
6 pages
Evolutionary Biology MCQs
No ratings yet
Evolutionary Biology MCQs
3 pages
True/False Bullying Quiz
No ratings yet
True/False Bullying Quiz
1 page
Weekend Activities Worksheet for ESL
No ratings yet
Weekend Activities Worksheet for ESL
7 pages
Đề 548 (Giảng Viên)
No ratings yet
Đề 548 (Giảng Viên)
2 pages
Weekly Encounters Reflection Summary
No ratings yet
Weekly Encounters Reflection Summary
8 pages