0% found this document useful (0 votes)
28 views2 pages

Interact With Document Using Gen AI

This document discusses the development of an application that utilizes Gen AI to interact with lengthy documents, enabling users to ask questions and receive summaries and FAQs efficiently. It outlines the architecture using AWS services, including API Gateway, ECS, and vector databases, while emphasizing the importance of maintaining context and security. The article also addresses challenges related to LLMs' context window size and offers strategies for improving user experience through caching and asynchronous operations.

Uploaded by

Iron man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views2 pages

Interact With Document Using Gen AI

This document discusses the development of an application that utilizes Gen AI to interact with lengthy documents, enabling users to ask questions and receive summaries and FAQs efficiently. It outlines the architecture using AWS services, including API Gateway, ECS, and vector databases, while emphasizing the importance of maintaining context and security. The article also addresses challenges related to LLMs' context window size and offers strategies for improving user experience through caching and asynchronous operations.

Uploaded by

Iron man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

Interact with document using Gen AI

Are you tired of going through long documents looking for answers to your questions
? Ever wondered if you could get a summary of the lengthy documents ? Do you ever
need to generate FAQ(s) manually or would it be productive if you could click a
button and generate FAQ(s) from a document. If you are looking for answers, this
publication shall help you get started. It will help you build a quick application
that can interact with documents in a secure manner.

Published By :
Kamal Sharma
Image
Artificial intelligence has been touching our lives in one or another in a more
profound way since the advent of cloud computing. The big tech companies have
constantly leveraged power of AI to deliver lower costs to customers by suggesting
changes in the resources in cloud. The mini movie that gets generated on our
iPhones or Android phones too is a result of an AI algorithm in the background.
However, with the launch of ChatGPT, powered by GPT, owned by OPENAI, has made a
technology - GEN AI much more accessible to everybody. Business around the world
are brainstorming on what use cases they can use the technology for.

One thing is certain that we are just at the surface here and sky is the limit as
we progress through the decade. Companies and businesses around the world, even if
they do not want to, will be compelled and forced to provide better customer
experience and services. Imagine a travel site today requires 50+ click to plan an
itinerary but with the power of GEN AI, a bot can plan everything for you with few
sets of questions. This is just an example and potential is immense.

In this article, I would like to talk about a use cases we solved to reduce the
time spent by team members to go through a document but rather giving them a tool
at hand that can be used to answer specific question they might have from the
document. Instead of going through the document and using the compute of our brain,
handing it off to LLMs on AWS Bedrock, a productivity boost was achieved.

Let's dive into the architecture.

The above architecture uses a series of services across AWS :

API Gateway
ECS - for compute and cache layer
Vector Database - pickle files on S3
Foundational models such as Claude and Titan on AWS Bedrock
Amongst the libraries and components used, we used React for the frontend, FAST API
for the back end api along with lang chain to interact with AWS bedrock.

The architecture is pretty straightforward along with the customer journey. There
are 2 user stories associated with it - 1 : Uploading and preparing the document
for interaction. 2: Interacting with the document.

Find below the steps for customer journey for use case of uploading a file and
making it ready for interaction:

A user uploads a file


File gets stored on a S3 bucket
A backend process triggers to chunk the file using lang chain.
Conversational buffer is also used to keep the customer prompts and responses.
The chunks are individually vectorized by invoking Titan LLM model
FAISS is used as a in memory vector database
The vector data is serialized as a pickle file and stored in S3.
A notification is sent to the front end that file is ready for interaction.
User starts asking questions in "Natural Language". Example : What is this document
about ?
In the background another process to generate a summary of the document and
generate FAQ(s) is also triggered in the background so that it's ready for the
user.
The customer journey for interacting with the document is stated below:

Customer asks questions in Natural language.


A back end API vectorizes the question asked by the user.
In memory vector database - FAISS loads the respective pickle file from S3 and
performs the similarity search.
Similar texts along with the prompt is sent to LLM - Claude on AWS bedrock to
provide a response.
Since we use lang chain, the conversational buffer is maintained keeping the
context of all the conversations for the user.

It's amazing how the similar search also performs well in terms of out of context
questions being asked and does the best match [Link] system can also be made
smarter by using Agents (Bedrock Agents) to communicate to other systems as well.

Why choose Pickle files instead of using a vector data base such as Weviate or open
search ?

It's a valid question and it's a matter of tradeoff. When a user is interacting
with a document, the context needs to be of document itself. Having a shared vector
database such as open search will be much more nuance in this case. A lot more work
needed to be done for Authorization controls and making sure the right context is
picked as well. However, if you are building a knowledge base of documents that
does not need AuthZ or the data in the documents is not related at all, then using
systems such as AWS Kendra or vector DBs such as Open search should not be an
issue. In fact moving to pickle files in such case would not scale and also will
not be be an ideal solution.

Hence, no one strategy fits everything.

How do we summarize or generate FAQ(s) for larger documents - 100 MB+?

It's a great question to ask as well. Unfortunately, all LLMs are restricted by the
context window size. In order to generate Summary of FAQ the entire document must
be presented to the LLM so that appropriate response can be generated. But here the
context window size is a limiting factor. Claude is little ok with 100,000 limit.
Refer here.

There are certain model specific strategies to increase the context window size can
be employed. One of the other strategies could be an architecture such as below:

However, this should not be done as a Sync call since the user might end up waiting
for ever. Above all, the API gateway has a time out of 30 seconds max so connection
might time out.

To conclude, we looked at how an application can be architected that can help


interact with documents along with vectorizing it using Titan model and keeping the
context in a secure manner. There are also strategies on how to improve customer
experience by employing caching, lang chain conversational buffers, and performing
async operations such as summarizing and generating FAQ(s).

You might also like