Python Lists:
Built in data-type
Mylist = [“apples”, 1, True]
Can store values of different data types like str integer Boolean
Are changeable : able to add, delete values
Ordered lists are those which have a specific order in which elements are ordered
Unordered means no specific order or sequence of items
Allows duplicates (mylist =[“apples”, “cherry”, “apples”])
Can also create a list using list() constructor
To change item value:
thislist = ["apple", "banana", "cherry"]
thislist[1] = "blackcurrant"
Adding items to list
o thislist.append("orange")
o thislist.insert(1, "orange")
append 2 lists:
o thislist.extend(tropical)
o list1+list2
extend a different data type to list: thislist.extend(thistuple) where thistuple is a tuple
removing items
o thislist.remove("banana")
o pop() to remove last item
o clear() to empty list contents but list still remains to exist
o del mylist to delete entire lit
o del mylist[1] to delete specific index item
make a copy of list:
o mylist.copy()
o mylist[:]
o list(mylist)
list comprehension: offers a shorter syntax to create a list based on values of an old/pre-
existing list:
can replace below code :
for x in fruits:
if "a" in x:
newlist.append(x)
print(newlist)
as so:
newlist = [x for x in fruits if "a" in x]
print(newlist)
Python Tuple
ordered
unchangeable/immutable i.e cannot add, delete item after a tuple is created but we can
convert a tuple to a list using the list() function and then perform add/update/remove
operation nd convert this list back to a tuple using tuple() function
allows duplicates
thistuple = ("apple",)
print(type(thistuple))
#NOT a tuple
thistuple = ("apple")
print(type(thistuple))
can contain different data types in one tuple
can create a tuple directly :
o mytuple = (“apple”,”banana”)
o thistuple = tuple(("apple", "banana", "cherry")) # note the double
round-brackets
unpacking a tuple:
Python Set:
unordered myset = {"apple", "banana", "cherry"}
Once a set is created, you cannot change its items, but you can remove items and add new
items.
Duplicates Not Allowed. The values True and 1 are considered the same value in sets, and are
treated as duplicates:
thisset = set(("apple", "banana", "cherry")) # note the double round-brackets
add items to set: thisset.add("orange")
add 2sets: thisset.update(tropical)
add an iterable to set: thisset.update(mylist)
to remove an item from set: thisset.remove("banana")
The union() and update() methods joins all items from both sets.
The intersection() method keeps ONLY the duplicates.
The difference() method keeps the items from the first set that are not in the other set(s).
The symmetric_difference() method keeps all items EXCEPT the duplicates.
Python dictonary:
Dictionary items are ordered, changeable, and do not allow duplicates.
Change/add values:
o thisdict.update({"key": value})
o thisdict["key"] = value
to remove item:
o thisdict.pop("model")
o del thisdict["model"]
o thisdict.clear()
Internship details. manual
task 1: comes under oil and gas sector
- Created a chatbot using ai that answers to user query based on information available in db
- Db consists of pdf files,word docs, excel files
- All file types are converted to pdf. Word to pdf using libreoffice, excel using langchain
unstructured document loaders. Pdf loaded using langchain document loaders.
- Now all files are available in pdf format.
- Gpt4o used to extract pdf content and store into langchain docs
- Embeddings created for entire extracted content using text-embedding-ada-002 model and
stored in vector store
- Use RecursiveCharacterTextSplitter to split large docs while creating embeddings
- Rag pipeline created to respond to user query
- Use conversational chain buffer memory to maintain chat history for user
- Created an api to handle all this application
- Used postman to test results
- Solving trivy scanner nd sonar cube errors
- Deployed api using docker image on azure git
task 2: supply chain management sector
- Explored azure document intelligence for extracting tables from complex excel sheets.
- Extracting information from vendor bids to understand availability of piping system material
or plant material according to our specifications.
- Analyzing vendor quotation and specification to choose the best fit
- Recommending the best fit to decision making authority
Task 3: enterprise automation project
- Part of a project that generated responses for EOI documents.
- Embeddings created for multiple sector databases like oil and gas, chemical, construction,
etc.
- Workflow like task 1
- Multiple API application
- worked on creating a custom retriever to cache results for frequent queries and use a hybrid
scoring method of Weight cosine similarity + keyword match
__________________________________________________________________________________
Task 1: AI-Powered Chatbot for Oil & Gas Sector
Designed and implemented an AI-driven chatbot capable of responding to user queries by
leveraging document-based knowledge stored in a centralized database.
Consolidated heterogeneous data sources including PDF files, Word documents, and Excel
sheets. Utilized LibreOffice for .doc to .pdf conversions and LangChain's unstructured
document loaders for Excel files to ensure standardized document formatting.
Processed all documents in PDF format using LangChain PDF loaders, enabling consistent
parsing and preprocessing.
Leveraged GPT-4o for extracting semantically rich content from PDFs and converting them
into LangChain document objects.
Created embeddings for all processed content using OpenAI’s text-embedding-ada-002
model and stored them in a high-performance vector database, enabling rapid semantic
search and retrieval.
Employed RecursiveCharacterTextSplitter to manage large document chunks effectively
during embedding generation, optimizing retrieval accuracy.
Built a Retrieval-Augmented Generation (RAG) pipeline to provide contextual, accurate
answers based on stored document knowledge.
Integrated LangChain’s ConversationalRetrievalChain with buffer memory to maintain
contextual continuity in multi-turn conversations.
Developed a FastAPI-based backend API to orchestrate the end-to-end chatbot operations,
from query intake to response generation.
Validated API functionality and performance using Postman, ensuring reliability across
various input cases.
Addressed security and quality issues by resolving vulnerabilities flagged by Trivy (container
security) and SonarQube (static code analysis).
Containerized the application and deployed it using Docker, integrating with Azure Git for
version control and CI/CD readiness.
Business Impact: Enhanced operational efficiency by providing engineers and field experts with real-
time access to domain-specific documentation and insights, significantly reducing time spent on
manual document search.
Task 2: Intelligent Document Analysis for Supply Chain Management
Explored and applied Azure Document Intelligence (formerly Form Recognizer) to extract
structured data, particularly complex tabular information from Excel-based vendor bids.
Automated the identification and extraction of key metrics such as material availability,
specifications, and compliance for piping systems and plant materials.
Conducted comparative analysis of vendor quotations using extracted insights to assist in
optimal supplier selection.
Delivered data-backed recommendations to procurement and decision-making teams,
enabling informed and faster purchasing decisions.
Business Impact: Streamlined vendor evaluation processes by automating bid analysis, leading to
reduced procurement cycle time and improved supply chain transparency.
Task 3: Enterprise Automation – Expression of Interest (EOI) Response Generation
Contributed to an enterprise-grade automation project aimed at generating responses for
EOI documents across various sectors, including Oil & Gas, Construction, and Chemicals.
Built sector-specific document embeddings using a pipeline similar to Task 1, enabling
intelligent matching and content generation for EOI requirements.
Developed and managed multiple APIs to handle modular tasks within the application such
as retrieval, ranking, and response generation.
Engineered a custom retriever capable of caching frequently asked queries, implementing a
hybrid retrieval approach that combined weighted cosine similarity with keyword matching
for improved ranking accuracy and reduced latency.
Business Impact: Automated a previously manual and time-intensive process, increasing proposal
generation speed and enabling scalable client engagements across sectors.
RAG architecture:
-RAG as “a general-purpose fine-tuning recipe” designed to integrate any large language model (LLM)
with various internal or external knowledge sources.
-RAG gives an LLM a superpower: the ability to consult an external knowledge base before crafting its
responses.
1. Question Input: The client inputs a question into the system. This initiates the process by
feeding the query into the framework.
2. Semantic Search: The framework employs semantic search techniques to query the vector
database. This search retrieves relevant contextual data based on the input question.
3. Contextual Data Utilization: The retrieved data is then used to create a prompt. This prompt
is specifically tailored to guide the LLM in generating a response that is both relevant and
informative.
4. Response Generation by LLM: The LLM processes the prompt and generates a response. The
LLM’s extensive training on vast datasets enables it to produce high-quality answers.
5. Post-Processing: The generated response undergoes post-processing to ensure clarity,
coherence, and appropriateness. This step may involve refining the language, correcting
errors, and enhancing the overall quality of the response.
6. Response Delivery: The final, polished response is delivered back to the client, providing
them with the information they sought in a clear and concise manner.
Use delta query to track changes
(Use delta query to track changes in Microsoft Graph data - Microsoft Graph | Microsoft Learn)
Hyperparamaters in llm