0% found this document useful (0 votes)
10 views6 pages

Python Lists

The document provides an overview of Python data structures including lists, tuples, sets, and dictionaries, detailing their characteristics and operations. It also outlines internship tasks related to AI chatbot development, document analysis in supply chain management, and enterprise automation projects, emphasizing the use of advanced technologies like GPT-4o and Azure Document Intelligence. Additionally, it describes the Retrieval-Augmented Generation (RAG) architecture for enhancing language model responses through external knowledge integration.

Uploaded by

justpics.tanvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Python Lists

The document provides an overview of Python data structures including lists, tuples, sets, and dictionaries, detailing their characteristics and operations. It also outlines internship tasks related to AI chatbot development, document analysis in supply chain management, and enterprise automation projects, emphasizing the use of advanced technologies like GPT-4o and Azure Document Intelligence. Additionally, it describes the Retrieval-Augmented Generation (RAG) architecture for enhancing language model responses through external knowledge integration.

Uploaded by

justpics.tanvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Python Lists:

 Built in data-type
 Mylist = [“apples”, 1, True]
 Can store values of different data types like str integer Boolean
 Are changeable : able to add, delete values
 Ordered lists are those which have a specific order in which elements are ordered
 Unordered means no specific order or sequence of items
 Allows duplicates (mylist =[“apples”, “cherry”, “apples”])
 Can also create a list using list() constructor
 To change item value:
thislist = ["apple", "banana", "cherry"]
thislist[1] = "blackcurrant"
 Adding items to list
o thislist.append("orange")
o thislist.insert(1, "orange")
 append 2 lists:
o thislist.extend(tropical)
o list1+list2
 extend a different data type to list: thislist.extend(thistuple) where thistuple is a tuple
 removing items
o thislist.remove("banana")
o pop() to remove last item
o clear() to empty list contents but list still remains to exist
o del mylist to delete entire lit
o del mylist[1] to delete specific index item
 make a copy of list:
o mylist.copy()
o mylist[:]
o list(mylist)
 list comprehension: offers a shorter syntax to create a list based on values of an old/pre-
existing list:
can replace below code :
for x in fruits:
if "a" in x:
newlist.append(x)

print(newlist)
as so:
newlist = [x for x in fruits if "a" in x]
print(newlist)
Python Tuple

 ordered
 unchangeable/immutable i.e cannot add, delete item after a tuple is created but we can
convert a tuple to a list using the list() function and then perform add/update/remove
operation nd convert this list back to a tuple using tuple() function
 allows duplicates
thistuple = ("apple",)
print(type(thistuple))

#NOT a tuple
thistuple = ("apple")
print(type(thistuple))
 can contain different data types in one tuple
 can create a tuple directly :
o mytuple = (“apple”,”banana”)
o thistuple = tuple(("apple", "banana", "cherry")) # note the double
round-brackets
 unpacking a tuple:

Python Set:

 unordered myset = {"apple", "banana", "cherry"}


 Once a set is created, you cannot change its items, but you can remove items and add new
items.
 Duplicates Not Allowed. The values True and 1 are considered the same value in sets, and are
treated as duplicates:
 thisset = set(("apple", "banana", "cherry")) # note the double round-brackets
 add items to set: thisset.add("orange")
 add 2sets: thisset.update(tropical)
 add an iterable to set: thisset.update(mylist)
 to remove an item from set: thisset.remove("banana")
 The union() and update() methods joins all items from both sets.
 The intersection() method keeps ONLY the duplicates.
 The difference() method keeps the items from the first set that are not in the other set(s).
 The symmetric_difference() method keeps all items EXCEPT the duplicates.
Python dictonary:

 Dictionary items are ordered, changeable, and do not allow duplicates.


 Change/add values:
o thisdict.update({"key": value})
o thisdict["key"] = value
 to remove item:
o thisdict.pop("model")
o del thisdict["model"]
o thisdict.clear()

Internship details. manual

task 1: comes under oil and gas sector

- Created a chatbot using ai that answers to user query based on information available in db
- Db consists of pdf files,word docs, excel files
- All file types are converted to pdf. Word to pdf using libreoffice, excel using langchain
unstructured document loaders. Pdf loaded using langchain document loaders.
- Now all files are available in pdf format.
- Gpt4o used to extract pdf content and store into langchain docs
- Embeddings created for entire extracted content using text-embedding-ada-002 model and
stored in vector store
- Use RecursiveCharacterTextSplitter to split large docs while creating embeddings
- Rag pipeline created to respond to user query
- Use conversational chain buffer memory to maintain chat history for user
- Created an api to handle all this application
- Used postman to test results
- Solving trivy scanner nd sonar cube errors
- Deployed api using docker image on azure git

task 2: supply chain management sector

- Explored azure document intelligence for extracting tables from complex excel sheets.
- Extracting information from vendor bids to understand availability of piping system material
or plant material according to our specifications.
- Analyzing vendor quotation and specification to choose the best fit
- Recommending the best fit to decision making authority

Task 3: enterprise automation project

- Part of a project that generated responses for EOI documents.


- Embeddings created for multiple sector databases like oil and gas, chemical, construction,
etc.
- Workflow like task 1
- Multiple API application
- worked on creating a custom retriever to cache results for frequent queries and use a hybrid
scoring method of Weight cosine similarity + keyword match

__________________________________________________________________________________

Task 1: AI-Powered Chatbot for Oil & Gas Sector

 Designed and implemented an AI-driven chatbot capable of responding to user queries by


leveraging document-based knowledge stored in a centralized database.

 Consolidated heterogeneous data sources including PDF files, Word documents, and Excel
sheets. Utilized LibreOffice for .doc to .pdf conversions and LangChain's unstructured
document loaders for Excel files to ensure standardized document formatting.

 Processed all documents in PDF format using LangChain PDF loaders, enabling consistent
parsing and preprocessing.

 Leveraged GPT-4o for extracting semantically rich content from PDFs and converting them
into LangChain document objects.

 Created embeddings for all processed content using OpenAI’s text-embedding-ada-002


model and stored them in a high-performance vector database, enabling rapid semantic
search and retrieval.

 Employed RecursiveCharacterTextSplitter to manage large document chunks effectively


during embedding generation, optimizing retrieval accuracy.

 Built a Retrieval-Augmented Generation (RAG) pipeline to provide contextual, accurate


answers based on stored document knowledge.

 Integrated LangChain’s ConversationalRetrievalChain with buffer memory to maintain


contextual continuity in multi-turn conversations.

 Developed a FastAPI-based backend API to orchestrate the end-to-end chatbot operations,


from query intake to response generation.

 Validated API functionality and performance using Postman, ensuring reliability across
various input cases.

 Addressed security and quality issues by resolving vulnerabilities flagged by Trivy (container
security) and SonarQube (static code analysis).

 Containerized the application and deployed it using Docker, integrating with Azure Git for
version control and CI/CD readiness.

Business Impact: Enhanced operational efficiency by providing engineers and field experts with real-
time access to domain-specific documentation and insights, significantly reducing time spent on
manual document search.

Task 2: Intelligent Document Analysis for Supply Chain Management

 Explored and applied Azure Document Intelligence (formerly Form Recognizer) to extract
structured data, particularly complex tabular information from Excel-based vendor bids.
 Automated the identification and extraction of key metrics such as material availability,
specifications, and compliance for piping systems and plant materials.

 Conducted comparative analysis of vendor quotations using extracted insights to assist in


optimal supplier selection.

 Delivered data-backed recommendations to procurement and decision-making teams,


enabling informed and faster purchasing decisions.

Business Impact: Streamlined vendor evaluation processes by automating bid analysis, leading to
reduced procurement cycle time and improved supply chain transparency.

Task 3: Enterprise Automation – Expression of Interest (EOI) Response Generation

 Contributed to an enterprise-grade automation project aimed at generating responses for


EOI documents across various sectors, including Oil & Gas, Construction, and Chemicals.

 Built sector-specific document embeddings using a pipeline similar to Task 1, enabling


intelligent matching and content generation for EOI requirements.

 Developed and managed multiple APIs to handle modular tasks within the application such
as retrieval, ranking, and response generation.

 Engineered a custom retriever capable of caching frequently asked queries, implementing a


hybrid retrieval approach that combined weighted cosine similarity with keyword matching
for improved ranking accuracy and reduced latency.

Business Impact: Automated a previously manual and time-intensive process, increasing proposal
generation speed and enabling scalable client engagements across sectors.

RAG architecture:

-RAG as “a general-purpose fine-tuning recipe” designed to integrate any large language model (LLM)
with various internal or external knowledge sources.

-RAG gives an LLM a superpower: the ability to consult an external knowledge base before crafting its
responses.

1. Question Input: The client inputs a question into the system. This initiates the process by
feeding the query into the framework.

2. Semantic Search: The framework employs semantic search techniques to query the vector
database. This search retrieves relevant contextual data based on the input question.

3. Contextual Data Utilization: The retrieved data is then used to create a prompt. This prompt
is specifically tailored to guide the LLM in generating a response that is both relevant and
informative.

4. Response Generation by LLM: The LLM processes the prompt and generates a response. The
LLM’s extensive training on vast datasets enables it to produce high-quality answers.
5. Post-Processing: The generated response undergoes post-processing to ensure clarity,
coherence, and appropriateness. This step may involve refining the language, correcting
errors, and enhancing the overall quality of the response.

6. Response Delivery: The final, polished response is delivered back to the client, providing
them with the information they sought in a clear and concise manner.

Use delta query to track changes

(Use delta query to track changes in Microsoft Graph data - Microsoft Graph | Microsoft Learn)

Hyperparamaters in llm

You might also like