Google Ai-Ml Report-5j6
Google Ai-Ml Report-5j6
on
BACHELOR OF TECHNOLOGY
in
Submitted by
Y. Prasanth (21501A05J6)
1|Page
2024 - 2025
CERTIFICATE
This is to certify that the Internship Summary Report on “AI-ML Virtual Internship” is
the bonafide work of Y. Prasanth (Regd. No. 21501A05J6), in partial fulfillment of the
requirements for the award of the degree of BACHELOR OF TECHNOLOGY in COMPUTER
SCIENCE AND ENGINEERING submitted during the academic year 2024 - 2025.
2|Page
3|Page
VIRTUAL INTERNSHIP COMPLETION CERTIFICATE
4|Page
ACKNOWLEDGEMENT
First, I would like to thanks AICTE EDUSKILLS FOUNDATION for giving me the
opportunity to do an internship virtually.
I also would like all the people that worked along with me, with their patience and openness
they created an enjoyable working environment.
It is indeed with a great sense of pleasure and immense sense of gratitude that I acknowledge
the help of these individuals.
I am highly indebted to Principal Dr. K. SIVAJI BABU, for the facilities provided to
accomplish this internship.
I would like to thank my Head of the Department Dr. A. Jayalakshmi for his constructive
criticism throughout my internship.
I am extremely great full to my department staff members and friends who helped me in
successful completion of this internship.
Y. Prasanth
(21501A0J6)
5|Page
ABSTRACT
6|Page
INTRODUCTION
7|Page
INDEX
COURSE: Google AI-ML VIRTUAL INTERNSHIP
SNO MODULES TOPICS PGNO
1. Module 1 Program neural networks with TensorFlow 9-13
What is machine learning?
Business problems solved with machine
learning.
Machine learning process.
Machine learning tools overview
Machine learning challenges
2. Module 2 Get started with object detection 14-22
Formulating machine learning
problems.
Collecting and securing data.
Evaluating your data.
Feature Engineering.
Training.
3. Module 3 Go further with object detection 23-28
Forecasting Overview.
Processing time series data.
Performing AI and ML in Googl
coding Developer Profile e
using Collab.
4. Module 4 Get started with a product image search 29-37
Introduction to computer vision.
Image and video analysis.
Preparing custom datasets for
computing vision.
5. Module 5 Go further with product image search 38-43
Overview of natural language
processing.
Natural language processing managed
services.
Build a Visual Product Search
Backend using Vision API Product
Search.
6. Module 6 Go further with image classification 44-47
Build a Flower Recognizer.
8|Page
Create a Custom Model for Your Image
Classifier.
Integrate a Custom Model into Your
App.
CONCLUSION & REFERENCES 48-51
9|Page
COURSE: Google AI-ML VIRTUAL INTERNSHIP
MODULE 1: PROGRAM NEURAL NETWORKS WITH
TENSORFLOW
We have claimed that AI is exciting, but we have not said what it is. In Figure we see eight
definitions of AI, laid out along two dimensions. The definitions on top are concerned with
thought processes and reasoning, whereas the ones on the bottom address behavior. The
definitions on the left measure success in terms of fidelity to human performance,
whereas the ones on the right measure against an ideal performance measure, called
rationality. A system is rational if it does “right thing”, given what it knows.
Historically, all four approaches to AI have been followed, each by different people with
different methods. A human-centered approach must be in part an empirical science, in 1 2
Thinking Humanly Chapter 1. Introduction Thinking Rationally “The exciting new effort
to make computers think machines with minds, in the full and literal sense.” (Haugeland,
1985) “[The automation of] activities that we associate with human thinking, activities
such as decision-making, problem solving, learning” (Bellman, 1978) Acting Humanly
“The art of creating machines that per form functions that require intelligence when
performed by people.” (Kurzweil, 1990) “The study of how to make computers do things
at which, at the moment, people are better.” (Rich and Knight, 1991)
“The study of mental faculties through the use of computational models.” (Charniak and
McDermott, 1985) “The study of the computations that make it possible to perceive,
reason, and act.” (Winston, 1992) Acting Rationally “Computational Intelligence is the
study of the design of intelligent agents.” (Poole et al., 1998) “AI is concerned with
intelligent behavior in artifacts.” (Nilsson, 1998) Some definitions of artificial
intelligence, organized into four categories. Involving observations and hypotheses about
human behavior. A rationalist1 approach involves a combination of mathematics and
engineering. The various group have both disparaged and helped each other. Let us look at
the four approaches in more detail.
10 | P a g e
1. What is Machine Learning?
Machine learning is the scientific study of algorithms and statistical models to
perform a task by using interference instead of instructions. Figure 2.1 represents the
machine- learning flow.
Page | 10
ML PIPELINE: A machine leaning pipeline is the end-to-end construct that orchestrates
the flow of data into, and output from, a machine learning model as in figure-2.2 (or set
of multiple models). It includes raw data input, features, outputs, the machine learning
model and model parameters, and prediction outputs.
Figure-2.2-ML Pipeline
4. Machine Learning tools Overview
Jupyter Notebook is a open-source web application that enables you to create
and share documents that contain live code, equations, visualizations, and
narrative text.
Jupyter Lab is a web-based interactive development environment for the
Jupyter notebooks, code, and data. Jupyter Lab is flexible.
Pandas is an open-source Python library. It’s used for data handling and analysis.
It represents data in a table that is similar to a spreadsheet. This table is known
as a panda Data Frame.
Matplotlib is a library for creating scientific static, animated and interactive
visualizations in Python. You use it to generate plots of your data later in this
course.
5. Machine Learning Challenges
NumPy is one of the fundamental scientific computing packages in Python. It
contains functions for N-dimensional array objects and useful math functions such as
linear algebra, Fourier transform, and random number capabilities-learn is an open-
source machine learning library that supports supervised and unsupervised learning. It
also provides various tools for model fitting, data pre-processing, model selection and
evaluation, and many other utilities.
Page | 11
PROGRAM NEURAL NETWORKS WITH TENSOR FLOW
Computer Vision (CV) is a field within artificial intelligence (AI) that focuses on
enabling machines to interpret and understand visual information, much like the
human visual system. This technology has become increasingly important across
various industries, including healthcare, automotive, agriculture, and security, among
others.
1. Introduction to Computer Vision:
Definition: Computer Vision aims to replicate human vision abilities by enabling
machines to perceive, understand, and interpret visual data from real world.
Tasks: It encompasses a wide range of tasks, including image classification, object
detection, facial recognition, image segmentation, and motion analysis.
Importance: By enabling machines to “see,” CV enables automation of tasks that were
previously exclusive to humans, leading to advancements in fields such as
autonomous vehicles, medical imaging, and surveillance systems.
2. Introduction to Convolutions:
Definition: Convolutions are mathematical operations applied to images to extract
features by sliding a small matrix (kernel) over the input image.
Feature Extraction: Convolutions are fundamental in image processing for detecting
edges, textures, shapes, and other patterns.
Applications: They are widely used in tasks such as image filtering, edge detection, and
feature extraction in computer vision algorithms.
3. Convolutional Neural Networks (CNNs):
Definition: CNNs are a class of deep learning models specifically designed to
handle visual data.
Architecture: They consist of multiple layers, including convolutional layers for
feature extraction, pooling layers for down sampling and reducing special dimensions,
and fully connected layers for classification.
Advantages: CNNs excel in learning hierarchical representations of features, making
them highly effective for tasks like image classification, object detection, and
semantic segmentation.
Page | 12
4. Complex Images:
Variability: Images captured in real-world scenarios often exhibit variations in
lighting conditions, backgrounds, scales and object orientations.
Robustness: Models need to be robust to these variations to perform reliably across
diverse datasets and environments.
Mitigation Techniques: Strategies like data augmentation, which involves generating
new training samples by applying transformations to existing data, and transfer
learning, which leverages pre-trained models on similar tasks, help improve
robustness and generalization.
5. Use CNNs with Larger Datasets:
Generalization: Training CNNs with larger datasets helps improve their ability to
generalize, meaning they can perform well on unseen data.
Diverse Patterns: Larger datasets expose models to a wider variety of patterns and
variations, enabling them to learn more robust and representative features.
Real-world Performance: By training on diverse data, CNNs can better handle real-
world tasks and scenarios, leading to enhanced performance in applications such as
autonomous driving, medical diagnosis, and surveillance.
Page | 13
MODULE 2: GET STARTED WITH OBJECT DETECTION
Page | 14
Deleting the outlier: This approach might be a good choice if your outlier is
based on an artificial error. Artificial error means that the outlier isn’t natural
and was introduced because of some failure- perhaps incorrectly entered data.
Imputing a new value for the outlier: You can use the mean of the feature,
for instance, and impute that value to replace the outlier value. Again, this
would be a good approach if an artificial error caused the outlier.
Feature Selection: Filter Methods
Filter methods (figure-2.4) use a proxy measure instead of the actual model’s
performance. Filter methods are fast to compute, and they still capturing the usefulness of
the feature set. Common measures include:
Pearson’s correlation coefficient -Measures the statistical relationship or
association between two continuous variables.
Linear discriminant analysis (LDA) -Is used to find a linear combination of
features that separates two or more classes.
Feature Selection: Wrapper Methods
Forward selection starts with no features and adds them until the best model is
found. (figure-2.5)
Backward selection starts with all features, drops them one at a time, and selects
the best model.
Feature Selection: Embedded Methods
Embedded methods (figure-2.6) combine the qualities of filter and wrapper methods.
They are implemented from algorithms that have their own built-in feature selection
methods.
Page | 15
4. Training
Holdout technique (figure-2.7) and k-fold cross validation (figure-2.8) methods are the
most used ones when the data is to be classified as test set and training set.
Linear learner: The Amazon SageMaker linear learner algorithm provides a solution for
both classification and regression problems. The Amazon SageMaker linear learner
algorithm compares favorably with methods that provide a solution for only continuous
objectives. It provides a significant increase in speed over naïve hyperparameter
optimization techniques.
Page | 16
5. Hosting and Using the Model
You can deploy your trained model by using Amazon SageMaker to handle
API calls from applications, or to perform by using a batch transformation.
Use Single-model endpoints for simple use cases and use multi-model endpoint
support to save resources when you have multiple models to deploy.
6. Evaluating the Accuracy of the Model
Confusion Matrix Terminology: Confusion Matrix is a performance
measurement for machine learning classification. An example for classification
(figure- 2.9)
Page | 17
7. Hyperparameter and model tuning
HYPERPARAMETER TUNING:
Tuning hyperparameters can be labor-intensive. Traditionally, this kind of tuning
was done manually.
Then, they would train the model and score it on the validation data. This process
would be repeated until satisfactory results were achieved.
This process is not always the most through and efficient way of tuning your
hyperparameters. It helps the model to define and filtering, and optimizer for
finding patterns and defining the attributes of data by itself (figure-2.11).
Google Developers offers a wealth of resources and tools for developers interested in
artificial intelligence (AI) and machine learning (ML). Here are some key offerings in the
AI and ML domain provided by Google Developers:
1. TensorFlow: TensorFlow is an open-source machine learning framework
developed by Google. Google Developers provides extensive documentation,
tutorials, and guides for TensorFlow, covering topics such as building neural
networks, training models, deploying models in production, and using TensorFlow
for tasks like image classification, natural language processing, and more.
2. TensorFlow Lite: TensorFlow Lite is a lightweight version of TensorFlow
designed for mobile and embedded devices. Google Developers offers resources
and tools for developers to leverage TensorFlow lite for deploying machine
learning models on mobile platforms, optimizing model size and performance, and
integrating AI capabilities into mobile applications.
3. TensorFlow.js: TensorFlow.js is a JavaScript library that allows developers to run
machine learning models directly in the browser or Node.js environment. Google
Developers provides documentation, tutorials and code samples for TensorFlow.js,
enabling developers to build interactive web applications with AI capabilities, such
as image recognition, sentiment analysis, and more.
4. **AI Platform**: Google Cloud AI Platform is a suite of managed services and
tools for building, training and deploying machine learning models at scale.
Google Developers provides documentation, tutorials, and code samples for
TensorFlow.js
Page | 18
enabling developers to build interactive web applications with AI capabilities, such
as image recognition, sentiment analysis, and more.
5. AutoML: Google’s Cloud AIML suite provides tools and services that automate the
process of building custom machine learning models without requiring deep
expertise in ML. Google Developers offers resources for AutoML, including
documentation, tutorials, and examples for using AutoML Vision, AutoML Natural
Language, AutoML Tables, and other AutoML products.
6. Machine Learning APIs: Google Developers provides access to a range of pre-
trained machine learning models and APIs through Google Cloud, including Vision
API, Natural Language API, Translation API, Video Intelligence API, and more.
These APIs enable developers to add AI capabilities to their applications with
minimal effort, allowing tasks like image recognition, text analysis, and language
translation.
Overall, Google Developers offers a comprehensive set of resources and tools for
developers interested in AI and ML, spanning from frameworks and libraries to managed
services and APIs. Whether developers are building models from scratch, deploying
models in production, or integrating AI capabilities into their applications, Google
Developers provides the support and guidance needed to succeed in the field of artificial
intelligence and machine learning. Acting humanly: The Turing Test approach
TURINGTEST NATURALLANGUAGE PROCESSING KNOWLEDGE
REPRESENTATION AUTOMATED REASONING MACHINE LEARNING.
The Turing Test, proposed by Alan Turing (1950), was designed to provide a satisfactory
operational definition of intelligence. A computer passes the test if a human interrogator,
after posing some written questions, cannot tell whether the written responses come from
a person or from a computer discusses the details of the test and whether a computer
would really be intelligent if it passed. For now, we note that programming a computer to
pass a rigorously applied test provides plenty to work on. The computer would need to
possess the following capabilities:
Natural language processing to enable it to communicate successfully in English
knowledge representation to store what it knows or hears, automated reasoning to use the
stored information to answer questions and to draw new conclusions. Machine learning to
adapt to new circumstances and to detect and extrapolate patterns
Turing’ s test deliberately avoided direct physical interaction between the interrogator and
the computer, because physical simulation of a person in unnecessary for intelligence.
However, the so-called total Turing Test includes a video signal so that the interrogator
can test the subject’s perceptual abilities, as well as the opportunity for the interrogator to
pass
Page | 19
physical objects “through the hatch.” To pass the total Turing Test, the computer will
need computer vision to perceive objects, and robotics to manipulate objects and move
about. These six disciplines compose most of AI, and Turing deserves credit for
designing a test that remains relevant 60 years later.
Yet AI researchers have devoted little effort to passing the Turing Test, believing that it is
more important to study the underlying principles of intelligence than to duplicate an
exemplar. The quest for “artificial flight” succeeded when the Wright brothers and others
stopped intimating birds and started using wind tunnels and learning about aerodynamics.
Aeronautical engineering texts do not define the goal of their field as making “machines
that fly so exactly like pigeons that they can fool even other pigeons.”
The cognitive modeling approach if we are going to say that a given program thinks like
a human, we must have some way of determining how humans thick. We need to get
inside the actual workings of human minds. There are three ways to do this: through
introspection- trying to catch our own thoughts as they go by through psychological
experiments- observing a person in action and through brain imaging- observing the brain in
action. Once we have a sufficiently precise theory of the mind, it becomes possible to
express the theory as a computer program.
Page | 20
Convenient Solution: The ML Kit Object Detection API provides a user-friendly
solution for seamlessly integrating object detection capabilities into mobile apps.
Accessibility: Accessible across various mobile platforms, it simplifies the integration
process, eliminating the need for extensive machine learning expertise.
Flexibility: ML Kit supports both on-device and cloud-based object detection, offering
flexibility to developers based on their app’s requirements and constraints.
Google Developers offers a wealth of resources and tools for developers interested in
artificial intelligence (AI) and machine learning (ML). Here are some key offerings in
the AI and ML domain provided by Google Developers:
1. TensorFlow: TensorFlow is an open-source machine learning framework developed
by Google. Google Developers provides extensive documentation, tutorials, and
guides for TensorFlow, covering topics such as building neural networks, training
models, deploying models in production, and using TensorFlow for tasks like
image classification, natural language processing, and more.
2. TensorFlow Lite: TensorFlow Lite is a lightweight version of TensorFlow designed
for mobile and embedded devices. Google Developers offers resources and tools
for developers to leverage TensorFlow Lite for deploying machine learning models
on mobile platforms, optimizing model size and performance, and integrating AI
capabilities into mobile applications.
3. TensorFlow.js: TensorFlow.js is a JavaScript library that allows developers to run
machine learning directly in the browser or Node.js environment. Google
Developers provides documentation, tutorials, and code samples for TensorFlow.js,
enabling developers to build interactive web applications with AI capabilities, such
as image recognition, sentiment analysis, and more.
4. AI Platform: Google Cloud AI Platform is a suite of managed services and tools for
building, training, and deploying machine learning models at scale. Google
Developers offers documentation, tutorials and guides for AI Platform, covering
topics such as data preparation, model training, hyperparameter tuning, model
deployment, and monitoring.
5. AutoML: Google’s AutoML suite provides tools and services that automate the
process of building custom machine learning models without requiring deep
expertise in ML. Google Developers offers for AutoML, including documentation,
tutorials, and examples for using AutoML Vision, AutoML Natural Language,
AutoML Tables, and other AutoML products.
6. Machine Learning APIs: Google Developers provides access to a range of pre-trained
machine learning models and APIs through Google Cloud, including Vision API,
Page | 21
Natural Language API, Translational API, Video Intelligence API, and more. These
APIs enable developers to add AI capabilities to their applications with minimal
effort, allowing tasks like image recognition, text analysis, and language
translation.
Page | 22
MODULE: 3 GO FURTHER WITH OBJECT DETECTION
1. OVERVIEW OF FORECASTING
Forecasting is an important area of machine learning. It is important because so many
opportunities for predicting future outcomes are based on historical data. It’s based on
time series data.
Time series data fall into two broad categories.
The first type is univariate, which means it has only one variable. The second type is
patterns:
Trend – A pattern that shows the value as they increase, decrease or stay the
same overtime.
Seasonal – A repeating pattern that is based on the seasons in a year.
Cyclical – Some other form of a repeating pattern.
Irregular – Changers in the data overtime that appear to be random or that have
no discernible pattern.
2. PROCESSING TIME SERIES DATA
During the time series processing we need to check whether the data behavior as
forward filled, moving average, backward fill or interpolation.
Page | 23
Time Series Data Handling: Smoothing of Data: Smoothing your data can help you
deal with outliers and other anomalies. You might consider smoothing for the
following reasons.
Data preparation – Removing error values and outliers.
Visualization – Reducing noise in a plot.
Time Series Data Algorithms: There are 5 types of Time Series Data Algorithms
considering of ARMA, DeepAR+, ETS, NPTS, Prophet as shown in figure – 2.13.
Page | 24
Object detection is a fundamental task in computer vision that involves identifying and
locating objects within images or videos. Unlike image classification, which only predicts
the presence of an object in an image, object detection provides both spatial localization
and class labels for each detected object. This capability enables a wide range of
applications, including autonomous driving, surveillance, medical imaging, and industrial
automation.
1.1 Object Localization
Object detection begins with task of localization, where the goal is to determine the
precise location of objects within an image. This typically involves drawing bounding
boxes around objects to indicate their spatial extent. These bounding boxes are
represented by coordinates (x, y) of the object’s top- left corner an its width and height (w,
h). The accuracy of object localization is crucial for tasks such as robotics and augmented
reality.
1.2. Object Classification
In addition to localization, object detection involves classifying each detected object into
predefined categories or classes. This step assigns a label to each bounding box,
indicating the type of object it represents. For example, in an image containing cars,
pedestrians, and traffic signs, object detection algorithms would classify each detected
object accordingly. This classification enables intelligent systems to understand and
interact with their environment effectively.
Techniques and Applications of Object Detection
2.1 Techniques for Object Detection
Object detection algorithms employ various techniques to accurately detect and
classify objects within images or videos. One of the most widely used approaches is
the region-based convolutional neural network (R-CNN) family of methods. R-CNN
based models first propose a set of candidate regions within an image using selective
search or similar methods. Then, these regions are processed individually by a CNN
to extract features and classify objects.
2.2 Advances in Object Detection
Recent advances in object detection have led to development of faster and more
accurate algorithms. One notable breakthrough is the introduction of single-stage
detectors, such as You Only Look Once (YOLO) and Single Shot MultiBox Detector
(SSD). These models eliminate the need for region proposal methods by predicting
bounding boxes and class performance without sacrificing accuracy, making ideal for
applications requiring low latency, such as video surveillance and autonomous vehicles.
2.3 Application of Object Detection
Page | 25
Object detection has numerous applications across various domains:
Autonomous driving: Detecting vehicles, pedestrians, and traffic signs for
navigation and collision avoidance.
Surveillance: Identifying suspicious activities and monitoring crowds in public
spaces.
Medical imaging: Locating and analyzing anatomical structures in medical images
for diagnosis and treatment planning.
Retail: Tracking inventory, detecting product defects, and analyzing customer
behavior in stores.
3.1. Challenges in Object Detection
Despite significant progress, object detection still faces several challenges that
researchers continue to address:
Scale and Aspect Ration Variations: Objects in images can vary greatly in scale and
aspect ratio, making it challenging to detect objects of different sizes and shapes
accurately.
Occlusions and Clutter: Objects may be partially occluded or surrounded by clutter in
real- world scenes, affecting the performance of object detection algorithms.
Speed and Efficiency: Real-time object detection requires algorithms to be fast and
efficient, especially for applications such as autonomous driving and surveillance.
Generalization to New Environments: Object detection models trained on one dataset
may not generalize well to new environments with different lighting conditions,
backgrounds, or object classes.
3.2 Future Directions in Object Detection
Researchers are actively exploring new approaches and techniques to address these
challenges and push the boundaries of object detection technology:
Multi-Scale and Context-Aware Models: Next- generation object detection models
aim to incorporate multi-scale features and contextual information to improve
detection accuracy in challenging scenarios.
Few-Shot and Zero-Shot Learning: Techniques such as few-shot and zero-shot
learning enable object detection models to generalize to new object to new object
classes with limited or no training data, expanding the scope of applications.
Page | 26
Attention Mechanism: Integrating attention mechanisms into object detection
architectures can improve the model’s ability to focus on relevant regions of interest
and filter out distractions, enhancing both accuracy and efficiency.
Domain Adaption and Transfer Learning: Techniques for domain adaption and
transfer learning help object detection models adapt to new environments by
leveraging knowledge from pre-trained or auxiliary datasets.
Ethical and Societal Considerations: As object detection technology become more
widespread, addressing ethical and societal considerations, such as privacy, fairness,
and bias, is essential to ensure responsible deployment and mitigate potential harm.
Practical Considerations and Implementations of Object Detection
4.1 Data Preparation
One of the crucial steps in implementing object detection is data preparation. This
involves collecting, annotating, and preprocessing a diverse dataset containing images
with annotated bounding boxes for training the object detection model. The quality
and diversity of the training data significantly impact the performance of the model,
so careful attention must be paid to data selection and annotation.
4.2. Model Selection and Training
Choosing the appropriate object detection model architecture is another critical
consideration. Depending on the application requirements, factors such as accuracy,
speed, and resource constraints need to be weighed when selecting a model. Popular
choices include Faster R-CNN, SSD, YOLO, and their variants, Once the model is
selected, it needs to be trained using the annotated dataset to learn to detect objects
accurately.
4.3. Optimization and Fine-Tuning
After training the model, optimization techniques such as pruning, quantization, and
model compression can be applied to reduce the model’s size and computational
complexity, making it suitable for deployment on resource-constrained devices.
Additionally, fine-tuning the model on domain-specific data or performing transfer
learning form pre-trained models can further improve its performance on target tasks.
4.4 Deployment and Integration
Once the object detection model is trained and optimized, it can be deployed and
integrated into the target application or system. This may involve developing software
interfaces or APIs for communication with other components, optimizing interference
pipelines for real-time performance, and ensuring compatibility with the target
Page | 27
hardware platform. Continuous monitoring and maintenance are essential to ensure
the deployed model’s performance remains robust over time.
Train a custom object-detection model and deploy it with TensorFlow Lite for
efficient real-time recognition on mobile or edge devices.
1. Train Your Own Object-Detection Model:
Customization: Train a custom object-detection model using a dataset of labeled
images, tailored to specific recognition tasks.
Machine Learning Frameworks: Utilizing machine learning frameworks like
TensorFlow to develop and train the model, allowing for flexibility and
customization.
2. Build and Deploy a Custom Object-Detection Model with TensorFlow Lite:
TensorFlow Lite: Utilize TensorFlow Lite, a lightweight version of TensorFlow, to
build a model suitable for deployment on mobile or edge devices.
Efficient Interface: TensorFlow Lite enables efficient real-time interference on
resource-constrained platforms, ensuring optimal performance even with limited
computational resources.
Page | 28
MODULE 4: GET STARTED WITH PRODUCT IMAGE
SEARCH
1. Computer Vision enables machines to identify people, places and things in images
with accuracy at or above human levels, with greater speed and efficiency. Often
built with deep learning models, computer vision automates the extraction,
analysis, classification and understanding of useful information from a single
image or a sequence of images. The image data can take many forms, such as
single images, video sequences, views from multiple cameras, or three-
dimensional data.
Applications of Computer vision: (figure 2.15)
Agriculture: It helps in monitoring crop health, detecting pests, and assessing soil
conditions to improve yield and manage resources efficiently.
Page | 29
Insurance: Computer vision automates claim processing by analyzing images of damages,
thus speeding up the assessment and approval process.
Manufacturing: It is employed for quality control, inspecting products for defects, and
ensuring they meet required standards during the production process.
Banking: In banking, computer vision enhances security through facial recognition for
authentication and fraud detection.
Sports: Computer vision analyzes player movements, tracks ball trajectory, and provides
performance insights to enhance training and game strategy.
Page | 30
Problem 02: Video Analysis
Amazon Recognition is a computer vision service based on deep learning. You can use it
to add image and video analysis to your applications. Amazon Recognition enables you
to perform the following types of analysis:
Searchable image and video libraries – Amazon Recognition makes images and
stored videos searchable so that you can discover the objects and scenes that
appear in them.
Pathing – You can capture the path of people in the scene. For example, you can
use the movement of athletes during a game to identify plays for post-game
analysis.
There can be performed two cases “Searchable Image Library” and “Sentiment
Analysis”.
Page | 31
Figure – 2.17 – Video
Analysis Preparing Custom Dataset for Computer
Vision There are 6 steps involved in preparing customs
data:
Each step has it’s functionalities like collection of images, creating training dataset, create
test dataset, train the model, evaluate and then use the model
1. Collect Images
Typically use a few hundred images.
Build domain-specific models.
Use 10 PNG or JPEG images per label.
Use images similar to the images that you want to detect.
Page | 32
4. Train the Model
5. Evaluate
Evaluate model performance.
Metrics.
o Precision.
o Recall.
o Overall model performance.
6. Use Model
Returns array of custom labels:
o Label.
o Bounding box for objects.
o Confidence.
So, now let’s see how the product image starts the image search.
1: Introduction to Product Image Search
Product image search is a technology that enables users to search for products using
images instead of text queries. This innovative approach leverages computer vision and
machine learning techniques to analyze and understand the visual content of images,
allowing users to find similar or visually related products across vast product catalogs.
Product image search offers a more intuitive and convenient shopping experience,
enabling users to discover products based on visual preferences and inspiration.
1.1 Evolution of E-Commerce Search
Traditional text-based search systems in e-commerce platforms have limitations, such
as the need for precise product descriptions and the inability to convey visual
preferences effectively. Product image search addresses these limitations by allowing
users to initiate searches directly from images, bypassing the need for text queries and
providing more accurate and relevant results.
1.2 Key Components of Product Image Search
Product image search systems typically consist of several key components:
Image Feature Extraction: Extracting high-dimensional feature from product
images using CNNs or other deep learning models.
Page | 33
Indexing and Retrieval: Building an index of image features to efficiently search
and retrieve similar images from large product database.
Ranking and Relevance Scoring: Ranking retrieved images based on their
similarity to the query image and relevance to the user’s preferences.
User Interface and Integration: Designing user-friendly interfaces that allow users
to interact with product image search system seamlessly, integrating it into e-
commerce websites or mobile applications.
Page | 34
friction in the search process, and facilitate product discovery. Product image
search can also enable visual recommendation systems, personalized shopping
experiences, and virtual try-on features.
3.2. Visual Search Engines
Beyond e-commerce, product image search has applications in visual search engines
and image-based information retrieval systems. Users can search visually similar
images across diverse domains, such as art, fashion, home décor, and more. Visual
search engines empower users to explore visual content intuitively, discover new
inspirations, discover new inspirations, and find relevant information based on visual
cues.
4. Challenges and Future Directions
4.1. Challenges in Product Image Search
Despite its promise, product image search faces several challenges, including:
Data Quality and Annotation: Ensuring the availability of high-quality image data with
accurate annotations for training and evaluation.
Semantic Gap: Bridging the semantic gap between low-level image features and high-
level semantic concepts, such as product attributes and user preferences.
Scalability and Efficiency: Developing scalable and efficient image search algorithms
capable of handling large-scale product databases and real-time search queries.
Cross-Domain Generalization: Enabling product image search systems to generalize
across different product categories, styles and domains.
4.2. Future Directions
Future research directions in product image search include:
Semantic Understanding: Advancing techniques for semantic understanding of product
images, including attribute detection, style analysis, and context-aware retrieval.
Multimodal Integration: Integrating textual and visual information to enhance product
understanding and recommendation capabilities.
Interactive Search Interfaces: Designing interactive search interfaces that allow users
to refine search results, provide feedback, and engage in active exploration.
Privacy and Ethical Consideration: Addressing privacy and ethical considerations
related to user data and image usage in product image search systems.
Page | 35
Integrating object detection into mobile apps to enable visual product search is a
significant advancement that greatly enhances user experiences, particularly in e-
commerce applications. Let’s delve deeper into the process and its benefits.
Introduction to Product Image Search on Mobile:
Enhanced User Experience: Product image search on mobile devices offers users a
more intuitive and engaging way to find items they’re interested in purchasing.
Quicker Searches: By allowing users to search for products using images, the
process becomes faster and more convenient, eliminating the need for manual text-
based searches. Build and Object Detector into Your Mobile App:
Enhanced Capabilities: Integrating an object detector into a mobile app expands its
functionalities, enabling users to identify and locate objects within images.
Interactive Experiences: Object detection adds interactively to the app, allowing
users to engage with visual content and explore products in a dynamic manner.
Page | 37
MODULE 5: GO FURTHER WITH PRODUCT IMAGE SEARCH
Enable visual product searches by establishing communication between the mobile app and
the backend, integrating this functionality into an Android app, and utilizing Vision API
Product Search for machine learning – based product matching enable.
Call the Product Search Backend from the Mobile App:
Connection Establishment: Establish a secure and reliable between the mobile app and
the product search backend. This typically involves implementing API endpoints or
utilizing SDKs provided by the backend service.
Communication Protocol: Define a communication protocol, such as RESTful APIs or
GraphQL, to facilitate data exchange between the mobile app and the backend. This
ensures seamless retrieval of product search results based on user queries.
Call the Product Search Backend from the Android App:
Android Integration: Integrate the functionality of the product search backend into the
Android application. This may involve incorporating relevant libraries, SDKs or APIs
provided by the backend service provider into the Android project.
Optimized Performance: Ensure that the integration is optimized for Android platforms,
considering factors such as network connectively, device compatibility, the resource
utilization. This enhances the app’s capability to effectively communicate with and
retrieve information form the backend.
Build a Visual Product Search Backend using Vision API Product Search:
Utilization of Vision API: Leverage Google’s Vision API Product Search to develop a
backend infrastructure capable of supporting visual product searches.
Page | 39
Machine Learning Integration: Harness the machine leaning capabilities of Vision API
Product Search to match and retrieve products based on visual features extracted from
images uploaded by users.
Scalability and Reliability: Design the backend system to be scalable and reliable,
capable of handling large volumes of requests and providing accurate search results in
real time.
Authentication and Authorization: Implement secure authentication mechanisms, such
as OAuth 2.0 or API keys, to ensure that only authorized users can access the product
search backend. This helps protect sensitive user data and prevents unauthorized access
to backend resources.
Error Handling and Resilience: Implement secure authentication mechanisms, such as
OAuth 2.0 or API keys, to ensure that only authorized users can access the product search
backend. This helps retry policies and fallback strategies to ensure resilence in adverse
network conditions.
Caching and Offline Support: Incorporate caching mechanisms within the Android app
to store frequently accessed product data locally, enabling offline product search
capabilities. This enhances user experience by providing uninterrupted access to product
information even without an active internet connection.
Analytics and Monitoring: Integrate analytics tools and monitoring solutions into both
the mobile app and the backend to track user interactions, monitor performances metrics,
and identify areas for optimization. Insights gathered from analytics data can help
improve the overall effectiveness and user engagement of the product search feature.
Localization and Internationalization: Consider localization and internationalization
requirements when designing the product search backend and Android app. Support for
multiple languages, currencies, and regional preferences enhances accessibility and
usability for a global audience, improving user satisfaction and adoption.
Scalability and Load Balancing: Design the product search backend to be horizontally
scalable, capable of handling increasing user traffic and demand over time. Implement
load balancing techniques to distribute incoming requests evenly across multiple backend
servers, ensuring optimal performance and responsiveness.
Security Best Practices: Adhere to industry – standard security best practices when
designing, implementing, and deploying the product search backend. This includes
measures such as data encryption, input validation, and protection against common
security vulnerabilities like SQL injection and cross – site scripting (XSS) attacks.
Continuous Integration and Deployment (CI/CD): Implement CI/CD pipelines to
automate the build, testing and deployment processes for both the Android app and the
Page | 40
backend. This streamlines the development workflow, ensures code quality and facilitates
rapid interaction and deployment of new features and updates.
By addressing these additional points, developers can create a comprehensive and robust
integration of the product search backend into Android apps, delivering a seamless and
feature -rich user experience for visual product search functionality.
Product image search revolutionizes the way users interact with e-commerce platforms by
allowing them to search for products using images instead of text queries. This
technology leverages advancements in computer vision and machine learning to
understand and analyze the visual content of images, enabling users to find visually
similar or related products effortlessly.
1.1. Evolution of E-Commerce Search
Traditional text – based search systems in e-commerce platforms have limitations
such as the need for precise product descriptions and the inability to convey visual
preferences, effectively. Product image search addresses these limitations by enabling
users to initiate searches directly from images, providing more accurate and relevant
results.
1.2. Key Components of Product Image Search
Product image search systems comprise several key components, including image
features extraction, indexing, retrieval, ranking, relevance scoring, user interface
design, and integration into e-commerce platforms. Each component plays a crucial
role in enabling seamless and intuitive product discovery through image-based search.
2: Techniques for Product Image Search
2.1. Image Feature Extraction
Image feature extraction is a fundamental component of product image search, involving
the transformation of raw image data into meaningful representations that capture visual
similarities between products. CNNs are commonly used for image feature extraction,
with pre-trained models such as VGG, ResNet and EfficientNet yielding powerful feature
representations.
2.2. Similarity Search Algorithms
Once image features are extracted, similarity search algorithms are employed to retrieve
visually similar products from the database efficiently. Techniques such as nearest neighbor
search, approximate nearest neighbor search, and locality – sensitive hashing (LSH)
enable fast and accurate retrieval of similar images based on their feature representations.
Page | 41
2.4 Fine – Grained Visual Recognition
Fine-grained visual recognition techniques are utilized in product image search to
distinguish between visually similar products with subtle differences, such as different
styles, colors, or patterns. Fine-grained classification models, attention mechanism,
and metric learning techniques enhance the discriminative power of the image search
system, improving accuracy in identifying visually similar products.
3: Applications of Product Image Search
3.1. E-Commerce and Retail
Product image search has widespread applications in e-commerce and retail, enhancing
the shopping experience and driving sales. By enabling users to search for the products
using images, e-commerce platforms increase engagement, reduce-friction in the search
process, and facilitate product discovery. Visual recommendation systems, personalized
shopping experiences, and virtual try-on features are enabled by product image search
technology.
3.2. Visual Search Engines
Beyond e-commerce product image search powers visual search engines and image-based
information retrieval systems. Users can search for visually similar images across diverse
domains such as art, fashion, home décor, and more. Visual search engines empower
users to explore visual content intuitively, discover new inspirations, and find relevant
information based on visual cues.
4: Challenges in Product Image Search
4.1. Data Quality Annotation
Ensuring the availability of high-quality image data with accurate annotations for training
and evaluation is a significant challenge in product image search. Annotated image datasets
that adequately represent the diversity of products and user preferences are essential for
training robust and effective image search models.
4.2 Semantic Gap
Bridging the semantic gap between low-level image features and high-level semantic
concepts, such as product attributes and user preferences, presents a formidable challenge
in product image search. Advancements in semantic understanding techniques, including
attribute detection, style analysis, and context-aware retrieval are essential for addressing
this challenge.
Page | 42
5: Scalability and Efficiency
5.1. Scalability
Scalability is a critical consideration in product image search, particularly concerning
large-scale product databases and real-time search queries. Developing scalable image
search algorithms and infrastructure capable of handling millions of products and
serving thousands of concurrent users efficiently is essential for delivering a seamless
and responsive user experience.
5.2. Efficiency
Efficiency in image search algorithms and systems is essential for delivering fast and
responsive search results. Techniques such as distributed computing, parallel
processing, and algorithmic optimizations are employed to optimize the efficiency of
image feature extraction, indexing, retrieval, and ranking processes, enabling real-
time product image search capabilities.
6: Future Directions and Conclusion
6.1. Future Directions
Future research directions in product image search encompass advancements in
semantic understanding, multimodal integration, interactive search interfaces, and
addressing privacy and ethical considerations. Semantic understanding techniques
enable deeper insights into product attributes and user preferences, while multimodal
integration enhances product understanding and recommendation capabilities.
Interactive search interfaces empower users to refine search results and engage in
active exploration, while addressing privacy concerns and ethical considerations
ensures responsible deployment and usage of product image search systems.
Page | 43
MODULE 6: GO FURTHER WITH IMAGE CLASSIFICATION
Develop and integrate a flower recognition system by creating a custom image classifier
and seamlessly embedding it into a user-friendly application for real-time flower
identification.
Build a Flower Recognizer:
Machine Learning Approach: Develop a flower recognition system using machine
learning techniques. This involves training a model on a dataset of labeled flower
images, where each image is associated with the type of flower it represents.
Visual Feature Identification: The goal is to create a system capable of identifying
different types of flowers based on visual features such as color, shape, texture and
petal arrangement.
Dataset Preparation: Collect and curate a dataset of flower images, ensuring it covers
a wide variety of flower types and includes sufficient examples of each type for
effective training.
Create a Custom Model for Your Image Classifier:
Custom Model Design: Design a custom image classification model tailored to the
specific requirements of the flower recognition task. This may involve selecting an
appropriate architecture, such as CNNs, and defining the model’s layers and
parameters.
Training Process: Utilize machine learning frameworks like TensorFlow or PyTorch
to define and train the custom model with the labeled flower dataset. Train the model
to learn the visual features characteristic of different types of flowers, optimizing its
performance through techniques like transfer learning or fine-tuning.
Integrate a Custom Model into Your App:
Model Deployment: Once the custom model is trained, embed it into a mobile or web
application to enable real-time flower recognition capabilities for users.
Integration Seamlessness: Integrate the model seamlessly into the application’s user
interface, allowing users to easily access the flower recognition feature. This may
involve designing an intuitive interface for capturing images and displaying
recognition results.
Page | 44
Real-time Recognition: Enable users to take pictures of flowers using their device’s
camera and receive instant recognition results within the app, providing a convenient
and interactive experience.
Data Augmentation Techniques for Improving Model Generalization:
Data augmentation involves applying transformation such as rotation, flipping and
scaling to the training imagers to increase the diversity of the dataset. This helps
improve the model’s ability to generalize to unseen data by exposing it to variations in
the input images.
Fine-Tuning Pre-trained Models for Flower Recognition:
Fine-Tuning involves taking a pre-trained neural network model (e.g. VGG, ResNet)
and adapting it to specific task of flower recognition by further training it on a dataset
of flower images. This technique leverages the learned features of the pre-trained
model, accelerating training and potentially improving performance.
Implementing Transfer Learning in Flower Recognition Models:
Transfer learning involves transferring knowledge gained from training on one task
(e.g. image classification on a large dataset like ImageNet) to a related task (e.g.,
flower recognition). By initializing the model with pre-trained weights and fine-tuning
on the target dataset, transfer leaning can lead to faster convergence and better
performance, especially when the target dataset is small.
Exploring Different CNN Architectures for Image Classification:
CNNs are widely used for image classification tasks like flower recognition.
Exploring different CNN architectures (e.g. AlexNet, Inception, MobileNet) allows
researchers and developers to find the most suitable model architecture for their
specific requirements in terms of accuracy, speed and resource efficiency.
Evaluating Model Performance Metrics such as Accuracy, Precision and Recall:
Model performance metrics such as accuracy, precision, recall and F1 score provide
insights into how well the flower recognition model is performing. Evaluating these
metrics helps identify areas for improvement and guides model refinement efforts.
Handling Class Imbalance in Flower Datasets:
Class imbalance occurs when certain classes of flowers are represented more
frequently than others in the dataset. Techniques such as class weighting, data
resampling (oversampling or undersampling), or generating synthetic samples can
help address class imbalance issues and improve model performance.
Page | 45
Optimizing Hyperparameters for Model Training:
Hyperparameters such as learning rate, batch size, and optimization algorithm settings
significantly impact the training process and final model performance.
Hyperparameter optimization techniques such as grid search, or Bayesian
optimization help find the optimal combination of hyperparameters for the flower
recognition model.
Understanding Activation Functions in Neural Networks:
Activation functions introduce non-linearities into neural networks, enabling them to
learn complex patterns in data. Common activation functions include ReLU (Rectified
Linear Unit), sigmoid, and tanh. Understanding how different activation functions
work and their effects on model training helps in designing more effective neural
network architectures.
Exploring Different Loss Functions for Image Classification Tasks:
Loss function quality the difference between the predicted outputs of the model and
the ground truth labels during training. Common loss functions for image
classification tasks include categorical cross-entropy, binary cross-entropy, and mean
squared error. Choosing an appropriate loss function based on the nature of the task
and the output distribution is crucial for effective model training.
Interactive User Interface for Flower Identification:
Creating a user-friendly and intuitive interface that guides users through the flower
recognition process, providing feedback and suggestions for improved user
engagement.
Implementing Continuous Integration and Continuous Deployment (CI/CD)
Pipelines:
Setting up automated CI/CD pipelines to streamline the development, testing and
deployment of the flower recognition system, ensuring rapid interaction and delivery
of updates.
Exploring Federated Learning for Collaborative Model Training:
Investigating federated learning techniques to train the flower recognition model
collaboratively across multiple devices or servers while preserving data privacy and
security.
Deploying Flower Recognition Models on Edge Devices:
Optimizing and deploying flower recognition models directly on edge devices such as
smartphones, cameras or IoT devices to enable real-time inference and offline.
Page | 46
Implementing Multi-Modal Fusion for Improved Recognition Accuracy:
Integrating multiple modalities such as images, text descriptions and metadata to
enhance the accuracy and robustness of the flower recognition system, especially in
challenging scenarios.
Addressing Ethical and Bias Considerations in Flower Recognition:
Considering ethical implications and potential biases in the training data, algorithms
and deployment of the flower recognition system to ensure fairness and inclusivity.
Exploring Reinforcement Learning for Adaptive Model Behavior:
Investigating reinforcement learning techniques to enable the flower recognition
system to adapt and improve its performance over time based on user interactions and
feedback.
Implementing Real-Time Object Tracking for Dynamic Environments:
Enhancing the flower recognition system with real-time object tracking capabilities to
detect and track moving in dynamic environments such as gardens or parks.
Exploring Domain Adaption Techniques for Cross-Domain Recognition:
Investigating domain adaption techniques to transfer knowledge from one domain
(e.g. laboratory conditions) to anchor (e.g. outdoor environments) for improved cross-
domain recognition performance.
Integrating Voice-Assisted Flower Recognition for Accessibility:
Implementing voice-assisted flower recognition features that enable users to verbally
describe flowers or ask questions, improving accessibility for users with disabilities or
those who prefer voice interactions.
Page | 47
Participation in Google AI ML Internship Program
The statement implies that the speaker or their team participated in an internship
program offered by Google focused on AI and ML. Badges as Milestones or
Achievements:
Within the internship program, badges likely represent milestones or achievements
attained by the participants. These badges serve as recognition of completing specific
units or demonstrating proficiency in certain skills or concepts.
Six Badges Earned:
The statement indicates that the speaker or their team earned a total of six badges
throughout the duration of the internship program. This suggests active participation
and accomplishment within the various components of the program.
Badge Achievement for Each Unit:
For each unit or module completed within the program, the speaker or their team
earned a badge. This implies that there were multiple units or topics covered in the
program, and participants were recognized for their proficiency in each area.
Recognition of Knowledge and Skills:
Earning badges in the Google AI ML internship program signifies the speaker on their
team’s acquisition of knowledge and skills in AI and ML. It demonstrates their
commitment to learning and proficiency in the subject manner.
Page | 48
PRACTICAL APPLICATIONS AND CASE STUDIES
Healthcare
Examples of AI and ML applications in healthcare, including disease diagnosis,
personalized treatment plans, medical imaging analysis, and drug discovery.
Finance
Case studies demonstrating the use of AI and ML in finance, such as fraud detection, risk
assessment, algorithmic trading, and customer service chatbots.
Autonomous Vehicles
Overview of how AI and ML technologies enable autonomous vehicles to perceive their
environment, make decisions, and navigate safely.
Ethical Considerations and Future
Trends Ethical Implications of AI and
ML
Discussion of ethical challenges such as bias in algorithms, privacy concerns, job
displacement, and the need for transparency and accountability in AI systems.
Future Trends in AI and ML
Exploration of emerging trends and advancements in AI and ML, including explainable AI
(XAI), federal learning, AI-driven creativity, and AI ethics framework.
Conclusion
Recap of the importance of AI and ML in various domains, along with a call to address
ethical concerns and embrace responsible AI development.
Page | 49
CONCLUSION
Page | 50
REFERENCES
How is the Bayesian classifier being different from MLE classifier?
https://medium.com/@mark.rethana/bayesian-statistics-and-naive-bayes-classifier-
33b735ad7b16#:~:text=Instead%20of%20MLE%20the%20Bayesian,probability)
%20and%20MLE%20maximizes%20likelihood.
Cross validation – need for test set:
https://datascience.stackexchange.com/questions/123130/for-cross-validation-
should-i-use-training-set-or-whole- dataset#:~:text=It's%20generally
%20recommended%20to%20use,set%20into%20 the%20training%20process.
Why LASSO shrinkage works:
https://stats.stackexchange.com/questions/179864/why-does-shrinkage-
work#:~:text=Consider%20a%20lasso%20model%20with,of%20zero%20has%20
zero%20variance.
Why is gradient descent or optimization methods required at all if cost function
minima can be found directly say by using linear algebra or differentiation?
https://stats.stackexchange.com/questions/212619/why-is-gradient-descent-
required
Introduction to Matrix Decomposition:
https://math.stackexchange.com/questions/tagged/matrix-decomposition
How to compare ML modes:
https://datascience.stackexchange.com/questions/90175/comparing-ml-models-to-
baselines
Difference between generative and discriminative algorithm:
https://stackoverflow.com/questions/879432/what-is-the-difference-between-a-
generative-and-a-discriminative- algorithm#:~:text=A%20generative
%20algorithm%20models%20how,simply%20 categorizes%20a%20given
%20signal.
ML crash course by google: https://developers.google.com/machine-learning/crash-
course
Page | 51