0% found this document useful (0 votes)
29 views11 pages

Unit 5 Computer Vision

Unit 5 focuses on Computer Vision, a branch of artificial intelligence that allows machines to interpret visual information. It covers methodologies, applications, and essential techniques in image processing, enabling students to apply these skills to real-world problems. Key concepts include object detection, feature extraction, and the differences between computer vision and image processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views11 pages

Unit 5 Computer Vision

Unit 5 focuses on Computer Vision, a branch of artificial intelligence that allows machines to interpret visual information. It covers methodologies, applications, and essential techniques in image processing, enabling students to apply these skills to real-world problems. Key concepts include object detection, feature extraction, and the differences between computer vision and image processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit 5: Computer Vision

Title: Computer Vision Approach: Practical Implementation


Summary: Computer Vision is a branch of artificial intelligence that enables machines to
interpret and understand visual information from the real world. This unit provides an in-depth
exploration of various methodologies, and applications in computer vision, equipping students
with the skills necessary to analyze and process visual data.
Objectives:
1. To introduce students to the basic principles and techniques of computer vision.
2. To familiarize students with common algorithms and tools used in image
processing and analysis.
3. To enable students to apply computer vision techniques to solve real-world
problems.
4. To foster critical thinking and problem-solving skills in the domain of computer
vision.
Learning Outcomes:
1. Understand the fundamental concepts and theories underlying computer vision.
2. Implement basic and advanced image processing techniques using
programminglanguages such as Python.
3. Apply computer vision techniques to tasks such as object detection, image
segmentation, and feature extraction.
4. Develop ideas to solve real-world problems leveraging computer vision technologies.
Pre-requisites: Essential understanding of Artificial Intelligence
Key-concepts: Image processing, feature extraction, object detection & recognition.

5.1: Introduction
In the previous chapter, you studied the concepts of Artificial Intelligence for Data Sciences. It
is a concept to unify statistics, data analysis, machine learning and their related methods to
understand and analyse actual phenomena with data.

As we all know, artificial intelligence is a technique that enables computers to m1m1c human
intelligence. As humans, we can see things, analyse them and then do the required action
based on what we see.

But can machines do the same? Can machines have the eyes that humans have? If you
answered yes, then you are right. The Computer Vision domain of Artificial Intelligence, enables

algorithms and methods to analyse actual phenomena with images.

Now before we get into the concepts of Computer Vision, let us experience this domain with
the help of the following game:
* Emoji Scavenger Hunt: https://emojiscavengerhunt.withgoogle.com/

Go to the link and try to play the game Emoji Scavenger Hunt. The challenge here is to find 8 items
within the time limit to pass. Did you manage to win?

What was the strategy that you applied to win this game?

Was the computer able to identify all the items you brought in front of it?

Did the lighting of the room affect the identifying of items by the machine?

A Quick Overview of Computer Vision!


Computer vision is the process of extraction of information from images, text, videos, etc.
A system that can process, analyze and make sense of visual data in the same way as humans do.
Human Vision System

Elephant

Eye

Brain

Sensing
Device

Interpreting Device
Computer Vision and Artificial Intelligence
Computer vision is a field of artificial intelligence (AI).
AI enables computers to think, and computer vision enables AI to see, observe and make sense
of visual data(like images & videos).

Artificial Intelligence

Computer Deep Machine


Vision Learnin Learning

Computer Vision Vs. Image Processing


Computer Vision Image Processing
Computer vision deals with extracting Image processing is mainly focused on
information from the input images or processing the raw input images to
videos toinfer meaningful information and enhance them or preparing them to do
understanding them to predict the visual other tasks
input Image Processing is a subset of
Computer Vision is a superset of ComputerVision.
ImageProcessing. Examples - Rescaling image, Correcting
Examples - Object detection, brightness, Changing tones, etc.
Handwritingrecognition, etc.

5.1 Applications of Computer Vision


The concept of computer vision was first introduced in the 1970s. All these new
applications of computer vision excited everyone. Having said that, computer vision
technology advanced enough to make these applications available to everyone at ease
today. However, in recent years the world witnessed a significant leap in technology that
has put computer vision on the priority list of many industries. Let us look at some of
them:

Facial Recognition*: With the advent of smart cities and


smart homes, Computer Vision plays a vital role in making the
home smarter. Security being the most important application
involves the use of Computer Vision for facial recognition. It
can be either guest recognition or log maintenance of the
visitors. It also finds its application in schools for an
attendance system based on facial recognition of students.
Face Filters*: Modern-day apps like lnstagram and Snapchat
have a lot of features based on the
usage of computer vision. The application of face filters is one
among them. Through the camera, the machine or the
algorithm is able to identify the facial dynamics of the person
and applies the facial filter selected.

Google's Search by Image*: The maximum


amount of searching for data on Google's search
engine comes from textual data, but at the same
time it has an interesting feature of getting
search results through an image. This uses
Computer Vision as it compares different
features of the input image to the database of
images and gives us the search result while at the
NOTE DOWN THE PICTURES GIVEN same time analysing various features of the
ALONG WITH THE APPLICATION. image.

Computer Vision in Retail*: The retail field has been one


of the fastest-growing fields and at the same time is using
Computer Vision for making the user experience more
fruitful. Retailers can use Computer Vision techniques to
track customers' movements through stores, analyse
navigational routes and detect walking patterns.

Inventory Management is another such application.


Through security camera image analysis, a Computer
Vision algorithm can generate a very accurate estimate of
the items available in the store. Also, it can analyse the
use of shelf space to identify suboptimal configurations
and suggest better item placement.

Self-Driving Cars: Computer Vision is the


fundamental technology behind the
development of autonomous vehicles. Most
leading car manufacturers in the world are
reaping the benefits of investing in artificial
intelligence for developing on-road versions of
hands-free technology. This involves the process
of identifying the objects, getting navigational
routes and also at the same time environment
monitoring.
Medical Imaging*: For the last decades,
computer supported medical imaging
application has been a trustworthy help for
physicians. It doesn't only create and analyse
images, but also becomes an assistant and
helps doctors with their interpretation. The
application is used to read and convert 20 scan
images into interactive 30 models that enable
medical professionals to gain a detailed
understanding of a patient's health condition.

Google Translate App*: All you need to do to read signs in a


foreign language is to point your phone's camera at the
words and let the Google Translate app tell you what it
means in your preferredlanguage almost instantly. By using
optical character recognition to see the image and
augmented reality to overlay an accurate translation, this is
a convenient tool that uses Computer Vision

5.2 Computer Vision Tasks

The various applications of Computer Vision are based on a certain number of tasks that are
performed to get certain information from the input image which can be directly used for
predictionor forms the base for further analysis. The tasks used in a computer vision application are:

Objects Objects

Detection
STUDY THESE DEFENITIONS
Classification

The image Classification problem is the task of assigning an input image one label from a fixed
set ofcategories. This is one of the core problems in CV that, despite its simplicity, has a large
variety of practical applications.

Classification+ Localisation

This is the task that involves both processes of identifying what object is present in the image
and at the same time identifying at what location that object is present in that image. It is
used only for single objects.

Object Detection

Object detection is the process of finding instances of real-world objects such as faces,
bicycles, and buildings in images or videos. Object detection algorithms typically use
extracted features andlearning algorithms to recognize instances of an object category. It is
commonly used in applications such as image retrieval and automated vehicle parking
systems.

Instance Segmentation

Instance Segmentation is the process of detecting instances of the objects, giving them a
category, andthen giving each pixel a label based on that. A segmentation algorithm takes an
image as input and outputs a collection of regions (or segments).

IDENTIFY TESE PICTURES FOR EACH TASK


Basics of Images

We all see a lot of images around us and use them daily either through our mobile phones or
computer system. But do we ask some basic questions to ourselves while we use them on
regular basis?

Don't know the answer yet? Don't worry, in this section, we will study the basics of an image:

Basics of Pixels

The word "pixel" means a picture element. Every photograph, in digital form, is made
up of pixels. They are the smallest unit of information that make up a picture. Usually
round or square, they are typically arranged in a 2-dimensional grid.

In the image below, one portion has been magnified many times over so that you can see
its composition in pixels. As you can see, the pixels approximate the actual image. The
more pixels you have, the more closely the image resembles the original.
Resolution
The number of pixels in an image is sometimes called the resolution. When the term is used to
describepixel count, one convention is to express resolution as the width by the height, for
example, a monitor resolution of 1280x1024. This means there are 1280 pixels from one side
to the other, and 1024 from top to bottom.

Another convention is to express the number of pixels as a single number, like a 5 mega pixel
camera (a megapixel is a million pixels). This means the pixels along the width multiplied by
the pixels along the height of the image taken by the camera equals 5 million pixels. In the
case of our 1280x1024 monitors, it could also be expressed as 1280 x 1024 = 1,310,720, or 1.31
megapixels.

Pixel value

Each of the pixels that represent an image stored inside a computer has a pixel value that
describes how bright that pixel is, and/or what colour it should be. The most common pixel
format is the byte image, where this number is stored as an 8
8-bit integer giving a range of
possible values from Oto 255.Typically, zero is to be taken as no colour or black and 255 is
taken to be full colour or white. Why do we have a value of 255?
In computer systems, computer data is in the form of ones and zeros,
which we call the binary system. Each bit in a computer system can
have either a zero or a one. Since each pixel uses 1 byte of an image,
which is equivalent to 8 bits of data. Since each bit can have two
possible values which tell us that the 8 bits can have 255
possibilities of values that starts from 0 and ends at 255.
Here ^, represents exponent
(2 raised to the power 8)

Grayscale Images
Grayscale images are images that have a range of shades of gray without apparent colour. The
darkest possible shade is black, which is the total absence of colour or zero value of pixel. The
lightest possible shade is white, which is the total presence of colour or 255 value of a
pixel. Intermediate shades of gray are represented by equal brightness levels of the three
primary colours.

A grayscale has each pixel of size 1 byte having a single plane of 2d array of pixels. The size of
a grayscale image is defined as the Height x Width of that image.

Let us look at an image to understand grayscale images.


Here is an example of a grayscale image. As you check, the value of pixels is within the
range of 0-255.The computers store the images we see in the form
of these numbers.

RGB Images

All the images that we see around us are coloured images. These images are
made up of three primary colours Red, Green, and Blue.
All the colours that are present can be made by combining different
intensities of red, green, and blue.

Let us experience!

Go to this online link https://www.w3schools.com/colors/colors_rgb.asp. On the basis of


this online tool,try and answer all the below mentioned questions.

1) What is the output colour when you put R=G=B=255?

2) What is the output colour when you put R=G=B=0?

3) How does the colour vary when you put either of the three as 0 and then keep on varying theother
two?
4) How does the output colour change when all the three colours are varied in same proportion?

5) What is the RGB value of your favourite colour from the colour palette?

Were you able to answer all the questions? If yes, then you would have understood how
every colour we see around is made.

Now the question arises, how do computers store RGB images? Every RGB image is stored
in the form of three different channels called the R channel, G channel, and the B channel.

Each plane separately has many pixels with each pixel value varying from Oto 255. All the
three planes when combined form a colour image. This means that in an RGB image, each
pixel has a set of three different values which together give colour to that particular pixel.

For Example,

As you can see, each colour image is stored in the form of three different channels, each
having different intensity. All three channels combine to form a colour we see.

In the above given image, if we split the image into three different channels, namely Red (R), Green
{G) and Blue (B), the individual layers will have the following intensity of colours of the
individual pixels. These individual layers when stored in the memory looks like the image on
the extreme right. The images look in the grayscale image because each pixel has a value
intensity of O to 255 and as studied earlier, 0 is considered as black or no presence of colour
and 255 means white or full presence of colour. These three individual RGB values when
combined form the colour of each pixel.

Therefore, each pixel in the RGB image has three values to form the complete colour.
Task:
Go to the following link www.piskelapp.com and create your pixel art. Try and make a GIF
using the online app for your pixel art.

5.3 No-Code AI Tools:


Introduction to Lobe
Lobe.ai is an Auto-ML tool, which means that it is a no-code AI tool
It works with image classification and allows a set of images with labels and will

automatically find the most optimal model to classify the images

Introduction to Teachable Machine


Teachable Machine is an AI, Machine Learning, and Deep Learning tool that was
developed by Google in 2017
It runs on top of tensorflow.js which was also developed by the same company
It is a web-based tool that allows training of a model based on different images, audio,
orposes given as input through webcam or pictures

Activity Time: Build a Smart Sorter


Purpose: Using CV is to automate and enhance sorting processes through computer
vision technology.

You might also like