0% found this document useful (0 votes)
19 views5 pages

Ai Part B - Unit 5 (CV) Class 10

Computer vision is a field of artificial intelligence that enables systems to process and analyze visual data similarly to humans. Key applications include facial recognition, self-driving cars, and medical imaging, with various tasks such as classification, object detection, and segmentation. The document also explains fundamental concepts like pixels, resolution, grayscale and RGB images, convolution, and image features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

Ai Part B - Unit 5 (CV) Class 10

Computer vision is a field of artificial intelligence that enables systems to process and analyze visual data similarly to humans. Key applications include facial recognition, self-driving cars, and medical imaging, with various tasks such as classification, object detection, and segmentation. The document also explains fundamental concepts like pixels, resolution, grayscale and RGB images, convolution, and image features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Ar ficial Intelligence- Notes

PART B
Unit-5: Computer Vision

Computer vision is the process of extrac on of informa on from images, text, videos, etc.
It is a system that can process, analyze and make sense of visual data in the same way as
humans do.

Computer Vision and Ar ficial Intelligence


Computer vision is a field of ar ficial intelligence (AI).
AI enables computers to think, and computer vision enables AI to see, observe and make
sense of visual data (like images & videos).

Applica ons of Computer Vision


The concept of computer vision was first introduced in the 1970s.
1. Facial Recogni on
2. Face Filters
3. Google's Search by Image
4. Computer Vision in Retail
5. Self-Driving Cars
6. Medical Imaging
7. Google Translate App
Computer Vision Tasks
The various applica ons of Computer Vision are based on a certain number of tasks that are
performed to get certain informa on from the input image which can be directly used for
predic on or forms the base for further analysis. The tasks used in a computer vision applica on
are:

Classifica on
The image Classifica on problem is the task of assigning an input image one label from a
fixed set of categories.
Classifica on+ Localisa on
This is the task that involves both processes of iden fying what object is present in the
image and at the same me iden fying at what loca on that object is present in that image.
It is used only for single objects.
Object Detec on
Object detec on is the process of finding instances of real-world objects such as faces,
bicycles, and buildings in images or videos. Object detec on algorithms typically use
extracted features and learning algorithms to recognize instances of an object category. It is
commonly used in applica ons such as image retrieval and automated vehicle parking
systems.
Instance Segmenta on
Instance Segmenta on is the process of detec ng instances of the objects, giving them a
category, and then giving each pixel a label based on that. A segmenta on algorithm takes
an image as input and outputs a collec on of regions (or segments).
Pixels
The word "pixel" means a picture element. Every photograph, in digital form, is made up of
pixels. They are the smallest unit of informa on that make up a picture. Usually round or
square, they are typically arranged in a 2-dimensional grid.
In the image below, one por on has been magnified many mes over so that you can see its
composi on in pixels. As you can see, the pixels approximate the actual image. The more
pixels you have, the more closely the image resembles the original

Resolu on
The number of pixels in an image is some mes called the resolu on.
Another conven on is to express the number of pixels as a single number, like a 5 mega pixel
camera (a megapixel is a million pixels). This means the pixels along the width mul plied by
the pixels along the height of the image taken by the camera equals 5 million pixels. In the
case of our 1280x1024 monitors, it could also be expressed as 1280 x 1024 = 1,310,720, or
1.31 mega pixels.

Pixel value
Pixel value that describes how bright that pixel is, and/or what colour it should be. The most
common pixel format is the byte image, where this number is stored as an 8-bit integer
giving a range of possible values from O to 255.Typically, zero is to be taken as no colour or
black and 255 is taken to be full colour or white.
Since each pixel uses 1 byte of an image, which is equivalent to 8 bits of data. Since each bit
can have two possible values which tell us that the 8 bits can have 255 possibili es of values
that starts from 0 and ends at 255.
Grayscale Images
Grayscale images are images that have a range of shades of gray without apparent colour.
The darkest possible shade is black, which is the total absence of colour or zero value of
pixel. The lightest possible shade is white, which is the total presence of colour or 255 value
of a pixel. Intermediate shades of gray are represented by equal brightness levels of the
three primary colours.
A grayscale has each pixel of size 1 byte having a single plane of 2d array of pixels. The size of
a grayscale image is defined as the Height x Width of that image.
Let us look at an image to understand grayscale images.
Here is an example of a grayscale image. As you check, the value of pixels is within the range
of 0-255. The computers store the images we see in the form of these numbers.
RGB Images
All the images that we see around us are coloured images. These images are made up of
three primary colours Red, Green, and Blue.
All the colours that are present can be made by combining different intensi es of red, green,
and blue.

Every RGB image is stored in the form of three different channels called the R channel, G
channel, and the B channel.
Each plane separately has many pixels with each pixel value varying from Oto 255. All the
three planes when combined form a colour image. This means that in an RGB image, each
pixel has a set of three different values which together give colour to that par cular pixel.
Convolu on
This process of changing pixel values is the base of image edi ng.
Convolu on is a simple mathema cal opera on that is fundamental to many common image
processing operators. Convolu on provides a way of mul plying together two arrays of
numbers, generally of different sizes, but of the same dimensionality, to produce a third
array of numbers of the same dimensionality.
An (image) convolu on is simply an element-wise mul plica on of image arrays and another
array called the kernel followed by sum.

I= Image Array
K = Kernel Array
I * K = Resul ng array a er performing the convolu on operator
Note: The Kernel is passed over the whole image to get the resul ng array a er convolu on.
What is a Kernel?
A Kernel is a matrix, which is slid across the image and mul plied with the input such that
the output is enhanced in a certain desirable manner. Each kernel has a different value for
different kinds of effects that we want to apply to an image.
Image Features:
In computer vision and image processing, a feature is a piece of informa on that is relevant
for solving the computa onal task related to a certain applica on. Features may be specific
structures in the image such as points, edges, or objects.

You might also like