0% found this document useful (0 votes)
30 views8 pages

Computer Vision Notes

Computer Vision is a domain of Artificial Intelligence that enables machines to interpret and analyze visual data through various applications such as facial recognition, self-driving cars, and medical imaging. It involves tasks like image classification, object detection, and instance segmentation, which help in extracting meaningful information from images. The technology is increasingly utilized in industries like retail and agriculture to enhance efficiency and decision-making.

Uploaded by

jmjacejatrin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views8 pages

Computer Vision Notes

Computer Vision is a domain of Artificial Intelligence that enables machines to interpret and analyze visual data through various applications such as facial recognition, self-driving cars, and medical imaging. It involves tasks like image classification, object detection, and instance segmentation, which help in extracting meaningful information from images. The technology is increasingly utilized in industries like retail and agriculture to enhance efficiency and decision-making.

Uploaded by

jmjacejatrin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT 6

COMPUTER VISION

Q1. Define Computer Vision


The Computer Vision domain of Artificial Intelligence enables machines to see through images or
visual data, process and analyse them based on algorithms and methods to analyse actual
phenomena with images.

Q2. What is the difference between computer vision and Image processing?

Q3. Explain in detail the application of Computer Vision.


Facial Recognition
With the advent of smart cities and smart homes, Computer Vision plays a vital role in making the
home smarter. Security being the most important application involves the use of Computer Vision
for facial recognition. It can be either guest recognition or log maintenance of the visitors. It also
finds its application in schools for an attendance system based on facial recognition of students.
Face Filters
Modern-day apps like lnstagram and Snapchat have a lot of features based on the usage of computer
vision. The application of face filters is one among them. Through the camera, the machine or the
algorithm is able to identify the facial dynamics of the person and applies the facial filter selected.
Google’s Search by Image
The maximum amount of searching for data on Google's search engine comesfrom textual data, but
at the same time it has an interesting feature of getting search results through an image. This uses
Computer Vision as it compares different features of the input image to the database of images and
gives us the search result while at the same time analysing various features of the image.
Computer Vision in Retail
One of the industries with the quickest growth is retail, which is also utilising computer vision to
improve the user experience. Retailers can analyse navigational routes, find walking patterns, and
track customer movements through stores using computer vision techniques.
Self-Driving Cars
Vision is the fundamental technology behind developing autonomous vehicles. Most leading car
manufacturers in the world are reaping the benefits of investing in artificial intelligence for
developing on-road versions of hands-free technology. This involves the process of identifying the
objects, getting navigational routes and also at the same time environment monitoring
Medical Imaging
A reliable resource for doctors over the past few decades has been computer-supported medical
imaging software. It doesn’t just produce and analyse images; it also works as a doctor’s helper to
aid in interpretation. The software is used to interpret and transform 2D scan photos into interactive
3D models that give medical professionals a thorough insight of a patient’s health.
Google Translate App
To read signs written in a foreign language, all you must do is point the camera on your phone at the
text, and the Google Translate software will very immediately translate them into the language of
your [Link] using optical character recognition to see the image and augmented reality to overlay
an accurate translation, this is a convenient tool that uses Computer Vision.

Q4. How do Computer Vision helps in Inventory Management?


In Inventory Management through security camera image analysis, a Computer Vision algorithm can
generate a very accurate estimate of the items available in the store. Also, it can analyse the use of
shelf space to identify suboptimal configurations and suggest better item placement.

Q5. Explain the different tasks used in a computer vision application?


The various applications of Computer Vision are based on a certain number of tasks which are
performed to get certain information from the input image which can be directly used for prediction
or forms the base for further analysis. The tasks used in a computer vision application are :

Classification Image: Classification problem is the task of assigning an input image one label from a
fixed set of categories. This is one of the core problems in CV that, despite its simplicity, has a large
variety of practical applications.
Classification + Localisation: This is the task which involves both processes of identifying what object
is present in the image and at the same time identifying at what location that object is present in
that image. It is used only for single objects.
Object Detection: Object detection is the process of finding instances of real-world objects such as
faces, bicycles, and buildings in images or videos. Object detection algorithms typically use extracted
features and learning algorithms to recognize instances of an object category. It is commonly used
in applications such as image retrieval and automated vehicle parking systems.
Instance Segmentation: Instance Segmentation is the process of detecting instances of the objects,
giving them a category and then giving each pixel a label based on that. A segmentation algorithm
takes an image as input and outputs a collection of regions (or segments).

Q6. Explain the term


a. Pixels
The word “pixel” means a picture element. Every photograph, in digital form, is made up of pixels.
They are the smallest unit of information that make up a picture. Usually round or square, they are
typically arranged in a 2-dimensional [Link] more pixels you have, the more closely the image
resembles the original.
b. Pixel value
Each of the pixels that make up an image that is stored on a computer has a pixel value that specifies
its brightness and/or intended color. The byte image, which stores this number as an 8-bit integer
with a possible range of values from 0 to 255, is the most popular pixel [Link] is typically used
to represent no color or black, and 255 is used to represent full color or white.

c. Resolution
The resolution of an image is occasionally referred to as the number of pixels. One approach is to
define resolution as the width divided by the height when the phrase is used to describe the number
of pixels, for example, a monitor resolution of 1280×1024. Accordingly, there are 1280 pixels from
side to side and 1024 pixels from top to bottom.
Another convention is to express the number of pixels as a single number, like a 5 mega pixel camera
(a megapixel is a million pixels). This means the pixels along the width multiplied by the pixels along
the height of the image taken by the camera equals 5 million pixels. In the case of our 1280x1024
monitors, it could also be expressed as 1280 x 1024 = 1,310,720, or 1.31 megapixels.
d. Grayscale Images
Grayscale images are images which have a range of shades of grey without apparent colour. The
darkest possible shade is black, which is the total absence of colour or zero value of pixel. The lightest
possible shade is white, which is the total presence of colour or 255 value of a pixel. Intermediate
shades of grey are represented by equal brightness levels of the three primary colours. A grayscale
has each pixel of size 1 byte having a single plane of 2d array of pixels. The size of a grayscale image
is defined as the Height x Width of that image.
e. RGB Images
Every image we encounter is a coloured image. Three main colors—Red, Green, and Blue—make up
these graphics. Red, green, and blue can be combined in various intensities to create all the colours
that are visible.
g. OpenCV
• OpenCV or Open-Source Computer Vision Library is that tool which helps a computer extract
the features from the images. It is used for all kinds of images and video processing and
analysis.
• It is capable of processing images and videos to identify objects, faces, or even handwriting.
• OpenCV for basic image processing operations on images such as resizing, cropping and
many more.

Q7. How do computers store RGB images?


• Every RGB image is stored in the form of three different channels called the R channel, G
channel and the B channel.
• Each plane separately has several pixels with each pixel value varying from 0 to 255.
• All the three planes when combined form a colour image.
• This means that in a RGB image, each pixel has a set of three different values which together
give colour to that pixel.

Q8. Why do computer systems have a 255-pixel value?


Or
Why do pixel values have numbers?
In the computer systems, computer data is in the form of ones and zeros, which we call the binary
system. Each bit in a computer system can have either a zero or a one. Since each pixel uses 1 byte
of an image, which is equivalent to 8 bits of data. Since each bit can have two possible values which
tells us that the 8 bit can have 255 possibilities of values which starts from 0 and ends at 255.
Q9. Multiple choice Questions

1. Which of the following tasks is an example of computer vision?


a) Rescaling an image
b) Correcting brightness levels in an image
c) Object detection in images or videos
d) Changing tones of an image

2. How is resolution typically expressed?


a) By the number of pixels along the width and height, such as 1280x1024
b) By the brightness level of each pixel, ranging from 0 to 255
c) By the total number of pixels, such as 5 megapixels
d) By the arrangement of pixels in a 2-dimensional grid

3. What is the core task of image classification?


a) Identifying objects and their locations in images
b) Segmenting objects into individual pixels
c) Assigning an input image one label from a fixed set of categories
d) Detecting instances of real-world objects in images

4. Object detection and handwriting recognition are examples of tasks commonly associated with:
a) Computer vision
b) Image processing
c) Both computer vision and image processing
d) Neither computer vision nor image processing

5. What does the pixel value represent in an image?


a) Width of the pixel
b) Brightness or color of the pixel
c) Height of the pixel
d) Resolution of the pixel

6. In the byte image format, what is the range of possible pixel values?
a) 0 to 10
b) 0 to 100
c) 0 to 1000
d) 0 to 255

7. In a grayscale image, what does the darkest shade represent?


a) Total presence of color
b) Zero value of pixel
c) Lightest shade of gray
d) Maximum pixel value

8. In an RGB image, what does a pixel with an intensity value of 0 represent?


a) Full presence of color
b) No presence of color
c) Maximum level of brightness
d) Minimum level of brightness
9. Assertion: Object detection is a more complex task than image classification because it
Involves identifying both the presence and location of objects in an image.
Reasoning: Object detection algorithms need to not only classify the objects present in an
Image but also accurately localize them by determining their spatial extent.
Select the appropriate option for the statements given above:
a) Both A and R are true, and R is the correct explanation of A
b) Both A and R are true, and R is not the correct explanation of A
c) A is true but R is false
d) A is False but R is true

10. Assertion: Grayscale images consist of shades of gray ranging from black to white, were
each pixel is represented by a single byte, and the size of the image is determined by its
height multiplied by its width.
Reasoning: Grayscale images are represented using three intensities per pixel, typically
ranging from 0 to 255.
Select the appropriate option for the statements given above:
a) Both A and R are true, and R is the correct explanation of A
b) Both A and R are true, and R is not the correct explanation of A
c) A is true but R is false
d) A is False but R is true

Q10. Imagine you have a smartphone camera app that can recognize objects. When you point
your camera at a dog, the app identifies it as a dog, analyzing patterns and features in the image.
Behind the scenes, the app's software processes the image, detecting edges, shapes, and colors,
then compares these features to a vast database to make accurate identifications.” Identify the
technology used in the above scenario and explain the way it works.

Answer
The technology used in the scenario is Computer Vision, specifically leveraging Image
Classification and Object Recognition techniques.

Q11. Enlist two smartphone apps that utilize computer vision technology? How have these apps
improved your efficiency or convenience in daily tasks?

Answer of Question Q3

Q12. How an RGB image is different from a grayscale image?

Feature Grayscale Image RGB Image


Color Shades of gray (no color) Full color (Red, Green, Blue)
Channels 1 (Brightness) 3 (R, G, B)
Pixel Size 1 byte (8 bytes) 3 bytes (24 bytes)
Memory Usage Lower (Height x Width) Higher (Height x Width x 3)
Applications Medical imaging, OCR, etc. Photography, video, etc.

• Grayscale images have 1 channel and represent brightness only.


• RGB images have 3 channels and represent full colour.
Q13. Determine the color of a pixel based on its RGB values mentioned below:
(i) R=0, B=0, G=0
(ii) R=255, B=255, G=255
(iii) R=0, B=0, G=255
(iv) R=0, B=255, G=0

(i) R=0, B=0, G=0


• Colour: Black
• Explanation: When all RGB values are 0, it means no colour is present, resulting in black.
(ii) R=255, B=255, G=255
• Colour: White
• Explanation: When all RGB values are at their maximum (255), it represents the full presence
of all colours, resulting in white.
(iii) R=0, B=0, G=255
• Colour: Pure Green
• Explanation: Only the Green channel is at its maximum (255), while Red and Blue are absent
(0), resulting in pure green.
(iv) R=0, B=255, G=0
• Colour: Pure Blue
• Explanation: Only the Blue channel is at its maximum (255), while Red and Green are absent
(0), resulting in pure blue.

Q14. “Imagine you're a researcher tasked with improving workplace safety in a manufacturing
environment. You decide to employ computer vision technology to enhance safety measures.”
a) Real-Time Alerts:
• Integrate the system with alarms, lights, or notifications to alert workers and supervisors of
potential hazards immediately.
b) Data Analytics:
• Collect and analyse data from the computer vision system to identify patterns, recurring
issues, or high-risk areas.
• Use insights to improve safety protocols and training programs.
c) Worker Training:
• Use recorded footage (with privacy considerations) to train workers on safe practices and
highlight common mistakes.
d) Integration with IoT:
• Combine computer vision with IoT devices (e.g., wearable sensors) to enhance safety
monitoring and response.
Q15. Explain the distinctions between image classification, classification with localization, object
detection, and instance segmentation in computer vision tasks. Provide examples for each
to support your answer

Answer of Q5

Q16. “Agriculture is an industry where precision and efficiency are crucial for sustainable
production. Traditional farming methods often rely on manual labor and visual inspection, which
can be time- consuming and error-prone. However, advancements in computer vision technology
offer promising solutions to optimize various agricultural processes. Agricultural drones equipped
with high-resolution cameras and computer vision algorithms are increasingly being used to
monitor crop health, detect diseases, and assess crop yields.” Answer the following questions
based on the case study mentioned above:
How does the integration of computer vision technology with drones improve efficiency in
agricultural practices compared to traditional methods?
1. Faster and Bigger Coverage: Drones can fly over huge fields in a short time, taking pictures
and collecting data. This is much quicker than walking through the fields and checking crops
by hand.
2. More Accurate: Computers can spot tiny problems (like sick plants or pests) that humans
might miss. This helps farmers fix issues before they get worse.
3. Saves Time and Money: Instead of hiring many workers to check crops, drones can do the
job faster and cheaper.
4. Real-Time Help: Drones send data instantly, so farmers can act quickly if there’s a problem,
like watering dry areas or spraying pesticides only where needed.
5. Better Decisions: Drones collect lots of data, helping farmers understand what’s happening
in their fields and make smarter choices to grow healthier crops.
Q17. What are some key indicators or parameters that computer vision algorithms can analyze to
assess crop health and detect diseases?
1. Colour of Leaves: If leaves turn yellow or brown, it might mean the plant is sick or missing
nutrients.
2. Shape and Size: Computers can check if plants are growing properly or if they look smaller
or weird, which could mean a problem.
3. Spots or Damage: If there are spots, holes, or strange marks on leaves, it could be a sign of
disease or pests.
4. Water Levels: Using special cameras, drones can see if plants are too dry or need more water.
5. Weeds: Computers can tell the difference between crops and weeds, so farmers can remove
weeds without harming the crops.
6. Fruit or Grain Count: Drones can estimate how much food (like fruits or grains) the plants
will produce, helping farmers plan better.
Q18. You are tasked with developing a computer vision system for a self-driving car company. The
system needs to accurately detect and classify various objects on the road to ensure safe
navigation. Imagine you're working on improving the object detection algorithm for the self-
driving car's computer vision system. During testing, you notice that the system occasionally
misclassifies pedestrians as cyclists, especially in low-light conditions.
How would you approach addressing this issue? What steps would you take to enhance the
accuracy of pedestrian detection while ensuring the system's overall performance and reliability
on the road?
The Problem
The self-driving car’s computer vision system sometimes confuses pedestrians (people walking) with
cyclists (people riding bikes), especially when it’s dark or the lighting is bad. This is a big problem
because it could make the car less safe.
How to Fix It
1. Understand Why It Happens
• The system might not have enough examples of pedestrians and cyclists in dark conditions
to learn from.
• The camera might struggle to see clearly in low light.
2. Improve the Data
• Collect More Examples: Take more pictures and videos of pedestrians and cyclists in the dark,
at dusk, or in shadows.
• Make the Data Better: Use software to adjust the brightness of images or simulate dark
conditions so the system can learn to handle them.
3. Make the System Smarter
• Use a Better Model: Upgrade the system to a more advanced version that’s better at
detecting objects in the dark.
• Preprocess Images: Brighten dark images or reduce noise (like graininess) so the system can
see better.
4. Test the System
• Simulate Dark Conditions: Test the system in a virtual environment that mimics nighttime or
low-light situations.
• Real-World Testing: Drive the car in real dark conditions to see if it can now correctly identify
pedestrians and cyclists.
5. Add Backup Systems
• Use Other Sensors: Combine data from cameras with other sensors like LiDAR (which uses
lasers) or radar (which uses radio waves) to double-check what the camera sees.
• Set Rules: If the system isn’t sure whether something is a pedestrian or a cyclist, make the
car slow down or stop to be safe.
6. Keep Improving
• Monitor Performance: Keep testing the system in different conditions and collect more data
to make it even better.
• Update Regularly: Keep improving the system as new technology and data become available.

You might also like