Chapter 1: Introduction to Computer Vision and Image Processing
Overview: Computer Imaging
Definition
of computer imaging:
Acquisition and processing of visual information by computer.
Why
is it important?
Human primary sense is visual sense. Information can be conveyed well through images (one picture worth a thousand words). Computer is required because the amount of data to be processed is huge.
Overview: Computer Imaging
Computer
imaging can be divided into two main categories:
Computer Vision: applications of the output are for use by a computer. Image Processing: applications of the output are for use by human.
These
two categories are not totally separate and distinct.
Overview: Computer Imaging
They
overlap each other in certain areas.
COMPUTER IMAGING
Computer Vision
Image Processing
Computer Vision
Does
not involve human in the visual loop. One of the major topic within this field is image analysis (Chapter 2). Image analysis involves the examination of image data to facilitate in solving a vision problem.
Computer Vision
Image
analysis process involves two other
topics:
Feature extraction: acquiring higher level image info (shape and color) Pattern classification: using higher level image information to identify objects within image.
Computer Vision
Most
computer vision applications involve tasks that:
Are tedious for people to perform. Require work in a hostile environment. Require a high processing rate. Require access and use of a large database of information.
Computer Vision
Examples
of applications of computer vision:
Quality control (inspect circuit board). Hand-written character recognition. Biometrics verification (fingerprint, retina, DNA, signature, etc). Satellite image processing. Skin tumor diagnosis. And many, many others.
Image Processing
Processed
images are to be used by human.
Therefore, it requires some understanding on how the human visual system operates.
Among
the major topics are:
Image restoration (Chapter 3). Image enhancement (Chapter 4). Image compression (Chapter 5).
Image Processing
Image
restoration:
The process of taking an image with some know, or estimated degradation, and restoring it to its original appearance. Done by performing the reverse of the degradation process to the image. Examples: correcting distortion in the optical system of a telescope.
Image Processing
An Example of Image Restoration
Image Processing
Image
enhancement:
Improve an image visually by taking an advantage of human visual systems response. Example: improve contrast, image sharpening, and image smoothing.
Image Processing
An Example of Image Enhancement
Image Processing
Image
compression:
Remove the amount of data required to represent an image by:
Removing
unnecessary data that are visually unnecessary. Taking advantage of the redundancy that is inherent in most images.
Example: JPEG, MPEG, etc.
Computer Imaging Systems
Computer
imaging systems comprises of both hardware and software. The hardware components can be divided into three subsystems:
The computer Image acquisition: camera, scanner, video recorder. Image display: monitor, printer, film, video player.
Computer Imaging Systems
The
software is used for the following tasks:
Manipulate the image and perform any desired processing on the image data. Control the image acquisition and storage process.
The
computer system may be a generalpurpose computer with a frame grabber or image digitizer board in it.
Computer Imaging Systems
Frame
grabber is a special purpose piece of hardware that digitizes standard analog video signal. Digitization of analog video signal is important because computers can only process digital data.
Computer Imaging Systems
Digitization
is done by sampling the analog signal or instantaneously measuring the voltage of the signal at fixed interval in time. The value of the voltage at each instant is converted into a number and stored. The number represents the brightness of the image at that point.
Computer Imaging Systems
The
grabbed image is now a digital image and can be accessed as a two dimensional array of data.
Each data point is called a pixel (picture element).
The
following notation is used to express a digital image:
I(r,c) = the brightness of the image at point (r,c) where r = row and c = column.
The CVIPtools Software
CVIPtools
software contains C functions to perform all the operations that are discussed in the text book. It also comes with an application with GUI interface that allows you to perform various operations on an image.
No coding is needed. Users may vary all the parameters. Results can be observed in real time.
The CVIPtools Software
It
is available from:
The CD-ROM that comes with the book. [Link]
Human Visual Perception
Human
perception encompasses both the physiological and psychological aspects. We will focus more on physiological aspects, which are more easily quantifiable and hence, analyzed.
Human Visual Perception
Why
study visual perception?
Image processing algorithms are designed based on how our visual system works. In image compression, we need to know what information is not perceptually important and can be ignored. In image enhancement, we need to know what types of operations that are likely to improve an image visually.
The Human Visual System
The
human visual system consists of two primary components the eye and the brain, which are connected by the optic nerve.
Eye receiving sensor (camera, scanner). Brain information processing unit (computer system). Optic nerve connection cable (physical wire).
The Human Visual System
The Human Visual System
This
is how human visual system works:
Light energy is focused by the lens of the eye into sensors and retina. The sensors respond to the light by an electrochemical reaction that sends an electrical signal to the brain (through the optic nerve). The brain uses the signals to create neurological patterns that we perceive as images.
The Human Visual System
The
visible light is an electromagnetic wave with wavelength range of about 380 to 825 nanometers.
However, response above 700 nanometers is minimal.
We
cannot see many parts of the electromagnetic spectrum.
The Human Visual System
The Human Visual System
The
visible spectrum can be divided into three bands:
Blue (400 to 500 nm). Green (500 to 600 nm). Red (600 to 700 nm).
The
sensors are distributed across retina.
The Human Visual System
The Human Visual System
There
are two types of sensors: rods and cones. Rods:
For night vision. See only brightness (gray level) and not color. Distributed across retina. Medium and low level resolution.
The Human Visual System
Cones:
For daylight vision. Sensitive to color. Concentrated in the central region of eye. High resolution capability (differentiate small changes).
The Human Visual System
Blind
spot:
No sensors. Place for optic nerve. We do not perceive it as a blind spot because the brain fills in the missing visual information.
Why
does an object should be in center field of vision in order to perceive it in fine detail?
This is where the cones are concentrated.
The Human Visual System
Cones
have higher resolution than rods because they have individual nerves tied to each sensor. Rods have multiple sensors tied to each nerve. Rods react even in low light but see only a single spectral band. They cannot distinguish color.
The Human Visual System
The Human Visual System
There
are three types of cones. Each responding to different wavelengths of light energy. The colors that we perceive are the combined result of the response of the three cones.
The Human Visual System
Spatial Frequency Resolution
To
understand the concept of spatial frequency, we must first understand the concept of resolution. Resolution: the ability to separate two adjacent pixels.
If we can see that two adjacent pixels as being separate, then we can say that we can resolve the two.
Spatial Frequency Resolution
Spatial
frequency: how rapidly the signal changes in space.
Spatial Frequency Resolution
If
we increase the frequency, the stripes get closer until they finally blend together.
Spatial Frequency Resolution
The distance between eye and image also affects the resolution.
The farther the image, the worse the resolution. The number of pixels per square inch on a display device must be large enough for us to see an image as being realistic. Otherwise we will end up seeing blocks of colors. There is an optimum distance between the viewer and the display device.
Why is this important?
Spatial Frequency Resolution
Limitations
of visual system in resolution are due to both optical and neural factor.
We cannot resolve things smaller than the individual sensor. Lens has finite size, which limits the amount of light it can gather. Lens is slightly yellow (which progresses with age); limits eyes response to certain wavelength of light.
Spatial Frequency Resolution
Spatial
resolution is affected by the average background brightness of the display. In general, we have higher spatial resolution at brighter levels. The visual system has less spatial resolution for color information that has been decoupled from the brightness information.
Spatial Frequency Resolution
Brightness Adaptation
The
vision system responds to a wide range of brightness levels. The perceived brightness (subjective brightness) is a logarithmic function of the actual brightness.
However, it is limited by the dark threshold (too dark) and the glare limit (too bright).
Brightness Adaptation
We
cannot see across the entire range at any one time. But our system will adapt to existing light condition. The pupil varies its size to control the amount of light coming into the eye.
Brightness Adaptation
Brightness Adaptation
It
has been experimentally determined that we can detect only about 20 changes in brightness in a small area within a complex image. However, for an entire image, about 100 gray levels are necessary to create a realistic image.
Due to brightness adaptation of our visual system.
Brightness Adaptation
If
fewer gray levels are used, we will observe false contours (bogus line). This resulted from gradually changing light intensity not being accurately presented.
Brightness Adaptation
Image with 8 bits/pixel (256 gray levels no false contour)
Image with 3 bits/pixel (8 gray levels contain false contour)
Brightness Adaptation
An
interesting phenomena that our vision system exhibits related to brightness is called the Mach Band Effect. This creates an optical illusion. When there is a sudden change in intensity, our vision system response overshoots the edge.
Brightness Adaptation
This
accentuates edges and helps us to distinguish and separates objects within an image. Combined with our brightness adaptation response, this allows us to see outlines even in dimly lit areas.
Brightness Adaptation
An illustration of the Mach Band Effect. Observe the edges between the different brightness. The edges seem to be a bit stand out compared to the rest of the image.
Brightness Adaptation
Brightness Adaptation
Temporal Resolution
Related
to how we respond to visual information as a function of time.
Useful when considering video and motion in images. Can be measured using flicker sensitivity.
Flicker
sensitivity refers to our ability to observe a flicker in a video signal displayed on a monitor.
Temporal Resolution
Temporal Resolution
The
cutoff frequency is about 50 hertz (cycles per second).
We will not perceive any flicker for a video signal above 50Hz. TV uses frequency around 60Hz.
The
brighter the lighting, the more sensitive we are to changes.
Image Representation
Digital
image I(r, c) is represented as a twodimensional array of data. Each pixel value corresponds to the brightness of the image at point (r, c). This image model is for monochrome (one color, or black and white) image data.
Image Representation
Multiband
images (color, multispectral) can be modeled by a different I(r, c) function for each separate band of brightness information. Types of images that will discuss:
Binary Gray-scale Color Multispectral
Binary Images
Takes
only two values:
Black and white (0 and 1) Requires 1 bit/pixel
Used
when the only information required is shape or outline info. For example:
To position a robotic gripper to grasp an object. To check a manufactured object for deformations. For facsimile (FAX) images.
Binary Images
Binary Images
Binary images are often created from gray-scale images via a threshold operation.
White (1) if pixel value is larger than threshold. Black (0) if it is less.
Gray-Scale Images
Also
referred to as monochrome or one-color images. Contain only brightness information. No color information. Typically contain 8 bits/pixel data, which corresponds to 256 (0 to 255) different brightness (gray) levels.
Gray-Scale Images
Why
8 bits/pixel?
Provides more than adequate brightness resolution. Provides a noise margin by allowing approximately twice gray levels as required. Byte (8-bits) is the standard small unit in computers.
Gray-Scale Images
However,
there are applications such as medical imaging or astronomy that requires 12 or 16 bits/pixel.
Useful when a small section of the image is enlarged. Allows the user to repeatedly zoom a specific area in the image.
Color Images
Modeled
as three band monochrome image
data. The values correspond to the brightness in each spectral band. Typical color images are represented as red, green and blue (RGB) images.
Color Images
Using
the 8-bit standard model, a color image would have 24 bits/pixel.
8-bits for each of the three color bands (red, green and blue).
Color Images
For many applications, RGB is transformed to a mathematical space that decouples (separates) the brightness information from color information. The transformed images would have a:
1-D brightness or luminance. 2-D color space or chrominance.
This creates a more people-oriented way of describing colors.
Color Images
One
example is the hue/saturation/lightness (HSL) color transform.
Hue: Color (green, blue, orange, etc). Saturation: How much white is in the color (pink is red with more white, so it is less saturated than pure red). Lightness: The brightness of the color.
Color Images
Most
people can relate to this method of describing color.
A deep, bright orange would have a large intensity (bright), a hue of orange and a high value of saturation (deep). It is easier to picture this color in mind. If we define this color in terms of RGB component, R = 245, G = 110, B = 20, we have no idea how this color looks like.
Color Images
In
addition to HSL, there are various other formats used for representing color images:
YCrCb SCT (Spherical Coordinate Transform) PCT (Principle Component Transform) CIE XYZ L*u*v L*a*b
Color Images
One
color space can be converted to another color space by using equations. Example: Converting RGB color space to YCrCb color space.
Multispectral Images
Typically
contain information outside normal human perceptual range.
Infrared, ultraviolet, X-ray, acoustic or radar data.
They
are not really images in usual sense (not representing scene of physical world, but rather information such as depth). Values are represented in visual form by mapping the different spectral bands to RGB.
Multispectral Images
Sources
include satellite system, underwater sonar system, airborne radar, infrared imaging systems, and medical diagnostic imaging systems. The number of bands into which the data are divided depends on the sensitivity of the imaging sensory.
Multispectral Images
Most
satellite images contain two to seven spectral bands.
One to three in the visible spectrum. One or more in the infrared region.
Newest
satellites have sensors that collect image information in 30 or more bands. Due to the large amount of data involved, compression is essential.
Digital Image File Formats
There
are many different types of image file formats. This is because:
There are many different types of images and applications with varying requirements. Lack of coordination within imaging industry.
Images
can be converted from one format to another using image conversion software.
Digital Image File Formats
Types
of image data are divided into two categories:
Bitmap (raster) images: where we have pixel data and the corresponding brightness values stored in some file format. Vector images: methods of representing lines, curves and shapes by storing only the key points. The process of turning the key points into an image is called rendering.
Digital Image File Formats
Most
of the file formats to be discussed fall under the category of bitmap images. Some of the formats are compressed.
The I(r, c) values are not available until the file is decompressed.
Bitmap
image files must contain both header information and the raw pixel data.
Digital Image File Formats
The
header contain information regarding:
The number of rows (height) The number of columns (width) The number of bands The number of bits per pixel The file type Type of compression used (if applicable)
Digital Image File Formats
BIN
format:
Only contain the raw data I(r, c) and no header. Users must know the necessary parameters beforehand.
PPM
format:
Contain raw image data with a simple header. PBM (binary), PGM (gray-scale), PPM (color) and PNM (handles any of the other types).
Digital Image File Formats
GIF
(Graphics Interchange Format):
Commonly used in WWW. Limited to a maximum of 8 bits/pixel (256 colors). The bits are used as an input to a lookup table. Allow for a type of compression called LZW. Image header is 13 bytes long.
Digital Image File Formats
TIFF
(Tagged Image File Format):
Allows a maximum of 24 bits/pixel. Support several types of compression: RLE, LZW, and JPEG. Header is of variable size and is arranged in a hierarchical manner. Designed to allow user to customize it for specific applications.
Digital Image File Formats
JFIF
(JPEG File Interchange Format):
Allows images compressed with JPEG algorithm to be used in many different computer platforms. Contains a Start of Image (SOI) and an application (APPO) marker that serves as a file header. Being used extensively in WWW.
Digital Image File Formats
Sun
Raster file format:
Defined to allow for any number of bits per pixel. Supports RLE compression and color lookup tables. Contains 32-byte header, followed by the image data.
Digital Image File Formats
SGI
file format:
Handles up to 16 million colors. Supports RLE compression. Contains 512-byte header, followed the image data. Majority of the bytes in header are not used, presumably for future extension.
Digital Image File Formats
EPS
(Encapsulated PostScript):
Not a bitmap image. The file contains text. It is a language that supports more than just images. Commonly used in desktop publishing. Directly supported by many printers (in the hardware itself). Commonly used for data interchange across hardware and software platforms. The files are very big.