Computer Vision - Lecture notes all PDF

Title	Computer Vision - Lecture notes all
Course	Computer Vision
Institution	Amrita Vishwa Vidyapeetham
Pages	27
File Size	1.1 MB
File Type	PDF
Total Downloads	95
Total Views	157

Preview

CLICK TO PREVIEW PDF

Summary

Lecture Notes for Computer Vision.
Helpful for Engineering Students...

Description

COMPUTER VISION Computer Vision is a field of Artificial Intelligence(AI) that enables the computer and systems to derive meaningful information from digital images, videos and other visual inputs - and take actions or make recommendations based on that information. If AI enables computers to think, computer vision enables them to see, observe and understand. Eg, Computer vision is necessary to enable self-driving cars. Manufacturers such as Tesla, BMW, Volvo and Audi use multiple cameras, lidar, radar and ultrasonic sensors to acquire images from the environment so that their self-driving cars can detect objects, lane markings, signs and traffic signals to safely drive. APPLICATIONS OF COMPUTER VISION Optical Character Recognition , Machine Inspection, Retail , 3D modelling building(photogrammetry) , Automotive safety , Match move ,Motion capture , Surveillance , Fingerprint recognition and biometrics HISTORY OF COMPUTER VISION ●

Larry Roberts is commonly accepted as the father of computer vision.

●

Computer Vision came into existence during the 1960’s

LEVELS OF HUMAN AND COMPUTER VISION SYSTEM : Low Level Vision : Edge , Corner, Stereo reconstruction Mid Level Vision : Texture, Segmentation and Grouping , illumination High Level Vision :Tracking, Specific Object recognition , Category level object recognition CAMERA PROJECTION Projection : Projection is a technique or process which is used to transform a 3D object into a 2D plane. How to transform a 3D world object to 2D picture ? Try to form a ray between our eyes and a 3D object through a canvas. The ray hit the 3D object and bounced back to the canvas and painted colours of the 3D object to 2D surface. Parts help in transforming picture from 3D to 2D world : Extrinsic Parameters - It includes the camera’s orientation i.e., Rotation(R) , Translation(T). The extrinsic parameters are the camera body configuration. Intrinsic Parameters : Spatial relation between sensor and pinhole (K) , focal length (L). It gives the transformation of the optical parameters.

Process of forming a 3D world object to a 2D picture : Imagine that we are an artist who is going to draw a picture of the world. We’re standing in front of the canvas and looking into the world. The thing we do here is we try to form a ray between the eye and the point in the world that we are seeing and now we pick any points that we want to draw in our canvas. So, this ray that is being

formed keeps the 3D world intersecting the points there and bounds that object back into the canvas and then paints the color of the 3D object into the 2D picture. So, that is the process of forming a 3D world object to a 2D picture. Projection Equation :

3D to 2D image - f=focal length , c = focal distance { distance between

observer and the object} What is Center of projection : Use Calibration Tool Box or Other Method : Take phone and paper and draw some set of radiating lines. Fix a point at the center. Place the phone vertically to paper. Move camera back and forth. At a point lines will look parallel .

PERSPECTIVE PROJECTION Once those closest to us appear larger than the ones that are farther. {Eg : Railway track } Components Of Perspective : ●

Spectator (observer)

●

Picture plane(plane that we are going to draw

●

Horizontal Line (The ray of light that finishes at

perspective image of object) horizontal line) ●

Groundline

PINHOLE CAMERA A pinhole camera is a simple camera without a lens but with a tiny aperture.

Camera that we use in daily life is not a canvas, it is a reverse canvas where the image plane is in the backside of us .This is a pinhole model. How to create Image : By taking Vanishing Point : Vanishing point is obtained by the spectator looking parallel to whatever line he is looking at.

PROPERTIES OF LIGHT Illumination :Property and effect of light. It is the amount of light incident on a surface Luminance : Amount of visible light that come to eye from a surface Blind Spot : Refraction : bending of light rays when passing through a surface between one transparent material to another Reflectance : A portion of incident light that is reflected from a surface. Blindspot : Spot where your optic nerve connects your retina DIGITIZATION Digitization is the process of converting information into a digital format. There is a processed pipeline to convert analog to digital image ❖ SAMPLING : Digitization wrt Coordinate Values. The sampling rate determines the spatial resolution of the digitized image. ❖ QUANTIZATION : Digitization wrt Amplitude. The quantization level determines the number of grey levels in the digitized image. If Quantization increases, Image detailing will improve. REPRESENTING DIGITAL IMAGE

b=M*N*K b of bits required to store a digitized image of size M x N. IMAGE TYPES ❖ Binary Image : (b&w image ) : Each pixel contain 1 bit ( 1 : black , 0: white ) ❖ Digital Image : Monochromatic/Grayscale / Intensity Image : Pixel value can be in range 0-255 Each pixel corresponds to light intensity normally represented in gray scale. Colour Image / f : Each pixel contains a vector representing Red, Green, Blue components. Index Image : Construct a look up table and each image is denoted by an Index number and each index number has its own RGB value. IMAGE DATA TYPES 1bit Image : (Binary Image) Each pixel is stored as a single bit (0 or 1) . 8bit gray level Image : Each pixel has gray value b/w 0-255. Each pixel is represented by 1 bit. 24bit colour image : Each pixel is represented by 3 bytes representing RGB. 256 x 256 x 256 colors (16,777,216 colours) IMAGE INTERPOLATION Interpolation- constructing new data points within the range of a discrete set of known data points.

Image interpolation- is a tool which is used to zoom, shrink and geometric corrections of an image ( re-sampling of images). Image interpolation refers to the “guess” of intensity values at missing locations Why Image Interpolation ? : if we want to see an image bigger -When we see a video clip on a PC, we like to see it in the full screen mode if we want a good image -If some block of an image gets damaged during the transmission, we want to repair it if we want a cool image -Manipulate images digitally can render fancy artistic effects as we often see in movies ZOOMING Zooming tells us that you are trying to expand the size of the image. Two step procedure ● ●

Creation of new pixel location Assigning gray levels to those new location

Methods ● ● ●

Nearest Neighbor Interpolation Pixel Replication Bilinear Interpolation

NEAREST NEIGHBOURHOOD INTERPOLATION

•Suppose an Image of Size 2x2 pixels image will be enlarged 2 times . •Lay an imaginary 4*4 grid over the original image. •For any point in the overlay, look for the closest pixel in the original image, and assign its gray level to the new pixel in the grid. •When all the new pixels are assigned values, expand the overlay grid to the original specified size to obtain the zoomed image. Limitations of Nearest Neighbour Interpolation : it creates a checkerboard effect . When you are trying to replicate neighbourhood pixel values, sharpness of the image decreases. PIXEL REPLICATION (Resampling) •Pixel replication (re sampling) is a special case that is applicable when the size of the image needs to be increased an integer number of times.(Eg: 5 times)

•Double the size of the image •Duplicate each column BILINEAR INTERPOLATION Resampling method that uses the distance weighted average of the four nearest pixel values to estimate a new pixel value. IMAGE SHRINKING

To shrink an image by half, we delete every other row and column. RELATIONSHIP BETWEEN PIXELS Neighbours of pixel : N4P - 4 Neighbours : (x+1,y),(x-1,y),(x,y+1),(x,y-1) NDP - 4 Diagonal Neighbours : (x+1,y+1),(x+1,y-1),(x-1,y+1),(x-1,y-1) N8P-8 Neighbours : N4P + NDP DISTANCE MEASURE If we have 3 pixels: p,q,z: p with (x,y) q with (s,t) and z with (v,w) Then: D is to be distance metric iff •D(p,q) ≥ 0 [D(p,q)=0 iff p = q] •D(p,q) = D(q,p) (symmetry) •D(p,z) ≤ D(p,q) + D(q,z) (triangular inequality) Euclidean distance City Block distance

Chess Board distance

DISTANCE MEASURE OF PATH If distance depends on the path between two pixels such as m-adjacency then the Dm distance between two pixels is defined as the shortest m-path between the pixels.

WEEK 3 -PART 1

COLOUR MODEL Specification of a coordinate system and a subspace within that system where each colour is represented by a single point Eg : RGB, CMY,HSI RGB : Primary Colours (Red,green,blue), Secondary Colors (Cyan, Magenta, Yellow) CMY : Cyan , Magenta, yellow [CMY] = 1- [R G B] Equal amount of CMY produce black HSI : Uses 3 measures to describe colour Hue : Indicate dominant wavelength in mixture of lightwaves Saturation : Give a measure of degree to which RGB is diluted with white light Intensity : Brightness is nearly impossible to measure and use to describe colour sensation Pseudo Colour Image Processing : Assigning colour to gray level values based on Intensity slicing.

--------------------------------------------------------------------PROCESSING PIPELINE

IMAGE ENHANCEMENT IN SPATIAL DOMAIN : Take i/p image of MXN dimension -> process it -> o/p will be processed image of dimension MXN Spatial Domain Techniques : Spatial domain refers to the image plane itself and are based on direct manipulation of pixels in an image g(x,y) = T(f(x,y)) Point Operations Histogram based processing Mask Processing POINT OPERATION Enhancement at any point in an image depend on gray level at point Goal of Image Processing : One of the basic problems in image processing is enhancement. Process an image such that the result image is more suitable than the original image. ●

Contrast Stretching Function

●

Linear Negative Transformation

Contrast Stretching Function: An image enhancement technique that attempts to improve contrast in image by stretching a range of intensity values. [Produce image with high Contrast] Linear Negative Transformation: Negative of an image with gray levels in the range [0,L-1] is defined by a –ve transformation.

Enhance white or gray detail on dark regions, esp. when black areas are dominant in size Limitation of Point Processing •They don’t know anything about their neighbors •Most image features (edges, textures) involve a spatial neighborhood of pixels •If we want to enhance or manipulate these features, we need to go beyond point operations HISTOGRAM BASED PROCESSING Image Enhancement By Considering Image As A Whole (global operation) Histogram : The histogram of a digital image with gray levels in the range[0,L-1] is discrete function h(rk)=nk rk - kth gray level and nk - number of pixels in the image having gray level rk. Histogram Equalization : An approach to enhance a given image. After equalization the output will have all gray level values in equal proportion. n=total number of pixel {if image = 64 x 64 , n=4096} (rk) - pixel intensity (p(rk)) or (nk/n) - probability CDF -cumilativeprobability

CDF*(L-1) (Sk) - rounding Number of pixel - check with old nk and rounding Demerits of Histogram Equalization: if given histogram is very narrow, histogram equalization may not always produce desirable result It can produce false edges and regions It can also increase image graininess and patchness Histogram Matching : (Specification) : to specify particular histogram shapes capable of highlighting certain gray levels. [ modify 1 image based on contrast of other ] Steps : ➢ Equalize A and B ➢ Map each pixels of A and B using equalized histogram ➢ Modify each pixel of A based on B Local Enhancement : Enhance details over small areas in an image. MASK / KERNEL PROCESSING Instead of taking 1 pixel, we are taking the neighbourhood values. N4P - 4 Neighbours : (x+1,y),(x-1,y),(x,y+1),(x,y-1) NDP - 4 Diagonal Neighbours : (x+1,y+1),(x+1,y-1),(x-1,y+1),(x-1,y-1) N8P-8 Neighbours : N4P + NDP SPATIAL DOMAIN FILTERS : Spatial Filtering technique is used directly on pixels of an image. Mask is usually considered to be added in size so that it has a specific center pixel. Two Types : Linear Spatial Filter : Mean , Weinar, Gaussian Non-Linear Spatial Filter : Min, max, Median Non Linear Spatial Filter : ●

Operate on neighborhoods

●

Filtering operation is based conditionally on the values of the pixel in the neighborhood under consideration,

●

They do not explicitly use coefficients in the sum - of products manner described in linear filter

Application ●

Noise reduction

●

Compute the median gray value of the pixel neighborhood

Sharpening Spatial Filter : Technique for enhancing intensity transitions or highlighting fine details. {enhance details that are blurred} Smoothing Spatial Filter : ●

Used for image blurring and noise reduction

●

Blurring helps to remove small details from an image, used in preprocessing stage

●

Helps to fill small gaps in lines or curves

●

Noise reduction accomplished by blurring

●

By applying Linear Filter or NonLinear Filter

Smoothing Linear Filter : Averaging filter or Low Pass filter Basic workflow ●

Replacing the value of every pixel in an image by the average of the gray levels in the neighborhood by the filter mask.

●

Produce an image with reduced sharp transition in the gray level

●

Helps to reduce irrelevant details from an image.

Limitation ●

the edges in the given image have sharp transitions in the gray level.

●

Performing image averaging can lead to blur edges.

●

Big mask size means more blurred

Ordered statistics filters are nonlinear spatial filters whose response is based on ordering (ranking) the pixel contained in the image area encompassed by The filter, and then Replacing The Value Of The Center Pixel With the Value Determined By The Ranking result.

Eg : Median Filter : Eliminated isolated clusters of pixel that are light or dark wrt their neighbours whose area < nsquare/2 Median filter : [10 125 135 141 141 144 230] = 141 {To remove salt and pepper noise} Max filter

: [10 125 125 135 141 141 144 230 240] = 240 {to remove salt noise }

Min filter

: [10 125 125 135 141 141 144 230 240] = 10

{To find darkest point in an image} {remove pepper noise }

Linear vs NonLinear Operation :Let H is a linear operator of the form H(af+bg)=aH(f)+bH(g) {f and g are 2 images and a and b are 2 scalar values} Then the linear operator is the result of the sum of these 2 images will be identical.

CONVOLUTION AND CORRELATION Process of moving a filter mask over the image and compute sum of product at each location. Difference b/w Convolution and correlation : Convolution process rotate matrix by 180 degree. ●

Linear Spatial Filters are examples for Convolution.

●

It considers its neighborhood operation in which the output pixel is the weighted sum of neighboring pixels.

●

Correlation is used to measure the similarity between two signals

●

If the mask or kernel is symmetrical results are identical

IMAGE ENHANCEMENT IN FREQUENCY DOMAIN : Take i/p image (f(x,y)) -> apply fourier transform -> inverse transform -> get processed image (g(x,y)) Frequency Domain Techniques : Frequency domain processing techniques are based on modifying fourier transform of image. Smoothing Sharpening

SMOOTHING FREQUENCY DOMAIN FILTERS

Ideal Low Pass Filter : It removes high-frequency noise from a digital image and preserves low-frequency components. Cut off all the high frequency components of fourier transform that are at a distance greater than specific distance D0 from the origin of the transform. H(u,v)=1 if D(u,v) 0 { D(u,v) id distance from point (u,v) to origin } If image is of size MXN, center of frequency rectangle is at (u,v) =(M/2,N/2)

Butterworth Low Pass Filter : It does not have a sharp discontinuity. Filter has smooth transition between low and high frequency. Butterworth LPF of order n and with cutoff frequency at a distance D0

Demerit : Have ringing effect Gaussian Low Pass Filter :

Replace sigma by Do (cut of frequency) Advantages : No ringing effect, Fill gaps in letters Low pass filter remove high frequency component from image SHARPENING FREQUENCY DOMAIN FILTERS

Sharpening Spatial Filter : Technique for enhancing intensity transitions or highlighting fine details. {enhance details that are blurred} Derivative and Sharpening Filter : Image differentiation : First Order Derivative : df/dx = f(x+1)-f(x) Second Order Derivative : d2f/dx2=f(x+1)+f(x-1)-2fx Properties Of Derivative : Firstderivative : must be 0 in flat segments, must be non 0 along ramps, Must be non 0 at the onset of a gray level step or ramp. Secondderivative : Must be 0 in flat segment, Must be 0 along ramps, Must be non 0 at the onset and end of gray level step or ramp. First Order VS Second Order : First order derivative produce thicker edges, Second order produces a double response at step changes in gray level. {II derivative is better than 1st for image enhancement. The principle use of first derivative for edge detection } FREQUENCY DOMAIN FILTERS Frequency domain filters are used for smoothing and sharpening of image by removal of high or low frequency components. ● Frequency domain filters are different from spatial domain filters as it basically focuses on the frequency of the images. ● It is basically done for two basic operations i.e., smoothing and sharpening Smoothing Linear Filter Low Pass Filter/Averaging : Low pass filter removes the high frequency components that means it keeps the low frequency components. It is used for smoothing the image by attenuating high frequency components and preserving low frequency components. Mechanism of low pass filtering in the frequency domain is given by: G(u,v) = H(u,v) . F(u,v), Where F(u,v) is the Fourier Transform of the original image and H(u,v) is the Fourier transform of filtering mass. Basic workflow ●

Replacing the value of every pixel in an image by the average of the gray levels in the neighborhood by the filter mask.

●

Produce an image with reduced sharp transition in the gray level

●

Helps to reduce irrelevant details from an image.

Limitation ●

the edges in the given image have sharp transitions in the gray level.

●

Performing image averaging can lead to blur edges.

●

Big mask size means more blurred

High Pass filter : High pass filter removes the low frequency components that means it keeps the high frequency components. It is used for sharpening the image. It is used to sharpen the image by attenuating low frequency components and preserving high frequency components. Mechanism of

high pass filtering in frequency domain is given by: H(u,v) = 1 - H’ (u,v) , Where H(u,v) is the Fourier Transform of high pass filtering and H’ (u,v) is the Fourier transform of low pass filtering. Band pass filter : Band pass filter removes the very low frequency and very high frequency components that means it keeps the moderate range band of frequencies. Band pass filtering is used to enhance edges while reducing the noise at the same time. --------------------------------------------------------------------SIFT Scale Invariant Feature Transform : is a feature detection algorithm in computer vision to detect and describe local features in images Applications : ob...