Image presentation

ME5411 Robotics Vision and AI

Last updated on Oct 17, 2023 4 min read Course Notes

This note is for revision, so I give up telling the story and just write down the key points.

1. Image Representation

Continuous Image Function

f(x,y) or f(x,y,t) for video

The value is intensity that can be determined by illumination, reflection. Or for some other cases, it can be temperature, pressure, distance, etc.

Image Digitalization

Sampling: turn continuous x, y into discrete
Quantization: turn intensity into discrete values
so it can be represented as a matrix
each element of this matrix is called a pixel

Pixel

smallest unit
point sample
has a position
color capability: bits (or bit-depth) (such as 2-bit for black and white, 8-bit for 256 colors, 24-bit for 16M colors)

2. Image Sampling and Quantization

Sampling

sampling will make the shape different
spatial resolution (image resolution): pixel per unit area (pixel / inch, PPI)
raster dimension (image dimension): the number of pixels in width and height (width x height)
p(i,j): i is the column, j is the row in the matrix. While in programming, we call image[y, x], either in python or matlab.

Quantization

bit-depth: K = 2^n
index color: 8 bit but encode 256 colors, good for hardware to make a look up table
this level better be fine enough for human eyes to distinguish, (human do well at 60 gray levels, so we use 8-bit for gray scale)

3. Digital Image Properties

Metric and Topology

Set
Metric Space: a set with a distance function
Topological Space: a mathematical structure that allows formal definition of concepts such as convergence, connectedness, continuity, etc.
connecivity of a set: a set is connected if it cannot be divided into two disjoint nonempty open sets

distances:

Euclidean distance
City Block distance (Manhattan distance)
Chessboard distance

Typical Connectivity:

4-connectivity
8-connectivity

Histogram

a graph showing count of intensity values
can detect
- low contrast / high contrast
- brightness / overexposure / underexposure
more than one image can have the same histogram
can remove background by thresholding
invariant to translation, rotation, scaling, etc
does not contain spatial information (shape)

Visual Perception

Contrast
- local change in brightness
- for human, object from background
Acuity
- the ability to detect detail
- the resolution of eye
perceptual grouping

example codes in MATLAB:

% read image
lenna = imread('lenna_(test_image).png');

% slpit image into 3 channels and show in a grid of 2x2
lenna_r = lenna(:,:,1);
lenna_g = lenna(:,:,2);
lenna_b = lenna(:,:,3);

subplot(2,2,1);
imshow(lenna_r);
title('Red Channel');

subplot(2,2,2);
imshow(lenna_g);
title('Green Channel');

subplot(2,2,3);
imshow(lenna_b);
title('Blue Channel');

subplot(2,2,4);
imshow(lenna);
title('Original Image');

% convolution with a kxk Gaussian kernel
k = 5;
sigma = 1;
kernel = fspecial('gaussian', k, sigma);
kernel % print kernel
lenna_conved = imfilter(lenna, kernel);
figure;
imshow(lenna_conved);
title('Convolved Image');

% visualize the kernel in 3D plot
figure;
surf(kernel); % surf is a 3D plot function, it needs a 2D matrix as input
title('Gaussian Kernel');

% Simple CNNs
gpuInfo = gpuDevice();
disp(gpuInfo);

% 64x3 32x16 16x32 8x64
% use full conv net
net = [ ...
    imageInputLayer([64 64 3])
    convolution2dLayer(3, 16, 'Padding', 1, 'Stride', 2)  % 32x32x16
    batchNormalizationLayer
    leakyReluLayer(0.2)
    convolution2dLayer(3, 32, 'Padding', 1, 'Stride', 2)  % 16x16x32
    batchNormalizationLayer
    leakyReluLayer(0.2)
    convolution2dLayer(3, 64, 'Padding', 1, 'Stride', 2)  % 8x8x64
    batchNormalizationLayer
    leakyReluLayer(0.2)
    convolution2dLayer(3, 128, 'Padding', 1, 'Stride', 2) % 4x4x128
    batchNormalizationLayer
    leakyReluLayer(0.2)
    convolution2dLayer(3, 256, 'Padding', 1, 'Stride', 2) % 2x2x256
    batchNormalizationLayer
    leakyReluLayer(0.2)
    convolution2dLayer(2, 3, 'Padding', 0, 'Stride', 2)   % 1x1x10, 10 classes
    softmaxLayer
    classificationLayer
];

% read in data
imds = imageDatastore('ssb', ...
    'IncludeSubfolders', true, ...
    'LabelSource', 'foldernames');

% split into training and validation
[trainingImds, validationImds] = splitEachLabel(imds, 0.8, 'randomized');

% preprocess into 64x64
inputSize = [64 64 3];
augmentedTrainingImds = augmentedImageDatastore(inputSize, trainingImds);
augmentedValidationImds = augmentedImageDatastore(inputSize, validationImds);

% define training options
options = trainingOptions('sgdm', ...
    'InitialLearnRate', 0.001, ...
    'MaxEpochs', 20, ...
    'Shuffle', 'every-epoch', ...
    'ValidationData', augmentedValidationImds, ...
    'ValidationFrequency', 30, ...
    'Verbose', false, ...
    'Plots', 'training-progress', ...
    'ExecutionEnvironment', 'auto');

% train network
trainedNet = trainNetwork(augmentedTrainingImds, net, options);

Index color

mapping one color to one index, and use the index to represent the color. such as 0 -> red, 1 -> green, 2 -> blue, 3 -> (0, 3, 6) … Good for hardware, like using look up table to save memory and speed up processing.