Object Detection with Python
Object Detection is the process of finding real world object instances in still images or Videos. When we see an image, our brain instantly find the objects contained in it. On the other hand, it takes a lot of time and training data for a machine to learn to find object instances. But the recent advanced technology has made this computer vision field a lot easier.
clear % Load the ground truth data load('DaataAndCars.mat') load('rcnnStopSigns.mat') % Update the path to the image files to match the local file system DaataAndCars.imageFilename = fullfile('carData/', DaataAndCars.imageFilename);
% Display a summary of the ground truth data summary(DaataAndCars) Variables: imageFilename: 41×1 cell array of character vectors stopSign: 41×1 cell carRear: 41×1 cell carFront: 41×1 cell % Only keep the image file names and the stop sign ROI labels carFront = DaataAndCars(:, {'imageFilename', 'carRear'}); % Display one training image and the ground truth bounding boxes I = imread(carFront.imageFilename{1});
I = insertObjectAnnotation(I,'Rectangle',carFront.carRear{1},
' Car Rear',
'LineWidth',24);
figure imshow(I) % A trained detector is loaded from disk to save time when running the % example. Set this flag to true to train the detector. % Set training options options = trainingOptions( 'sgdm', ... 'MiniBatchSize', 128, ... 'InitialLearnRate', 1e-3, ...
'LearnRateSchedule', 'piecewise', ... 'LearnRateDropFactor', 0.1, ... 'LearnRateDropPeriod', 100, ... 'MaxEpochs', 100, ... 'Verbose', true);
% Train an R-CNN object detector. This will take several minutes. rcnn = trainRCNNObjectDetector(carFront, cifar10Net, options, ... 'NegativeOverlapRange', [0 0.3], 'PositiveOverlapRange',[0.5 1])
******************************************************************* Training an R-CNN Object Detector for the following object classes: * carRear --> Extracting region proposals from 41 training images...done. --> Training a neural network to classify objects in training data... Training on single GPU. Initializing input data normalization. |========================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning | | | | (hh:mm:ss) | Accuracy | Loss | Rate | |========================================================================================| | 1 | 1 | 00:00:00 | 62.50% | 0.6585 | 0.0010 | | 25 | 50 | 00:00:15 | 100.00% | 0.0018 | 0.0010 | | 50 | 100 | 00:00:31 | 100.00% | 0.0008 | 0.0010 | | 75 | 150 | 00:00:46 | 100.00% | 0.0014 | 0.0010 | | 100 | 200 | 00:01:02 | 100.00% | 0.0001 | 0.0010 | |========================================================================================| Network training complete. --> Training bounding box regression models for each object class...100.00%...done. Detector training complete (with warnings): ******************************************************************* rcnn = rcnnObjectDetector with properties: Network: [1×1 SeriesNetwork] RegionProposalFcn: @rcnnObjectDetector.proposeRegions ClassNames: {'carRear' 'Background'} BoxRegressionLayer: 'conv_2'
% Read test image testImage = imread('image003.jpg'); % Detect stop signs [bboxes,score,label] = detect(rcnn,testImage,'MiniBatchSize',128) bboxes = 2×4 1016 427 609 378 782 363 406 407 score = 2×1 single column vector 0.9983 0.5087 label = 2×1 categorical array carRear
% Display the detection results [score, idx] = max(score); bbox = bboxes(idx, :); annotation = sprintf('%s: (Confidence = %f)', label(idx), score); outputImage = insertObjectAnnotation(testImage, 'rectangle', bbox, annotation); figure imshow(outputImage)