Social Distancing Monitoring Using Deep Learning
Modified Problem Statement
In lieu of ongoing Covid 19
To identify distance between persons in pre-recorded video and warn when distance is less than a minimum safe distance following social distancing norms.
To identify distance between persons in real time video and warn when distance is less than a minimum safe distance following social distance
The progressing Covid 19 pandemic made an enormous circumstance for huge businesses to manage to attempt to keep up safe separation between representatives to guarantee well being and ensure the work is proceeded with unhindered .This project and it’s modification is an attempt to solve and assist the companies towards safe and better functioning ensuring none of the employee is affected as a result of covid 19 pandemic while working by ensuring that they follow social distancing by monitoring over them through video in workplace.Video Analysis is a domain of understanding and working on video to determine and use various spatial and temporal events and features of video . Python language with it’s libraries of image/video processing and Artificial Intelligence is used to perform the tasks.
Possible Approach
(general solution to problem statement)
Herein I will briefly explain the general and best approach of solution and how to achieve it .
Thus I will quickly clarify the general and best methodology of arrangement and how to accomplish it .
It can be basically taken as a hypothesis or the best possible method.
Camera Calibration
The present modest pinhole cameras acquaints a great deal of twisting with pictures. Two major distortions are radial distortion and tangential distortion. Due to radial distortion, straight lines will appear curved. Its effect is more as
I move away from the centre of image.
For instance, one picture is demonstrated as follows, where two edges of a chess board are set apart with red lines. In any case, you can see that border is definitely not a straight line and doesn't coordinate with the red line. All the normal straight lines are swell out.
This distortion is solved as follows:
Similarly, another distortion is that the tangential distortion which occurs because image taking lense is not aligned perfectly parallel to the imaging plane. So some areas in image may look nearer than expected. It is solved as below:
So, it's expected to discover five parameters, known as distortion coefficients given by:
In addition to this, find a few more information, like intrinsic and extrinsic parameters of a camera. Intrinsic parameters are specific to a camera. It includes information like focal length (
Extrinsic parameters corresponds to rotation and translation vectors which translates a coordinates of a 3D point to a frame of reference.
With Open CV calibration.py module the camera can be calibrated to correct all distortion.
And to identify all the parameters (intrinsic as well as extrinsic).
Streaming the real time/pre recorded video
A video is a succession of quick moving pictures. The undeniable inquiry that follows is how quick are the photos moving?
The proportion of how quick the pictures are progressing is given by a measurement called frames per second(FPS). At the point when somebody says that the video has a FPS of 40, it implies that 40 pictures are being shown each second.
In OpenCV, a video is read either by using the feed from a camera connected to a computer or by reading a video file using VideoCapture object.
Each video is converted to image frames . Total no of frames = fps * time of video
Person Identification
The Person identification task is sub-problem of object identification and segmentation problem .
According to Tensorflow tutorial :

The popular and most accurate image Segmentation methods are :
Image Source : Analytics Vidhya

I have utilized Neural Network based object detection and image segmentation technique . The potential techniques were :
Image Source : My Project
Object Detection Image Segmentation


Given a pixel to real life changing formula the Euclidean distance can be calculated between persons in a image frame between the centers of the each person.
Lines can be used to indicate the distance between persons. Using Open CV
Image Source : My Project

Warning if the distance is less
Image Source : My Project

This can be done using Open CV
imshow function by looping across all the processed frames and showing them in best possible fps enabling real time social distance detector.
The video can recorded and stored on the disk for further future reference using VideoWriter object of OpenCV
The project I did was done on Colab notebook .
For seamless working Intel processor i7 with Gpu over 1050 ti is recommended. AMD Cpu and Gpu of same capability can also be used .
This would make it not only work on pre-recorded but with these specifications the model can easily work real time seamlessly.
Softwares necessary :
Open CV (3.2.0) ,
Python Libraries ( Python 3.7.3) ,
Tensorflow (tensorflow 2.1.0 ,tensorflow-estimator 2.1.0),
keras ,numpy ,skimage ,matplotlib.pyplot
git clone the Mask RCNN repository to get all the necessary packages.,
matplotlib.pyplot
git clone the Mask RCNN repository to get all the necessary packages.
The Mask RCNN model pre-trained on coco dataset is used . This pre-trained model has 90 labels which it can identify and provide mask and bounding boxes for them .
Some examples are person ,bicycle, car,motorcycle,bus ,train,umbrella ,bag etc..
Weights are loaded from pre-trained system and loaded on model .
Results are visualised on a particular frame .

Image Source : My Project

Image Source : My Project
This is how different objects in a video frame are identified.

In OpenCV, a video is read either by using the feed from a camera connected to a computer or by reading a video file using VideoCapture object.
Converting video into image frames
Each video is converted to image frames . Total no of frames = fps * time of video
Person Identification
and Segmentation on the frames
The Person identification task is sub-problem of object identification and segmentation problem .
According to Tensorflow tutorial :
The task of image segmentation is to train a neural network to output a pixel-wise mask of the image. This helps in understanding the image at a much lower level, i.e., the pixel level. Image segmentation has many applications in medical imaging, self-driving cars and satellite imaging to name a few.
The popular and most accurate image Segmentation methods are :
Image Source : Analytics Vidhya
I have utilized Neural Network based object detection and image segmentation technique . The potential techniques were :
- SSD
- R-CNN(R-FCN will come under this )
- YOLO(version 1, 2 , 3, tiny yolo)
- Mask RCNN(based on R-FCN but still faster)
Image Source : My Project
Object Detection Image Segmentation
Calculating distance between persons
Given a pixel to real life changing formula the Euclidean distance can be calculated between persons in a image frame between the centers of the each person.Lines can be used to indicate the distance between persons. Using Open CV
Image Source : My Project
Warning if the distance is less
than particular distance by plotting lines or showing bounding boxes
Image Source : My Project
Showing the frames in real time on screen
This can be done using Open CV
imshow function by looping across all the processed frames and showing them in best possible fps enabling real time social distance detector.
The video can recorded and stored on the disk for further future reference using VideoWriter object of OpenCV
Detailed Explanation of Approach Taken
Using Mask RCNN object segmentation for analysis on pre- recorded video
Hardware specifications :The project I did was done on Colab notebook .
For seamless working Intel processor i7 with Gpu over 1050 ti is recommended. AMD Cpu and Gpu of same capability can also be used .
This would make it not only work on pre-recorded but with these specifications the model can easily work real time seamlessly.
Softwares necessary :
Open CV (3.2.0) ,
Python Libraries ( Python 3.7.3) ,
Tensorflow (tensorflow 2.1.0 ,tensorflow-estimator 2.1.0),
keras ,numpy ,skimage ,matplotlib.pyplot
git clone the Mask RCNN repository to get all the necessary packages.,
matplotlib.pyplot
git clone the Mask RCNN repository to get all the necessary packages.
Method :
Importing all the necessary modules and packages from open cv and python as well as from cloned repository of mask rcnnThe Mask RCNN model pre-trained on coco dataset is used . This pre-trained model has 90 labels which it can identify and provide mask and bounding boxes for them .
Some examples are person ,bicycle, car,motorcycle,bus ,train,umbrella ,bag etc..
Weights are loaded from pre-trained system and loaded on model .
Results are visualised on a particular frame .
Image Source : My Project
Image Source : My Project
This is how different objects in a video frame are identified.
- With VideoCapture module of Open CV video is opened .
- Each frame of video is read simultaneously.
- Using read function of cv2
- Pre-trained and built Mask RCNN model is used to provide masks and bounding boxes for each object .
- Using a distance function defined to calculate Euclidean distance between centers of each bounding boxes which means distance between every masked object is found.
- Using Coco labels and ids I identify and calculate the distance between every person in the video frame.
- If the distance is less than a required pixel distance by social distancing norms lines are drawn between the persons as a warning .
Image Source : My Project
Image Source : My Project
This is done on pre-recorded video and new edited video is either shown on screen and saved as edited video into specified path .
All the functions are then closed.
And cv2 windows destroyed.
Strength of the approach :
- I can see whosoever violated the specified distance and record the video .
- On hardware of good specifications this can be done in real time also.
- Most importantly it can identify any no of persons and has very high accuracy of around 99 percent.
Uses :
Can be used in really crowded rooms airports stations to identify social distancing violations.Can be used in power plants companies other industrial workplaces because accuracy of detecting a person is very high .
Problems of the model :
Need of high end cpu and gpu for processing
As speed and accuracy are somewhat inversely related hence to get such good accuracy I had to sacrifice speed here.
The camera needs to calibrated at birds eye view and distance be adjusted accordingly .
Solution :
Having a model which is somewhat less accurate to be just able to identify persons but can be used real time on even lower specification hardware.Using CUDA , Multi threading and Multi processing to make use of the resources and inbuilt tensorflow power to give speed and accuracy together.
Detailed Explanation of Approach
In this approach I have lowered the accuracy to gain speed .
Using Yolov3-tiny object segmentation for analysis on real time video:
Hardware specifications :
I was able to perform real time social distance detection on i5 cpu with no graphics .
Android phone camera linked through server was used for video capturing.
Software specifications:
The software since it uses OpenCV darknet which is available only on Linux based operating systems would any linux based os.
Android phone needs to have camera server app or any other webcam based on server.
Weights of pre-trained Yolov3-tiny model were downloaded and used. Alongside coco name label for yolov3 tiny and configuration file needs to be downloaded.
Open CV (3.2.0) ,
Python Libraries ( Python 3.7.3) , imutils
Method :
- Importing all the libraries and specific configuration files and weights .\
- Then through server on android phones camera video captured was used and worked upon in real time using VideoCapture module in cv2 and read function to read the video frame.
- The dimensions of our input video for testing are quite large, so I resizeeach frame while maintaining aspect ratio.
frame: The frame from your video file or directly from your webcam /android
camera
net: The pre-initialized and pre-trained YOLO object detection model
ln: The YOLOv3-tiny CNN output layer names
personIdx: The YOLOv3-tiny model can detect many types of objects; this index is specifically for the person class, as I won’t be considering other objects
then the corresponding result on the frame after image segmentation and object classification is stored.
The results consist of
- the person prediction probability,
- bounding box coordinates for the detection, and
- the centroid of the object.
Given our frame, now it is time to perform inference with YOLOv3-tiny.
Pre-processing our frame requires that I construct a blob. From there, I are
able to perform object detection with YOLOv3-tiny and OpenCV darknet.
Then computing bounding box coordinates and the center (i.e., centroid) of the bounding box . Using the bounding box coordinates deriving the top-left coordinates for the object.
Next, I apply non-maxima suppression:The purpose of non-maxima suppression is to suppress overlapping bounding boxes.
Assuming the result of NMS yields at least one detection, I loop over them, extract bounding box coordinates, and update our results list consisting of the:
- Confidence of each person detection
- Bounding box of each person
- Centroid of each person
Assuming that at least two people were detected in the frame , I proceed to:
- Compute the Euclidean distance between all pairs of centroids
- Loop over the upper triangular of distance matrix (since the matrix is symmetrical)
- Check to see if the distance violates our minimum social distance set forth by public health professionals. If two people are too close, I add them to the violate set .
So all the persons who violated the social distance factor are updated in violate set in each frame .
- Looping over the results on violate set, I proceed to:
- Extract the bounding box and centroid coordinates
- Initialize the color of the bounding box to green
- Check to see if the current index exists in our violate set, and if so, update the color to red.
- Draw both the bounding box of the person and their object centroid . Each is
- color-coordinated, so I’ll see which people are too close.
- Display information on the total number of social distancing violations (the length of violate set)
Image Source : My Project
Image Source : My Project
Snapshots of the real time detection done on people .
Model is able to detect people almost instant and give the total count of violation in each frame pretty much accurately .
Strength of the approach :
- I can see whosoever violated the specified distance and record the video .
- Can be done in real time also even on low specification .
- Detection is very fast and of decent accuracy .
Uses :
Can be used on roads and less crowded places to do real time social distance violations detection.
Problems :
- Not so accurate
- Model is not very accurate to classify segment or identify large no of peoples.
- Good to be used in 5-10 persons case in a frame as the accuracy is traded for speed.
- The camera needs to calibrated at birds eye view and distance be adjusted accordingly .
Solution
To use previous accurate model on high end with better software usage and processes like CUDA , Multi threading and Multi processing to make use of the resources and inbuilt tensorflow power to give speed and accuracy together.
Birds eye view calculation and calibration to make sure that exact 3D scenario is taken care of .
It means trying to capture video from a perspective and vision of a bird for better 3D depth and handling.
Result
The method for social distancing detection can be used anywhere with proper hardware and most perfect model can be selected based on the usage in the real and different scenarios.
Experience and Future improvements
The project helped to learn a great deal on video analytic and usage of artificial intelligence and IoT to counter real life problems.
Usually such projects are not easy to be found wherein application of artificial intelligence is applied on real life scenarios.
After minor tweaks and graphics adjustments the models can be turned into rest apis and hosted on server thereupon to be used anywhere everywhere anytime on any device which can access software .
Conclusions
In this report I have tried to explain and summarize the solution for the given problem statement.
I listed the most ideal and perfect general approach towards the solution.
I applied the first approach with social distancing monitoring with best accuracy using Mask RCNN image segmentation and Open CV2 for video and image processing and pre-processing.
I also presented a solution for real time detection on lower end machines using YOLOv3-tiny for object and image segmentation and Open CV2 for video and image processing and pre-processing.
Comments
Post a Comment