FALL DETECTION
If you’re an older adult — or care for someone who is — falling is likely high on your list of worries. And for good reason. According to the National Council on Aging :
- Every 11 seconds, an older adult is treated in an emergency room for a fall-related injury.
- Falls are the leading cause of fatal and non-fatal injuries for elders.
There’s plenty of stuff to worry about already this year. Why not take at least one thing off the list?
Accidental falls are a major source of loss of autonomy, deaths, and injuries. Accidental falls also have a remarkable impact on the costs of national health systems. Thus, extensive research and development of fall detection and rescue systems are a necessity. Technologies related to fall detection should be reliable and effective to ensure a proper response.
Fall detection systems can be generally categorized under the following types: context-aware systems, wearable devices, and cell phone-based systems. In context-aware systems, sensors are deployed in the environment to detect falls. Fusing cameras with other context-aware sensors is a common approach to minimize the number of computations performed as the non-camera sensor is used to trigger the processing of video frames, instead of continuously analyzing the surveillance videos. A Pressure sensor is an example of such a sensors, where the human pressure on the floor is measured and compared against a threshold. If the pressure value is greater than the threshold value, video frames are processed in order to verify a fall-event.
Most commercial vision-based fall-detection systems in the market are based on portable devices. Nowadays, it is not easy to find commercial devices using computer vision, but their associated technical advancements and related literature remain promising. Some systems also merge cameras (Microsoft’s Kinect) and accelerometers, where a fuzzy system merges sensor data to determine if a fall has occurred.
Wearable technologies and their drawbacks:
The most common technologies found in these types of sensors are accelerometers and gyroscopes. These are devices that are easy to wear but have some drawbacks as the power consumption (limiting its usability) and the sensitivity to body movement (which may cause false alarms). Although wearable techniques can be accurate (mainly the marker-based ones) and suitable even in outdoor conditions (e.g. kinematic sensors), nevertheless their effectiveness is reduced (occurrence of false alarms) if they are forgotten or placed incorrectly on the body, and besides they limit body movement.
A skeleton The description of the human body possesses increasing popularity in action recognition due to its compactness and availability. Modern RGB-D cameras like Microsoft Kinect allow us to immediately obtain the skeleton points in real-time. The skeleton data is easier to transmit while preserving the high level of privacy in the surveillance system.
Based on the depth information and skeleton tracking technology of the Microsoft Kinectv2 sensor, first, the depth of the Kinect V2 sensor is used to process the human joints produced by the skeleton tracker. Then the optimized BP neural network is used for posture recognition, and the fall is detected on this basis.
This blog provides a comprehensive review of state-of-the-art fall detection technologies considering the most powerful deep learning methodologies. We reviewed the most recent and effective deep learning methods for fall detection and categorized them into three categories: Convolutional Neural Network (CNN) based systems, Long Short-Term Memory (LSTM) based systems, and Auto-encoder based systems. In the process, we will touch upon a few insightful methods used currently.
Convolutional neural networks are extensively used for image classification, object detection, and scene recognition tasks. Recurrent neural networks are also applied to image classification tasks, but they are more associated with sequences processing. In this method, video frames can be stack and feed them to a multi-resolution CNN. The multi-resolution network has two streams for processing; a low-resolution stream and a high-resolution stream.
In a 2-stream convolutional neural network. One stream is used to capture the spatial information by processing single frames. The other stream is used to capture the temporal information by processing multi-frame optical flow representations.
A 3D convolutional neural network can be used to extract the spatial and temporal information encoded in successive video frames. Video frames were fed into the network which consists of two 3D convolutional layers, two pooling layers, one 2D convolutional layer, and one fully connected layer. In most cases, the classification based on the CNN approach achieves good results. For RGB video frames, these results can be due to the association of human actions with the presence of certain objects in the scene. For example, for human action to be classified as swimming, a water surface needs to be present in the scene.
C. AUTO-ENCODER (AE) BASED FALL DETECTION SYSTEMS
While CNN and LSTM architectures are mostly used for supervised learning, Autoencoder is an unsupervised artificial neural network that learns how to efficiently compress and encode data then learn how to reconstruct the data back from the reduced encoded representation to a representation that is as close to the original input as possible. Autoencoder, by design, reduces data dimensions by learning how to ignore the noise in the data.
Some of these techniques are compared in the research paper: Deep Learning-Based Systems Developed for Fall Detection
(https://ieeexplore.ieee.org/document/9186685)
Most of the researchers have used CNN for developing the fall detection system ( 1D-CNN, 2D-CNN, 3D-CNN). 3D-CNN gives better performance. Some of them have used LSTM, RNN, and RCN, Auto-encoder with three layers, regression, and logistic classifier. Most of the auto-encoder architectures are used by the sensor-based systems using radar data.
1. Ma, X., Wang, H., Xue, B., Zhou, M., Ji, B., and Li, Y. Depth-based human fall detection via shape features and improved extreme learning machine. IEEE Journal of biomedical and health informatics 18, 6 (2014), 1915–1922.
2. Chen, C., Liu, K., and Kehtarnavaz, N. Real-time human action recognition based on depth motion maps. IEEE Journal of real-time image-processing 12, 1 (2016), 155–163.
3. Karpathy, A., Roderick, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1725–1732.
Thank you for reading the article! The contributors to this article are :
Vaishnavi Mal, Sandesh Mankar ,Pranav Paigude, Arya Patil, Indrayani Pawar.






Nice work...!!
ReplyDeleteHow are you planning to solve the occlusion problem??
Occlusion can be avoided by using depth images....
Deleteas both the occluding bodies will be at different depths.....a RGB+D image can be used for distinguishing between those bodies
Great!! Explained different ways of implementing Fall Detection.
ReplyDeleteVery well articulated, couldn't have explained it better !
ReplyDeleteVery Informative!
ReplyDeleteGreat
ReplyDeleteNice blog
ReplyDeletecan we use 1d conv layers instead of LSTM??
ReplyDeleteGreat work!
ReplyDeleteUnprecedented blog! Keep it up!
ReplyDeleteGreat explanation!! This must be implemented.
ReplyDeleteVery well explained
ReplyDelete