A newer version of the Gradio SDK is available:
5.28.0
title: Video Human Fall Detector
emoji: 🐠
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.25.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Fall Detection Demo using LightCLIP
Fall Detection Demo using LightCLIP on Hugging Face Spaces
This project demonstrates a lightweight, transformer-based approach to detect human falls in video clips using a vision–language model (VLM). The demo is designed for complex scenes including multiple persons, obstacles, and varying lighting conditions. It employs a sliding-window technique to check multiple frames for robust detection and aggregates predictions over time to reduce false alarms.
Overview
The demo uses a pre-trained LightCLIP (or CLIP) model to compute image–text similarity scores between video frames and natural language prompts. Two prompts are used:
- Fall Prompt: "A person falling on the ground."
- Non-Fall Prompt: "A person standing or walking."
For each window of frames extracted from the video, the model computes similarity scores for each frame. The scores are aggregated over a sliding window, and if the average score for the "fall" prompt exceeds a defined threshold, a fall event is registered along with an approximate timestamp.
Project Files
- app.py: The main application file containing the Gradio demo.
- requirements.txt: Lists all the required Python libraries.
- README.md: This file.
How to Run
- Clone or download the repository into your Hugging Face Spaces.
- Ensure the project is set to use the GPU plan in Spaces.
- Spaces will automatically install the required libraries from
requirements.txt
. - Launch the demo by running
app.py
(Gradio will start the web interface).
Code Overview
- Frame Extraction: The video is processed using OpenCV to extract frames (resized to 224×224).
- LightCLIP Inference: The demo uses the Hugging Face Transformers library to load a CLIP model (acting as LightCLIP). It computes image embeddings for each frame and compares them to text embeddings of the fall and non-fall descriptions.
- Temporal Aggregation: A sliding window (e.g. 16 frames with a stride of 8) is used to calculate average "fall" scores. Windows exceeding a threshold (e.g. 0.8) are flagged as fall events.
- User Interface: A simple Gradio UI allows users to upload a video clip and displays the detection result along with a representative frame and list of detected fall times.
Customization
- Model: Replace
"openai/clip-vit-base-patch32"
inapp.py
with your own LightCLIP model checkpoint if available. - Threshold & Window Size: Adjust parameters such as the detection threshold, window size, and stride for better results on your dataset.
- Deployment: This demo is configured to run on a GPU-backed Hugging Face Space for real-time inference.
Enjoy experimenting with fall detection!