|
--- |
|
title: Video Human Fall Detector |
|
emoji: 🐠 |
|
colorFrom: purple |
|
colorTo: red |
|
sdk: gradio |
|
sdk_version: 5.25.0 |
|
app_file: app.py |
|
pinned: false |
|
license: apache-2.0 |
|
short_description: Fall Detection Demo using LightCLIP |
|
--- |
|
|
|
# Fall Detection Demo using LightCLIP on Hugging Face Spaces |
|
|
|
This project demonstrates a lightweight, transformer-based approach to detect human falls in video clips using a vision–language model (VLM). The demo is designed for complex scenes including multiple persons, obstacles, and varying lighting conditions. It employs a sliding-window technique to check multiple frames for robust detection and aggregates predictions over time to reduce false alarms. |
|
|
|
## Overview |
|
|
|
The demo uses a pre-trained LightCLIP (or CLIP) model to compute image–text similarity scores between video frames and natural language prompts. Two prompts are used: |
|
- **Fall Prompt:** "A person falling on the ground." |
|
- **Non-Fall Prompt:** "A person standing or walking." |
|
|
|
For each window of frames extracted from the video, the model computes similarity scores for each frame. The scores are aggregated over a sliding window, and if the average score for the "fall" prompt exceeds a defined threshold, a fall event is registered along with an approximate timestamp. |
|
|
|
## Project Files |
|
|
|
- **app.py:** The main application file containing the Gradio demo. |
|
- **requirements.txt:** Lists all the required Python libraries. |
|
- **README.md:** This file. |
|
|
|
## How to Run |
|
|
|
1. **Clone or download the repository** into your Hugging Face Spaces. |
|
2. Ensure the project is set to use the **GPU plan** in Spaces. |
|
3. Spaces will automatically install the required libraries from `requirements.txt`. |
|
4. Launch the demo by running `app.py` (Gradio will start the web interface). |
|
|
|
## Code Overview |
|
|
|
- **Frame Extraction:** The video is processed using OpenCV to extract frames (resized to 224×224). |
|
- **LightCLIP Inference:** The demo uses the Hugging Face Transformers library to load a CLIP model (acting as LightCLIP). It computes image embeddings for each frame and compares them to text embeddings of the fall and non-fall descriptions. |
|
- **Temporal Aggregation:** A sliding window (e.g. 16 frames with a stride of 8) is used to calculate average "fall" scores. Windows exceeding a threshold (e.g. 0.8) are flagged as fall events. |
|
- **User Interface:** A simple Gradio UI allows users to upload a video clip and displays the detection result along with a representative frame and list of detected fall times. |
|
|
|
## Customization |
|
|
|
- **Model:** Replace `"openai/clip-vit-base-patch32"` in `app.py` with your own LightCLIP model checkpoint if available. |
|
- **Threshold & Window Size:** Adjust parameters such as the detection threshold, window size, and stride for better results on your dataset. |
|
- **Deployment:** This demo is configured to run on a GPU-backed Hugging Face Space for real-time inference. |
|
|
|
Enjoy experimenting with fall detection! |
|
|