metadata

title: Video Human Fall Detector
emoji: 🐠
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.25.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Fall Detection Demo using LightCLIP

Fall Detection Demo using LightCLIP on Hugging Face Spaces

This project demonstrates a lightweight, transformer-based approach to detect human falls in video clips using a vision–language model (VLM). The demo is designed for complex scenes including multiple persons, obstacles, and varying lighting conditions. It employs a sliding-window technique to check multiple frames for robust detection and aggregates predictions over time to reduce false alarms.

Overview

The demo uses a pre-trained LightCLIP (or CLIP) model to compute image–text similarity scores between video frames and natural language prompts. Two prompts are used:

Fall Prompt: "A person falling on the ground."
Non-Fall Prompt: "A person standing or walking."

For each window of frames extracted from the video, the model computes similarity scores for each frame. The scores are aggregated over a sliding window, and if the average score for the "fall" prompt exceeds a defined threshold, a fall event is registered along with an approximate timestamp.

Project Files

app.py: The main application file containing the Gradio demo.
requirements.txt: Lists all the required Python libraries.
README.md: This file.

How to Run

Clone or download the repository into your Hugging Face Spaces.
Ensure the project is set to use the GPU plan in Spaces.
Spaces will automatically install the required libraries from requirements.txt.
Launch the demo by running app.py (Gradio will start the web interface).

Code Overview

Frame Extraction: The video is processed using OpenCV to extract frames (resized to 224×224).
LightCLIP Inference: The demo uses the Hugging Face Transformers library to load a CLIP model (acting as LightCLIP). It computes image embeddings for each frame and compares them to text embeddings of the fall and non-fall descriptions.
Temporal Aggregation: A sliding window (e.g. 16 frames with a stride of 8) is used to calculate average "fall" scores. Windows exceeding a threshold (e.g. 0.8) are flagged as fall events.
User Interface: A simple Gradio UI allows users to upload a video clip and displays the detection result along with a representative frame and list of detected fall times.

Customization

Model: Replace "openai/clip-vit-base-patch32" in app.py with your own LightCLIP model checkpoint if available.
Threshold & Window Size: Adjust parameters such as the detection threshold, window size, and stride for better results on your dataset.
Deployment: This demo is configured to run on a GPU-backed Hugging Face Space for real-time inference.

Enjoy experimenting with fall detection!