Spaces:

Luigi
/

Video-Human-Fall-Detection-with-CLIP

Sleeping

App Files Files Community

Video-Human-Fall-Detection-with-CLIP / README.md

Luigi

initial commit

c5ee215 18 days ago

preview code

raw

history blame contribute delete

2.93 kB

	---
	title: Video Human Fall Detector
	emoji: 🐠
	colorFrom: purple
	colorTo: red
	sdk: gradio
	sdk_version: 5.25.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Fall Detection Demo using LightCLIP
	---

	# Fall Detection Demo using LightCLIP on Hugging Face Spaces

	This project demonstrates a lightweight, transformer-based approach to detect human falls in video clips using a vision–language model (VLM). The demo is designed for complex scenes including multiple persons, obstacles, and varying lighting conditions. It employs a sliding-window technique to check multiple frames for robust detection and aggregates predictions over time to reduce false alarms.

	## Overview

	The demo uses a pre-trained LightCLIP (or CLIP) model to compute image–text similarity scores between video frames and natural language prompts. Two prompts are used:
	- Fall Prompt: "A person falling on the ground."
	- Non-Fall Prompt: "A person standing or walking."

	For each window of frames extracted from the video, the model computes similarity scores for each frame. The scores are aggregated over a sliding window, and if the average score for the "fall" prompt exceeds a defined threshold, a fall event is registered along with an approximate timestamp.

	## Project Files

	- app.py: The main application file containing the Gradio demo.
	- requirements.txt: Lists all the required Python libraries.
	- README.md: This file.

	## How to Run

	1. Clone or download the repository into your Hugging Face Spaces.
	2. Ensure the project is set to use the GPU plan in Spaces.
	3. Spaces will automatically install the required libraries from `requirements.txt`.
	4. Launch the demo by running `app.py` (Gradio will start the web interface).

	## Code Overview

	- Frame Extraction: The video is processed using OpenCV to extract frames (resized to 224×224).
	- LightCLIP Inference: The demo uses the Hugging Face Transformers library to load a CLIP model (acting as LightCLIP). It computes image embeddings for each frame and compares them to text embeddings of the fall and non-fall descriptions.
	- Temporal Aggregation: A sliding window (e.g. 16 frames with a stride of 8) is used to calculate average "fall" scores. Windows exceeding a threshold (e.g. 0.8) are flagged as fall events.
	- User Interface: A simple Gradio UI allows users to upload a video clip and displays the detection result along with a representative frame and list of detected fall times.

	## Customization

	- Model: Replace `"openai/clip-vit-base-patch32"` in `app.py` with your own LightCLIP model checkpoint if available.
	- Threshold & Window Size: Adjust parameters such as the detection threshold, window size, and stride for better results on your dataset.
	- Deployment: This demo is configured to run on a GPU-backed Hugging Face Space for real-time inference.

	Enjoy experimenting with fall detection!