Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.27.1
title: RealtimeMonocularDepthModel
emoji: π
colorFrom: yellow
colorTo: pink
sdk: gradio
pinned: false
license: mit
short_description: Real-Time Monocular Depth Estimation for AR
sdk_version: 5.7.1
Real-time Depth Estimation using Knowledge Distillation
This project demonstrates real-time depth estimation using a compressed student model trained through knowledge distillation. Here's how it works:
Knowledge Distillation
The CompressedStudentModel was trained using knowledge distillation from a larger, more complex teacher model (DPT). This technique allows the smaller student model to learn from the teacher's predictions, effectively transferring knowledge and achieving comparable performance with reduced computational requirements.
Model Architecture
The student model uses an encoder-decoder architecture optimized for efficient depth estimation:
- Encoder: Extracts hierarchical features through convolutional layers and max pooling.
- Decoder: Upsamples features to produce a high-resolution depth map.
Real-time Processing
The model is designed for real-time inference on webcam input:
- Each frame is preprocessed and resized to 200x200 pixels.
- The frame is passed through the model to generate a depth map.
- The depth map is visualized as a 3D surface plot using matplotlib.
3D Visualization
The depth map is rendered as an interactive 3D surface, providing an intuitive representation of the scene's depth structure. The plot uses a viridis colormap to represent depth values, with warmer colors indicating closer objects and cooler colors for more distant ones.
Usage
To use this depth estimation tool:
- Ensure your webcam is connected and functioning.
- The interface will display your webcam feed and the corresponding 3D depth visualization in real-time.
- Move objects or your camera to see how the depth map changes dynamically.
This project showcases the potential of compressed models and knowledge distillation in creating efficient, real-time computer vision applications, and could be utilized in Augmented Reality.