Spaces:

FQiao
/

SoundingStreet

Running on Zero

App Files Files Community

FQiao commited on 5 days ago

Commit

0d5b5bc

verified ·

1 Parent(s): 3324de2

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -50

README.md CHANGED Viewed

@@ -1,50 +1,12 @@
-**A training-free pipeline utilizing pre-trained generative models to synthesize sound for any street on Earth with available Street View panoramic images.**
-1.  Change to this directory:
-    ```
-    cd SoundingStreet
-    ```
-2. Create the conda environment:
-    ```
-    conda env create -f environment.yml
-    conda activate geosynthsound
-    ```
-3. Make sure to create necessary directories:
-    ```
-    mkdir -p logs output
-    ```
-4. Download checkpoint for depth estimator model:
-    ```
-    wget https://ommer-lab.com/files/depthfm/depthfm-v1.ckpt -P external_models/depth-fm/checkpoints/
-    ```
-5. Run the `SoundingStreet` demo:
-    ```
-    python main.py --panoramic --location "52.3436723,4.8529625"
-    ```
-    Intermediate files such as the downloaded panoramic image and perspective cut-outs can be found in `./logs/`, and output audios for each view as well as the composite audio for the location are saved as `./output/panoramic_composition.wav`
-## Acknowledgements
-- **InternVL2.5-8B-MPO**
-  For vision-language modeling, we employ InternVL2.5-8B-MPO, which is released under the MIT License.
-  GitHub: https://github.com/OpenGVLab/InternVL
-- **Grounding DINO**
-  We use Grounding DINO for open-set object detection. Grounding DINO is released under the Apache 2.0 License.
-  GitHub: https://github.com/IDEA-Research/GroundingDINO
-- **DepthFM**
-  We utilize the DepthFM model for monocular depth estimation. DepthFM is released under the MIT License.
-  GitHub: https://github.com/CompVis/depth-fm
-- **TangoFlux**
-  We incorporate TangoFlux for text-to-audio generation. TangoFlux is available for non-commercial research use only and is subject to the Stability AI Community License, WavCaps license, and the original licenses of the datasets used in training.
-  GitHub: https://github.com/declare-lab/TangoFlux
-Our repository's license and usage terms adhere to the respective licenses of these models.

+---
+title: SoundingStreet
+emoji: 🚀
+colorFrom: blue
+colorTo: pink
+sdk: gradio
+sdk_version: 5.21.0
+app_file: app.py
+pinned: false
+license: mit
+short_description: Sound Generation
+---