FQiao commited on
Commit
0d5b5bc
·
verified ·
1 Parent(s): 3324de2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -50
README.md CHANGED
@@ -1,50 +1,12 @@
1
- **A training-free pipeline utilizing pre-trained generative models to synthesize sound for any street on Earth with available Street View panoramic images.**
2
-
3
- 1. Change to this directory:
4
- ```
5
- cd SoundingStreet
6
- ```
7
-
8
- 2. Create the conda environment:
9
- ```
10
- conda env create -f environment.yml
11
- conda activate geosynthsound
12
- ```
13
-
14
- 3. Make sure to create necessary directories:
15
- ```
16
- mkdir -p logs output
17
- ```
18
-
19
- 4. Download checkpoint for depth estimator model:
20
- ```
21
- wget https://ommer-lab.com/files/depthfm/depthfm-v1.ckpt -P external_models/depth-fm/checkpoints/
22
- ```
23
-
24
- 5. Run the `SoundingStreet` demo:
25
- ```
26
- python main.py --panoramic --location "52.3436723,4.8529625"
27
- ```
28
- Intermediate files such as the downloaded panoramic image and perspective cut-outs can be found in `./logs/`, and output audios for each view as well as the composite audio for the location are saved as `./output/panoramic_composition.wav`
29
-
30
-
31
- ## Acknowledgements
32
-
33
- - **InternVL2.5-8B-MPO**
34
- For vision-language modeling, we employ InternVL2.5-8B-MPO, which is released under the MIT License.
35
- GitHub: https://github.com/OpenGVLab/InternVL
36
-
37
- - **Grounding DINO**
38
- We use Grounding DINO for open-set object detection. Grounding DINO is released under the Apache 2.0 License.
39
- GitHub: https://github.com/IDEA-Research/GroundingDINO
40
-
41
- - **DepthFM**
42
- We utilize the DepthFM model for monocular depth estimation. DepthFM is released under the MIT License.
43
- GitHub: https://github.com/CompVis/depth-fm
44
-
45
- - **TangoFlux**
46
- We incorporate TangoFlux for text-to-audio generation. TangoFlux is available for non-commercial research use only and is subject to the Stability AI Community License, WavCaps license, and the original licenses of the datasets used in training.
47
- GitHub: https://github.com/declare-lab/TangoFlux
48
-
49
-
50
- Our repository's license and usage terms adhere to the respective licenses of these models.
 
1
+ ---
2
+ title: SoundingStreet
3
+ emoji: 🚀
4
+ colorFrom: blue
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 5.21.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ short_description: Sound Generation
12
+ ---