Voice over speed is super quick
Most of text you feed to the model is being read out extremely quick (1.5x - 3x speed)
Changing speed to 0.8 doesn't do much
Are there special instructions for that?
Tried locally with rtx 4090, same issue as on HG
Thanks for the interest. This is a known issue about our model. Will need some more research to fix that.
I recommend opening an issue or putting it on a feature list
Spent some time, seen same issue on videos people are sharing when showcasing the model -> but haven't found this is something known
Anyway, that for the model. This is amazing step forward, and if speed part is fixed, it gonna get major adoption [one person waiting for it ;) ]
same here, the speed out makes it unusable for my use cases. Although speed can be fixed in post-processing, it seems variable across the single inference generation.
Is it because of the max length? It's capped at around 3000, but I thought 3000 tokens was enough for longer text?