Spaces:

PrunaAI
/

InferBench

Running

davidberenstein1957 commited on 14 days ago

Commit

0c350fd

1 Parent(s): 14b802d

fix: update InferBench description in app.py to emphasize speed instead of compression, and add a link to a related blog post for further information

Files changed (1) hide show

app.py CHANGED Viewed

@@ -84,7 +84,7 @@ with gr.Blocks("ParityError/Interstellar", css=custom_css) as demo:
                         We are Pruna AI, an open source AI optimisation engine and we simply make your models cheaper, faster, smaller, greener!
                         # 📊 About InferBench
-                        InferBench is a leaderboard for inference providers, focusing on cost, quality, and compression.
                         Over the past few years, we’ve observed outstanding progress in image generation models fueled by ever-larger architectures.
                         Due to their size, state-of-the-art models such as FLUX take more than 6 seconds to generate a single image on a high-end H100 GPU.
                         While compression techniques can reduce inference time, their impact on quality often remains unclear.
@@ -96,6 +96,8 @@ with gr.Blocks("ParityError/Interstellar", css=custom_css) as demo:
                         FLUX-juiced was obtained using a combination of compilation and caching algorithms and we are proud to say that it consistently outperforms alternatives, while delivering performance on par with the original model.
                         This combination is available in our Pruna Pro package and can be applied to almost every image generation model.
                         """
                     )
                 with gr.Column(scale=1):

                         We are Pruna AI, an open source AI optimisation engine and we simply make your models cheaper, faster, smaller, greener!
                         # 📊 About InferBench
+                        InferBench is a leaderboard for inference providers, focusing on cost, quality, and speed.
                         Over the past few years, we’ve observed outstanding progress in image generation models fueled by ever-larger architectures.
                         Due to their size, state-of-the-art models such as FLUX take more than 6 seconds to generate a single image on a high-end H100 GPU.
                         While compression techniques can reduce inference time, their impact on quality often remains unclear.
                         FLUX-juiced was obtained using a combination of compilation and caching algorithms and we are proud to say that it consistently outperforms alternatives, while delivering performance on par with the original model.
                         This combination is available in our Pruna Pro package and can be applied to almost every image generation model.
+                        A full blogpost on the method can be found [here](https://pruna.ai/blog/flux-juiced). # TODO: Add link
                         """
                     )
                 with gr.Column(scale=1):