huggingface/HuggingDiscussions · [FEEDBACK] Inference Providers

julien-c

Hugging Face org Jan 17

Any inference provider you love, and that you'd like to be able to access directly from the Hub?

reach-vb

Hugging Face org Jan 28

•

edited Jan 28

Love that I can call DeepSeek R1 directly from the Hub 🔥

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="together",
    api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1", 
    messages=messages, 
    max_tokens=500
)

print(completion.choices[0].message)

benhaotang

Jan 28

•

edited Jan 28

Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(

julien-c

Hugging Face org Jan 28

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

benhaotang

Jan 28

•

edited Jan 28

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

Thanks for your quick reply, good to know!

sylanaustin

Jan 28

Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...

Hazzzardous

Jan 28

Could be good to add featherless.ai

teentitan

Jan 28

TitanML !!

107 hidden messages

Expand all

sh8459131

17 days ago

Just signed up with HF and had some questions for the general community to help us get started. We plan to use the Cerebras Inference Provider using direct calls rather than routing through HF itself.

With a Pro subscription, are there any limits to token usage or queuing constraints when using a custom API key and direct calls? The free tier on Cerebras did have such constraints.

Thanks in advance

adiraja

14 days ago

Hey all, I'd like to make nCompass (https://docs.ncompass.tech/api-reference/quickstart) an inference provider on HF. We build GPU optimizations to be able to support an API without rate limits by maximizing GPU utilization. I would really appreciate it if someone could help us with the process of becoming an inference provider.

alexman83

5 days ago

Hi I have a problem using Smoleagents HfApiModel Inference, I noticed that even though I belong to an enterprise organization, the inference API uses the credits of my free account

and not those of the organization

yet I read here (https://huggingface.co./docs/inference-providers/en/pricing)
that it should automatically use those of the organization.
I wrote even here (https://discuss.huggingface.co/t/hugging-face-payment-error-402-youve-exceeded-monthly-quota/144968/10?u=alexman83) and it seems there is a bug...
How we can solve it? For our company. is important use that service...

Thank you for you help!

julien-c

Hugging Face org 2 days ago

With a Pro subscription, are there any limits to token usage or queuing constraints when using a custom API key and direct calls? The free tier on Cerebras did have such constraints.

@sh8459131 When using a custom key, requests are forwarded to Cerebras directly so their limits will apply

julien-c

Hugging Face org 2 days ago

@alexman83 can you share some sample code you're using? We might need to update smolagents to expose the new bill_to parameter. cc @albertvillanova for viz

alexman83

about 8 hours ago

@julien-c of course!
Thanks!

from smolagents import CodeAgent
from extraction_smolagents.custom_tools import CSVRetrieverTool
from smolagents import HfApiModel

from huggingface_hub import login
login()

prompt_template = """
# Prompt per Analisi e Estrazione di Topic a partire da una richiesta dell'utente
Sei un esperto analista di contenuti televisivi. Devi estrarre una lista di topics a partire da una richiesta dell'utente nel seguente modo

## Passaggio 1: Analizza la richiesta dell'utente
- Comprendi dalla richiesta dell'utente, indicata dopo la parola 'richiesta', quali sono i topic di suo interesse

## Passaggio 2: Confronto tematiche estratte con quelle fornite
- Leggi il file csv 'topics_info.csv' contente come informazioni il nome del topic (colonna name), le parole rappresentative (colonna representation) e i documenti rilevanti (representative_docs)
- Confronta le parole rappresentative dei vari topic con le tematiche estratte al Passaggio 1 e memorizza solo le righe dei topic che soddisfano questo requisito
- Adesso analizza per i topic memorizzati al passo precedente i documenti rappresentativi e verifica quali siano simili ai temi estratti dalla richiesta dell'utente al Passaggio 1 e memorizzali

## Passaggio 3: Generazione dell'output
Genera un file json contenente:
- la lista dei topic estratti usando il valore della colonna name
- la motivazione per cui sono stati scelti

Organizza il file json come nel segunete esempio:

json
{
    "topics": [<topic_1>, <topic_2>, <topic_3>],
    "motivazione: <motivazione>
}
"""

retriever = CSVRetrieverTool()
llm_model = HfApiModel(model_id='Qwen/Qwen2.5-Coder-32B-Instruct')
agent = CodeAgent(
    tools=[retriever],
    model=llm_model,
    verbosity_level=2,
    additional_authorized_imports = ['pandas']
)

question = prompt_template + '\n' + "voglio tematiche musicali"
answer = agent.run(question)
print(f"Answer: {answer}")

This is the custom class for reading CSV

from smolagents import Tool
import pandas as pd

class CSVRetrieverTool(Tool):
    
    name = "csv_retriever"
    description = "Uses the provided path to access a csv file using pandas dataframe"
    inputs = {
        "path": {
            "type": "string",
            "description": "The path containing the filename of the csv to read",
        }
    }
    output_type = "string"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def forward(self, path) -> str:
        df = pd.read_csv(path)
        return df.to_string()
```

julien-c

Hugging Face org about 8 hours ago

@alexman83 Merve ( @merve ) opened https://github.com/huggingface/smolagents/pull/1260 which will expose the bill_to param in smolagents' InferenceClient 🔥

julien-c

Hugging Face org about 8 hours ago

(you'll need to upgrade your smolagents version)