Spaces:
Running
Running
TITLE = """<h1 align="center" id="space-title">FinanceBench Leaderboard</h1>""" | |
INTRODUCTION_TEXT = """ | |
FinanceBench is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). | |
## Leaderboard | |
Submission made by our team are labelled "financebench". While we report average scores over different runs when possible in our paper, we only report the best run in the leaderboard. | |
See below for submissions. | |
""" | |
SUBMISSION_TEXT = """ | |
## Submissions | |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split. | |
Each question calls for an answer that is either a string (one or a few words), a number, or a comma separated list of strings or floats, unless specified otherwise. There is only one correct answer. | |
Hence, evaluation is done via quasi exact match between a model’s answer and the ground truth (up to some normalization that is tied to the “type” of the ground truth). | |
In our evaluation, we use a system prompt to instruct the model about the required format: | |
``` | |
You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER]. YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise. If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string. | |
``` | |
We advise you to use the system prompt provided in the paper to ensure your agents answer using the correct and expected format. In practice, GPT4 level models easily follow it. | |
We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optional: | |
``` | |
{"task_id": "task_id_1", "model_answer": "Answer 1 from your model", "reasoning_trace": "The different steps by which your model reached answer 1"} | |
{"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"} | |
``` | |
""" | |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" | |
CITATION_BUTTON_TEXT = r"""@misc{ | |
}""" | |
def format_error(msg): | |
return f"<p style='color: red; font-size: 20px; text-align: center;'>{msg}</p>" | |
def format_warning(msg): | |
return f"<p style='color: orange; font-size: 20px; text-align: center;'>{msg}</p>" | |
def format_log(msg): | |
return f"<p style='color: green; font-size: 20px; text-align: center;'>{msg}</p>" | |
def model_hyperlink(link, model_name): | |
return f'<a target="_blank" href="{link}" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">{model_name}</a>' | |