Spaces:
Running
Running
commit
Browse files
app.py
CHANGED
@@ -66,7 +66,7 @@ def main():
|
|
66 |
gr.Markdown("# Dataset")
|
67 |
gr.Markdown("""
|
68 |
- [Armenian Unified Exams](https://dimord.am/public/tests): collection of High School graduation test exams used in 2025 in Armenia. The highest achievable score per test is 20. The data is extracted from PDFs and manually prepared for LLM evaluation.
|
69 |
-
- MMLU-Pro-Hy: a massive multi-task test in MCQA format, inspired by the original [MMLU benchmark](https://arxiv.org/abs/2406.01574), adapted for the Armenian language. Currently, a stratified sample is sued for evaluation summing up to
|
70 |
"""
|
71 |
)
|
72 |
gr.Markdown("## Submission Guide")
|
|
|
66 |
gr.Markdown("# Dataset")
|
67 |
gr.Markdown("""
|
68 |
- [Armenian Unified Exams](https://dimord.am/public/tests): collection of High School graduation test exams used in 2025 in Armenia. The highest achievable score per test is 20. The data is extracted from PDFs and manually prepared for LLM evaluation.
|
69 |
+
- MMLU-Pro-Hy: a massive multi-task test in MCQA format, inspired by the original [MMLU benchmark](https://arxiv.org/abs/2406.01574), adapted for the Armenian language. Currently, a stratified sample is sued for evaluation summing up to 1000 questions in total. The Armenian version is generated through machine-translation. Resulting dataset went extensive post-processing to ensure high quality subsample is selected for evaluation..
|
70 |
"""
|
71 |
)
|
72 |
gr.Markdown("## Submission Guide")
|