File size: 763 Bytes
f49345e
 
 
 
 
 
 
 
 
 
70c8dc9
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

TITLE = """<h1 align="center" id="space-title">KUMO Benchmark</h1>"""

DESCRIPTION = f"""
## Generative Evaluation of Complex Reasoning in Large Language Models

✨ KUMO is a novel benchmark designed to systematically evaluate the complex reasoning capabilities of Large Language Models (LLMs) through procedurally generated reasoning games. Explore the limits of LLM reasoning and track model performance on our interactive leaderboard.


"""

ABOUT = """

## About KUMO Benchmark

KUMO is a novel benchmark designed to systematically evaluate the complex reasoning capabilities of Large Language Models (LLMs) through procedurally generated reasoning games. Explore the limits of LLM reasoning and track model performance on our interactive leaderboard.


"""