Spaces:

SE-Arena
/

Software-Engineering-Arena

Running

App Files Files Community

Software-Engineering-Arena / README.md

zhiminy

Update README.md

065faaf 12 days ago

preview code

raw

history blame contribute delete

4.05 kB

	---
	title: SE-Arena
	emoji: 🛠️
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.25.2
	app_file: app.py
	hf_oauth: true
	pinned: false
	short_description: The chatbot arena for software engineering
	---

	# SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering

	Welcome to SE Arena, an open-source platform designed for evaluating software engineering-focused foundation models (FMs), particularly large language models (LLMs). SE Arena benchmarks models in iterative, context-rich workflows that are characteristic of software engineering (SE) tasks.

	## Key Features

	- Multi-Round Conversational Workflows: Evaluate models through extended, context-dependent interactions that mirror real-world SE processes.
	- RepoChat Integration: Automatically inject repository context (issues, commits, PRs) into conversations for more realistic evaluations.
	- Advanced Evaluation Metrics: Assess models using a comprehensive suite of metrics including:
	- Traditional metrics: Elo score and average win rate
	- Network-based metrics: Eigenvector centrality, PageRank score
	- Community detection: Newman modularity score
	- Consistency score: Quantify model determinism and reliability through self-play matches
	- Transparent, Open-Source Leaderboard: View real-time model rankings across diverse SE workflows with full transparency.

	## Why SE Arena?

	Existing evaluation frameworks (like Chatbot Arena, WebDev Arena, and Copilot Arena) often don't address the complex, iterative nature of SE tasks. SE Arena fills critical gaps by:

	- Supporting context-rich, multi-turn evaluations to capture iterative workflows
	- Integrating repository-level context through RepoChat to simulate real-world development scenarios
	- Providing multidimensional metrics for nuanced model comparisons
	- Focusing on the full breadth of SE tasks beyond just code generation

	## How It Works

	1. Submit a Prompt: Sign in and input your SE-related task (optional: include a repository URL for RepoChat context)
	2. Compare Responses: Two anonymous models provide responses to your query
	3. Continue the Conversation: Test contextual understanding over multiple rounds
	4. Vote: Choose the better model at any point, with ability to re-assess after multiple turns

	## Getting Started

	### Prerequisites

	- A [Hugging Face](https://huggingface.co.) account
	- Basic understanding of software engineering workflows

	### Usage

	1. Navigate to the [SE Arena platform](https://huggingface.co./spaces/SE-Arena/Software-Engineering-Arena)
	2. Sign in with your Hugging Face account
	3. Enter your SE task prompt (optionally include a repository URL for RepoChat)
	4. Engage in multi-round interactions and vote on model performance

	## Contributing

	We welcome contributions from the community! Here's how you can help:

	1. Submit SE Tasks: Share your real-world SE problems to enrich our evaluation dataset
	2. Report Issues: Found a bug or have a feature request? Open an issue in this repository
	3. Enhance the Codebase: Fork the repository, make your changes, and submit a pull request

	## Privacy Policy

	Your interactions are anonymized and used solely for improving SE Arena and FM benchmarking. By using SE Arena, you agree to our Terms of Service.

	## Future Plans

	- Analysis of Real-World SE Workloads: Identify common patterns and challenges in user-submitted tasks
	- Multi-Round Evaluation Metrics: Develop specialized metrics for assessing model adaptation over successive turns
	- Enhanced Community Engagement: Enable broader participation through voting and contributions
	- Expanded FM Coverage: Include domain-specific and multimodal foundation models
	- Advanced Context Compression: Integrate techniques like LongRope and SelfExtend to manage long-term memory

	## Contact

	For inquiries or feedback, please [open an issue](https://github.com/SE-Arena/Software-Engineering-Arena/issues/new) in this repository. We welcome your contributions and suggestions!