Update app.py
Browse files
app.py
CHANGED
@@ -62,34 +62,27 @@ Here is an outline of some of the most exciting recent developments in AI:
|
|
62 |
|
63 |
- 馃摎 Datasets:
|
64 |
|
65 |
-
|
|
|
|
|
66 |
- [Universal Dependencies official website.](https://universaldependencies.org/)
|
67 |
-
|
68 |
-
- WMT 2014: The fourth edition of the Workshop on Statistical Machine Translation, featuring shared tasks on translating between English and various other languages.
|
69 |
- [WMT14 website.](http://www.statmt.org/wmt14/)
|
70 |
-
|
71 |
-
- The Pile: An English language corpus of diverse text, sourced from various places on the internet.
|
72 |
- [The Pile official website.](https://pile.eleuther.ai/)
|
73 |
-
|
74 |
-
- HumanEval: A dataset of English sentences, annotated with human judgments on a range of linguistic qualities.
|
75 |
- [HumanEval: An Evaluation Benchmark for Language Understanding](https://github.com/google-research-datasets/humaneval) by Gabriel Ilharco, Daniel Loureiro, Pedro Rodriguez, and Afonso Mendes.
|
76 |
-
|
77 |
-
- FLORES-101: A dataset of parallel sentences in 101 languages, designed for multilingual machine translation.
|
78 |
- [FLORES-101: A Massively Multilingual Parallel Corpus for Language Understanding](https://flores101.opennmt.net/) by Aman Madaan, Shruti Rijhwani, Raghav Gupta, and Mitesh M. Khapra.
|
79 |
-
|
80 |
-
- CrowS-Pairs: A dataset of sentence pairs, designed for evaluating the plausibility of generated text.
|
81 |
- [CrowS-Pairs: A Challenge Dataset for Plausible Plausibility Judgments](https://github.com/stanford-cogsci/crows-pairs) by Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Pascale Fung, and Caiming Xiong.
|
82 |
-
|
83 |
-
- WikiLingua: A dataset of parallel sentences in 75 languages, sourced from Wikipedia.
|
84 |
- [WikiLingua: A New Benchmark Dataset for Cross-Lingual Wikification](https://arxiv.org/abs/2105.08031) by Jiarui Yao, Yanqiao Zhu, Ruihan Bao, Guosheng Lin, Lidong Bing, and Bei Shi.
|
85 |
-
|
86 |
-
- MTEB: A dataset of English sentences, annotated with their entailment relationships with respect to other sentences.
|
87 |
- [Multi-Task Evaluation Benchmark for Natural Language Inference](https://github.com/google-research-datasets/mteb) by Micha艂 Lukasik, Marcin Junczys-Dowmunt, and Houda Bouamor.
|
88 |
-
|
89 |
-
- xP3: A dataset of English sentences, annotated with their paraphrase relationships with respect to other sentences.
|
90 |
- [xP3: A Large-Scale Evaluation Benchmark for Paraphrase Identification in Context](https://github.com/nyu-dl/xp3) by Aniket Didolkar, James Mayfield, Markus Saers, and Jason Baldridge.
|
91 |
-
|
92 |
-
- DiaBLa: A dataset of English dialogue, annotated with dialogue acts.
|
93 |
- [A Large-Scale Corpus for Conversation Disentanglement](https://github.com/HLTCHKUST/DiaBLA) by Samuel Broscheit, Ant贸nio Branco, and Andr茅 F. T. Martins.
|
94 |
|
95 |
|
|
|
62 |
|
63 |
- 馃摎 Datasets:
|
64 |
|
65 |
+
**Datasets:**
|
66 |
+
|
67 |
+
1. - **Universal Dependencies:** A collection of annotated corpora for natural language processing in a range of languages, with a focus on dependency parsing.
|
68 |
- [Universal Dependencies official website.](https://universaldependencies.org/)
|
69 |
+
2. - **WMT 2014:** The fourth edition of the Workshop on Statistical Machine Translation, featuring shared tasks on translating between English and various other languages.
|
|
|
70 |
- [WMT14 website.](http://www.statmt.org/wmt14/)
|
71 |
+
3. - **The Pile:** An English language corpus of diverse text, sourced from various places on the internet.
|
|
|
72 |
- [The Pile official website.](https://pile.eleuther.ai/)
|
73 |
+
4. - **HumanEval:** A dataset of English sentences, annotated with human judgments on a range of linguistic qualities.
|
|
|
74 |
- [HumanEval: An Evaluation Benchmark for Language Understanding](https://github.com/google-research-datasets/humaneval) by Gabriel Ilharco, Daniel Loureiro, Pedro Rodriguez, and Afonso Mendes.
|
75 |
+
5. - **FLORES-101:** A dataset of parallel sentences in 101 languages, designed for multilingual machine translation.
|
|
|
76 |
- [FLORES-101: A Massively Multilingual Parallel Corpus for Language Understanding](https://flores101.opennmt.net/) by Aman Madaan, Shruti Rijhwani, Raghav Gupta, and Mitesh M. Khapra.
|
77 |
+
6. - **CrowS-Pairs:** A dataset of sentence pairs, designed for evaluating the plausibility of generated text.
|
|
|
78 |
- [CrowS-Pairs: A Challenge Dataset for Plausible Plausibility Judgments](https://github.com/stanford-cogsci/crows-pairs) by Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Pascale Fung, and Caiming Xiong.
|
79 |
+
7. - **WikiLingua:** A dataset of parallel sentences in 75 languages, sourced from Wikipedia.
|
|
|
80 |
- [WikiLingua: A New Benchmark Dataset for Cross-Lingual Wikification](https://arxiv.org/abs/2105.08031) by Jiarui Yao, Yanqiao Zhu, Ruihan Bao, Guosheng Lin, Lidong Bing, and Bei Shi.
|
81 |
+
8. - **MTEB:** A dataset of English sentences, annotated with their entailment relationships with respect to other sentences.
|
|
|
82 |
- [Multi-Task Evaluation Benchmark for Natural Language Inference](https://github.com/google-research-datasets/mteb) by Micha艂 Lukasik, Marcin Junczys-Dowmunt, and Houda Bouamor.
|
83 |
+
9. - **xP3:** A dataset of English sentences, annotated with their paraphrase relationships with respect to other sentences.
|
|
|
84 |
- [xP3: A Large-Scale Evaluation Benchmark for Paraphrase Identification in Context](https://github.com/nyu-dl/xp3) by Aniket Didolkar, James Mayfield, Markus Saers, and Jason Baldridge.
|
85 |
+
10. - **DiaBLa:** A dataset of English dialogue, annotated with dialogue acts.
|
|
|
86 |
- [A Large-Scale Corpus for Conversation Disentanglement](https://github.com/HLTCHKUST/DiaBLA) by Samuel Broscheit, Ant贸nio Branco, and Andr茅 F. T. Martins.
|
87 |
|
88 |
|