awacke1 commited on
Commit
329f9c6
1 Parent(s): f21323a

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +12 -19
app.py CHANGED
@@ -62,34 +62,27 @@ Here is an outline of some of the most exciting recent developments in AI:
62
 
63
  - 馃摎 Datasets:
64
 
65
- - Universal Dependencies: A collection of annotated corpora for natural language processing in a range of languages, with a focus on dependency parsing.
 
 
66
  - [Universal Dependencies official website.](https://universaldependencies.org/)
67
-
68
- - WMT 2014: The fourth edition of the Workshop on Statistical Machine Translation, featuring shared tasks on translating between English and various other languages.
69
  - [WMT14 website.](http://www.statmt.org/wmt14/)
70
-
71
- - The Pile: An English language corpus of diverse text, sourced from various places on the internet.
72
  - [The Pile official website.](https://pile.eleuther.ai/)
73
-
74
- - HumanEval: A dataset of English sentences, annotated with human judgments on a range of linguistic qualities.
75
  - [HumanEval: An Evaluation Benchmark for Language Understanding](https://github.com/google-research-datasets/humaneval) by Gabriel Ilharco, Daniel Loureiro, Pedro Rodriguez, and Afonso Mendes.
76
-
77
- - FLORES-101: A dataset of parallel sentences in 101 languages, designed for multilingual machine translation.
78
  - [FLORES-101: A Massively Multilingual Parallel Corpus for Language Understanding](https://flores101.opennmt.net/) by Aman Madaan, Shruti Rijhwani, Raghav Gupta, and Mitesh M. Khapra.
79
-
80
- - CrowS-Pairs: A dataset of sentence pairs, designed for evaluating the plausibility of generated text.
81
  - [CrowS-Pairs: A Challenge Dataset for Plausible Plausibility Judgments](https://github.com/stanford-cogsci/crows-pairs) by Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Pascale Fung, and Caiming Xiong.
82
-
83
- - WikiLingua: A dataset of parallel sentences in 75 languages, sourced from Wikipedia.
84
  - [WikiLingua: A New Benchmark Dataset for Cross-Lingual Wikification](https://arxiv.org/abs/2105.08031) by Jiarui Yao, Yanqiao Zhu, Ruihan Bao, Guosheng Lin, Lidong Bing, and Bei Shi.
85
-
86
- - MTEB: A dataset of English sentences, annotated with their entailment relationships with respect to other sentences.
87
  - [Multi-Task Evaluation Benchmark for Natural Language Inference](https://github.com/google-research-datasets/mteb) by Micha艂 Lukasik, Marcin Junczys-Dowmunt, and Houda Bouamor.
88
-
89
- - xP3: A dataset of English sentences, annotated with their paraphrase relationships with respect to other sentences.
90
  - [xP3: A Large-Scale Evaluation Benchmark for Paraphrase Identification in Context](https://github.com/nyu-dl/xp3) by Aniket Didolkar, James Mayfield, Markus Saers, and Jason Baldridge.
91
-
92
- - DiaBLa: A dataset of English dialogue, annotated with dialogue acts.
93
  - [A Large-Scale Corpus for Conversation Disentanglement](https://github.com/HLTCHKUST/DiaBLA) by Samuel Broscheit, Ant贸nio Branco, and Andr茅 F. T. Martins.
94
 
95
 
 
62
 
63
  - 馃摎 Datasets:
64
 
65
+ **Datasets:**
66
+
67
+ 1. - **Universal Dependencies:** A collection of annotated corpora for natural language processing in a range of languages, with a focus on dependency parsing.
68
  - [Universal Dependencies official website.](https://universaldependencies.org/)
69
+ 2. - **WMT 2014:** The fourth edition of the Workshop on Statistical Machine Translation, featuring shared tasks on translating between English and various other languages.
 
70
  - [WMT14 website.](http://www.statmt.org/wmt14/)
71
+ 3. - **The Pile:** An English language corpus of diverse text, sourced from various places on the internet.
 
72
  - [The Pile official website.](https://pile.eleuther.ai/)
73
+ 4. - **HumanEval:** A dataset of English sentences, annotated with human judgments on a range of linguistic qualities.
 
74
  - [HumanEval: An Evaluation Benchmark for Language Understanding](https://github.com/google-research-datasets/humaneval) by Gabriel Ilharco, Daniel Loureiro, Pedro Rodriguez, and Afonso Mendes.
75
+ 5. - **FLORES-101:** A dataset of parallel sentences in 101 languages, designed for multilingual machine translation.
 
76
  - [FLORES-101: A Massively Multilingual Parallel Corpus for Language Understanding](https://flores101.opennmt.net/) by Aman Madaan, Shruti Rijhwani, Raghav Gupta, and Mitesh M. Khapra.
77
+ 6. - **CrowS-Pairs:** A dataset of sentence pairs, designed for evaluating the plausibility of generated text.
 
78
  - [CrowS-Pairs: A Challenge Dataset for Plausible Plausibility Judgments](https://github.com/stanford-cogsci/crows-pairs) by Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Pascale Fung, and Caiming Xiong.
79
+ 7. - **WikiLingua:** A dataset of parallel sentences in 75 languages, sourced from Wikipedia.
 
80
  - [WikiLingua: A New Benchmark Dataset for Cross-Lingual Wikification](https://arxiv.org/abs/2105.08031) by Jiarui Yao, Yanqiao Zhu, Ruihan Bao, Guosheng Lin, Lidong Bing, and Bei Shi.
81
+ 8. - **MTEB:** A dataset of English sentences, annotated with their entailment relationships with respect to other sentences.
 
82
  - [Multi-Task Evaluation Benchmark for Natural Language Inference](https://github.com/google-research-datasets/mteb) by Micha艂 Lukasik, Marcin Junczys-Dowmunt, and Houda Bouamor.
83
+ 9. - **xP3:** A dataset of English sentences, annotated with their paraphrase relationships with respect to other sentences.
 
84
  - [xP3: A Large-Scale Evaluation Benchmark for Paraphrase Identification in Context](https://github.com/nyu-dl/xp3) by Aniket Didolkar, James Mayfield, Markus Saers, and Jason Baldridge.
85
+ 10. - **DiaBLa:** A dataset of English dialogue, annotated with dialogue acts.
 
86
  - [A Large-Scale Corpus for Conversation Disentanglement](https://github.com/HLTCHKUST/DiaBLA) by Samuel Broscheit, Ant贸nio Branco, and Andr茅 F. T. Martins.
87
 
88