We have currently published the following linguistic datasets. All datasets except for MECORE-EN annotate
semantic properties of lexical items in a specific category based on original elicited judgments. MECORE-EN
records naturally occurring examples of clausal embedding in English with annotations of their syntactic
specifications. Please refer to the associated publications for their details.
| Name | Empirical Domain | Sample Languages | Data source | Publication |
| MultiCoS | connectives | 24 languages | Elicitation | LREC 2026 |
| MECORE-EN | clause-embedding predicates | English | Web-crawled corpora | SCiL 2025 |
| LiSU-Modals | modal auxiliaries | 24 languages | Elicitation | Linguistic Variation 2024 |
| MECORE-XLing | clause-embedding predicates | 14 languages | Elicitation | SigTyp 2023 |