Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
Show more facetsThese levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
This corpus contains sound recordings and transcripts of two dialects of an Australian Aboriginal Language, Jaminjung an…
This corpus contains sound recordings and transcripts of two dialects of an Australian Aboriginal Language, Jaminjung and Ngaliwurru. The materials were recorded and transcribed by Eva Schultze-Berndt between 1993 and 1998.; This subcorpus contains transcripts annotations and recordings of Jaminjung data; This file giv…
The main purpose of this archive is the documentation of Trumai, a genetically isolate language spoken in Brazil (Xingu …
The main purpose of this archive is the documentation of Trumai, a genetically isolate language spoken in Brazil (Xingu reserve). Trumai is an endangered language, with a reduced number of speakers. The archive has linguistic and non-linguistic materials, as well as some studies about the Trumai language and culture. T…
This archive has been created for the documentation of the Savosavo language together with the people of Savo Island and…
This archive has been created for the documentation of the Savosavo language together with the people of Savo Island and the Florida Islands, Central Province, Solomon Islands. The corpus, which is still under construction, contains data on two neighboring but unrelated languages, Savosavo and Gela. In addition, it pre…
DK-CLARIN Reference Corpus of General Danish has been collected as part of DK-CLARIN project, WP2.1, 2008 - 2011. All te…
DK-CLARIN Reference Corpus of General Danish has been collected as part of DK-CLARIN project, WP2.1, 2008 - 2011. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, ePOS-tagging, sentence and paragraph segmentation, and lemmatisation. The corpus comprises 45,113,245 words.
Texts in the Health and Medicine Domain come from netpatient.dk, Søfartsstyrelsen, Sundhedsstyrelsen, regionH, Libris, A…
Texts in the Health and Medicine Domain come from netpatient.dk, Søfartsstyrelsen, Sundhedsstyrelsen, regionH, Libris, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The corpus consists of 3,972,573 words in 3273 files. Communicative setting/Number of files: expert->expe…
Texts in the Agriculture domain come from Danmarks JordbrugsForskning and have been collected in the DK-CLARIN project,…
Texts in the Agriculture domain come from Danmarks JordbrugsForskning and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The corpus consists of 2,376,029 words in 216 files. Communicative setting/Number of files: expert->expert (45) expert->advanced (24) expert->basic (142) advanced->basic (5). …
Texts in the Environment Domain come from Hovedland, Danske Miljøundersøgelser, Det Økologiske Råd and Aktuel Naturviden…
Texts in the Environment Domain come from Hovedland, Danske Miljøundersøgelser, Det Økologiske Råd and Aktuel Naturvidenskab(via DMI). The corpus consists of 1,478,298 words in 93 files. Communicative setting/Number of files: expert->expert (2) expert->advanced (23) expert->basic (68). All texts are in XML TEIP5 fo…
Texts in the IT Domain come from Libris, Open Office, Aktuel Naturvidenskab and have been collected in the DK-CLARIN pro…
Texts in the IT Domain come from Libris, Open Office, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The corpus consists of 1,101,059 words in 66 files. Communicative setting/Number of files: expert->advanced (5) expert->basic (61). All texts are in XML TEIP5 format (TE…
The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) docum…
The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from 12 of the biggest Danish companies. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and pa…
The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet an…
The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet and Den Danske Ordbog, and available through a CLARIN academic license. The corpus includes approx. 90,000 words, comprises six textual domains, and is annotated with sense inventories of different gran…