Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
Show more facetsThese levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
This corpus contains sound recordings and transcripts of two dialects of an Australian Aboriginal Language, Jaminjung an…
This corpus contains sound recordings and transcripts of two dialects of an Australian Aboriginal Language, Jaminjung and Ngaliwurru. The materials were recorded and transcribed by Eva Schultze-Berndt between 1993 and 1998.; This subcorpus contains transcripts annotations and recordings of Jaminjung data; This file giv…
The main purpose of this archive is the documentation of Trumai, a genetically isolate language spoken in Brazil (Xingu …
The main purpose of this archive is the documentation of Trumai, a genetically isolate language spoken in Brazil (Xingu reserve). Trumai is an endangered language, with a reduced number of speakers. The archive has linguistic and non-linguistic materials, as well as some studies about the Trumai language and culture. T…
This archive has been created for the documentation of the Savosavo language together with the people of Savo Island and…
This archive has been created for the documentation of the Savosavo language together with the people of Savo Island and the Florida Islands, Central Province, Solomon Islands. The corpus, which is still under construction, contains data on two neighboring but unrelated languages, Savosavo and Gela. In addition, it pre…
The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morp…
The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in three versions: A downloadable version in conllx format, a searchable version in the search inter…
The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet an…
The SemDax Corpus is a Danish human-annotated corpus relying on the combined wordnet and dictionary resources: DanNet and Den Danske Ordbog, and available through a CLARIN academic license. The corpus includes approx. 90,000 words, comprises six textual domains, and is annotated with sense inventories of different gran…
Texts in the IT Domain come from Libris, Open Office, Aktuel Naturvidenskab and have been collected in the DK-CLARIN pro…
Texts in the IT Domain come from Libris, Open Office, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The corpus consists of 1,101,059 words in 66 files. Communicative setting/Number of files: expert->advanced (5) expert->basic (61). All texts are in XML TEIP5 format (TE…
The corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 (http…
The corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 (http://europa.eu/rapid/search.htm). Each of the 5330 press releases (files) exist in Danish, English and German with app. 3,000,000 words for each language. All texts are in XML TEIP5 format (TEIP5DKCLA…
Texts in the Health and Medicine Domain come from netpatient.dk, Søfartsstyrelsen, Sundhedsstyrelsen, regionH, Libris, A…
Texts in the Health and Medicine Domain come from netpatient.dk, Søfartsstyrelsen, Sundhedsstyrelsen, regionH, Libris, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The corpus consists of 3,972,573 words in 3273 files. Communicative setting/Number of files: expert->expe…
Texts in the Nanotechnology domain come from iNano (Interdisciplinary Nanoscience Center, AU), Nano (DTU), Niels Bohr In…
Texts in the Nanotechnology domain come from iNano (Interdisciplinary Nanoscience Center, AU), Nano (DTU), Niels Bohr Institutet, Forskningscenter Risø, Ministeriet for Sundhed og Forebyggelse (via DTU), Miljøstyrelsen, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The co…
Texts in the Agriculture domain come from Danmarks JordbrugsForskning and have been collected in the DK-CLARIN project,…
Texts in the Agriculture domain come from Danmarks JordbrugsForskning and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The corpus consists of 2,376,029 words in 216 files. Communicative setting/Number of files: expert->expert (45) expert->advanced (24) expert->basic (142) advanced->basic (5). …