Home > Science > Social Sciences > Linguistics > Computational Linguistics > Corpus Analysis
The study of language through computerized corpora, or enormous samples of machine-readable text drawn from authentic language situations.
http://www.ling.gu.se/~lager/taglog.html
A 1996 thesis by Torbjörn Lager. Abstract available, as well as full text in PostScript and PDF formats.
http://www.natcorp.ox.ac.uk/
The BNC is balanced synchronic text corpus containing 100 million words annotated with parts of speech.
http://www.corpus.bham.ac.uk/
At the University of Birmingham, England. Information on programmes, research and available resources.
http://juppiter.fltr.ucl.ac.be/FLTR/GERM/ETAN/CECL/cecl.html
At the Catholic University of Leuven, this institute focuses on cross-linguistic corpora and learner corpora. Research, events, staff, publications.
http://subidadecliticos.blogspot.com/
Thesis study by Kertes Gábor that analyses the phenomenon of clitic climbing or clitic promotion. [Parallel Spanish and English]
http://www.lancs.ac.uk/fss/courses/ling/corpus/
Online lessons intended to supplement the book by Tony McEnery and Andrew Wilson. Introductory information on the field.
http://catalog.elra.info/
Various language resources and evaluation packages in the field of Human Language Technology (HLT) are available at ELRA (European Language Resources Association). Distribution is taken care of by ELRA's operational body: ELDA.
http://corpus.nytud.hu/mnsz/index_eng.html
More than 150 million Hungarian words, a model of Hungarian language of the 1990s. Free and extensive query system. [Hungarian, English]
http://www.ldc.upenn.edu/
The Linguistic Data Consortium (LDC) creates, collects and distributes speech and text databases, annotated corpora, treebanks, lexicons and other linguistic resources for research, education and development.
http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/
A French-German parallel corpus consisting of articles from Le Monde Diplomatique and die Tageszeitung, manually aligned and part-of-speech tagged.
http://www.psych.rl.ac.uk/
Web access to a large database of linguistic and psycholinguistic (but not semantic) data derived from a variety of sources.
http://nkjp.pl/
The National Corpus of Polish is a publicly available, large, balanced and linguistically annotated corpus of polish.
http://www.bultreebank.org/ProgramSProLaC03.html
Held at Lancaster University. Presented papers are available in PDF format.
http://www.cs.vassar.edu/sigann/
A subgroup of the Association for Computational Linguistics (ACL), this group is concerned with all aspects of linguistic annotation of language resources (linguistic corpora), especially the advancement of interoperability. Sponsors the annual Linguistic Annotation Workshop (LAW).
http://www.sigwac.org.uk/
A subgroup of the Association for Computational Linguistics (ACL) which promotes interest in the use of the Internet as a source of linguistic data, and as an object of study in its own right. Organizes the WAC workshops.
Home > Science > Social Sciences > Linguistics > Computational Linguistics > Corpus Analysis
Thanks to DMOZ, which built a great web directory for nearly two decades and freely shared it with the web. About us