The EMILLE project was undertaken by the Unit for Computer Corpus Research on Language (UCREL) at Lancaster, together with a range of partners in Europe and South Asia.

Tony McEnery
The project was managed by a team led by Anthony McEnery as principal investigator with Robert Gaizauskas as co-investigator. Anthony McEnery is Professor of Linguistics at Lancaster University, and has been involved with corpus building and exploitation at Lancaster for fifteen years. He worked on the BNC, and was involved with the ET10/63, CRATER and MULTEXT projects. He was principal investigator on the MILLE and MILLEFT projects. He is now Director of Research at the Arts and Humanities Research Council.
Robert Gaizauskas
Robert Gaizauskas, Reader in Computer Science at Sheffield University, has been an investigator on eleven funded LE projects in the last seven years and has published over 40 refereed papers in Language Engineering during this period. As co-investigator on both EPSRC-supported GATE projects, he was well placed to guide the GATE extensions in the EMILLE project. He is also closely involved with the Computational Linguistics UK (CLUK) organisation.
Andrew Hardie
Andrew Hardie is a lecturer in Linguistics at Lancaster University. He worked as a research associate on the project (April 2002 - August 2003), overseeing the collection, encoding and development of the corpora and managing the project transcribers. He is the author of the text-format software Unicodify used to generate the texts in the written corpus, and developed the Unitag Software and its instantiation as a POS tagger for Urdu.
Paul Baker
Paul Baker is a lecturer in Linguistics at Lancaster University. He was formerly (September 2000 - April 2002) the research associate on the project. He was previously the principal RA on the MILLE project and has extensive experience of NIML corpus building, corpus encoding standards and corpus validation guidelines.
Hamish Cunningham
Hamish Cunningham worked on GATE for 7 years (and NLE infrastructure for a decade) and has run the GATE project for 5 years. His ambition is to become an IT manager.

The Central Institute of Indian Languages provided the project with access to a large pool of copyright cleared texts available as a result of their own data collection efforts in India. In relation to this we would like particularly to acknowledge the assistance of B.D. Jayaram and Prof Udaya Narayana Singh on the project.

Home | About | Who We Are | Languages | Encoding | Sample Data | Links | Contact Us