EMILLE (Enabling Minority Language Engineering) was a 3 year EPSRC project at Lancaster University and Sheffield University. Its end product was a 97 million word electronic corpus of South Asian languages, especially those spoken in the UK.
Obtaining a copy of the EMILLE Corpus
Related research: the Nepali Grammar Project
एमिली | ਏਮਿਲੀ | એમિલી | এমিলী | ایملی