SLENC is a corpus of approximately 32 million words, extracted from the online versions of two prominent English newspapers published in Sri Lanka, the Daily News and the Daily Mirror. The Daily News represents the voice of the state (government) as it is published by the Associated Newspapers of Ceylon Limited, a government owned corporation. The Daily Mirror, on the other hand is published by Wijeya Newspapers Limited, a privately founded media company. Both newspapers are dailies, published from Monday to Saturday. Apart from their popularity as is evident from the circulation figures cited above, a principal reason for choosing these two newspapers as sources of data for compiling SLENC is that SLENC intentionally follows the design of the South Asian Varieties of English (SAVE) Corpus (Bernaisch et al, 2011) compiled by the Department of English, Justus Liebig University, Giessen, Germany. The goal of the SAVE corpus is to provide a snapshot of acrolectal written English in six South Asian countries; the assumption being that the variety of English found in prominent English newspapers in each country very likely represents the most prestigious variety, and that it is also most likely to be the variety to be eventually codified (Bernaisch et al 2011, p.1). Mukherjee (2012) observes that “there is a small but influential minority for whom English is a first language in Sri Lanka. Their usage exerts an enormous normative influence on language in the media” (p. 198). While we cannot attest to all the texts in SLENC being the product of those for whom English is a first language, it is a fair assumption that many of the texts are indeed representative of this variety.
SLENC is intended to be used for diachronic analyses of Sri Lankan English alongside older newspaper corpora, training of staff and students in the use of software for annotation and corpus analysis, as well as in the fields of digital humanities and natural language processing.
Works cited
Bernaisch, Tobias; Christopher Koch; Joybrato Mukherjee & Marco Schilk (2011): Manual to the South Asian Varieties of English (SAVE) Corpus. Giessen: Justus Liebig University, Department of English.
Mukherjee, J. (2012). English in South Asia – Ambinormative orientations and the role of corpora: The state of the debate in Sri Lanka. DOI:10.1007/978-94-007-4578-0_12
Download the full corpus here
If you want to do a quick frequency search on the sample corpus click here