We are very pleased to announce the first public release of the Oxford-NINJAL Corpus of Old Japanese (ONCOJ).
Old Japanese is the earliest attested stage of the Japanese language (mainly the 8th century AD). The texts from the period are mainly poetry. The ONCOJ is an ongoing, long-term collaborative research project between the Research Centre for Japanese Language and Linguistics in the University of Oxford, and the National Institute for Japanese Language and Linguistics, Tokyo.
The corpus is avallable through this website: http://oncoj.ninjal.ac.jp/
The ONCOJ contains the texts in original script and in a phonemic transcription. It is lemmatized and has annotation for mode of writing (phonographic or logographic), morphology, constituency, and grammatical function. This release presents the poetic texts from the period, approximately 90,000 words of text.
The corpus is searchable through a suite of online search facilities and both the full data in the corpus and individual search results are downloadable for offline use. The data is primarily presented in a Penn Historical style bracketed tree format, but will also soon be available in a TEI convertible xml format.
The Project Director is Bjarke Frellesvig (Professor of Japanese Linguistics, University of Oxford).