Odia Language Resources
Computer work in Odia language requires some special techniques and tools. Some such tools have been developed by Srujanika and some others have been adapted from other sources. These are being presented here for use by anyone interested. It is also hoped that the users will develop these furthe and add new ones as needed.
Also provided are links to other tolls which have been found useful in our work. There may be other such material which we do not know about – will add these if brought to our notice.
There are many websites now where digital Odia material are available. Srujanika is associated with some of hese either directly or indirectly. Some links to such websites and groups are given here.
Odia Unicode Keyboard
Odia Corpus Tools
Websites with Odia Material
Odia Unicode Keyboard
Unicode encoding developed later has the ability to overcome this restriction (Odia Unicode Chart). But it is hardly used for Odia language work. One of the problems was the faulty keyboard layout in MS Windows 7. Several keyboards have been developed to overcome this problem, but most do not adhere to the INSCRIPT keymap. Presented here is a keyboard layout developed by Srujanika based on INSCRIPT. the downloadable zip file contains yhe necessary files for both Windows and Linux, Keymap and instructions for use. It will work with Windows 7, 8 and 10. Click here for keyboard download.
Keyboard is of paramount importance for working with computers. ISCII encoding and INSCRIPT keyboard layout has been developed by C-DAC for the standardisation of Indian language computing. Although notification has been made for their implementation, it has been largely ignored and many different encodings/keyboard layouts for the 8-bit fonts have been marketed. This has led to a serious problem of non-iterchangeability of data composed with different engines. (Major coding charts for Odia).
Difficulties are faced with correct pronunciation of Odia words written with Roman alphabet. Several methods have been developed to overcome this difficulty and the use of diacritic marks is one of these. However, there is no standardisation for the use of diacritic marks for Odia. This and the difficulties in typing these marks without the use of specialised software has hampered its use. Srujanika uses a simple but adequate method using only six marks which can be typed with the operating system capabilities.
The diacritic marks are a part of extended Roman characters under the Unicode system. The decimal codes for these characters are used for typing purposes. A list of the marks used, their Unicode and decimal codes and instructions for use are given here. The desired alphabet is typed first and keeping the alt key depressed the decimal code is entered on the numerical keypad (num lock must be on). The combined character with diacritic mark is formed when the alt key is released. A font like Arial or Times New Roman must be used since not all fonts have the diacritic characters. It is also best to use a plain text editor like Notepad or Wordpad for typing these – word processors like MS Word Or LibreOffice Write do not form the combined characters in some cases. However, the typed material can be copied and pasted into other applications.
One needs to enter two diacritic marks on a single character which can be done by inputting the decimals codes one after the other with the alt key pressed. Diacritics for some old characters not used now but found in old texts are also included.
Odia Diacritic Chart and Instructions can be downloaded here
Programs for converting text composed in 8bit fonts to Unicode are given here. Please click on any title to download it.
1. deisciify) [Runs on Linux] – Text files with correct ISCII encoding can be converted to Unicode with this program. It can convert multiple files at a time in batch mode. This will work with text composed with GIST engine, e.g, LEAP Office, iLEAP, LEAPpp, ISM, as these follow ISCII encoding. Will not work with Shreelipe, Akruti and other engines not conforming to the ISCII standard. It will not work on native PageMaker files even when composed with GIST, but the file contents can be converted after exporting as text.
2. convert [Runs on Linux] – PageMaker files composed with GIST can be converted to Unicode with this program after exporting the contents as text.
3. correction [Runs on Linux] – Problems with interchanging of ba/wa and ja/ya are face often when converting 8bit text to Unicode. This program corrects such mistakes.
4. PageMaker files prepared with various engines like GIST, Shreelipi and Akruti can be converted with specific programs given here. These converters were developed by Srujanika in an early phase of the work and have been further developed by the Wikipedia group – GIST, Akruti, Shreelipi. The unified converter developed by the Wikipedia group can be used for different engines – Wikipedia converter। These converters will work on both Linux and Windows.
Building and analysing word corpora form an important part of linguistic studies. This can be done effectively through computer programs these days. Selecting single instances of the words with multiple occurrence and counting their frequencies can be done in an automated manner with the odisort program [Runs on Linux].
deisciify, convert, correction, odisort (Utility Programs) download: These four programs are best run from a single directory as these have common dependencies. These can be downloaded together along with instructions for use here. Utility Programs.
Websites Hosting Odia Material
Odia digital material are now available at many places on the internet. Some of these are:
1. Internet Archive
3. NIT – Rourkela
5. Odia Virtual Academy
Useful Websites of Associates
6. Arvind Gupta: It hosts all Srujanika’ publications along with an extensive collection of material on education and science.
7. Eklavya: Hosts many interesting books and magazines relating to education and science, mainly in Hindi and English. Very useful for children’s literature.