![]() The dataset was named Datuk Corpus, and was published on Olam in 2013. It took more than two years of on and off work to convert the text from the original ASCII input to Unicode, and to clean, structure, and correct close to 200,000 entries. Needless to say, I was stumped by the scope of this project, and immediately started working on integrating it into Olam. I discovered the RTF file Datuk had posted a decade prior on an inactive Yahoo groups page around the time I was working on Olam. I do not know of the origin of the dictionary Datuk digtised, but it is poignant to think that the original author’s work lives on after a century. ![]() The Malaysian government conferred the title “Datuk” upon him in recognition of his exemplary services in the country, which then ended up being his nickname too. He was a Malayali settled in Malaysia, a prominent active social worker and educator. Joseph undertook in the late 90s, when he single-handledly digitised an out-of-copyright Malayalam-Malayalam dictionary along with many other books and posted them online at the expense of copious amounts of time out of his retirement. While the English-Malayalam corpus is crowdsourced, the Malayalam-Malayalam corpus (now known as the Datuk Corpus) was created out of the mammoth digitisation project the late “Datuk” K. The entire Olam corpus is open source (licensed under OdBL), or open data, rather. Since then, the English-Malayalam dictionary has been expanding slowly with crowdsourced entries. The first version of the Olam corpus was seeded with unattributed word lists I scraped together from random parts of the web, and several thousand entries I entered myself. It is actively used by millions of Malayalam speakers. It has an input box that responds to dictionary lookups in under ~50ms, exactly as it did in 2010. Olam’s website has stayed exactly the same for 10 years. It was built out of the frustration of not having an easily accessible online Malayalam dictionary, of the frustration at dictionary websites that insulted the reader’s intelligence with poor usability, terrible ad-ridden spamminess, and no respect for language. I have been running Olam, an English-Malayalam and Malayalam-Malayalam dictionary, since 2010. This post is also a personal note, something I have not attempted in a long time. Krishna, Alar, his Kannada-English dictionary, and its accidental discovery and open sourcing at an unlikely place, a stock brokerage, Zerodha. This is the story of a massive dictionary that will become the window to a language spoken by tens of millions of people for generations to come, a resource its author has donated to posterity. This is the story of a product of tenacity, selflessness, and passion a product that will transcend and outlive most technology we know of. Output example: enter ಹುಡುಕು and click on search icon you'll get meaning of it.ನಮಸ್ಕಾರ (Namaskāra)! This is not a post on fintech, or even technology for that matter. Click on import select dictionary.sql file from Dictionary_Server folder.Create new database called "Dictionary".find "Require" word and replace that whole line with Require all granted.wamp-> bin -> apache -> apache2.4-> conf -> extra -> "nf" open this file in notepad or any text editor. ![]() ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |