JLect's Integration of JMdict - A Sorting Comparison

  1. 5 years ago

    Zachary

    Mar 2013 Administrator

    I've finally ceded and integrated Jim Breen's dictionary (JMdict ) into JLect's database. This means that you can easily search for standard Japanese terms as well as dialectal terms. However, priority will always be placed on JLect's own dictionary as opposed to the JMdict.

    For those interested, I wrote a short write up on how JLect's implementation of the JMdict file compares in terms of sorting to other major online Japanese dictionaries based off either the JMdict or EDICT files. You can read the article here: http://www.jlect.com/downloads/JLect-JMDict-Sorting-Comparison.pdf

  2. Greetings.

    It is a pity the "sorting comparison" write-up only addresses English words. The WWWJDIC server is designed really as a Japanese-English service and it is made clear in the User Guide that attempting to use it in the English->Japanese direction "can result in misleading results" (see: http://www.csse.monash.edu.au/~jwb/wwwjdicinf.html#opins_tag )

    The User Guide suggests trying a combination of the "common word" and "exact match" criteria to narrow down a search using an English word. For "cow" this option brings up a single entry (牛/うし) which is appropriate. The "exact match" option also ignores the "to" at the start of verb glosses, so when searching for "sleep" it returns (in order): 寝る, 眠る, 睡眠, 眠り, 寝 and お休み. Again an appropriate list.

    WWWJDIC indexes at the token level, and the display is just an unordered list of the entries containing the token. There is no sorting as such. If you search for a Japanese word using a kana key, it displays the "common" entries first, followed by the rest. (It does two passes over the list.) See http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1MUJ%E3%81%93%E3%81%86%E3%81%98%E3%82%87%E3%81%86 for an example of this.

    For digging into JMdict vi WWWJDIC, the new Advanced Search option (http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1P ) provides a useful tool as you can use a combination of kanji, kana and English keys, including ones you want excluded.

  3. Zachary

    Jun 2013 Administrator

    Hello Jim,

    Thank you for your input and for the links. The reason the article deals chiefly with English word searches is that searching for something like 寝る in a data source that's Japanese-to-English oriented will inevitably yield the same results regardless of the interface, and kana-based searches are generally quite predictably organized (though I admit I could still have touched up on it). Considering this, I chose to focus on English-to-Japanese searches to see how different dictionaries compared in terms of sorting when using relatively the same source data.

    The idea behind the write up was to see how I could better optimize English-to-Japanese searches for language learners, using the data as is, with the main goal being to improve JLect's implementation and share my thoughts along the way. I realize this may have not been clear and understand that other dictionaries provide several advanced features and other tools, contrary to JLect which has none, but my intention was just to test a basic search.

    Regarding the WWWJDIC interface specifically, the lack of any specific output order is still an order in and of itself, hence the dictionary's inclusion – I still wanted to see how the results compared and to know the output method used. Nonetheless, I do admit the exact word-match is a nice option that I hadn't explored at the time.

    I hope this clarifies where I was coming from; I think the nice thing is that each of the dictionaries are different and have different focuses and features. And of course, I am grateful for all of your work.

    Out of curiosity, however, is there any reason why searching for the key string "to sleep" yields few results (and none when selecting common words or exact match)?

or Sign Up to reply!