Top

150,000 Free Japanese-English Sentences

Have you ever heard of the Tanaka Corpus? If you’ve been using Jim Breen’s fantastic free online Japanese-English dictionary JDIC, you may have unwittingly encountered it. The Tanaka Corpus is a collection of over 150,000 sentence pairs ideal for students learning Japanese. Perfect for repetitions in your favourite SRS software!

Here’s a brief introduction. If you just want to get started searching for example sentences, click the link below!

Search Tatoeba.org using a variety of different languages!

The corpus was compiled by Professor Yasuhito Tanaka at Hyogo University and his students, as described in his Pacling2001 paper. At Pacling2001 Professor Tanaka released copies of the corpus, and stated that it is in the public domain. According to Professor Christian Boitet, Professor Tanaka did not think the collection was of a very good standard. (Sadly, Prof. Tanaka died in early 2003.)
At the 2002 Papillon workshop in Tokyo, Professor Boitet included a copy of the corpus in a CD, distributed to participants, and suggested that it may serve as examples in a dictionary. Jim Breen realised it had the potential to be a source of example sentences in the WWWJDIC server. He edited, reformatted and indexed the corpus and linked it at the word level to the dictionary function in the server.
The inclusion of the Corpus in the WWWJDIC server exposed it to a wide audience, and a number of other systems incorporated the corpus into their operation. It also began to be used in some research projects in natural language processing.
In 2006 the Corpus was incorporated into the Tatoeba Project being developed by Trang Ho to provide a sentence-based multi-lingual resource. That project is now the “home” of the Corpus.

Source

Warning

There are however several caveats that the avid student of Japanese must be aware of in order to use this resource safely and effectively. Although the original Tanaka Corpus has come a long way and been cut from 212,000 sentences to 150,000 (mostly removing duplicates and correcting errors), there are still areas where it is unreliable.

・ There are many English sentences that do not sound natural. They may be the result of old-fashioned English or contrived sentences originally taken from textbooks.
・ There are also sentences that sound unnatural in Japanese, such as direct, hard translations of English to Japanese, or sentences which sound odd out of context.

Most people reading this will no doubt be native English speakers, so dodgy English sentences shouldn’t worry us much as we can work around them. Unnatural Japanese sentences, however, are a problem. This means the resource is best suited to intermediate students who already have a firm grasp of the language and have a good chance of being able to distinguish between a natural and unnatural Japanese sentence.

Beginners can benefit too, but be sure to check as much as you can with native or fluent speakers to make sure you are not memorising an odd expression. Also be aware of gender differences in language so that you don’t end up sounding like the opposite sex when you speak!

There are many other websites and applications that use this public domain database as their primary data source but do not tell you about the problems with it, so double check your favourite application to see where they get their sentence examples from. It takes a lot of work to compile original Japanese-English sentence pairs, so there is a good chance that they took the easy route and used the Tanaka Corpus.

You can download the latest version on Jim Breen’s site here, but you can browse the corpus on ManyThings or search for example sentences containing a particular word on Tatoeba using a variety of different languages! If all else fails, here is a mirror on Gakuu (but please be aware that this file will quickly become outdated). Mirrored file 21st November 2011.

The corpus is available free to all under a Creative Commons CC BY licence.

, , ,

8 Responses to Gakuu. Studying with Real Japanese

  1. Sabrina September 29, 2010 at 5:48 pm #

    This is pretty cool! Right now I’m studying on Textfugu which is great! I belive Gakuu can be really helpul too. What I like most is that its for pre-intermediate to advanced level students. So you can’t say its the same thing over and over again… students can actually improve their knowledge here. Thank you very much for creating this site. Have a nice day :)
    -Sabrina

    • Gakuranman September 30, 2010 at 1:09 am #

      Hi Sabrina! Thank you for your comment :).

      That’s definitely our aim. I love Textfugu for beginners and really getting students a solid grounding in the language, but afterwards (and even while) studying the basics, it can really help to encounter raw Japanese material. You don’t have to understand everything at first, but feeling challenged and picking up little bits here and there that are extra to your learning the basics helps expand your mind. Let me know if you have any more questions! More demonstration material will be up soon! We are currently having a special launch sale price for early adopters, so check out the pricing page if interested :).

      • Sabrina September 30, 2010 at 5:14 pm #

        Thanks for your reply. :) Unfortunately I’m even still miles away from the intermediate level. But I’ll definetly return to Gakuu when I get to this point. Anyway, I’m looking forward to the extra demonstration material. :) Keep up the good work.

        • Gakuranman October 1, 2010 at 12:21 am #

          Sure thing :). Let me know if you have any other questions or suggestions for things you’d like to see on Gakuu!

  2. missingno15 October 1, 2010 at 7:31 pm #

    When I looked at this, I first thought to myself, “aw hell no, gakuranman is doing the same thing as koichi…even the website layout is similar”. But then I realized “it’s aimed at pre-intermediate to advanced level students” which is perfect for my situation right now because I now really want to excel way past beginner. So basically, Gakuu really complements Textfugu. Can’t wait for more lessons to see how this is gonna be like so I can decide if its worth getting.

    • Gakuranman October 1, 2010 at 7:37 pm #

      Hey there! Thanks for dropping by :). No way – Koichi and I are buds. I’ve always loved teaching the more advanced stuff so it worked out perfectly. I’ll be adding more stuff in the coming days, so please stay tuned!

  3. DumbOtaku (percent20) October 3, 2010 at 12:58 am #

    This is really cool. I am glad to see more online content going beyond just teaching hirigana and katakana. That is what I try to do on my blog, but with to little consistency. Glad to see an expert do it, btw already a signed-up paid member now. :)

    • Gakuranman October 3, 2010 at 2:24 pm #

      Glad to have you man! Look forward to hearing any suggestions you have for the site and future lessons :)

Leave a Reply

Need a break? Have an adventure on Gakuran.com