When learning German, one of the most confusing features of the language is the noun gender system. In German, every noun has one of three genders (masculine/feminine/neuter), but unlike many other languages, these genders are seemingly not assigned based on any logical rule. Despite this, native German speakers as well as experienced German students are able to intuitively “guess” noun genders correctly. This led me to the logical conclusion that some underlying rules must exist. Furthermore, if humans can have an intuitive model of these rules, perhaps we can create a computer-based model, and figure out what these rules actually are!

This post is about my initial exploration of modeling German noun genders with some simple machine learning and statistics. I attempt to find some rules that might aid German learners (including myself) in figuring out noun genders.

Do you have any ideas as to how this approach could be improved? Do you have any experience with analysing German grammar? I'd love to hear about it, so feel free to write to me. I hope you've found some of the information here useful, and for those of you also learning German, good luck going forward!