Name Gender Classifier

Predicts whether a name is male or female.



How does it work? What does it do?

I built this classifier using a naive bayes algorithm. That is, if we consider x to be a name and y being male or female, we calculate p(y)-- the probability in the training set of a name being male or female-- then p(X|y) --given that we a looking at all males or all females, what is the probability of picking this name, then using bayes rule we get can find P(y|x)=P(x|y)P(y)/p(x) by approximating P(x|y)P(y). We assume that all the features in a name are independent in this approach; however, it makes for easier math. The error of this classifier is about 18% after being trained on about 200 names (100 male, 100 female). This can be improved by changing how I vectorize the names. For example, I could add a feature for "names ending in a" or "starting with J". This might improve accuracy because for example many female names end with "a" like Jessica or Melissa. It's funny: this classifier thinks the name Jane is male! I am planning on uploading more Machine Learning projects on my webpage eventually. Imagine what these algorithms can do when trained with the end goal of keeping you scrolling on instagram longer!