Saturday, August 2, 2008

Classifier angst

The Bayessian Classifier is working...now I'm onto the Decision Tree classifier. What I noticed immediately is that these two classifiers don't really appear to be useful for the same types of data. Dr. Hung gave me several sets of data and I think that they would all be classified more accurately with the Bayesian Classifier.

Before I started this project, I hadn't really thought about how different data categorizes in different ways. The book I've been using for my algorithms is the excellent book by Toby Seagaran, Programming Collective Intelligence. In the chapter where he compares these classifiers, he points out how data can masquerade as meaningful data, but if it's being manipulated in the wrong way, will be meaningless. It's very important to pick the right type of classifier for the right type of data.

My challenge for coding the decision tree, is to code it so that I can use the iris measurement data AND the data in the Collective Intelligence book. My time is starting to run very short, but so far, the DTC's logic is much easier to code and test.

The categories for both of these have been pretty simple. I'm wondering what would happen if my categories got more complex.

2 comments:

Ntino said...

the book by Segaran is indeed a very good one around classification algorithms...it's published by O'Reilly and what I love about books from this house is that they practical! They have real-world examples and code, that get you started right away, unlike textbooks....

Marlena Compton said...

I completely agree with ntino's comment. I've waded through "the" AI textbook by Russell & Norvig and can honestly say that Segaran's book gave me a real world grounding that was lacking in Russell & Norvig. The two books together really are a fantastic one-two punch for learning AI...just be sure to stay on your toes and avoid getting knocked out! This topic requires Focus(note the capital F).