During my first semester of my Junior year I spent a semester abroad at the University of East Anglia. One of my classes there was Information Retrieval taught by Professor. Cox and Dr. Dan Smith. This was one of my assignments, and I would like to thank them for allowing me to post my results and the example emails we were given. Being myself, I went a bit beyond the actual requirements.

This project was about labeling email messages as spam or ham using Machine Learning. One of the goals of Machine Learning, and used in this project, is to learn information from labeled examples and use that information to produce a classification for future unseen emails.

You can download the Neatbeans project and examples emails here

For those more familiar with Machine Learning, I implemented the following algorithms.

  • ID3 (Decision Tree)
  • Naive Bayes (Multinomial & Multivariate)
  • K Means
  • K Nearest Neighbor
  • Rocchio
  • Support Vector Machine (Linear, Polynomial, & RBF Kernels)