Thanks to my semester abroad to the University of East Anglia, I learned that I have a great fondness for the field of Machine Learning (ML). We used the Weka project, which I did not have a great fondness for. I found Weka to be abnormally slow and unintuitive to use both as a tool and as a platform. So I started working on my own library. I wanted a tool in java for statistical analysis, so JSAT was born (Java Statistical Analysis Tool).

JSAT is hosted over at google code, and provided under the GPL 3 for non-commercial use. As it is a learning project for myself, I choose to implement all functionality myself - no outside projects were used. This includes the matrix computations used by many algorithms. While pure performance is not my goal (C/C++ is better for heavily numerical code like this), I did want performance to be reasonable for small to medium sized problems. Efficient enough that one could use a powerful server to get a bigger job done on time. As such, much of the library is also multi-threaded, cache friendly, and uses efficient algorithms.

I am also eating my own dog food, and have been using JSAT to do my assignments as I focus my masters studies in Machine Learning. I've used JSAT for comparing algorithm properties and synthetic data sets, and clustering 'tweets' from 'twitter'

You can find more details at the project's page and wiki, including some simple examples to help get started.