For my first semester of my Junior year, I went abroad to the University of East Anglia. One of my classes there was Information Retrieval, taught by Professor Stephen Cox and Dr. Dan Smith. This was one of my assignments, and shares some code that later became part of my spam classification assignment. I would like to thank them for graciously allowing me to keep and show my results, and allowing me to do this project as an alternate examination.

This project was a project for me in particular, done in place of a normal exam. UEA did their exams at the end of the year and I would not be staying for the whole year. Search Light was inspired by OS X's Spot Light, which searches the computer for documents related to a query. However, Spot Light does not return documents listed by relevance.

Search Light is given a directory (or multiple) that it will recursively scan for text files, and then process them. Once complete, the user can give search light a query and it will return documents sorted by their relevance to the original query. Some tricks to give better results, such as Rocchio and stemming, are also available.

You can download the Search Light jar here.

Search Light was tested with the Reuters-21578 corpus with excellent results. To use Search light, go to File->Add Folders to add the folder you want to recursively scan to text files. Once complete (A dialog box will show up if it takes a longer amount of time), click File-> Build Index. Once that is complete, you can type in a query and hit search. Double click on the result to open it up in the default viewer on your computer. For best results, go to Edit->Feature Selection and select either stemming or stop list + stemming before you build the index. You can re build the index at any time.