Welcome...

My name is Tom Austin. Currently I am a PhD student in the SLANG lab of the CS department at UC Santa Cruz. Until recently, I had worked for Knight Ridder Digital (formerly KnightRidder.com, formerly Knight Ridder New Media, ... at this point I lose track of the names). If you are viewing this page and happen to be looking for software engineers, here is my resume.

XMUltra

XMUltra is a feed processing framework built by Knight Ridder for handling its news stories. They briefly open sourced it in 2003 before withdrawing it. After McClatchy merged with KnightRidder, they agreed to open source it again, which was very, very cool of them. I've been busy updating it for general usage.

Think of it like Ant for handling data feeds. Once you use it, feed processing is never the same again. Available at http://xmultra.sourceforge.net.

JOMP -- JavaScript One-metaclass Metaobject Protocol

For my master's thesis at San Jose State I explored metaobject protocols for different languages. I've added some new MOP features to Mozilla's Rhino JavaScript, and as a proof of concept I am working on integrating this into RhinoFaces. RhinoFaces is built on JavaServer Faces, but it is patterned more after Ruby on Rails.

This seems to be about the first paper on MOPs for languages with prototype-based object systems. It has some interesting characteristics. If you are interested in programming language design, take a look.

I~Poper

I~Poper

Recently, I have been working on I~Poper, a site for building narrated videos. It is built on PHP and MySql. Check it out.

Data Harvester

Data Harvester

Print classified ads, charged by the line, are by their very nature terse and cryptic. They tend to have little or no fielded data.

Until recently, this was not an issue. But as newspapers have begun to compete online, and often on the same sites as online-only ads, their ads get lost in the shuffle. See Apartments.com for a good example of this.

A tool is needed to parse through the text of ads and extract useful information from this text. This will give the ads the extra metadata that they need to compete online. To date, there has been no good commercial solution for this.

Working for a newspaper company, I had given this issue a lot of thought. I was taking a class in "Computers and Written Language" at San Jose State from Professor John Fry. For this class project, I wrote a tool to solve this problem.

DataHarvester is the result. It is written in Ruby and built upon XML files of regular expressions. It includes scripts to evaluate the success rate of the model. If you are interested in investigating it, you may download the source here. I used this as a proof of concept at my job, which went over well. I've rebuilt this in Java, and now I'm back to extending the Ruby version... Long story.

Other projects

These began as projects for San Jose State, but they came out particularly well, so I decided to put them up here.


Questions or comments? Contact me at admin@bias2build.com.

Also, check out my blog at http://tomTheMighty.blogspot.com/

Valid XHTML 1.0 Transitional