Spidr and Raingrams are back, now with specs

course, crawler, generate, json, library, marshal, ngram, ngrams, obstacle, raingrams, random, rspec, rubygem, rubygems, spec, spider, spidr, text, web

Raingrams is back in action. After sitting on rubyforge for quite some time, I was asked to add some features to the general purpose Ngrams Ruby library. I ended up refactoring the code to handle probability calculations better (only recalculate the Maximum Likelihood Estimation (MLE) when the set of ngrams changes), removed the Unigram model (kinda pointless in a ngrams library), allow a trained model to be dumped to a file using Marshal and added the ability to generate random text from trained models. Raingrams also received a total of 133 new spec tests.

Install Raingrams:

$ sudo gem install raingrams

Spidr also received some new spec tests. After fixing a link handling bug in Spidr 0.1.1, I decided to create a Web Spider Obstacle Course for testing purposes. The course contains all manner of links (remote, local, relative, absolute, javascript, empty URLs and infinite looping links). The course also provides a JSON file containing spec information for how a web-spider should navigate the links. I also wrote a RSpec helper which imports the spec information from the JSON file and auto-generates spec tests for how Spidr::Agent should navigate the links in the obstacle course.

Install Spidr:

$ sudo gem install spidr