Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror

Comment Re:Okay kids...(in Ruby) (Score 2, Interesting) 104

I couldn't resist - in Ruby, using the beautiful (but much understated) hpricot library:

doc = Hpricot(open(html_document))
(doc/"a").each { |a| puts a.attributes['href'] }

Check it out - I've been using it for a project, and it's really fast and really easy to use (supports both xpath and css for parsing links). For spidering you should check out the ruby mechanize library (which is like perl's www-mechanize, but also uses hpricot, making parsing the returned document much easier).

Slashdot Top Deals

The flow chart is a most thoroughly oversold piece of program documentation. -- Frederick Brooks, "The Mythical Man Month"

Working...