Lesson 4: Fetching a Web Page

Here is a small program in Ruby that downloads the wikipedia entry for whatever string you put into st, in this case 'Book'. Wikipedia likes the first letter capitalized.


require 'net/http'
st='Book'
begin
page=Net::HTTP.get('en.wikipedia.org','/wiki/'+ st)
open(st+'.html','w') {|f| f.puts page }
rescue Exception
puts 'Is the Internet on?'
        logger.info($!.to_s)
end

All we are doing here, is invoking the get function from the net/http library, which fetches the desired web page and saves it as a text file with .html extension. We enclose this part in begin.. rescue, following which is the code to execute in case something goes wrong -- in our case, the most likely reason is that the computer isn't on the Net, so that is what we ask. However, to be able to diagnose the fault properly, you do need more information. When an exception is raised, and independent of any subsequent exception handling, Ruby places information relating to the exception in the global variable $! -- we simply write its contents to log/development.log using the instruction logger.info($!.to_s)


Feel free to intersperse your code with logger.info statements, providing intermediate values of crucial variables in the development log for debugging purposes. Later, such statements also help you track what the user did on your site.


Next lesson, we learn how to use the amazing Slicer-Dicer called hpricot, following which, we integrate all this with Rails.