From Grant at The HoBB
Wednesday, October 1st, 2008Grant wanted local (offline) versions of some HoBB sites and their published content on others sites which may possibly be discontinued.
Grant’s solution was to copy and paste the appropriate content into local documents which was a very time consuming process.
I created an offline spidering project which followed all the links from the home page of the respective sites that downloaded the entire content of the sites as offline files which can be browsed totally offline using a browser.
One of the sites was about 700Mb in size with over 16,000 pages. Can you imagine how long it would have taken to find. copy and paste the content?
The computer did it in about 6 hours. I think manually it could have taken weeks if not months.
Here’s part of the conversation:
BHARAT:
Grant,
I’ll start downloading the sites over the next few days.
GRANT:
Hi Bharat …. Does this mean you found a ‘number crunching’ way to grab everything? I hope you are not doing it the way you suggested I do it - it will take you an age!!
Best wishes, thanks as I marvel at either your stamina or extreme know-how!
BHARAT:
Just extreme know-how - the computer did what it’s made for.
GRANT:
I’ll pre-marvel at the computer as well as you. … I just enjoy marvelling. :)
Thank you


