Email Email    Print Print   

From Grant at The HoBB

Wednesday, October 1st, 2008

Grant wanted local (offline) versions of some HoBB sites and their published content on others sites which may possibly be discontinued.

Grant’s solution was to copy and paste the appropriate content into local documents which was a very time consuming process.

I created an offline spidering project which followed all the links from the home page of the respective sites that downloaded the entire content of the sites as offline files which can be browsed totally offline using a browser.

One of the sites was about 700Mb in size with over 16,000 pages. Can you imagine how long it would have taken to find. copy and paste the content?

The computer did it in about 6 hours. I think manually it could have taken weeks if not months.

Here’s part of the conversation:

BHARAT:
Grant,
I’ll start downloading the sites over the next few days.

GRANT:
Hi Bharat …. Does this mean you found a ‘number crunching’ way to grab everything? I hope you are not doing it the way you suggested I do it - it will take you an age!!

Best wishes, thanks as I marvel at either your stamina or extreme know-how!

BHARAT:
Just extreme know-how - the computer did what it’s made for.

GRANT:
I’ll pre-marvel at the computer as well as you. … I just enjoy marvelling. :)

Thank you

Email To A Friend Email To A Friend    Print This Post Print This Post    
1 Star2 Stars3 Stars4 Stars5 Stars (Rate this Post)
Loading ... Loading ...
Posted after Start of Transformation in: Testimonials from People , The HoBB

Post Your Comment: