Contact Me Contact Me   Comment On Comment on   Email To Email To    Print This Print This   

From Grant at The HoBB

Wednesday, October 1st, 2008

Grant wanted local (offline) versions of some HoBB sites and their published content on others sites which may possibly be discontinued.

Grant’s solution was to copy and paste the appropriate content into local documents which was a very time consuming process.

I created an offline spidering project which followed all the links from the home page of the respective sites that downloaded the entire content of the sites as offline files which can be browsed totally offline using a browser.

One of the sites was about 700Mb in size with over 16,000 pages. Can you imagine how long it would have taken to find. copy and paste the content?

The computer did it in about 6 hours. I think manually it could have taken weeks if not months.

Here’s part of the conversation:

BHARAT:
Grant,
I’ll start downloading the sites over the next few days.

GRANT:
Hi Bharat …. Does this mean you found a ‘number crunching’ way to grab everything? I hope you are not doing it the way you suggested I do it – it will take you an age!!

Best wishes, thanks as I marvel at either your stamina or extreme know-how!

BHARAT:
Just extreme know-how – the computer did what it’s made for.

GRANT:
I’ll pre-marvel at the computer as well as you. … I just enjoy marvelling. :)

Thank you

RATE THIS POST
Move Your Mouse Over Stars and Click
1 Star2 Stars3 Stars4 Stars5 Stars
(No Ratings Yet)
Loading ... Loading ...

Contact Me About This Post Contact Me About This   Comment On This Post Comment On This   Email To A Friend Email To A Friend
Print This Post Print This Post   
Posted in: Testimonials / References , The HoBB

Post Your Comment: