Feed on
Posts
Comments

Ever wanted your personal copy of Amazon, well, if your mad enough, have a big enough hard drive and a very fast internet connection you can!

There are a few programs capable of ripping websites, the one I use is HTTrack Website Copier which:

Allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online.

If you point HTTrack at a blog, it will rip the entire site and store it as static pages on your PC (it cannot get the PHP code as PHP is a protected server side resource).

Playing with local copies of websites is a good way to learn, and this tool makes getting such local copies very easy.

HTTrack have a few rules they ask users to follow, such as:

We hereby ask people using this source NOT to use it in purpose of grabbing
emails addresses, or collecting any other private informations on persons.
This would disgrace our work, and spoil the many hours we spent on it.

Elsewhere there are warnings about not stealing copyright content etc, which you should respect, but, if all you want to do is keep a local copy of a site, backup your own work or just play with something there should be no issue.

Using HTTrack is very simple, after installing you choose the folder where the copy of the site is to be stored, give the copied site a name, and , if you need to, set some more advanced options. For example you can control the way links /site structure is preserved, or filter out different types of file, asshown below:

httrack3.png

There are lots of options for controlling how this software makes copies, once you are happy with your choices, you can start ripping:

httrack4.png

This program is a lot of fun, particularly watching the screen above, which shows the ripping operation in progress!

Related Posts

  • How to learn web design
  • Dumping Pay Per Post
  • Automatic Browsing with Firefox
  • Automatic Blogging with WordPress
  • RSS feed

    14 Comments

    Comment by Jon Lee

    I used to use a program called WebGator or something way back in the day to download entire websites overnight so I wouldn’t have to wait for pages to load the next day (14.4k modem :D )
    With broadband, I find myself using this type of program to download entire blogs as reading material for long commutes.

    (Comments wont nest below this level)
    Comment by SiteLogic

    My first modem was 2k, the internet wasnt up to much at the time, I used it for accessing BBS systems and sending mail via FidoNet, dont know if you ever messed with that stuff, was fun while it lasted….

     
     
    Comment by Andrea Micheloni

    HTTrack rocks!
    Just have to say the linux version is cooler; it starts a little webserver to be accessed with http://127.0.0.1/, and the whole process is shown there, in the web interface :) that looks like the Httrack homepage, by the way…

    (Comments wont nest below this level)
    Comment by SiteLogic

    Thanks for that, I have not tried the Linux version, I will have to do that :-)

    Generally Linux is better for all things web related, the only exception being graphics software…

    Comment by Jake

    Pardon my ignorance, but how does graphics software qualify as being Internet related? Web design?

    Between Pixel, Inkscape, Karbon14, GIMP, and Krita something can be done, and when they don’t work, there is Crossover and Wine.

    Comment by SiteLogic

    Yup all I use graphics software for is editing images for sites….

    Yeah you can do a lot with Gimp etc, but IMO the windows apps are better, if you can afford them… dont like using wine, always seems a bit glitchy…

    Comment by Jake

    See, everyone immediately associates GIMP with Linux graphics. There are really many more, though only 2 other ones I mentioned were raster.

    Wine can be made to work correctly with a bit of work. If you don’t feel like tinkering, Crossover Office can usually do the job right out of the box. I’ve only ever used it with PS7 though myself.

    And what’s wrong with me… mentioning two commercial products. :roll:

     
     
     
     
     
    Comment by apostolos

    OK, it is a method, but note the absoiute method.

    In Linux web servers, there is the mod_security that denies the access to web sites of web site ripping software like this. And HTTrack is one of them :(

    (Comments wont nest below this level)
    Comment by SiteLogic

    Yeah I know they can be blocked along with wget and other ‘grabbers’, I dont know what proportion of sites actually choose to block them though, and its a fun program all the same ;-)

     
     
    Comment by HMTKSteve

    I wonder if there is any sort of web site ripping software that gets around “members only” areas :)

    (Comments wont nest below this level)
    Comment by SiteLogic

    Not unless it knows the password :???:

     
     
    Comment by Web Design Media

    I think using the WGET under linux is a much better solution…no need for a graphical interface to download a website.

    (Comments wont nest below this level)
    Comment by SiteLogic

    Yup WGET is V useful, but I am not aware of a windows equivalent… and HT Track does a good job.

    Comment by Web Design Media

    Hope Microsoft hears about it…and may be one day we’ll just have to enter a command line on MS DOS to download a site…just as WGET :smile:

     
     
     

    Sorry, the comment form is closed at this time.