With the terminal: Download a complete website with Wget

2 minutes

Nothing better than Wikipedia to explain what this tool consists of:

GNU Wget is a free software tool that allows the downloading of content from web servers in a simple way. Its name derives from World Wide Web (w), and from "get" (in English get), this means: get from the WWW.

Currently it supports downloads using the HTTP, HTTPS and FTP protocols.

Among the most outstanding features it offers wget there is the possibility of easy downloading of complex mirrors recursively, conversion of links to display HTML content locally, support for proxies ...

It is true that there are other applications that help us to perform this type of work such as httrack or even extensions for Firefox as Scrapbook, but nothing like the simplicity of a terminal 😀

Doing the magic

I was curious about the movie: The Social Network, as the character of mark_zuckerberg use the phrase: «A bit of magic wget«, When I was about to download the photos for Facemash 😀 and it's true, wget allows you to do magic with the appropriate parameters.

Let's look at a couple of examples, let's start with the simple use of the tool.

To go down a page:

$ wget https://blog.desdelinux.net/con-el-terminal-bajar-un-sitio-web-completo-con-wget

To download the entire site recursively, including images and other types of data:

$ wget -r https://blog.desdelinux.net/

And here comes the magic. As they explain us in the article of Humans, many sites verify the identity of the browser to apply various restrictions. With Wget we can circumvent this in the following way:

wget -r -p -U Mozilla https://blog.desdelinux.net/

Or we can also pause between each page, since otherwise the site owner may realize that we are downloading the site completely with Wget.

wget --wait=20 --limit-rate=20K -r -p -U Mozilla https://blog.desdelinux.net/

Full path to article: Desde Linux » GNU / Linux » With the terminal: Download a complete website with Wget

Leave a Comment Cancel reply

pandev92 said
ago 12 years

There is something to download only the images xd?

Reply to pandev92
1. Courage said
  ago 12 years
  
  http://buscon.rae.es/draeI/SrvltConsulta?TIPO_BUS=3&LEMA=vicio
  
  That I just read your mind hahahaha
  
  Reply to Courage
  1. pandev92 said
    ago 12 years
    
    lol oo xd
    
    Reply to pandev92
2. KZKG ^ Gaara said
  ago 12 years
  
  man wget 😉
  
  Reply to KZKG ^ Gaara
  1. pandev92 said
    ago 12 years
    
    Life is too short to read mans.
    
    Reply to pandev92
    1. KZKG ^ Gaara said
      ago 12 years
      
      Life is too short to fill the brain with information, but it is still valid to try 🙂
      
      Reply to KZKG ^ Gaara
      1. pandev92 said
        ago 12 years
        
        Information is worth half, I prefer to fill it with women, games and money if possible XD.
        
        Reply to pandev92
      2. Courage said
        ago 12 years
        
        You are always fucking thinking about women. From now on you will be listening to Dadee Yankee, Don Omar and Wisin Y Yandel like KZKG ^ Gaara does.
        
        Dedicate yourself better to money, which is the most important thing in this life
        
        Reply to Courage
        
        KZKG ^ Gaara said
        ago 12 years
        
        There are things that are worth much more than money ... for example, being in history, making a difference, being remembered for how much you managed to contribute to the world; and not for how much money did you have when you died 😉
        
        Try not to become a man of success but a man of courage, Albert Einsein.
      3. Courage said
        ago 12 years
        
        And can a beggar living under a bridge do that without having a penny?
        
        Well, no
        
        Reply to Courage
      4. Courage said
        ago 12 years
        
        *to have
        
        Reply to Courage
      5. pandev92 said
        ago 12 years
        
        Courage, I had my reggaeton era and well no longer, that was years ago, I only listen to Japanese music and classical music, and with the money… we are working on it :).
        
        Reply to pandev92
      6. pandev92 said
        ago 12 years
        
        I do not care to be remembered, gara, when I will have died I will have died and screw the others, since I will not even be able to know what they think of me. What is it worth to be remembered but you can be proud of it xD.
        
        Reply to pandev92
3. hypersayan_x said
  ago 12 years
  
  To download a specific type of files you can use filters:
  
  https://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html
  
  And a tip, if you are going to clone a very large page, it is recommended that you do it through a proxy such as tor, because otherwise there are certain pages that have reached a certain number of consecutive requests, blocking your IP for several hours or days .
  The other time that happened to me when I wanted to clone a wiki.
  
  Reply to hipersayan_x
4. mdir said
  ago 11 years
  
  An extension, which I use in Firefox, downloads only images; it's called "Save Images 0.94"
  
  Reply to Mdir
Brown said
ago 12 years

eh a question hehe where are the files that I download saved? They are going to want to kill me, right? LOL

Reply to Pardo
1. KZKG ^ Gaara said
  ago 12 years
  
  The files are downloaded to the folder where you are located in the terminal when executing wget 😉
  
  Reply to KZKG ^ Gaara
auroszx said
ago 12 years

Ahh, I didn't imagine that wget could have such an interesting use… Now, regarding the use that Courage mentions… No words 😉

Reply to AurosZx
Carlos-Xfce said
ago 12 years

Does anyone know if there is a WordPress plug-in that prevents Wget from downloading your blog?

Reply to Carlos-Xfce
darzee said
ago 12 years

Well it suits me great !! Thank you

Reply to darzee
piolavski said
ago 12 years

Very good, let's try to see how, thanks for the contribution.

Reply to piolavski
lyairmg said
ago 12 years

Although I consider myself a beginner this is easy for me now I will try to mix it with other things and see what it gives….

Reply to lyairmg
oswaldo said
ago 11 years

I hope you can help me because it is for Monday, December 3, 2012

The project to be developed is the following:

Relocation of a website by adjusting the href references.
1.-Considering a Web site, download the complete site to a local directory using the wget command. And using a script of your authorship, perform the following operations:

1.1.-Create an independent directory for each type of content: gif images, jpeg images, etc, avi videos, mpg videos, etc, mp3 audio, wav audio, etc., web content (HTML, javascript, etc).

1.2.-Once each of these contents has been relocated, carry out the adjustment of the references to the local locations of each resource on the site.

1.3.-Activate a Web server, and configure the root directory where the Web site backup is located as the root directory of the local Web server.

1.4.-Note: the wget command can only be used with the following options:
–Recursive
–Domains
–Page-requisites
If for some reason more commands are necessary, use the necessary ones.

Reply to oswaldo
1. KZKG ^ Gaara said
  ago 11 years
  
  To download here I think you have the solution in the post, now ... to move files and replace the paths, I had to do something like this a while ago in my work, I leave you the script I used: http://paste.desdelinux.net/4670
  
  You modify it taking into account the type of file and the path, that is, how the .HTMLs of your site are formed and that.
  
  This is not the 100% solution because you must make some arrangements or changes but, I guarantee you that it is 70 or 80% of all the work 😉
  
  Reply to KZKG ^ Gaara
  1. oswaldo said
    ago 11 years
    
    Thanks KZKG ^ Gaara has been a great help to me
    
    Reply to oswaldo
debt said
ago 11 years

I have always used httrack. Scrapbook for firefox I'm going to try it, but I love wget. Thank you!

Reply to Debd
Daniel PZ said
ago 11 years

Man, the command did not work for me ... this one did work well for me:

wget –random-wait -r -p -e robots = off -U mozilla http://www.example.com

Reply to Daniel PZ
1. Daniel said
  ago 9 years
  
  Thanks a lot! I used it with the parameters proposed by Daniel PZ and I had no problems 🙂
  
  Reply to Daniel
Ruben Almaguer said
ago 11 years

Thanks boy, I did that with WGet on my Linux puppy but I didn't know how to do it in terminal. a greeting

Reply to Rubén Almaguer
stubborn said
ago 10 years

where do you keep the pages?

Reply to pistonudo
1. Hache said
  ago 10 years
  
  Where you have the terminal open. At first, in your user root folder, unless you indicate another path.
  
  Reply to Hache
fernando said
ago 10 years

Also download the links? So if there is a link to a pdf or another document, do you also download it?

Reply to Fernando
river said
ago 9 years

What can I do to download my entire blog, I tried and what I can't see seems to be in codes or blocked, despite taking many hours to download but only the initial page can be read, which I recommend to download my blog, thanks raul.

Reply to raul
leo said
ago 9 years

hello, a doubt it is possible to replace the links within the html, to later be able to browse through the downloaded page as if it were the original.

What happens is that I download the page and when I opened it from the downloaded files I did not take the .css or .js and the links on the page lead me to the page on the Internet.

Reply to leo