Nothing better than Wikipedia to explain what this tool consists of:
GNU Wget is a free software tool that allows the downloading of content from web servers in a simple way. Its name derives from World Wide Web (w), and from "get" (in English get), this means: get from the WWW.
Currently it supports downloads using the HTTP, HTTPS and FTP protocols.
Among the most outstanding features it offers wget there is the possibility of easy downloading of complex mirrors recursively, conversion of links to display HTML content locally, support for proxies ...
It is true that there are other applications that help us to perform this type of work such as httrack or even extensions for Firefox as Scrapbook, but nothing like the simplicity of a terminal 😀
Doing the magic
I was curious about the movie: The Social Network, as the character of mark_zuckerberg use the phrase: «A bit of magic wget«, When I was about to download the photos for Facemash 😀 and it's true, wget allows you to do magic with the appropriate parameters.
Let's look at a couple of examples, let's start with the simple use of the tool.
To go down a page:
$ wget https://blog.desdelinux.net/con-el-terminal-bajar-un-sitio-web-completo-con-wget
To download the entire site recursively, including images and other types of data:
$ wget -r https://blog.desdelinux.net/
And here comes the magic. As they explain us in the article of Humans, many sites verify the identity of the browser to apply various restrictions. With Wget we can circumvent this in the following way:
wget -r -p -U Mozilla https://blog.desdelinux.net/
Or we can also pause between each page, since otherwise the site owner may realize that we are downloading the site completely with Wget.
wget --wait=20 --limit-rate=20K -r -p -U Mozilla https://blog.desdelinux.net/
There is something to download only the images xd?
http://buscon.rae.es/draeI/SrvltConsulta?TIPO_BUS=3&LEMA=vicio
That I just read your mind hahahaha
lol oo xd
man wget 😉
Life is too short to read mans.
Life is too short to fill the brain with information, but it is still valid to try 🙂
Information is worth half, I prefer to fill it with women, games and money if possible XD.
You are always fucking thinking about women. From now on you will be listening to Dadee Yankee, Don Omar and Wisin Y Yandel like KZKG ^ Gaara does.
Dedicate yourself better to money, which is the most important thing in this life
There are things that are worth much more than money ... for example, being in history, making a difference, being remembered for how much you managed to contribute to the world; and not for how much money did you have when you died 😉
Try not to become a man of success but a man of courage, Albert Einsein.
And can a beggar living under a bridge do that without having a penny?
Well, no
*to have
Courage, I had my reggaeton era and well no longer, that was years ago, I only listen to Japanese music and classical music, and with the money… we are working on it :).
I do not care to be remembered, gara, when I will have died I will have died and screw the others, since I will not even be able to know what they think of me. What is it worth to be remembered but you can be proud of it xD.
To download a specific type of files you can use filters:
https://www.gnu.org/software/wget/manual/html_node/Types-of-Files.html
And a tip, if you are going to clone a very large page, it is recommended that you do it through a proxy such as tor, because otherwise there are certain pages that have reached a certain number of consecutive requests, blocking your IP for several hours or days .
The other time that happened to me when I wanted to clone a wiki.
An extension, which I use in Firefox, downloads only images; it's called "Save Images 0.94"
eh a question hehe where are the files that I download saved? They are going to want to kill me, right? LOL
The files are downloaded to the folder where you are located in the terminal when executing wget 😉
Ahh, I didn't imagine that wget could have such an interesting use… Now, regarding the use that Courage mentions… No words 😉
Does anyone know if there is a WordPress plug-in that prevents Wget from downloading your blog?
Well it suits me great !! Thank you
Very good, let's try to see how, thanks for the contribution.
Although I consider myself a beginner this is easy for me now I will try to mix it with other things and see what it gives….
I hope you can help me because it is for Monday, December 3, 2012
The project to be developed is the following:
Relocation of a website by adjusting the href references.
1.-Considering a Web site, download the complete site to a local directory using the wget command. And using a script of your authorship, perform the following operations:
1.1.-Create an independent directory for each type of content: gif images, jpeg images, etc, avi videos, mpg videos, etc, mp3 audio, wav audio, etc., web content (HTML, javascript, etc).
1.2.-Once each of these contents has been relocated, carry out the adjustment of the references to the local locations of each resource on the site.
1.3.-Activate a Web server, and configure the root directory where the Web site backup is located as the root directory of the local Web server.
1.4.-Note: the wget command can only be used with the following options:
–Recursive
–Domains
–Page-requisites
If for some reason more commands are necessary, use the necessary ones.
To download here I think you have the solution in the post, now ... to move files and replace the paths, I had to do something like this a while ago in my work, I leave you the script I used: http://paste.desdelinux.net/4670
You modify it taking into account the type of file and the path, that is, how the .HTMLs of your site are formed and that.
This is not the 100% solution because you must make some arrangements or changes but, I guarantee you that it is 70 or 80% of all the work 😉
Thanks KZKG ^ Gaara has been a great help to me
I have always used httrack. Scrapbook for firefox I'm going to try it, but I love wget. Thank you!
Man, the command did not work for me ... this one did work well for me:
wget –random-wait -r -p -e robots = off -U mozilla http://www.example.com
Thanks a lot! I used it with the parameters proposed by Daniel PZ and I had no problems 🙂
Thanks boy, I did that with WGet on my Linux puppy but I didn't know how to do it in terminal. a greeting
where do you keep the pages?
Where you have the terminal open. At first, in your user root folder, unless you indicate another path.
Also download the links? So if there is a link to a pdf or another document, do you also download it?
What can I do to download my entire blog, I tried and what I can't see seems to be in codes or blocked, despite taking many hours to download but only the initial page can be read, which I recommend to download my blog, thanks raul.
hello, a doubt it is possible to replace the links within the html, to later be able to browse through the downloaded page as if it were the original.
What happens is that I download the page and when I opened it from the downloaded files I did not take the .css or .js and the links on the page lead me to the page on the Internet.