Squid is not only a proxy and cache service, it can do much more: manage acl (access lists), filter content, it can even do ssl filtering even in transparent mode (proxy method - without having to configure in proxy settings from their browsers, it's like man in the middle, nobody knows it's there). So I commonly see how the full potential of this application is wasted by not knowing how to configure each of its parts.
Now the interesting thing that squid does is the cache (in my opinion). You will tell me, why cache? The reason is simple, better managing the use of your speed and bandwidth is the main thing. Think carefully, 1000 people in your company consulting every 5 minutes, common pages, Google, Hotmail, Gmail, etc ... so that you are going to download images, banners, advertising, html content over and over again, all these are static things, no They change so frequently, better to have them stored in your local network and you deliver a copy that you consider recent within the configurations that you considered.
How to do it? Simple with the following sentence:
refresh_pattern [-i] regex min percent max [options]
As I always say, don't believe in everything, so I invite you to read from the official source. I recommend you read the manual of this sentence HERE
Sentence refresh_pattern it will always be our label to add new parameters to cache
Important, your cache lists must be sequential, because once it matches the first one that matches the object, it will not continue reading your other rules.
Regular expressions are case-sensitive, therefore flv is not the same as FLV, but you can avoid this if you wish by using the option -i . Then it would look like this refresh_pattern -i
'Min': is the time (minutes) in which an object will be considered as "recent or fresh" and if it does not have an explicit label of "expired". By default squid recommends that it be 0, for reasons that some dynamic applications can behave strangely, pure blah blah blah, really this value should be a number that you consider useful and effective for the elements you want to cache, example: jpg, 1440 minutes (a day) seems fine to me, it is not like if on a page the images of a post change every 5 minutes.
'Percent' It is the percentage of the age of an object (since the last modification) that will be considered «recent or fresh». Let me explain, maybe doing a constant reload or refresh to see the last modifications that were made to a web page, squid could consider if it already has, say, 50% of the time completed between min y Max, re-download that object from the internet and give you a new copy.
'Max' is the limit above or equal to 'Min' how long an object is considered «recent or fresh», suppose that an image of some page was only consulted once by a user, that object has already reached its time min, but not the Max, then when it is queried again, a cache copy will be delivered.
Options:
override-expire
override-lastmod
reload-into-ims
ignore-reload
ignore-no-store
ignore-private
max-stale=NN
refresh-ims
store-stale
These options were made mostly to ignore pre-established behaviors in languages and protocols, in order to guarantee the effective use of the cache.
override-expire
It enforces the minimum time of an object, even if the server sent a shorter expiration time (example things like header or Cache-Control: max-age). If we do this, a "warning" will appear saying things like this "VIOLATES the HTTP standard" but that is just warnings that we can ignore. Now if the time a server sends is longer then squid will take the time (expiration) of the server
override-lastmod
Reinforces the minimum time of an item, even if that item was recently modified.
reload-into-ims
The short explanation is that it prevents that when we press the refresh button or make a no-cache request, squid will deliver cache if it has not been "modified since" and / or if there is no "headers" on the page.
ignore-reload
Ignore the action of users to press the reload or refresh page button
ignore-no-store
Ignore any rule in headers not to cache, for example of videos
ignore-private
Ignore any rules in private content headers that should not be cached, example: facebook content.
refresh-ims
Squid contacts the server, to guarantee if the object is the newest. If it is then it will deliver cache
store-stale
Squid will save all of those responses, even if they don't have an expiration date, this is very impractical as they usually cannot be reused. If you decide to enable it you must declare max-stale = NN
max-stale=NN
If you enabled the above, you must declare a maximum lifetime for that response or factor. Squid does not deliver objects of this style but can validate it with the source
Here is a table of how the state of fresh "FRESH" works according to the values we have discussed:
- FRESH if expire> now, else STALE
- STALE if age> max
- FRESH if lm-factor <percent, else STALE
- FRESH if age <min else STALE
Here is an example configuration for a specific company with a lot of disk space, good equipment and good bandwidth
refresh_pattern -i \.(3gp|7z|ace|asx|bin|deb|divx|dvr-ms|ram|rpm|exe|inc|cab|qt)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims
refresh_pattern -i \.(rar|jar|gz|tgz|bz2|iso|m1v|m2(v|p)|mo(d|v)|arj|lha|lzh|zip|tar)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims
refresh_pattern -i \.(jp(e?g|e|2)|gif|pn[pg]|bm?|tiff?|ico|swf|dat|ad|txt|dll)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims
refresh_pattern -i \.(avi|ac4|mp(e?g|a|e|1|2|3|4)|mk(a|v)|ms(i|u|p)|og(x|v|a|g)|rm|r(a|p)m|snd|vob)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims
refresh_pattern -i \.(pp(t?x)|s|t)|pdf|rtf|wax|wm(a|v)|wmx|wpl|cb(r|z|t)|xl(s?x)|do(c?x)|flv|x-flv)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims
cache_mem 8092 MB
Now the cache is not only on the hard disk, we can also cache the ram memory, this value is for each squid process, so you must take it into consideration when you use redirectors like squidGuard
maximum_object_size_in_memory 1024 KB
The maximum size of the object in memory that squid will store in RAM. You can also declare a minimum.
memory_replacement_policy heap GDSF
cache_replacement_policy heap GDSF
As you can see, one is the policy for the replacement of cache in RAM memory and another in hard disk. There are 2 policies GDSF and LFUDA. The first seeks to improve the percentage of cache hits, having many small objects on hand, the second seeks the opposite, it keeps objects in cache regardless of their size.
The question that I imagine you are asking me at this moment is, what use do I use? Well, if you consider that in your environment they do a lot of queries and few downloads use GDSF if on the contrary they do a lot of downloads and few LFUDA queries. That if I recommend LFUDA when you are going to do, I don't know, cache in 1TB of disk, it is more efficient.
maximum_object_size 4 MB
The maximum size that an object can have to be stored
cache_dir aufs /media/proxy249/cache 100 16 256
Where the cache will be stored, attention here, important is if you use ufs, aufs or diskd, all 3 work more or less the same, the difference is that aufs and diskd work with separate processes to do the I / O operations on the hard disk and avoid that squid processes hang during these operations, additionally diskd you can specify the number of threads that you will have for this task. I recommend aufs if you have a good team.
Size 100 (megabytes), you can put 100000 is almost 100GB depends on your availability. 16 is the number of folders, and 256 is the number of sub-folders. You can play with both values depends on how fast your discs are and how much resources you have.
cache_swap_low 90
cache_swap_high 95
These options are the replacement values of objects, it is the minimum and maximum value as a watermark according to squid, where these numbers are in percentage (%), and in a very small cache, 5% like this right now would be let's say 300 objects per second , but in very large caches we would be talking about thousands of MB
Well, there I leave you, this for now, comment and also take into consideration those who told me that they could not cache and filter https pages (SSL) in squid 3.5 or higher, I will bring them to you soon, stay tuned to this blog.
Excellent complement to the first part!
There is a lot of literature about Squid, but getting to the point in its most practical options with their respective explanations and possible real use scenarios are not always at hand!
As always, I look forward to the third part of it!
Thanks for your comment. It is correct, a concise explanation of all relevant elements, and a best practice setup. However, I am always attentive to your comments and own experiences.
Hello, I have a problem with windows updates and antivirus. I have approximately 120 pc at my institution. Could you give me an idea of how to improve this situation. Thanks for your help and congratulations on the article.
Hello, thanks for participating .. well if I can help you, but explain well what your problem is, you can't download the updates? Did you put proxy in Internet options and the same in the proxy options of your browser? Did you check the ports? Or do you want to cache these updates?
What I need is that every time a computer downloads a windows or antivirus update, it stays in the cache for a period of about a month, in this way I would like to save bandwidth, since every morning every time all computers start downloading the same updates each one and the connection saturates.
Thanks for your help.
A server with Squid will do, as these are simple unencrypted http downloads. Other solutions for cache are WSUS and Altiris, normal in companies.
Thanks Mario I will keep it in mind.
ok ready, I understand, check this link. http://wiki.squid-cache.org/SquidFaq/WindowsUpdate. To cache antivirus, you have to know where the updates are downloaded from and under what extension (example .exe) and cache it ...
Thanks for your help.
Good morning friends, could you support me with my case. Since I have squid 2.7 .STABLE9 on a debian 6, and I have everything configured and when mounting it in a 10 pc environment, I get normal mail, the problem lies when I mount it for 90 pc, it only lasts a few seconds working and from there everyone is they are left without internet. Could you support me?
Excellent explanation, basic but very clear and precise. Personally the best explanation I have been able to read.
I have a question, is it possible to cache Android applications such as apk and xapk?
And what would be the correct way to configure dynamic cache whatever the origin of the files?
I use pfSense 2.4.5.