Squid cache - part 2

Squid is not only a proxy and cache service, it can do much more: manage acl (access lists), filter content, it can even do ssl filtering even in transparent mode (proxy method - without having to configure in proxy settings from their browsers, it's like man in the middle, nobody knows it's there). So I commonly see how the full potential of this application is wasted by not knowing how to configure each of its parts.

Now the interesting thing that squid does is the cache (in my opinion). You will tell me, why cache? The reason is simple, better managing the use of your speed and bandwidth is the main thing. Think carefully, 1000 people in your company consulting every 5 minutes, common pages, Google, Hotmail, Gmail, etc ... so that you are going to download images, banners, advertising, html content over and over again, all these are static things, no They change so frequently, better to have them stored in your local network and you deliver a copy that you consider recent within the configurations that you considered.

How to do it? Simple with the following sentence:

refresh_pattern [-i] regex min percent max [options]

As I always say, don't believe in everything, so I invite you to read from the official source. I recommend you read the manual of this sentence HERE

Sentence refresh_pattern it will always be our label to add new parameters to cache

Important, your cache lists must be sequential, because once it matches the first one that matches the object, it will not continue reading your other rules.

Regular expressions are case-sensitive, therefore flv is not the same as FLV, but you can avoid this if you wish by using the option -i . Then it would look like this refresh_pattern -i

'Min': is the time (minutes) in which an object will be considered as "recent or fresh" and if it does not have an explicit label of "expired". By default squid recommends that it be 0, for reasons that some dynamic applications can behave strangely, pure blah blah blah, really this value should be a number that you consider useful and effective for the elements you want to cache, example: jpg, 1440 minutes (a day) seems fine to me, it is not like if on a page the images of a post change every 5 minutes.

'Percent' It is the percentage of the age of an object (since the last modification) that will be considered «recent or fresh». Let me explain, maybe doing a constant reload or refresh to see the last modifications that were made to a web page, squid could consider if it already has, say, 50% of the time completed between min y Max, re-download that object from the internet and give you a new copy.

'Max' is the limit above or equal to 'Min' how long an object is considered «recent or fresh», suppose that an image of some page was only consulted once by a user, that object has already reached its time min, but not the Max, then when it is queried again, a cache copy will be delivered.

Options:
override-expire
override-lastmod
reload-into-ims
ignore-reload
ignore-no-store
ignore-private
max-stale=NN
refresh-ims
store-stale

These options were made mostly to ignore pre-established behaviors in languages ​​and protocols, in order to guarantee the effective use of the cache.

override-expire

It enforces the minimum time of an object, even if the server sent a shorter expiration time (example things like header or Cache-Control: max-age). If we do this, a "warning" will appear saying things like this "VIOLATES the HTTP standard" but that is just warnings that we can ignore. Now if the time a server sends is longer then squid will take the time (expiration) of the server

override-lastmod

Reinforces the minimum time of an item, even if that item was recently modified.

reload-into-ims

The short explanation is that it prevents that when we press the refresh button or make a no-cache request, squid will deliver cache if it has not been "modified since" and / or if there is no "headers" on the page.

ignore-reload

Ignore the action of users to press the reload or refresh page button

ignore-no-store

Ignore any rule in headers not to cache, for example of videos

ignore-private

Ignore any rules in private content headers that should not be cached, example: facebook content.

refresh-ims

Squid contacts the server, to guarantee if the object is the newest. If it is then it will deliver cache

store-stale

Squid will save all of those responses, even if they don't have an expiration date, this is very impractical as they usually cannot be reused. If you decide to enable it you must declare max-stale = NN

max-stale=NN

If you enabled the above, you must declare a maximum lifetime for that response or factor. Squid does not deliver objects of this style but can validate it with the source

Here is a table of how the state of fresh "FRESH" works according to the values ​​we have discussed:

  • FRESH if expire> now, else STALE
  • STALE if age> max
  • FRESH if lm-factor <percent, else STALE
  • FRESH if age <min else STALE

Here is an example configuration for a specific company with a lot of disk space, good equipment and good bandwidth

refresh_pattern -i \.(3gp|7z|ace|asx|bin|deb|divx|dvr-ms|ram|rpm|exe|inc|cab|qt)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

refresh_pattern -i \.(rar|jar|gz|tgz|bz2|iso|m1v|m2(v|p)|mo(d|v)|arj|lha|lzh|zip|tar)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

refresh_pattern -i \.(jp(e?g|e|2)|gif|pn[pg]|bm?|tiff?|ico|swf|dat|ad|txt|dll)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

refresh_pattern -i \.(avi|ac4|mp(e?g|a|e|1|2|3|4)|mk(a|v)|ms(i|u|p)|og(x|v|a|g)|rm|r(a|p)m|snd|vob)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

refresh_pattern -i \.(pp(t?x)|s|t)|pdf|rtf|wax|wm(a|v)|wmx|wpl|cb(r|z|t)|xl(s?x)|do(c?x)|flv|x-flv)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims
cache_mem 8092 MB

Now the cache is not only on the hard disk, we can also cache the ram memory, this value is for each squid process, so you must take it into consideration when you use redirectors like squidGuard

maximum_object_size_in_memory 1024 KB

The maximum size of the object in memory that squid will store in RAM. You can also declare a minimum.


memory_replacement_policy heap GDSF
cache_replacement_policy heap GDSF

As you can see, one is the policy for the replacement of cache in RAM memory and another in hard disk. There are 2 policies GDSF and LFUDA. The first seeks to improve the percentage of cache hits, having many small objects on hand, the second seeks the opposite, it keeps objects in cache regardless of their size.

The question that I imagine you are asking me at this moment is, what use do I use? Well, if you consider that in your environment they do a lot of queries and few downloads use GDSF if on the contrary they do a lot of downloads and few LFUDA queries. That if I recommend LFUDA when you are going to do, I don't know, cache in 1TB of disk, it is more efficient.

maximum_object_size 4 MB

The maximum size that an object can have to be stored

cache_dir aufs /media/proxy249/cache 100 16 256

Where the cache will be stored, attention here, important is if you use ufs, aufs or diskd, all 3 work more or less the same, the difference is that aufs and diskd work with separate processes to do the I / O operations on the hard disk and avoid that squid processes hang during these operations, additionally diskd you can specify the number of threads that you will have for this task. I recommend aufs if you have a good team.

Size 100 (megabytes), you can put 100000 is almost 100GB depends on your availability. 16 is the number of folders, and 256 is the number of sub-folders. You can play with both values ​​depends on how fast your discs are and how much resources you have.


cache_swap_low 90
cache_swap_high 95

These options are the replacement values ​​of objects, it is the minimum and maximum value as a watermark according to squid, where these numbers are in percentage (%), and in a very small cache, 5% like this right now would be let's say 300 objects per second , but in very large caches we would be talking about thousands of MB

Well, there I leave you, this for now, comment and also take into consideration those who told me that they could not cache and filter https pages (SSL) in squid 3.5 or higher, I will bring them to you soon, stay tuned to this blog.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.

  1.   Jose Albert said

    Excellent complement to the first part!

    There is a lot of literature about Squid, but getting to the point in its most practical options with their respective explanations and possible real use scenarios are not always at hand!

    As always, I look forward to the third part of it!

    1.    brodydalle said

      Thanks for your comment. It is correct, a concise explanation of all relevant elements, and a best practice setup. However, I am always attentive to your comments and own experiences.

  2.   artus said

    Hello, I have a problem with windows updates and antivirus. I have approximately 120 pc at my institution. Could you give me an idea of ​​how to improve this situation. Thanks for your help and congratulations on the article.

    1.    brodydalle said

      Hello, thanks for participating .. well if I can help you, but explain well what your problem is, you can't download the updates? Did you put proxy in Internet options and the same in the proxy options of your browser? Did you check the ports? Or do you want to cache these updates?

      1.    artus said

        What I need is that every time a computer downloads a windows or antivirus update, it stays in the cache for a period of about a month, in this way I would like to save bandwidth, since every morning every time all computers start downloading the same updates each one and the connection saturates.

        Thanks for your help.

    2.    Mario said

      A server with Squid will do, as these are simple unencrypted http downloads. Other solutions for cache are WSUS and Altiris, normal in companies.

      1.    artus said

        Thanks Mario I will keep it in mind.

    3.    brodydalle said

      ok ready, I understand, check this link. http://wiki.squid-cache.org/SquidFaq/WindowsUpdate. To cache antivirus, you have to know where the updates are downloaded from and under what extension (example .exe) and cache it ...

  3.   artus said

    Thanks for your help.

  4.   Erick said

    Good morning friends, could you support me with my case. Since I have squid 2.7 .STABLE9 on a debian 6, and I have everything configured and when mounting it in a 10 pc environment, I get normal mail, the problem lies when I mount it for 90 pc, it only lasts a few seconds working and from there everyone is they are left without internet. Could you support me?

  5.   JOSE RIVAS said

    Excellent explanation, basic but very clear and precise. Personally the best explanation I have been able to read.
    I have a question, is it possible to cache Android applications such as apk and xapk?
    And what would be the correct way to configure dynamic cache whatever the origin of the files?
    I use pfSense 2.4.5.