Squid cache – Part 2

Squid is not only a proxy and cache service, it can do much more: manage acl (access lists), filter content, it can even do ssl filtering even in transparent mode (proxy method - without having to configure in proxy settings from their browsers, it's like man in the middle, nobody knows it's there). So I commonly see how the full potential of this application is wasted by not knowing how to configure each of its parts.

Now the interesting thing that squid does is the cache (in my opinion). You will tell me, why cache? The reason is simple, better managing the use of your speed and bandwidth is the main thing. Think carefully, 1000 people in your company consulting every 5 minutes, common pages, Google, Hotmail, Gmail, etc ... so that you are going to download images, banners, advertising, html content over and over again, all these are static things, no They change so frequently, better to have them stored in your local network and you deliver a copy that you consider recent within the configurations that you considered.

How to do it? Simple with the following sentence:

refresh_pattern [-i] regex min percent max [options]

As I always say, don't believe in everything, so I invite you to read from the official source. I recommend you read the manual of this sentence HERE

Sentence refresh_pattern it will always be our label to add new parameters to cache

Important, your cache lists must be sequential, because once it matches the first one that matches the object, it will not continue reading your other rules.

Regular expressions are case-sensitive, therefore flv is not the same as FLV, but you can avoid this if you wish by using the option -i . Then it would look like this refresh_pattern -i

'Min': is the time (minutes) in which an object will be considered as "recent or fresh" and if it does not have an explicit label of "expired". By default squid recommends that it be 0, for reasons that some dynamic applications can behave strangely, pure blah blah blah, really this value should be a number that you consider useful and effective for the elements you want to cache, example: jpg, 1440 minutes (a day) seems fine to me, it is not like if on a page the images of a post change every 5 minutes.

'Percent' It is the percentage of the age of an object (since the last modification) that will be considered «recent or fresh». Let me explain, maybe doing a constant reload or refresh to see the last modifications that were made to a web page, squid could consider if it already has, say, 50% of the time completed between min y Max, re-download that object from the internet and give you a new copy.

'Max' is the limit above or equal to 'Min' how long an object is considered «recent or fresh», suppose that an image of some page was only consulted once by a user, that object has already reached its time min, but not the Max, then when it is queried again, a cache copy will be delivered.

Options:
override-expire
override-lastmod
reload-into-ims
ignore-reload
ignore-no-store
ignore-private
max-stale=NN
refresh-ims
store-stale

These options were made mostly to ignore pre-established behaviors in languages and protocols, in order to guarantee the effective use of the cache.

override-expire

It enforces the minimum time of an object, even if the server sent a shorter expiration time (example things like header or Cache-Control: max-age). If we do this, a "warning" will appear saying things like this "VIOLATES the HTTP standard" but that is just warnings that we can ignore. Now if the time a server sends is longer then squid will take the time (expiration) of the server

override-lastmod

Reinforces the minimum time of an item, even if that item was recently modified.

reload-into-ims

The short explanation is that it prevents that when we press the refresh button or make a no-cache request, squid will deliver cache if it has not been "modified since" and / or if there is no "headers" on the page.

ignore-reload

Ignore the action of users to press the reload or refresh page button

ignore-no-store

Ignore any rule in headers not to cache, for example of videos

ignore-private

Ignore any rules in private content headers that should not be cached, example: facebook content.

refresh-ims

Squid contacts the server, to guarantee if the object is the newest. If it is then it will deliver cache

store-stale

Squid will save all of those responses, even if they don't have an expiration date, this is very impractical as they usually cannot be reused. If you decide to enable it you must declare max-stale = NN

max-stale=NN

If you enabled the above, you must declare a maximum lifetime for that response or factor. Squid does not deliver objects of this style but can validate it with the source

Here is a table of how the state of fresh "FRESH" works according to the values we have discussed:

FRESH if expire> now, else STALE
STALE if age> max
FRESH if lm-factor <percent, else STALE
FRESH if age <min else STALE

Here is an example configuration for a specific company with a lot of disk space, good equipment and good bandwidth

refresh_pattern -i \.(3gp|7z|ace|asx|bin|deb|divx|dvr-ms|ram|rpm|exe|inc|cab|qt)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

refresh_pattern -i \.(rar|jar|gz|tgz|bz2|iso|m1v|m2(v|p)|mo(d|v)|arj|lha|lzh|zip|tar)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

refresh_pattern -i \.(jp(e?g|e|2)|gif|pn[pg]|bm?|tiff?|ico|swf|dat|ad|txt|dll)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

refresh_pattern -i \.(avi|ac4|mp(e?g|a|e|1|2|3|4)|mk(a|v)|ms(i|u|p)|og(x|v|a|g)|rm|r(a|p)m|snd|vob)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

refresh_pattern -i \.(pp(t?x)|s|t)|pdf|rtf|wax|wm(a|v)|wmx|wpl|cb(r|z|t)|xl(s?x)|do(c?x)|flv|x-flv)$ 43200 99% 43200 ignore-no-store ignore-must-revalidate override-expire override-lastmod reload-into-ims

cache_mem 8092 MB

Now the cache is not only on the hard disk, we can also cache the ram memory, this value is for each squid process, so you must take it into consideration when you use redirectors like squidGuard

maximum_object_size_in_memory 1024 KB

The maximum size of the object in memory that squid will store in RAM. You can also declare a minimum.


memory_replacement_policy heap GDSF
cache_replacement_policy heap GDSF

As you can see, one is the policy for the replacement of cache in RAM memory and another in hard disk. There are 2 policies GDSF and LFUDA. The first seeks to improve the percentage of cache hits, having many small objects on hand, the second seeks the opposite, it keeps objects in cache regardless of their size.

The question that I imagine you are asking me at this moment is, what use do I use? Well, if you consider that in your environment they do a lot of queries and few downloads use GDSF if on the contrary they do a lot of downloads and few LFUDA queries. That if I recommend LFUDA when you are going to do, I don't know, cache in 1TB of disk, it is more efficient.

maximum_object_size 4 MB

The maximum size that an object can have to be stored

cache_dir aufs /media/proxy249/cache 100 16 256

Where the cache will be stored, attention here, important is if you use ufs, aufs or diskd, all 3 work more or less the same, the difference is that aufs and diskd work with separate processes to do the I / O operations on the hard disk and avoid that squid processes hang during these operations, additionally diskd you can specify the number of threads that you will have for this task. I recommend aufs if you have a good team.

Size 100 (megabytes), you can put 100000 is almost 100GB depends on your availability. 16 is the number of folders, and 256 is the number of sub-folders. You can play with both values depends on how fast your discs are and how much resources you have.


cache_swap_low 90
cache_swap_high 95

These options are the replacement values of objects, it is the minimum and maximum value as a watermark according to squid, where these numbers are in percentage (%), and in a very small cache, 5% like this right now would be let's say 300 objects per second , but in very large caches we would be talking about thousands of MB

Well, there I leave you, this for now, comment and also take into consideration those who told me that they could not cache and filter https pages (SSL) in squid 3.5 or higher, I will bring them to you soon, stay tuned to this blog.

DesdeLinux

Squid cache - part 2

Leave a Comment Cancel reply