6 min read

In this article by Alex Kapranoff, the author of the book Nginx Troubleshooting, explains how all browsers (and even many non-browser HTTP clients) support client-side caching. It is a part of the HTTP standard, albeit one of the most complex caching to understand. Web servers do not control client-side caching to full extent, obviously, but they may issue recommendations about what to cache and how, in the form of special HTTP response headers. This is a topic thoroughly discussed in many great articles and guides, so we will mention it shortly, and with a lean towards problems you may face and how to troubleshoot them.

(For more resources related to this topic, see here.)

In spite of the fact that browsers have been supporting caching on their side for at least 20 years, configuring cache headers was always a little confusing mostly due to the fact that there two sets of headers designed for the same purpose but having different scopes and totally different formats.

There is the Expires: header, which was designed as a quick and dirty solution and also the new (relatively) almost omnipotent Cache-Control: header, which tries to support all the different ways an HTTP cache could work.

This is an example of a modern HTTP request-response pair containing the caching headers. First is the request headers sent from the browser (here Firefox 41, but it does not matter):

User-Agent:"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0"
Accept:"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
Accept-Encoding:"gzip, deflate"
Connection:"keep-alive"
Cache-Control:"max-age=0"

Then, the response headers is:

Cache-Control:"max-age=1800"
Content-Encoding:"gzip"
Content-Type:"text/html; charset=UTF-8"
Date:"Sun, 10 Oct 2015 13:42:34 GMT"
Expires:"Sun, 10 Oct 2015 14:12:34 GMT"

We highlighted the parts that are relevant. Note that some directives may be sent by both sides of the conversation. First, the browser sent the Cache-Control: max-age=0 header because the user pressed the F5 key. This is an indication that the user wants to receive a response that is fresh. Normally, the request will not contain this header and will allow any intermediate cache to respond with a stale but still nonexpired response.

In this case, the server we talked to responded with a gzipped HTML page encoded in UTF-8 and indicated that the response is okay to use for half an hour. It used both mechanisms available, the modern Cache-Control:max-age=1800 header and the very old Expires:Sun, 10 Oct 2015 14:12:34 GMT header.

The X-Cache: “EXPIRED” header is not a standard HTTP header but was also probably (there is no way to know for sure from the outside) emitted by Nginx. It may be an indication that there are, indeed, intermediate caching proxies between the client and the server, and one of them added this header for debugging purposes. The header may also show that the backend software uses some internal caching.

Another possible source of this header is a debugging technique used to find problems in the Nginx cache configuration. The idea is to use the cache hit or miss status, which is available in one of the handy internal Nginx variables as a value for an extra header and then to be able to monitor the status from the client side. This is the code that will add such a header:

add_header X-Cache $upstream_cache_status;

Nginx has a special directive that transparently sets up both of standard cache control headers, and it is named expires. This is a piece of the nginx.conf file using the expires directive:

location ~* \.(?:css|js)$ {
  expires 1y;
  add_header Cache-Control "public";
}

First, the pattern uses the so-called noncapturing parenthesis, which is a feature first appeared in Perl regular expressions. The effect of this regexp is the same as of a simpler \.(css|js)$ pattern, but the regular expression engine is specifically instructed not to create a variable containing the actual string from inside the parenthesis. This is a simple optimization.

Then, the expires directive declares that the content of the css and js files will expire after a year of storage. The actual headers as received by the client will look like this:

Server: nginx/1.9.8 (Ubuntu)
Date: Fri, 11 Mar 2016 22:01:04 GMT
Content-Type: text/css
Last-Modified: Thu, 10 Mar 2016 05:45:39 GMT
Expires: Sat, 11 Mar 2017 22:01:04 GMT
Cache-Control: max-age=31536000

The last two lines contain the same information in wildly different forms. The Expires: header is exactly one year after the date in the Date: header, whereas Cache-Control: specifies the age in seconds so that the client do the date arithmetics itself.

The last directive in the provided configuration extract adds another Cache-Control: header with a value of public explicitly. What this means is that the content of the HTTP resource is not access-controlled and therefore may be cached not only for one particular user but also anywhere else. A simple and effective strategy that was used in offices to minimize consumed bandwidth is to have an office-wide caching proxy server. When one user requested a resource from a website on the Internet and that resource had a Cache-Control: public designation, the company cache server would store that to serve to other users on the office network.

This may not be as popular today due to cheap bandwidth, but because history has a tendency to repeat itself, you need to know how and why Cache-Control: public works.

The Nginx expires directive is surprisingly expressive. It may take a number of different values. See this table:

off

This value turns off Nginx cache headers logic. Nothing will be added, and more importantly, existing headers received from upstreams will not be modified.

epoch

This is an artificial value used to purge a stored resource from all caches by setting the Expires header to “1 January, 1970 00:00:01 GMT”.

max

This is the opposite of the “epoch” value. The Expires header will be equal to “31 December 2037 23:59:59 GMT”, and the Cache-Control max-age set to 10 years. This basically means that the HTTP responses are guaranteed to never change, so clients are free to never request the same thing twice and may use their own stored values.

Specific time

An actual specific time value means an expiry deadline from the time of the respective request. For example, expires 10w; A negative value for this directive will emit a special header Cache-Control: no-cache.

“modified” specific time

If you add the keyword “modified” before the time value, then the expiration moment will be computed relatively to the modification time of the file that is served.

“@” specific time

A time with an @ prefix specifies an absolute time-of-day expiry. This should be less than 24 hours. For example, Expires @17h;.

Many web applications choose to emit the caching headers themselves, and this is a good thing. They have more information about which resources change often and which never change. Tampering with the headers that you receive from the upstream may or may not be a thing you want to do. Sometimes, adding headers to a response while proxying it may produce a conflicting set of headers and therefore create an unpredictable behavior.

The static files that you serve with Nginx yourself should have the expires directive in place. However, the general advice about upstreams is to always examine the caching headers you get and refrain from overoptimizing by setting up more aggressive caching policy.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here