When it comes to website performance, Varnish is a hot technology. With a simple installation and configuration, it’s possible to boost the performance of any website and serve up to a million pages with only a small virtual private server. In this article, I’ll show you four possible configurations that will help you improve the response time of your site, whether you serve hundreds, thousands or millions of pages.
An Introduction to Varnish
Varnish-Cache is a Web accelerator with the goal of caching website content. It’s an open-source project that aims to optimize and speed up access to websites non-invasively – without changing the code – and allow you put your hands into your website.
It was the creators of Varnish Cache who called it a Web accelerator, because its primary objective is to improve and speed up a website’s front end. Varnish achieves this by storing copies of the pages served by the Web server in its cache. The next time the same page is requested, Varnish will serve the copy instead of requesting the page from the Web server, resulting in a tremendous performance boost.
Another of the key features of Varnish Cache, in addition to its performance, is the flexibility of its configuration language, VCL. VCL makes it possible to write policies on how incoming requests should be handled. In such a policy, you can decide what content you want to serve, from where you want to get the content and how the request or response should be altered.
In the following examples of configuration, I’ll show you which VCL rules to use to achieve some goals, from a simple caching of images and static objects, to using Varnish in a distributed environment or having it act as a load balancer.
All the following examples are for Varnish 3.x. Please note that Varnish 2.x uses different syntax and rules, so these examples are not compatible with that version.
The following are the main states of Varnish, which we’ll use in the VCL configuration file:
recv
This is the first function that is called when receiving a request. Here we can manipulate the request before going to check whether it is present in the cache. If a request cannot be put in a cache, the back-end server to which the request will be sent can also be chosen in this phase.
pass
We can use this function when we want to pass the request to the Web server and cache the answer.
pipe
This function bypasses Varnish and sends the request to the Web server.
lookup
With a lookup, Varnish asks to verify whether the response is present and valid in the cache.
fetch
This function is called after the recovery of content from the back end is invoked by a pass or a miss.
The Basics : Cache Images
So let’s look at an example of configuration. In this first example, we’ll just cache the images and the static files like CSS files. This configuration is really useful when you don’t know the website that you want to boost, so you can just decide that all images, CSS and JavaScript are the same for all users. In order to distinguish users, the HTTP protocol uses cookies, so we have to eliminate them in this type of request so they are all the same for Varnish:
sub vcl_recv{
if(req.url ~ " * .(png|gif|jpg|swf|css|js)"{
unset req.http.cookie;
unset req.http.Vary;
return(lookup);
}
# strip the cookie before the image is inserted into cache.
sub vcl_fetch {
if (req.url ~ ".(png|gif|jpg|swf|css|js)$") {
unset beresp.http.set-cookie;
}
And that’s it. With this VCL file you can easily cache static content.
The Standard: Cache Images and Pages
Usually, you don’t just want to cache the static contents of your website but you also want to cache some dynamic pages that are generated by your Web server, but that are the same for all the users – or at least for all your anonymous users. In this phase, you must know choose which pages can be cached and which cannot.
A good example is WordPress, one of the most commonly used content management systems. WordPress generates website pages dynamically with PHP and queries to a MySQL database. This is nice because you can easily update your website from the administration interface with few clicks, but it’s also expensive in terms of resources used. Why run the same PHP script and MySQL query every time a user lands on the homepage? We can use Varnish to cache the most visited pages and achieve incredible results.
These are some rules that can be useful in a WordPress installation:
sub vcl_recv{
# Let's make sure we aren't compressing already compressed formats.
if (req.http.Accept-Encoding) {
if (req.url ~ ".(jpg|png|gif|gz|tgz|bz2|mp3|mp4|m4v)(?. * |)$") {
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
remove req.http.Accept-Encoding;
}
}
if (req.url ~ "^/$") {
unset req.http.cookie;
}
# Unset all cookies if not WordPress admin - otherwise login will fail
if (!(req.url ~ "wp-(login| admin )")) {
unset req.http.cookie;
return(lookup);
}
# If you request the special pages go directly to them
if (req.url ~ "wp-(login| admin )") {
return (pipe);
}
}
sub vcl_miss {
if (!(req.url ~ "wp-(login| admin )")) {
unset req.http.cookie;
}
if (req.url ~ "^/[^?]+.(jpeg|jpg|png|gif|ico|js|css|txt|gz|zip|lzma|bz2|tgz|tbz|html|htm)(?.|)$") {
unset req.http.cookie;
set req.url = regsub(req.url, "?.$", "");
}
if (req.url ~ "^/$") {
unset req.http.cookie;
}
}
sub vcl_fetch {
if (req.url ~ "^/$") {
unset beresp.http.set-cookie;
}
# Unset all cookies if not WordPress admin - otherwise login will fail
if (!(req.url ~ "wp-(login| admin )")) {
unset beresp.http.set-cookie;
}
}
You can see that in this example, we cache all the pages from our website, but for the ones that have "wp- admin " or "wp-login" in the url the strings are are "special" locations used to log in to WordPress as an administrator. As such, we want to talk directly to the Web server and bypass the Varnish cache.
Naturally, if you use Drupal, Joomla or a custom-made website, you have to change these rules, but the goal is always the same: To send all the dynamic pages and cache you can to your back end.
The Standard ++ : Increase Server Resilience
Sometime Web servers become slow because they have a high load. Varnish can help with this too. We can use some special directives to tell Varnish to avoid talking with the back end if it is down or is answering too slowly. For these cases Varnish uses the "grace" directive.
Grace in the scope of Varnish means delivering otherwise expired objects when circumstances call for it. This can happen because:
- The back-end director selected is down
- A different thread has already made a request to the back end that’s not yet finished.
Both cases are handled the same in VCL:
sub vcl_recv {
if (req.backend.healthy) {
set req.grace = 30s;
} else {
set req.grace = 1h;
}
}
sub vcl_fetch {
set beresp.grace = 1h;
}
This configuration tells Varnish to test the back end and raise the grace period if it has some problems. The example above also introduces the directive "req.backend.healthy", which is used to check a back end. This is really useful when you have multiple back ends, so let’s take a look at a more advanced example.
Advanced Use: Create a Resilient Web Server in a Distributed Environment
This is our final configuration file with all the options we have seen so far and the definition of two back ends with some special directive for the probe. This is how Varnish determines whether a Web server is alive or not.
.url
Varnish will make requests to the back end with this URL.
.timeout
Determines how fast the probe must finish. You must specify a time unit with a number, such as "0.1 s", "1230 ms" or even "1 h".
.interval
How long to wait between polls. You must specify a time unit here also. Notice that this is not a "rate" but an "interval". The lowest poll rate is (.timeout + .interval).
.window
How many of the latest polls to consider when determining whether the back end is healthy.
.threshold
How many of the .window last polls must be good for the back end to be declared healthy.
Now we can use the directive "req.backend.healthy" and get a Boolean result that tells us whether the back end(s) are alive or not.
#
# Customized VCL file for serving up a WordPress site with multiple back-ends.
#
# Define the internal network subnet.
# These are used below to allow internal access to certain files while not
# allowing access from the public internet .
acl internal {
"10.100.0.0"/24;
}
# Define the list of our backends (web servers), they Listen on port 8080
backend web1 { .host = "10.100.0.1"; .port = "8080"; .probe = { .url = "/status.php"; .interval = 5s; .timeout = 1s; .window = 5;.threshold = 3; }}
backend web2 { .host = "10.100.0.2"; .port = "8080"; .probe = { .url = "/status.php"; .interval = 5s; .timeout = 1s; .window = 5;.threshold = 3; }}
# Define the director that determines how to distribute incoming requests.
director default_director round-robin {
{ .backend = web1; }
{ .backend = web2; }
}
# Respond to incoming requests.
sub vcl_recv {
set req.backend = default_director;
# Use anonymous, cached pages if all backends are down.
if (!req.backend.healthy) {
unset req.http.Cookie;
set req.grace = 6h;
} else {
set req.grace = 30s;
}
# Unset all cookies if not WordPress admin - otherwise login will fail
if (!(req.url ~ "wp-(login| admin )")) {
unset req.http.cookie;
return(lookup);
}
# If you request the special pages go directly to them
if (req.url ~ "wp-(login| admin )") {
return (pipe);
}
# Always cache the following file types for all users.
if (req.url ~ "(?i).(png|gif|jpeg|jpg|ico|swf|css|js|html|htm)(?[ a- z0-9]+)?$") {
unset req.http.Cookie;
}
}
# Code determining what to do when serving items from the web servers.
sub vcl_fetch {
# Don't allow static files to set cookies.
if (req.url ~ "(?i).(png|gif|jpeg|jpg|ico|swf|css|js|html|htm)(?[ a- z0-9]+)?$") {
# beresp == Back-end response from the web server.
unset beresp.http.set-cookie;
}
# Allow items to be stale if needed.
set beresp.grace = 6h;
}
A Powerful Tool
These are just some examples that can help you get started in using Varnish. This tool is really powerful and can help you achieve a great performance boost without buying more hardware or virtual machines. For many website administrators, that’s a real benefit.