Smart generation of Gzip files for nginx

As the complexity of today’s sites increases, so are the challenges to keep the site loading fast and bandwidth usage low. Minified scripts, concatenated CSS, image sprites and even hand-crafted static html are used for speedy delivery. This article discusses some less-known features of nginx that can lead to significant speed increase.
In my quest for performance, I switched one of our high-traffic sites from Apache to Nginx. It was a perfect candidate, as most of it is static with only client-side functionality and some AJAX calls; less than 10% is has server-side functionality.

Like Apache, Nginx has an on-the-fly compression feature, via gzip on option.

When benchmarking the site, I noticed an increase in Time to First Byte (TTFB). This was to be expected – after all compressing a file does incur some overhead. Of course, the time lost on compression is made up many times over by the smaller time needed to download the file, but it still got me thinking – wouldn’t it be possible to have both a small file and a great TTFB?

GZip Static files

Nginx has an option called gzip_static. When turned on, if a request is made for a file, say, style.css, it looks first for style.css.gz and sends it back directly without any overhead. If the file does not exist, the file is compressed normally and sent back.

So the code might look like this:

location ~* \.(html|css|js|xml)$
{
    gzip_static on;
}

(Be careful when placing the rules so they don’t overwrite other file rules! nginx is a bit peculiar in this matter.)

Now the TTFB drops from 0.3s to 0.09s!

There’s just one problem – nginx does not generate or update the .gz files itself. This is a nuisance.

Automate the Gzip generation with cron

The simplest choice is to batch generate the gzip files:

#! /bin/bash

FILETYPES=( "*.html" "*.css" "*.js" "*.xml" )
DIRECTORIES="/var/www/"
MIN_SIZE=1024

for currentdir in $DIRECTORIES
do
   for i in "${FILETYPES[@]}"
   do
      find $currentdir -iname "$i" -exec bash -c 'PLAINFILE={};GZIPPEDFILE={}.gz; \
         if [ -e $GZIPPEDFILE ]; \
         then if [ `stat --printf=%Y $PLAINFILE` -gt `stat --printf=%Y $GZIPPEDFILE` ]; \
                then    gzip -1 -f -c $PLAINFILE > $GZIPPEDFILE; \
                 fi; \
         elif [ `stat --printf=%s $PLAINFILE` -gt $MIN_SIZE ]; \
            then gzip -1 -c $PLAINFILE > $GZIPPEDFILE; \
         fi' \;
  done
done

You would save this script and run it every hour or so via a cron job. The script searches for all files with the specified extensions inside the target directory and if the file size is larger than specified, compresses it with gzip. If the .gz file already exists, it looks at the modification time and updates only if necessary.

This works, but I still wasn’t happy. Often, only one file changes, but when it does, you want the .gz companion to be updated now, not within the next hour. Also, what to do if one of the uncompressed files is deleted?

The naive option would be to continuously poll the directory for changes; I shiver just thinking of this. If the idea crossed your mind, just say No.

Monitoring changes and generating gzip files as needed

Wouldn’t it be great if modern OSes would notify us when a file is added, modified or deleted? But wait – of course they do. On Linux it’s the inotify kernel subsystem. Unfortunately I couldn’t find mature high-level tools to take advantage of inotify. The most popular is incron, but it lacks the ability to monitor subdirectories, so it’s pretty useless for this task.

The only thing I could use is inotify tools, which work, albeit a bit low-level.

You install it with apt-get inotify-tools (or your bistro’s package manager).

Afterwards you work with the inotifywatch command that can monitor a directory for changes.

So, consider these two Bash scripts:

notify-edit.sh:

#!/bin/bash                               

inotifywait -m -q -e CREATE -e MODIFY -e MOVED_TO -r "/var/www/" --format "%w%f" --excludei '\.(jpg|png|gif|ico|log|sql|zip|gz|pdf|php|swf|ttf|eot|woff|)

notify-delete.sh:

#!/bin/bash 
inotifywait -m -q -e DELETE -e MOVED_FROM -r "/var/www/" --format "%w%f" --excludei '\.(jpg|png|gif|ico|log|sql|zip|gz|pdf|php|swf|ttf|eot|woff|)

The first script listens for create, modify and move to monitored directory signals. For performance reasons it filters out unwanted file types. It would have been better if there was an option to exclude everything except specified pattern. There is a patch that accepts the –includei parameter but it’s not included in the main branch. The created/modified file names are piped to a bash script that further checks the file extension and compresses only the file types we want. The second script is similar, monitoring the directory for deleted and moved out files.

To run the scripts, you enter:

nohup ./notify-edit.sh &
nohup ./notify-delete.sh &

As soon as a file is created or modified, a corresponding gzip version is (re)created. The .gz is deleted when the original file is deleted. If a file is moved from one folder to another, the corresponding gzip is deleted and then recreated at the new location.

A note on gzip compression

You may have noticed that I set the compression level to 1 (minimum). The natural tendency is to set the compression level to max, especially since the compression is done separately. However, the default nginx compression level is still 1 and I did run some tests on various file types – html, javascript and css.

Typically, using 1 for compression already brings 80% saving in file size. Going to 6 brings only another 3% saving; increasing compression level to 9 only improves the compression level by another 1%. At the same time, compression time shoots up: level 6 is over 70% more expensive and level 9 is almost 120% more expensive.

To summarize: a 4% compression improvement means over double the compression time. For archival, where space occupied is the most important factor, it makes sense to use higher compression levels (but even there it makes little sense to go over 6). On a server, a balance between file size and CPU usage hugely favor smaller compression levels.

Another note: I’ve seen nginx tutorials where even image files (JPEG, PNG, etc) were included in gzip compression. There no other way to put it but call it like it is: dangerously stupid. Image files, videos, PDFs and most other file types are already compressed. Gzipping them not only doesn’t bring any discernible benefit, it also slows down the browser that now has to decompress them as well.

Extending

The concepts explained here with inotify tools can be used to perform other server-side operations, for example recompressing JPEGs, optimizing PNGs, minifying css and js files and more.

Armand Niculescu

A senior Full-stack engineer, Armand has been working on projects end-to-end.

4 Responses

Axel says:

November 25, 2013 at 01:17

I’m getting a lot of errors:
bash: line 5: [: 6820: unary operator expected
bash: line 5: [: 47238: unary operator expected
bash: line 5: [: 15838: unary operator expected

Using bash and shell in CentOS 6.4, x86_64 architecture.
1. Armand Niculescu says:
  
  November 28, 2013 at 17:52
  
  Sorry Axel. Unfortunately I’m no shell programming master. I only tested on Ubuntu 12.04
Frankie P. Hansen says:

February 5, 2014 at 12:53

miniz.c v1.10 includes an optimized real-time compressor written specifically for compression level 1 (MZ_BEST_SPEED). miniz.c’s level 1 compression ratio is around 5-9% higher than other real-time compressors, such as minilzo , fastlz , or liblzf . miniz.c’s level 1 data consumption rate on a Core i7 3.2 GHz typically ranges between 70-120.5 MB/sec. Between levels 2-9, miniz.c is designed to compare favorably against zlib, where it typically has roughly equal or better performance.
Alexandr Priezzhev says:

April 1, 2014 at 15:27

A bit optimized for CentOS code (should work in other builds as well):

#! /bin/bash
FILETYPES=( “*.html” “*.css” “*.js” “*.xml” “*.txt” )
DIRECTORIES=”/path/to/site/dir/”
MIN_SIZE=512

for currentDir in $DIRECTORIES; do
for f in “${FILETYPES[@]}”; do
files=”$(find $currentDir -iname “$f”)”;
echo “$files” | while read file; do
PLAINFILE=$file;
GZIPPEDFILE=$file.gz;
if [[ -e “$GZIPPEDFILE” ]]; then
if [[ `stat –printf=%Y “$PLAINFILE”` -gt `stat –printf=%Y “$GZIPPEDFILE”` ]]; then
echo .gz is older, updating “$GZIPPEDFILE”…;
gzip -2 -f -c “$PLAINFILE” > “$GZIPPEDFILE”;
fi;
if [[ `stat –printf=%s “$PLAINFILE”` -le $MIN_SIZE ]]; then
echo Uncompressed size is less than minimum “(“$(stat –printf=%s “$PLAINFILE”)”)”, removing “$GZIPPEDFILE”;
rm -f “$GZIPPEDFILE”;
fi;
elif [[ `stat –printf=%s “$PLAINFILE”` -gt $MIN_SIZE ]]; then
echo Creating .gz “for” “$PLAINFILE”…;
gzip -2 -c “$PLAINFILE” > “$GZIPPEDFILE”;
fi;
done
done
done

exit

Comments are closed.

Smart generation of Gzip files for nginx

GZip Static files

Monitoring changes and generating gzip files as needed

A note on gzip compression

Extending

Armand Niculescu

4 Responses

Recent Posts

Chess Diagram Generator in NodeJS

A Comprehensive Guide to Backup for Home and Small Office

Easy human-readable date difference

Improving nginx integration with CloudFlare

How to create chess diagrams with PHP

Making a Fixed-Width Text File to CSV Converter in C, Java, PHP, Javascript and Python