Smart generation of Gzip files for nginx

As the complexity of today’s sites increases, so are the challenges to keep the site loading fast and bandwidth usage low. Minified scripts, concatenated CSS, image sprites and even hand-crafted static html are used for speedy delivery. This article discusses some less-known features of nginx that can lead to significant speed increase.

In my quest for performance, I switched one  of our high-traffic sites from Apache to Nginx. It was a perfect candidate, as most of it is static with only client-side functionality and some AJAX calls; less than 10% is has server-side functionality.

Like Apache, Nginx has an on-the-fly compression feature, via gzip on option.

When benchmarking the site, I noticed an increase in Time to First Byte (TTFB). This was to be expected – after all compressing a file does incur some overhead. Of course, the time lost on compression is made up many times over by the smaller time needed to download the file, but it still got me thinking – wouldn’t it be possible to have both a small file and a great TTFB?

GZip Static files

Nginx has an option called gzip_static. When turned on, if a request is made for a file, say, style.css, it looks first for style.css.gz and sends it back directly without any overhead. If the file does not exist, the file is compressed normally and sent back.

So the code might look like this:

location ~* \.(html|css|js|xml)$
{
    gzip_static on;
}

(Be careful when placing the rules so they don’t overwrite other file rules! nginx is a bit peculiar in this matter.)

Now the TTFB drops from 0.3s to 0.09s!

There’s just one problem – nginx does not generate or update the .gz files itself. This is a nuisance.

Automate the Gzip generation with cron

The simplest choice is to batch generate the gzip files:

#! /bin/bash
 
FILETYPES=( "*.html" "*.css" "*.js" "*.xml" )
DIRECTORIES="/var/www/"
MIN_SIZE=1024
 
for currentdir in $DIRECTORIES
do
   for i in "${FILETYPES[@]}"
   do
      find $currentdir -iname "$i" -exec bash -c 'PLAINFILE={};GZIPPEDFILE={}.gz; \
         if [ -e $GZIPPEDFILE ]; \
         then if [ `stat --printf=%Y $PLAINFILE` -gt `stat --printf=%Y $GZIPPEDFILE` ]; \
                then    gzip -1 -f -c $PLAINFILE > $GZIPPEDFILE; \
                 fi; \
         elif [ `stat --printf=%s $PLAINFILE` -gt $MIN_SIZE ]; \
            then gzip -1 -c $PLAINFILE > $GZIPPEDFILE; \
         fi' \;
  done
done

You would save this script and run it every hour or so via a cron job. The script searches for all files with the specified extensions inside the target directory and if the file size is larger than specified, compresses it with gzip. If the .gz file already exists, it looks at the modification time and updates only if necessary.

This works, but I still wasn’t happy. Often, only one file changes, but when it does, you want the .gz companion to be updated now, not within the next hour. Also, what to do if one of the uncompressed files is deleted?

The naive option would be to continuously poll the directory for changes; I shiver just thinking of this. If the idea crossed your mind, just say No.

Monitoring changes and generating gzip files as needed

Wouldn’t it be great if modern OSes would notify us when a file is added, modified or deleted? But wait – of course they do. On Linux it’s the inotify kernel subsystem. Unfortunately I couldn’t find mature high-level tools to take advantage of inotify. The most popular is incron, but it lacks the ability to monitor subdirectories, so it’s pretty useless for this task.

The only thing I could use is inotify tools, which work, albeit a bit low-level.

You install it with apt-get inotify-tools (or your bistro’s package manager).

Afterwards you work with the inotifywatch command that can monitor a directory for changes.

So, consider these two Bash scripts:

notify-edit.sh:

#!/bin/bash                               
 
inotifywait -m -q -e CREATE -e MODIFY -e MOVED_TO -r "/var/www/" --format "%w%f" --excludei '\.(jpg|png|gif|ico|log|sql|zip|gz|pdf|php|swf|ttf|eot|woff|)$' |
	while read file
	do
		if [[ $file =~ \.(html|css|js|xml)$ ]];
		then
			gzip -f -c -1 $file > $file.gz
		fi
	done

notify-delete.sh:

#!/bin/bash 
inotifywait -m -q -e DELETE -e MOVED_FROM -r "/var/www/" --format "%w%f" --excludei '\.(jpg|png|gif|ico|log|sql|zip|gz|pdf|php|swf|ttf|eot|woff|)$' |
	while read file
	do
		if [[ -f $file.gz ]];
		then
			rm $file.gz
		fi
	done

The first script listens for create, modify and move to monitored directory signals. For performance reasons it filters out unwanted file types. It would have been better if there was an option to exclude everything except specified pattern. There is a patch that accepts the –includei parameter but it’s not included in the main branch. The created/modified file names are piped to a bash script that further checks the file extension and compresses only the file types we want. The second script is similar, monitoring the directory for deleted and moved out files.

To run the scripts, you enter:

nohup ./notify-edit.sh &
nohup ./notify-delete.sh &

As soon as a file is created or modified, a corresponding gzip version is (re)created. The .gz is deleted when the original file is deleted. If a file is moved from one folder to another, the corresponding gzip is deleted and then recreated at the new location.

A note on gzip compression

You may have noticed that I set the compression level to 1 (minimum). The natural tendency is to set the compression level to max, especially since the compression is done separately. However, the default nginx compression level is still 1 and  I did run some tests on various file types – html, javascript and css.

Typically, using 1 for compression already brings 80% saving in file size. Going to 6 brings only another 3% saving; increasing compression level to 9 only improves the compression level by another 1%. At the same time, compression time shoots up: level 6 is over 70% more expensive and level 9 is almost 120% more expensive.

To summarize: a 4% compression improvement means over double the compression time. For archival, where space occupied is the most important factor, it makes sense to use higher compression levels (but even there it makes little sense to go over 6). On a server, a balance between file size and CPU usage hugely favor smaller compression levels.

Another note: I’ve seen nginx tutorials where even image files (JPEG, PNG, etc) were included in gzip compression. There no other way to put it but call it like it is: dangerously stupid. Image files, videos, PDFs and most other file types are already compressed. Gzipping them not only doesn’t bring any discernible  benefit, it also slows down the browser that now has to decompress them as well.

Extending

The concepts explained here with inotify tools can be used to perform other server-side operations, for example recompressing JPEGs, optimizing PNGs, minifying css and js files and more.

4 replies
  1. Axel
    Axel says:

    I’m getting a lot of errors:
    bash: line 5: [: 6820: unary operator expected
    bash: line 5: [: 47238: unary operator expected
    bash: line 5: [: 15838: unary operator expected

    Using bash and shell in CentOS 6.4, x86_64 architecture.

  2. Frankie P. Hansen
    Frankie P. Hansen says:

    miniz.c v1.10 includes an optimized real-time compressor written specifically for compression level 1 (MZ_BEST_SPEED). miniz.c’s level 1 compression ratio is around 5-9% higher than other real-time compressors, such as minilzo , fastlz , or liblzf . miniz.c’s level 1 data consumption rate on a Core i7 3.2 GHz typically ranges between 70-120.5 MB/sec. Between levels 2-9, miniz.c is designed to compare favorably against zlib, where it typically has roughly equal or better performance.

  3. Alexandr Priezzhev
    Alexandr Priezzhev says:

    A bit optimized for CentOS code (should work in other builds as well):

    #! /bin/bash
    FILETYPES=( “*.html” “*.css” “*.js” “*.xml” “*.txt” )
    DIRECTORIES=”/path/to/site/dir/”
    MIN_SIZE=512

    for currentDir in $DIRECTORIES; do
    for f in “${FILETYPES[@]}”; do
    files=”$(find $currentDir -iname “$f”)”;
    echo “$files” | while read file; do
    PLAINFILE=$file;
    GZIPPEDFILE=$file.gz;
    if [[ -e “$GZIPPEDFILE” ]]; then
    if [[ `stat –printf=%Y “$PLAINFILE”` -gt `stat –printf=%Y “$GZIPPEDFILE”` ]]; then
    echo .gz is older, updating “$GZIPPEDFILE”…;
    gzip -2 -f -c “$PLAINFILE” > “$GZIPPEDFILE”;
    fi;
    if [[ `stat –printf=%s “$PLAINFILE”` -le $MIN_SIZE ]]; then
    echo Uncompressed size is less than minimum “(“$(stat –printf=%s “$PLAINFILE”)”)”, removing “$GZIPPEDFILE”;
    rm -f “$GZIPPEDFILE”;
    fi;
    elif [[ `stat –printf=%s “$PLAINFILE”` -gt $MIN_SIZE ]]; then
    echo Creating .gz “for” “$PLAINFILE”…;
    gzip -2 -c “$PLAINFILE” > “$GZIPPEDFILE”;
    fi;
    done
    done
    done

    exit

Comments are closed.