In this article by Rahul Sharma, author of the book NGINX High Performance, we will cover the following topics:

NGINX configuration syntax
Configuring NGINX workers
Configuring NGINX I/O
Configuring TCP
Setting up the server

(For more resources related to this topic, see here.)

NGINX configuration syntax

This section aims to cover it in good detail. The complete configuration file has a logical structure that is composed of directives grouped into a number of sections. A section defines the configuration for a particular NGINX module, for example, the http section defines the configuration for the ngx_http_core module.

An NGINX configuration has the following syntax:

Valid directives begin with a variable name and then state an argument or series of arguments separated by spaces.
All valid directives end with a semicolon (;).
Sections are defined with curly braces ({}).
Sections can be nested in one another. The nested section defines a module valid under the particular section, for example, the gzip section under the http section.
Configuration outside any section is part of the NGINX global configuration.
The lines starting with the hash (#) sign are comments.
Configurations can be split into multiple files, which can be grouped using the include directive. This helps in organizing code into logical components. Inclusions are processed recursively, that is, an include file can further have include statements.
Spaces, tabs, and new line characters are not part of the NGINX configuration. They are not interpreted by the NGINX engine, but they help to make the configuration more readable.

Thus, the complete file looks like the following code:

#The configuration begins here
global1 value1;
#This defines a new section
section {
sectionvar1 value1;
include file1;
   subsection {
   subsectionvar1 value1;
}
}
#The section ends here
global2 value2;
# The configuration ends here

NGINX provides the -t option, which can be used to test and verify the configuration written in the file. If the file or any of the included files contains any errors, it prints the line numbers causing the issue:

$ sudo nginx -t

This checks the validity of the default configuration file. If the configuration is written in a file other than the default one, use the -c option to test it.

You cannot test half-baked configurations, for example, you defined a server section for your domain in a separate file. Any attempt to test such a file will throw errors. The file has to be complete in all respects.

Now that we have a clear idea of the NGINX configuration syntax, we will try to play around with the default configuration. This article only aims to discuss the parts of the configuration that have an impact on performance.

The NGINX catalog has large number of modules that can be configured for some purposes. This article does not try to cover all of them as the details are beyond the scope of the book. Please refer to the NGINX documentation at http://nginx.org/en/docs/ to know more about the modules.

Configuring NGINX workers

NGINX runs a fixed number of worker processes as per the specified configuration. In the following sections, we will work with NGINX worker parameters. These parameters are mostly part of the NGINX global context.

worker_processes

The worker_processes directive controls the number of workers:

worker_processes 1;

The default value for this is 1, that is, NGINX runs only one worker. The value should be changed to an optimal value depending on the number of cores available, disks, network subsystem, server load, and so on.

As a starting point, set the value to the number of cores available. Determine the number of cores available using lscpu:

$ lscpu
Architecture:     x86_64
CPU op-mode(s):   32-bit, 64-bit
Byte Order:     Little Endian
CPU(s):       4

The same can be accomplished by greping out cpuinfo:

$ cat /proc/cpuinfo | grep 'processor' | wc -l

Now, set this value to the parameter:

# One worker per CPU-core.
worker_processes 4;

Alternatively, the directive can have auto as its value. This determines the number of cores and spawns an equal number of workers.

When NGINX is running with SSL, it is a good idea to have multiple workers. SSL handshake is blocking in nature and involves disk I/O. Thus, using multiple workers leads to improved performance.

accept_mutex

Since we have configured multiple workers in NGINX, we should also configure the flags that impact worker selection. The accept_mutex parameter available under the events section will enable each of the available workers to accept new connections one by one. By default, the flag is set to on. The following code shows this:

events {
accept_mutex on;
}

If the flag is turned to off, all of the available workers will wake up from the waiting state, but only one worker will process the connection. This results in the Thundering Herd phenomenon, which is repeated a number of times per second. The phenomenon causes reduced server performance as all the woken-up workers take up CPU time before going back to the wait state. This results in unproductive CPU cycles and nonutilized context switches.

accept_mutex_delay

When accept_mutex is enabled, only one worker, which has the mutex lock, accepts connections, while others wait for their turn. The accept_mutex_delay corresponds to the timeframe for which the worker would wait, and after which it tries to acquire the mutex lock and starts accepting new connections. The directive is available under the events section with a default value of 500 milliseconds. The following code shows this:

events{
accept_mutex_delay 500ms;
}

worker_connections

The next configuration to look at is worker_connections, with a default value of 512. The directive is present under the events section. The directive sets the maximum number of simultaneous connections that can be opened by a worker process. The following code shows this:

events{
   worker_connections 512;
}

Increase worker_connections to something like 1,024 to accept more simultaneous connections.

The value of worker_connections does not directly translate into the number of clients that can be served simultaneously. Each browser opens a number of parallel connections to download various components that compose a web page, for example, images, scripts, and so on. Different browsers have different values for this, for example, IE works with two parallel connections while Chrome opens six connections. The number of connections also includes sockets opened with the upstream server, if any.

worker_rlimit_nofile

The number of simultaneous connections is limited by the number of file descriptors available on the system as each socket will open a file descriptor. If NGINX tries to open more sockets than the available file descriptors, it will lead to the Too many opened files message in the error.log.

Check the number of file descriptors using ulimit:

$ ulimit -n

Now, increase this to a value more than worker_process * worker_connections. The value should be increased for the user that runs the worker process. Check the user directive to get the username.

NGINX provides the worker_rlimit_nofile directive, which can be an alternative way of setting the available file descriptor rather modifying ulimit. Setting the directive will have a similar impact as updating ulimit for the worker user. The value of this directive overrides the ulimit value set for the user. The directive is not present by default. Set a large value to handle large simultaneous connections. The following code shows this:

worker_rlimit_nofile 20960;

To determine the OS limits imposed on a process, read the file /proc/$pid/limits. $pid corresponds to the PID of the process.

multi_accept

The multi_accept flag enables an NGINX worker to accept as many connections as possible when it gets the notification of a new connection. The purpose of this flag is to accept all connections in the listen queue at once. If the directive is disabled, a worker process will accept connections one by one. The following code shows this:

events{
   multi_accept on;
}

The directive is available under the events section with the default value off.

If the server has a constant stream of incoming connections, enabling multi_accept may result in a worker accepting more connections than the number specified in worker_connections. The overflow will lead to performance loss as the previously accepted connections, part of the overflow, will not get processed.

use

NGINX provides several methods for connection processing. Each of the available methods allows NGINX workers to monitor multiple socket file descriptors, that is, when there is data available for reading/writing. These calls allow NGINX to process multiple socket streams without getting stuck in any one of them. The methods are platform-dependent, and the configure command, used to build NGINX, selects the most efficient method available on the platform. If we want to use other methods, they must be enabled first in NGINX.

The use directive allows us to override the default method with the method specified. The directive is part of the events section:

events {
use select;
}

NGINX supports the following methods of processing connections:

select: This is the standard method of processing connections. It is built automatically on platforms that lack more efficient methods. The module can be enabled or disabled using the --with-select_module or --without-select_module configuration parameter.
poll: This is the standard method of processing connections. It is built automatically on platforms that lack more efficient methods. The module can be enabled or disabled using the --with-poll_module or --without-poll_module configuration parameter.
kqueue: This is an efficient method of processing connections available on FreeBSD 4.1, OpenBSD 2.9+, NetBSD 2.0, and OS X.
There are the additional directives kqueue_changes and kqueue_events. These directives specify the number of changes and events that NGINX will pass to the kernel. The default value for both of these is 512.

The kqueue method will ignore the multi_accept directive if it has been enabled.
epoll: This is an efficient method of processing connections available on Linux 2.6+. The method is similar to the FreeBSD kqueue.
There is also the additional directive epoll_events. This specifies the number of events that NGINX will pass to the kernel. The default value for this is 512.
/dev/poll: This is an efficient method of processing connections available on Solaris 7 11/99+, HP/UX 11.22+, IRIX 6.5.15+, and Tru64 UNIX 5.1A+.
This has the additional directives, devpoll_events and devpoll_changes. The directives specify the number of changes and events that NGINX will pass to the kernel. The default value for both of these is 32.
eventport: This is an efficient method of processing connections available on Solaris 10. The method requires necessary security patches to avoid kernel crash issues.
rtsig: Real-time signals is a connection processing method available on Linux 2.2+. The method has some limitations. On older kernels, there is a system-wide limit of 1,024 signals. For high loads, the limit needs to be increased by setting the rtsig-max parameter. For kernel 2.6+, instead of the system-wide limit, there is a limit on the number of outstanding signals for each process. NGINX provides the worker_rlimit_sigpending parameter to modify the limit for each of the worker processes:
```
worker_rlimit_sigpending 512;
```
The parameter is part of the NGINX global configuration.

If the queue overflows, NGINX drains the queue and uses the poll method to process the unhandled events. When the condition is back to normal, NGINX switches back to the rtsig method of connection processing. NGINX provides the rtsig_overflow_events, rtsig_overflow_test, and rtsig_overflow_threshold parameters to control how a signal queue is handled on overflows.

The rtsig_overflow_events parameter defines the number of events passed to poll.

The rtsig_overflow_test parameter defines the number of events handled by poll, after which NGINX will drain the queue.

Before draining the signal queue, NGINX will look up how much it is filled. If the factor is larger than the specified rtsig_overflow_threshold, it will drain the queue.

The rtsig method requires accept_mutex to be set. The method also enables the multi_accept parameter.

Configuring NGINX I/O

NGINX can also take advantage of the Sendfile and direct I/O options available in the kernel. In the following sections, we will try to configure parameters available for disk I/O.

Sendfile

When a file is transferred by an application, the kernel first buffers the data and then sends the data to the application buffers. The application, in turn, sends the data to the destination. The Sendfile method is an improved method of data transfer, in which data is copied between file descriptors within the OS kernel space, that is, without transferring data to the application buffers. This results in improved utilization of the operating system's resources.

The method can be enabled using the sendfile directive. The directive is available for the http, server, and location sections.

http{
sendfile on;
}

The flag is set to off by default.

Direct I/O

The OS kernel usually tries to optimize and cache any read/write requests. Since the data is cached within the kernel, any subsequent read request to the same place will be much faster because there's no need to read the information from slow disks.

Direct I/O is a feature of the filesystem where reads and writes go directly from the applications to the disk, thus bypassing all OS caches. This results in better utilization of CPU cycles and improved cache effectiveness.

The method is used in places where the data has a poor hit ratio. Such data does not need to be in any cache and can be loaded when required. It can be used to serve large files. The directio directive enables the feature. The directive is available for the http, server, and location sections:

location /video/ {
directio 4m;
}

Any file with size more than that specified in the directive will be loaded by direct I/O. The parameter is disabled by default.

The use of direct I/O to serve a request will automatically disable Sendfile for the particular request.

Direct I/O depends on the block size while doing a data transfer. NGINX has the directio_alignment directive to set the block size. The directive is present under the http, server, and location sections:

location /video/ {
directio 4m;
directio_alignment 512;
}

The default value of 512 bytes works well for all boxes unless it is running a Linux implementation of XFS. In such a case, the size should be increased to 4 KB.

Asynchronous I/O

Asynchronous I/O allows a process to initiate I/O operations without having to block or wait for it to complete.

The aio directive is available under the http, server, and location sections of an NGINX configuration. Depending on the section, the parameter will perform asynchronous I/O for the matching requests. The parameter works on Linux kernel 2.6.22+ and FreeBSD 4.3. The following code shows this:

location /data {
aio on;
}

By default, the parameter is set to off. On Linux, aio needs to be enabled with directio, while on FreeBSD, sendfile needs to be disabled for aio to take effect.

If NGINX has not been configured with the --with-file-aio module, any use of the aio directive will cause the unknown directive aio error.

The directive has a special value of threads, which enables multithreading for send and read operations. The multithreading support is only available on the Linux platform and can only be used with the epoll, kqueue, or eventport methods of processing requests.

In order to use the threads value, configure multithreading in the NGINX binary using the --with-threads option. Post this, add a thread pool in the NGINX global context using the thread_pool directive. Use the same pool in the aio configuration:

thread_pool io_pool threads=16;
http{
….....
   location /data{
     sendfile   on;
     aio       threads=io_pool;
   }
}

Mixing them up

The three directives can be mixed together to achieve different objectives on different platforms. The following configuration will use sendfile for files with size smaller than what is specified in directio. Files served by directio will be read using asynchronous I/O:

location /archived-data/{
sendfile on;
aio on;
directio 4m;
}

The aio directive has a sendfile value, which is available only on the FreeBSD platform. The value can be used to perform Sendfile in an asynchronous manner:

location /archived-data/{
sendfile on;
aio sendfile;
}

NGINX invokes the sendfile() system call, which returns with no data in the memory. Post this, NGINX initiates data transfer in an asynchronous manner.

Configuring TCP

HTTP is an application-based protocol, which uses TCP as the transport layer. In TCP, data is transferred in the form of blocks known as TCP packets. NGINX provides directives to alter the behavior of the underlying TCP stack. These parameters alter flags for an individual socket connection.

TCP_NODELAY

TCP/IP networks have the "small packet" problem, where single-character messages can cause network congestion on a highly loaded network. Such packets are 41 bytes in size, where 40 bytes are for the TCP header and 1 byte has useful information. These small packets have huge overhead, around 4000 percent and can saturate a network.

John Nagle solved the problem (Nagle's algorithm) by not sending the small packets immediately. All such packets are collected for some amount of time and then sent in one go as a single packet. This results in improved efficiency of the underlying network. Thus, a typical TCP/IP stack waits for up to 200 milliseconds before sending the data packages to the client.

It is important to note that the problem exists with applications such as Telnet, where each keystroke is sent over wire. The problem is not relevant to a web server, which severs static files. The files will mostly form full TCP packets, which can be sent immediately instead of waiting for 200 milliseconds.

The TCP_NODELAY option can be used while opening a socket to disable Nagle's buffering algorithm and send the data as soon as it is available. NGINX provides the tcp_nodelay directive to enable this option. The directive is available under the http, server, and location sections of an NGINX configuration:

http{
tcp_nodelay on;
}

The directive is enabled by default.

NGINX use tcp_nodelay for connections with the keep-alive mode.

TCP_CORK

As an alternative to Nagle's algorithm, Linux provides the TCP_CORK option. The option tells the TCP stack to append packets and send them when they are full or when the application instructs to send the packet by explicitly removing TCP_CORK. This results in an optimal amount of data packets being sent and, thus, improves the efficiency of the network. The TCP_CORK option is available as the TCP_NOPUSH flag on FreeBSD and Mac OS.

NGINX provides the tcp_nopush directive to enable TCP_CORK over the connection socket. The directive is available under the http, server, and location sections of an NGINX configuration:

http{
tcp_nopush on;
}

The directive is disabled by default.

NGINX uses tcp_nopush for requests served with sendfile.

Setting them up

The two directives discussed previously do mutually exclusive things; the former makes sure that the network latency is reduced, while the latter tries to optimize the data packets sent. An application should set both of these options to get efficient data transfer.

Enabling tcp_nopush along with sendfile makes sure that while transferring a file, the kernel creates the maximum amount of full TCP packets before sending them over wire. The last packet(s) can be partial TCP packets, which could end up waiting with TCP_CORK being enabled. NGINX make sure it removes TCP_CORK to send these packets. Since tcp_nodelay is also set then, these packets are immediately sent over the network, that is, without any delay.

Setting up the server

The following configuration sums up all the changes proposed in the preceding sections:

worker_processes 3;
worker_rlimit_nofile 8000;
 
events {
multi_accept on;
use epoll;
worker_connections 1024;
}
 
http {
sendfile on;
aio on;
directio 4m;
tcp_nopush on;
tcp_nodelay on;
# Rest Nginx configuration removed for brevity
}

It is assumed that NGINX runs on a quad core server. Thus three worker processes have been spanned to take advantage of three out of four available cores and leaving one core for other processes.

Each of the workers has been configured to work with 1,024 connections. Correspondingly, the nofile limit has been increased to 8,000. By default, all worker processes operate with mutex; thus, the flag has not been set. Each worker processes multiple connections in one go using the epoll method.

In the http section, NGINX has been configured to serve files larger than 4 MB using direct I/O, while efficiently buffering smaller files using Sendfile. TCP options have also been set up to efficiently utilize the available network.

Measuring gains

It is time to test the changes and make sure that they have given performance gain.

Run a series of tests using Siege/JMeter to get new performance numbers. The tests should be performed with the same configuration to get a comparable output:

$ siege -b -c 790 -r 50 -q http://192.168.2.100/hello
 
Transactions:               79000 hits
Availability:               100.00 %
Elapsed time:               24.25 secs
Data transferred:           12.54 MB
Response time:             0.20 secs
Transaction rate:           3257.73 trans/sec
Throughput:                 0.52 MB/sec
Concurrency:               660.70
Successful transactions:   39500
Failed transactions:       0
Longest transaction:       3.45
Shortest transaction:       0.00

The results from Siege should be evaluated and compared to the baseline.

Throughput: The transaction rate defines this as 3250 requests/second
Error rate: Availability is reported as 100 percent; thus; the error rate is 0 percent
Response time: The results shows a response time of 0.20 seconds

Thus, these new numbers demonstrate performance improvement in various respects.

After the server configuration is updated with all the changes, reperform all tests with increased numbers. The aim should be to determine the new baseline numbers for the updated configuration.

Summary

The article started with an overview of the NGINX configuration syntax. Going further, we discussed worker_connections and the related parameters. These allow you to take advantage of the available hardware. The article also talked about the different event processing mechanisms available on different platforms. The configuration discussed helped in processing more requests, thus improving the overall throughput.

NGINX is primarily a web server; thus, it has to serve all kinds static content. Large files can take advantage of direct I/O, while smaller content can take advantage of Sendfile. The different disk modes make sure that we have an optimal configuration to serve the content.

In the TCP stack, we discussed the flags available to alter the default behavior of TCP sockets. The tcp_nodelay directive helps in improving latency. The tcp_nopush directive can help in efficiently delivering the content. Both these flags lead to improved response time.

In the last part of the article, we applied all the changes to our server and then did performance tests to determine the effectiveness of the changes done. In the next article, we will try to configure buffers, timeouts, and compression to improve the utilization of the available network.