Nginx
Very deep on Nginx variables and Directive execution order:
the lifetime of Nginx
variable containers is indeed bound to the request being processed, and is
irrelevant to location.
Two ways to do “internal redirection”:
rewrite
echo_exec
proxy_pass (?)
Subrequest:
A subrequest is an abstract
invocation for decomposing the task of the main request into smaller
"internal requests" that can be served independently by multiple
different location blocks, either in series or in parallel.
"Subrequests" can also be recursive: any subrequest can initiate more
sub-subrequests, targeting other location blocks
or even the current location itself.
Sample subrequest directive:
echo_location
(independent variables)
lua
auth_request
(shared variables)
When a variable is being created at
"configure time", the creating Nginx module must make a decision on
whether to allocate a value container for it and whether to attach a custom
"get handler" and/or a "set handler" to it.
Those variables owning a value container are
called "indexed variables" in Nginx's terminology. Otherwise, they
are said to be not indexed.
Built-in variables:
Most of built-in variables are read-only. They are not simple value containers. But following are not:
$args
$arg_XXX
Most of built-in variables are sensitive to the subrequest
context.
Built-in variables for Main Requests only:
$request_method
Lua:
https://en.wikipedia.org/wiki/Lua_(programming_language)
ngx_lua module embeds the Lua language interpreter (or LuaJIT's Just-in-Time compiler) into the Nginx core, to allow Nginx users directly run their own Lua programs inside the server. The user can choose to insert her Lua code into different running phases of the server, to fulfill different requirements. Such Lua code are either specified directly as literal strings in the Nginx configuration file, or reside in external
.lua
source files (or Lua binary
bytecode files) whose paths are specified in the Nginx configuration.We cannot directly write something like
$arg_name
in our Lua code. Instead, we reference Nginx variables in
Lua by means of the ngx.var
API provided by the ngx_lua module.
For example, to reference the Nginx variable $VARIABLE
in Lua, we just write ngx.var.VARIABLE.
When the Nginx variable $arg_name
takes the special value "not found" (or
"invalid"), ngx.var.arg_name
is evaluated to the nil
value in the Lua world. It should
also be noting that we use the Lua function ngx.say to print out the response body
contents, which is functionally equivalent to the Nginx echo directive.access_by_lua
init_by_lua
log_by_lua
content_by_lua
set_by_lua
rewrite_by_lua
Using FFI library to call c functions from Lua code:
Directive execution order
Nginx mini language in its configuration is “declarative”,
not “procedural”.
3 major phases:
rewrite (set, set_unescape_uri, set_by_lua,
rewrite, rewrite_by_lua -- tail)
access (allow, deny, access_by_lua
-- tail): ACL checks such as guarding user clearance, checking user origins,
examining source IP validity etc.
content (echo, echo_exec, proxy_pass,
echo_location, content_by_lua): generate content and output HTTP response.
11 Phases:
Order
|
phase
|
coexist in the same phase
|
Allow ngx modules
|
|
1
|
post-read
|
|||
2
|
server-rewrite
|
Set value directly under
‘server’
|
||
3
|
find-config
|
Match request to the
‘location’
|
no
|
|
4
|
rewrite
|
yes
|
||
5
|
post-rewrite
|
Call internal redirects
defined in rewrite phase
|
no
|
|
6
|
pre-access
|
|||
7
|
access
|
yes
|
||
8
|
post-access
|
‘satisfy’ check for ‘all’
or ‘any’
|
no
|
|
9
|
try-files
|
no
|
||
10
|
content
|
no
|
||
11
|
log
|
Commands in each phase:
phase
|
module
|
commands
|
tail
|
post-read
|
ngx_realip
|
set_real_ip_from
(directly under ‘server’)
|
|
real_ip_header
|
|||
server-rewrite
|
ngx_rewrite
|
set
|
|
rewrite (directly under
‘server’)
|
|||
rewrite
|
ngx_rewrite
|
set
|
|
rewrite (directly under
‘location’)
|
|||
ngx_lua
|
rewrite_by_lua
|
yes
|
|
rewrite_by_lua_file
|
|||
set_by_lua
|
|||
set_by_lua_file
|
|||
ngx_set_misc
|
set_unescape_uri
|
||
ngx_headers_more
|
more_set_input_header
|
yes
|
|
pre-access
|
ngx_limit_req
|
||
ngx_limit_zone
|
|||
ngx_realip
|
set_real_ip_from (under
‘location’)
|
||
real_ip_header
|
|||
access
|
ngx_access
|
deny, allow
|
|
ngx_lua
|
access_by_lua
|
yes
|
|
ngx_auth_request
|
|||
content
|
ngx_echo
|
echo
|
|
echo_exec
|
|||
echo_location
|
|||
ngx_proxy
|
proxy_pass
|
||
?
|
fastcgi_pass
|
||
fastcgi_param
|
|||
ngx_index
|
index (ending with /)
|
||
autoindex
|
|||
ngx_static (not ending
with /)
|
|||
ngx_lua
|
content_by_lua
|
||
content_by_lua_file
|
|||
output filter
|
ngx_echo
|
echo_before_body
|
|
echo_after_boody
|
|||
Order of
Lua Nginx Module directives
指令
|
使用范围
|
解释
|
int_by_lua*init_worker_by_lua*
|
http
|
初始化全局配置/预加载Lua模块
|
set_by_lua*
|
server,server
if,location,location if
|
设置nginx变量,此处是阻塞的,Lua代码要做到非常快
|
rewrite_by_lua*
|
http,server,location,location
if
|
rewrite阶段处理,可以实现复杂的转发/重定向逻辑
|
access_by_lua*
|
http,server,location,location
if
|
请求访问阶段处理,用于访问控制
|
content_by_lua*
|
location, location if
|
内容处理器,接收请求处理并输出响应
|
header_filter_by_lua*
|
http,server,location,location if
|
设置 header 和 cookie
|
body_filter_by_lua*
|
http,server,location,location if
|
对响应数据进行过滤,比如截断、替换
|
log_by_lua
|
http,server,location,location if
|
log阶段处理,比如记录访问量/统计平均响应时间
|
Starting,
stopping, and reloading configuration:
[root@sv3-dsappweb1-devr1
~]# service_nginx status/restart/reload/…
(‘reload’
does not work for new defined server so ‘restart’ is still needed)
Nginx service script:
[root@sv3-dsappweb1-devr1 nginx]# cat
/lib/systemd/system/nginx.service
[Unit]
Description=nginx - high performance web server
Documentation=http://nginx.org/en/docs/
After=network.target remote-fs.target
nss-lookup.target
[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/local/pan-openresty/bin/openresty
-t -c /etc/nginx/nginx.conf
ExecStart=/usr/local/pan-openresty/bin/openresty -c /etc/nginx/nginx.conf
ExecReload=/usr/local/pan-openresty/bin/openresty -c /etc/nginx/nginx.conf -s reload
# ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true
[Install]
WantedBy=multi-user.target
The
master process supports the following signals:
TERM, INT
|
fast shutdown
|
QUIT
|
graceful shutdown
|
HUP
|
changing configuration, keeping up
with a changed time zone (only for FreeBSD and Linux), starting new worker
processes with a new configuration, graceful shutdown of old worker processes
|
USR1
|
re-opening log files
|
USR2
|
upgrading an executable file
|
WINCH
|
graceful shutdown of worker
processes
|
Add cookie header:
$ curl --cookie user=agentzh 'http://localhost:8080/test'
Do post:
$ curl --data hello 'http://localhost:8080/main'
Set header:
$ curl -H 'X-My-IP: 1.2.3.4' localhost:8080/test
Send 100000 requests:
ab -k -c1 -n100000 'http://127.0.0.1:8080/hello'
nginx Internals
As was mentioned before, the nginx
codebase consists of a core and a number of modules. The core of nginx is
responsible for providing the foundation of the web server, web and mail reverse
proxy functionalities; it enables the use of underlying network protocols,
builds the necessary run-time environment, and ensures seamless interaction
between different modules. However, most of the protocol- and
application-specific features are done by nginx modules, not the core.
Internally, nginx processes
connections through a pipeline, or chain, of modules. In other words, for every
operation there's a module which is doing the relevant work; e.g., compression,
modifying content, executing server-side includes, communicating to the
upstream application servers through FastCGI or uwsgi protocols, or talking to
memcached.
There are a couple of nginx modules
that sit somewhere between the core and the real "functional"
modules. These modules are
http
and mail
. These two
modules provide an additional level of abstraction between the core and
lower-level components. In these modules, the handling of the sequence of
events associated with a respective application layer protocol like HTTP, SMTP
or IMAP is implemented. In combination with the nginx core, these upper-level
modules are responsible for maintaining the right order of calls to the
respective functional modules. While the HTTP protocol is currently implemented
as part of the http
module, there are plans
to separate it into a functional module in the future, due to the need to
support other protocols like SPDY (see "SPDY:
An experimental protocol for a faster web").
The functional modules can be
divided into event modules, phase handlers, output filters, variable handlers,
protocols, upstreams and load balancers. Most of these modules complement the
HTTP functionality of nginx, though event modules and protocols are also used
for
mail
. Event modules provide a
particular OS-dependent event notification mechanism like kqueue
or epoll
. The event
module that nginx uses depends on the operating system capabilities and build
configuration. Protocol modules allow nginx to communicate through HTTPS,
TLS/SSL, SMTP, POP3 and IMAP.
A typical HTTP request processing
cycle looks like the following.
1.
Client sends HTTP request.
2.
nginx core chooses the appropriate phase handler based on the
configured location matching the request.
3.
If configured to do so, a load balancer picks an upstream server
for proxying.
4.
Phase handler does its job and passes each output buffer to the
first filter.
5.
First filter passes the output to the second filter.
6.
Second filter passes the output to third (and so on).
7.
Final response is sent to the client.
nginx module invocation is
extremely customizable. It is performed through a series of callbacks using
pointers to the executable functions. However, the downside of this is that it
may place a big burden on programmers who would like to write their own
modules, because they must define exactly how and when the module should run.
Both the nginx API and developers' documentation are being improved and made
more available to alleviate this.
Some examples of where a module can
attach are:
· Before the
configuration file is read and processed
· For each
configuration directive for the location and the server where it appears
· When the
main configuration is initialized
· When the
server (i.e., host/port) is initialized
· When the
server configuration is merged with the main configuration
· When the
location configuration is initialized or merged with its parent server
configuration
· When the
master process starts or exits
· When a new
worker process starts or exits
· When
handling a request
· When
filtering the response header and the body
· When
picking, initiating and re-initiating a request to an upstream server
· When
processing the response from an upstream server
· When
finishing an interaction with an upstream server
Inside a
worker
, the
sequence of actions leading to the run-loop where the response is generated
looks like the following:
1.
Begin
ngx_worker_process_cycle()
.
2.
Process events with OS specific mechanisms (such as
epoll
or kqueue
).
3.
Accept events and dispatch the relevant actions.
4.
Process/proxy request header and body.
5.
Generate response content (header, body) and stream it to the
client.
6.
Finalize request.
7.
Re-initialize timers and events.
The run-loop itself (steps 5 and 6)
ensures incremental generation of a response and streaming it to the client.
A more detailed view of processing
an HTTP request might look like this:
1.
Initialize request processing.
2.
Process header.
3.
Process body.
4.
Call the associated handler.
5.
Run through the processing phases.
Which brings us to the phases. When
nginx handles an HTTP request, it passes it through a number of processing
phases. At each phase there are handlers to call. In general, phase handlers
process a request and produce the relevant output. Phase handlers are attached
to the locations defined in the configuration file.
Phase handlers typically do four
things: get the location configuration, generate an appropriate response, send
the header, and send the body. A handler has one argument: a specific structure
describing the request. A request structure has a lot of useful information
about the client request, such as the request method, URI, and header.
When the HTTP request header is
read, nginx does a lookup of the associated virtual server configuration. If
the virtual server is found, the request goes through six phases:
1.
server rewrite phase
2.
location phase
3.
location rewrite phase (which can bring the request back to the
previous phase)
4.
access control phase
5.
try_files phase
6.
log phase
In an attempt to generate the
necessary content in response to the request, nginx passes the request to a
suitable content handler. Depending on the exact location configuration, nginx
may try so-called unconditional handlers first, like
perl
, proxy_pass
, flv
, mp4
, etc. If
the request does not match any of the above content handlers, it is picked by
one of the following handlers, in this exact order: random
index
, index
, autoindex
, gzip_static
, static
.
Indexing module details can be
found in the nginx documentation, but these are the modules which handle
requests with a trailing slash. If a specialized module like
mp4
or autoindex
isn't
appropriate, the content is considered to be just a file or directory on disk
(that is, static) and is served by the static
content
handler. For a directory it would automatically rewrite the URI so that the
trailing slash is always there (and then issue an HTTP redirect).
The content handlers' content is
then passed to the filters. Filters are also attached to locations, and there
can be several filters configured for a location. Filters do the task of
manipulating the output produced by a handler. The order of filter execution is
determined at compile time. For the out-of-the-box filters it's predefined, and
for a third-party filter it can be configured at the build stage. In the
existing nginx implementation, filters can only do outbound changes and there
is currently no mechanism to write and attach filters to do input content
transformation. Input filtering will appear in future versions of nginx.
Filters follow a particular design
pattern. A filter gets called, starts working, and calls the next filter until
the final filter in the chain is called. After that, nginx finalizes the
response. Filters don't have to wait for the previous filter to finish. The
next filter in a chain can start its own work as soon as the input from the
previous one is available (functionally much like the Unix pipeline). In turn,
the output response being generated can be passed to the client before the
entire response from the upstream server is received.
There are header filters and body
filters; nginx feeds the header and the body of the response to the associated
filters separately.
A header filter consists of three
basic steps:
1.
Decide whether to operate on this response.
2.
Operate on the response.
3.
Call the next filter.
Body filters transform the
generated content. Examples of body filters include:
· server-side
includes
· XSLT
filtering
· image
filtering (for instance, resizing images on the fly)
· charset
modification
·
gzip
compression
· chunked
encoding
After the filter chain, the
response is passed to the writer. Along with the writer there are a couple of
additional special purpose filters, namely the
copy
filter,
and the postpone
filter. The copy
filter
is responsible for filling memory buffers with the relevant response content
which might be stored in a proxy temporary directory. The postpone
filter
is used for subrequests.
Subrequests are a very important
mechanism for request/response processing. Subrequests are also one of the most
powerful aspects of nginx. With subrequests nginx can return the results from a
different URL than the one the client originally requested. Some web frameworks
call this an internal redirect. However, nginx goes further—not only can
filters perform multiple subrequests and combine the outputs into a single
response, but subrequests can also be nested and hierarchical. A subrequest can
perform its own sub-subrequest, and a sub-subrequest can initiate
sub-sub-subrequests. Subrequests can map to files on the hard disk, other
handlers, or upstream servers. Subrequests are most useful for inserting
additional content based on data from the original response. For example, the
SSI (server-side include) module uses a filter to parse the contents of the
returned document, and then replaces
include
directives
with the contents of specified URLs. Or, it can be an example of making a
filter that treats the entire contents of a document as a URL to be retrieved,
and then appends the new document to the URL itself.
Upstream and load balancers are
also worth describing briefly. Upstreams are used to implement what can be
identified as a content handler which is a reverse proxy (
proxy_pass
handler).
Upstream modules mostly prepare the request to be sent to an upstream server
(or "backend") and receive the response from the upstream server.
There are no calls to output filters here. What an upstream module does exactly
is set callbacks to be invoked when the upstream server is ready to be written
to and read from. Callbacks implementing the following functionality exist:
· Crafting a
request buffer (or a chain of them) to be sent to the upstream server
· Re-initializing/resetting
the connection to the upstream server (which happens right before creating the
request again)
· Processing
the first bits of an upstream response and saving pointers to the payload
received from the upstream server
· Aborting
requests (which happens when the client terminates prematurely)
· Finalizing
the request when nginx finishes reading from the upstream server
· Trimming
the response body (e.g. removing a trailer)
Load balancer modules attach to
the
proxy_pass
handler
to provide the ability to choose an upstream server when more than one upstream
server is eligible. A load balancer registers an enabling configuration file
directive, provides additional upstream initialization functions (to resolve
upstream names in DNS, etc.), initializes the connection structures, decides
where to route the requests, and updates stats information. Currently nginx
supports two standard disciplines for load balancing to upstream servers:
round-robin and ip-hash.
Upstream and load balancing
handling mechanisms include algorithms to detect failed upstream servers and to
re-route new requests to the remaining ones—though a lot of additional work is
planned to enhance this functionality. In general, more work on load balancers
is planned, and in the next versions of nginx the mechanisms for distributing
the load across different upstream servers as well as health checks will be
greatly improved.
There are also a couple of other
interesting modules which provide an additional set of variables for use in the
configuration file. While the variables in nginx are created and updated across
different modules, there are two modules that are entirely dedicated to
variables:
geo
and map
. The geo
module
is used to facilitate tracking of clients based on their IP addresses. This
module can create arbitrary variables that depend on the client's IP address.
The other module, map
, allows for the creation of
variables from other variables, essentially providing the ability to do
flexible mappings of hostnames and other run-time variables. This kind of
module may be called the variable handler.
Memory allocation mechanisms
implemented inside a single nginx
worker
were,
to some extent, inspired by Apache. A high-level description of nginx memory
management would be the following: For each connection, the necessary memory
buffers are dynamically allocated, linked, used for storing and manipulating
the header and body of the request and the response, and then freed upon
connection release. It is very important to note that nginx tries to avoid
copying data in memory as much as possible and most of the data is passed along
by pointer values, not by calling memcpy
.
Going a bit deeper, when the
response is generated by a module, the retrieved content is put in a memory
buffer which is then added to a buffer chain link. Subsequent processing works
with this buffer chain link as well. Buffer chains are quite complicated in
nginx because there are several processing scenarios which differ depending on
the module type. For instance, it can be quite tricky to manage the buffers
precisely while implementing a body filter module. Such a module can only
operate on one buffer (chain link) at a time and it must decide whether to
overwrite the input buffer, replace the buffer with a newly allocated buffer,
or insert a new buffer before or after the buffer in question. To complicate
things, sometimes a module will receive several buffers so that it has an
incomplete buffer chain that it must operate on. However, at this time nginx
provides only a low-level API for manipulating buffer chains, so before doing
any actual implementation a third-party module developer should become really
fluent with this arcane part of nginx.
A note on the above approach is
that there are memory buffers allocated for the entire life of a connection,
thus for long-lived connections some extra memory is kept. At the same time, on
an idle keepalive connection, nginx spends just 550 bytes of memory. A possible
optimization for future releases of nginx would be to reuse and share memory
buffers for long-lived connections.
The task of managing memory allocation
is done by the nginx pool allocator. Shared memory areas are used to accept
mutex, cache metadata, the SSL session cache and the information associated
with bandwidth policing and management (limits). There is a slab allocator
implemented in nginx to manage shared memory allocation. To allow simultaneous
safe use of shared memory, a number of locking mechanisms are available
(mutexes and semaphores). In order to organize complex data structures, nginx
also provides a red-black tree implementation. Red-black trees are used to keep
cache metadata in shared memory, track non-regex location definitions and for a
couple of other tasks.
Unfortunately, all of the above was
never described in a consistent and simple manner, making the job of developing
third-party extensions for nginx quite complicated. Although some good
documents on nginx internals exist—for instance, those produced by Evan
Miller—such documents required a huge reverse engineering effort, and the
implementation of nginx modules is still a black art for many.
Despite certain difficulties
associated with third-party module development, the nginx user community
recently saw a lot of useful third-party modules. There is, for instance, an
embedded Lua interpreter module for nginx, additional modules for load
balancing, full WebDAV support, advanced cache control and other interesting
third-party work that the authors of this chapter encourage and will support in
the future.
upstream
是 Nginx 的 HTTP Upstream
模块,这个模块通过一个简单的调度算法来实现客户端 IP 到后端服务器的负载均衡。在上面的设定中,通过 upstream 指令指定了一个负载均衡器的名称 test.net。这个名称可以任意指定,在后面需要用到的地方直接调用即可。
upstream 支持的负载均衡算法
Nginx 的负载均衡模块目前支持 6 种调度算法,下面进行分别介绍,其中后两项属于第三方调度算法。
· 轮询(默认):每个请求按时间顺序逐一分配到不同的后端服务器,如果后端某台服务器宕机,故障系统被自动剔除,使用户访问不受影响。Weight 指定轮询权值,Weight 值越大,分配到的访问机率越高,主要用于后端每个服务器性能不均的情况下。
· ip_hash:每个请求按访问 IP 的 hash 结果分配,这样来自同一个 IP 的访客固定访问一个后端服务器,有效解决了动态网页存在的 session 共享问题。
· fair:这是比上面两个更加智能的负载均衡算法。此种算法可以依据页面大小和加载时间长短智能地进行负载均衡,也就是根据后端服务器的响应时间来分配请求,响应时间短的优先分配。Nginx 本身是不支持 fair 的,如果需要使用这种调度算法,必须下载 Nginx 的 upstream_fair 模块。
· url_hash:此方法按访问 url 的 hash 结果来分配请求,使每个 url 定向到同一个后端服务器,可以进一步提高后端缓存服务器的效率。Nginx 本身是不支持 url_hash 的,如果需要使用这种调度算法,必须安装 Nginx 的 hash 软件包。
· least_conn:最少连接负载均衡算法,简单来说就是每次选择的后端都是当前最少连接的一个 server(这个最少连接不是共享的,是每个 worker 都有自己的一个数组进行记录后端 server 的连接数)。
· hash:这个 hash 模块又支持两种模式 hash, 一种是普通的 hash, 另一种是一致性 hash(consistent)。
Related Topic: GCP load balancer:
Two important
features:
Session
affinity:
Session
affinity sends all requests from the same client to the same
virtual machine instance as long as the instance stays healthy and has
capacity.
GCP HTTP(S) Load Balancing offers two types of session affinity:
·
client IP
affinity— forwards all requests from the same client IP address to
the same instance.
·
generated
cookie affinity— sets a client cookie, then sends all requests with
that cookie to the same instance.
WebSocket
proxy support
OpenResty: also is called “ngx_openresty”. It is the parent project of lua-nginx-module. This is
nginx bundled with lua-nginx-module as well as other popular nginx/lua-nginx
modules.
OpenResty Best
Practice Book:
Major points:
‘epoll’ to monitor multiple fd.
mmap to share same memory between kernel space and user
space.
Do JWT based service to service authentication
in openresty:
the jwt library:
https://github.com/SkyLothar/lua-resty-jwt
Install Openresty
on Mac:
(read
http://nginx.org/en/docs/beginners_guide.html to understand configuration)
brew install
openresty
cd
/usr/local/etc/openresty
(edit nginx.conf
and lur/ there)
openresty (to
run)
proxy_pass when
host contains variable
NGINX really needs to talk to
DNS server to resolve backend servers defined in variables.