Saturday, December 22, 2018

Nginx/Openresty


Nginx

Very deep on Nginx variables and Directive execution order:



the lifetime of Nginx variable containers is indeed bound to the request being processed, and is irrelevant to location.

Two ways to do “internal redirection”:

rewrite
echo_exec
proxy_pass (?)

Subrequest:

A subrequest is an abstract invocation for decomposing the task of the main request into smaller "internal requests" that can be served independently by multiple different location blocks, either in series or in parallel. "Subrequests" can also be recursive: any subrequest can initiate more sub-subrequests, targeting other location blocks or even the current location itself.

Sample subrequest directive:

            echo_location (independent variables)
            lua
            auth_request (shared variables)

When a variable is being created at "configure time", the creating Nginx module must make a decision on whether to allocate a value container for it and whether to attach a custom "get handler" and/or a "set handler" to it.
Those variables owning a value container are called "indexed variables" in Nginx's terminology. Otherwise, they are said to be not indexed.
Built-in variables:


Most of built-in variables are read-only.  They are not simple value containers.  But following are not:

$args
$arg_XXX

Most of built-in variables are sensitive to the subrequest context.

Built-in variables for Main Requests only:

            $request_method

Lua:
https://en.wikipedia.org/wiki/Lua_(programming_language)
ngx_lua module embeds the Lua language interpreter (or LuaJIT's Just-in-Time compiler) into the Nginx core, to allow Nginx users directly run their own Lua programs inside the server. The user can choose to insert her Lua code into different running phases of the server, to fulfill different requirements. Such Lua code are either specified directly as literal strings in the Nginx configuration file, or reside in external .lua source files (or Lua binary bytecode files) whose paths are specified in the Nginx configuration.
We cannot directly write something like $arg_name in our Lua code. Instead, we reference Nginx variables in Lua by means of the ngx.var API provided by the ngx_lua module. For example, to reference the Nginx variable $VARIABLE in Lua, we just write ngx.var.VARIABLE. When the Nginx variable $arg_name takes the special value "not found" (or "invalid"), ngx.var.arg_name is evaluated to the nil value in the Lua world. It should also be noting that we use the Lua function ngx.say to print out the response body contents, which is functionally equivalent to the Nginx echo directive.
access_by_lua
init_by_lua
log_by_lua
content_by_lua
set_by_lua
rewrite_by_lua

Using FFI library to call c functions from Lua code:

Directive execution order
Nginx mini language in its configuration is “declarative”, not “procedural”.
3 major phases:
rewrite (set, set_unescape_uri, set_by_lua, rewrite, rewrite_by_lua -- tail)
access (allow, deny, access_by_lua -- tail): ACL checks such as guarding user clearance, checking user origins, examining source IP validity etc.
content (echo, echo_exec, proxy_pass, echo_location, content_by_lua): generate content and output HTTP response.
11 Phases:
Order
phase

coexist in the same phase
Allow ngx modules
1
post-read



2
server-rewrite
Set value directly under ‘server’


3
find-config
Match request to the ‘location’

no
4
rewrite

yes

5
post-rewrite
Call internal redirects defined in rewrite phase

no
6
pre-access



7
access

yes

8
post-access
‘satisfy’ check for ‘all’ or ‘any’

no
9
try-files


no
10
content

no

11
log



Commands in each phase:
phase
module
commands
tail
post-read
ngx_realip
set_real_ip_from (directly under ‘server’)



real_ip_header

server-rewrite
ngx_rewrite
set



rewrite (directly under ‘server’)

rewrite
ngx_rewrite
set



rewrite (directly under ‘location’)


ngx_lua
rewrite_by_lua
yes


rewrite_by_lua_file



set_by_lua



set_by_lua_file


ngx_set_misc
set_unescape_uri


ngx_headers_more
more_set_input_header
yes
pre-access
ngx_limit_req



ngx_limit_zone



ngx_realip
set_real_ip_from (under ‘location’)



real_ip_header

access
ngx_access
deny, allow


ngx_lua
access_by_lua
yes

ngx_auth_request


content
ngx_echo
echo



echo_exec



echo_location


ngx_proxy
proxy_pass


?
fastcgi_pass



fastcgi_param


ngx_index
index (ending with /)



autoindex



ngx_static (not ending with /)


ngx_lua
content_by_lua



content_by_lua_file

output filter
ngx_echo
echo_before_body



echo_after_boody






Order of Lua Nginx Module directives




指令
使用范围
解释
int_by_lua*init_worker_by_lua*
http
初始化全局配置/预加载Lua模块
set_by_lua*
server,server if,location,location if
设置nginx变量,此处是阻塞的,Lua代码要做到非常快
rewrite_by_lua*
http,server,location,location if
rewrite阶段处理,可以实现复杂的转发/重定向逻辑
access_by_lua*
http,server,location,location if
请求访问阶段处理,用于访问控制
content_by_lua*
location, location if
内容处理器,接收请求处理并输出响应
header_filter_by_lua*
httpserverlocationlocation if
设置 header cookie
body_filter_by_lua*
httpserverlocationlocation if
对响应数据进行过滤,比如截断、替换
log_by_lua
httpserverlocationlocation if
log阶段处理,比如记录访问量/统计平均响应时间

Starting, stopping, and reloading configuration:
[root@sv3-dsappweb1-devr1 ~]# service_nginx status/restart/reload/…

(‘reload’ does not work for new defined server so ‘restart’ is still needed)

Nginx service script:

[root@sv3-dsappweb1-devr1 nginx]# cat /lib/systemd/system/nginx.service
[Unit]
Description=nginx - high performance web server
Documentation=http://nginx.org/en/docs/
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/local/pan-openresty/bin/openresty -t -c /etc/nginx/nginx.conf
ExecStart=/usr/local/pan-openresty/bin/openresty  -c /etc/nginx/nginx.conf
ExecReload=/usr/local/pan-openresty/bin/openresty  -c /etc/nginx/nginx.conf -s reload
# ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target

The master process supports the following signals:
TERM, INT
fast shutdown
QUIT
graceful shutdown
HUP
changing configuration, keeping up with a changed time zone (only for FreeBSD and Linux), starting new worker processes with a new configuration, graceful shutdown of old worker processes
USR1
re-opening log files
USR2
upgrading an executable file
WINCH
graceful shutdown of worker processes

Add cookie header:
$ curl --cookie user=agentzh 'http://localhost:8080/test'
Do post:
$ curl --data hello 'http://localhost:8080/main'

Set header:

$ curl -H 'X-My-IP: 1.2.3.4' localhost:8080/test
Send 100000 requests:
ab -k -c1 -n100000 'http://127.0.0.1:8080/hello'


nginx Internals


As was mentioned before, the nginx codebase consists of a core and a number of modules. The core of nginx is responsible for providing the foundation of the web server, web and mail reverse proxy functionalities; it enables the use of underlying network protocols, builds the necessary run-time environment, and ensures seamless interaction between different modules. However, most of the protocol- and application-specific features are done by nginx modules, not the core.
Internally, nginx processes connections through a pipeline, or chain, of modules. In other words, for every operation there's a module which is doing the relevant work; e.g., compression, modifying content, executing server-side includes, communicating to the upstream application servers through FastCGI or uwsgi protocols, or talking to memcached.
There are a couple of nginx modules that sit somewhere between the core and the real "functional" modules. These modules arehttp and mail. These two modules provide an additional level of abstraction between the core and lower-level components. In these modules, the handling of the sequence of events associated with a respective application layer protocol like HTTP, SMTP or IMAP is implemented. In combination with the nginx core, these upper-level modules are responsible for maintaining the right order of calls to the respective functional modules. While the HTTP protocol is currently implemented as part of the http module, there are plans to separate it into a functional module in the future, due to the need to support other protocols like SPDY (see "SPDY: An experimental protocol for a faster web").
The functional modules can be divided into event modules, phase handlers, output filters, variable handlers, protocols, upstreams and load balancers. Most of these modules complement the HTTP functionality of nginx, though event modules and protocols are also used for mail. Event modules provide a particular OS-dependent event notification mechanism like kqueue or epoll. The event module that nginx uses depends on the operating system capabilities and build configuration. Protocol modules allow nginx to communicate through HTTPS, TLS/SSL, SMTP, POP3 and IMAP.
A typical HTTP request processing cycle looks like the following.
1.     Client sends HTTP request.
2.     nginx core chooses the appropriate phase handler based on the configured location matching the request.
3.     If configured to do so, a load balancer picks an upstream server for proxying.
4.     Phase handler does its job and passes each output buffer to the first filter.
5.     First filter passes the output to the second filter.
6.     Second filter passes the output to third (and so on).
7.     Final response is sent to the client.
nginx module invocation is extremely customizable. It is performed through a series of callbacks using pointers to the executable functions. However, the downside of this is that it may place a big burden on programmers who would like to write their own modules, because they must define exactly how and when the module should run. Both the nginx API and developers' documentation are being improved and made more available to alleviate this.
Some examples of where a module can attach are:
·       Before the configuration file is read and processed
·       For each configuration directive for the location and the server where it appears
·       When the main configuration is initialized
·       When the server (i.e., host/port) is initialized
·       When the server configuration is merged with the main configuration
·       When the location configuration is initialized or merged with its parent server configuration
·       When the master process starts or exits
·       When a new worker process starts or exits
·       When handling a request
·       When filtering the response header and the body
·       When picking, initiating and re-initiating a request to an upstream server
·       When processing the response from an upstream server
·       When finishing an interaction with an upstream server
Inside a worker, the sequence of actions leading to the run-loop where the response is generated looks like the following:
1.     Begin ngx_worker_process_cycle().
2.     Process events with OS specific mechanisms (such as epoll or kqueue).
3.     Accept events and dispatch the relevant actions.
4.     Process/proxy request header and body.
5.     Generate response content (header, body) and stream it to the client.
6.     Finalize request.
7.     Re-initialize timers and events.
The run-loop itself (steps 5 and 6) ensures incremental generation of a response and streaming it to the client.
A more detailed view of processing an HTTP request might look like this:
1.     Initialize request processing.
2.     Process header.
3.     Process body.
4.     Call the associated handler.
5.     Run through the processing phases.
Which brings us to the phases. When nginx handles an HTTP request, it passes it through a number of processing phases. At each phase there are handlers to call. In general, phase handlers process a request and produce the relevant output. Phase handlers are attached to the locations defined in the configuration file.
Phase handlers typically do four things: get the location configuration, generate an appropriate response, send the header, and send the body. A handler has one argument: a specific structure describing the request. A request structure has a lot of useful information about the client request, such as the request method, URI, and header.
When the HTTP request header is read, nginx does a lookup of the associated virtual server configuration. If the virtual server is found, the request goes through six phases:
1.     server rewrite phase
2.     location phase
3.     location rewrite phase (which can bring the request back to the previous phase)
4.     access control phase
5.     try_files phase
6.     log phase
In an attempt to generate the necessary content in response to the request, nginx passes the request to a suitable content handler. Depending on the exact location configuration, nginx may try so-called unconditional handlers first, like perlproxy_passflvmp4, etc. If the request does not match any of the above content handlers, it is picked by one of the following handlers, in this exact order: random indexindexautoindexgzip_staticstatic.
Indexing module details can be found in the nginx documentation, but these are the modules which handle requests with a trailing slash. If a specialized module like mp4 or autoindex isn't appropriate, the content is considered to be just a file or directory on disk (that is, static) and is served by the static content handler. For a directory it would automatically rewrite the URI so that the trailing slash is always there (and then issue an HTTP redirect).
The content handlers' content is then passed to the filters. Filters are also attached to locations, and there can be several filters configured for a location. Filters do the task of manipulating the output produced by a handler. The order of filter execution is determined at compile time. For the out-of-the-box filters it's predefined, and for a third-party filter it can be configured at the build stage. In the existing nginx implementation, filters can only do outbound changes and there is currently no mechanism to write and attach filters to do input content transformation. Input filtering will appear in future versions of nginx.
Filters follow a particular design pattern. A filter gets called, starts working, and calls the next filter until the final filter in the chain is called. After that, nginx finalizes the response. Filters don't have to wait for the previous filter to finish. The next filter in a chain can start its own work as soon as the input from the previous one is available (functionally much like the Unix pipeline). In turn, the output response being generated can be passed to the client before the entire response from the upstream server is received.
There are header filters and body filters; nginx feeds the header and the body of the response to the associated filters separately.
A header filter consists of three basic steps:
1.     Decide whether to operate on this response.
2.     Operate on the response.
3.     Call the next filter.
Body filters transform the generated content. Examples of body filters include:
·       server-side includes
·       XSLT filtering
·       image filtering (for instance, resizing images on the fly)
·       charset modification
·       gzip compression
·       chunked encoding
After the filter chain, the response is passed to the writer. Along with the writer there are a couple of additional special purpose filters, namely the copy filter, and the postpone filter. The copy filter is responsible for filling memory buffers with the relevant response content which might be stored in a proxy temporary directory. The postpone filter is used for subrequests.
Subrequests are a very important mechanism for request/response processing. Subrequests are also one of the most powerful aspects of nginx. With subrequests nginx can return the results from a different URL than the one the client originally requested. Some web frameworks call this an internal redirect. However, nginx goes further—not only can filters perform multiple subrequests and combine the outputs into a single response, but subrequests can also be nested and hierarchical. A subrequest can perform its own sub-subrequest, and a sub-subrequest can initiate sub-sub-subrequests. Subrequests can map to files on the hard disk, other handlers, or upstream servers. Subrequests are most useful for inserting additional content based on data from the original response. For example, the SSI (server-side include) module uses a filter to parse the contents of the returned document, and then replaces include directives with the contents of specified URLs. Or, it can be an example of making a filter that treats the entire contents of a document as a URL to be retrieved, and then appends the new document to the URL itself.
Upstream and load balancers are also worth describing briefly. Upstreams are used to implement what can be identified as a content handler which is a reverse proxy (proxy_pass handler). Upstream modules mostly prepare the request to be sent to an upstream server (or "backend") and receive the response from the upstream server. There are no calls to output filters here. What an upstream module does exactly is set callbacks to be invoked when the upstream server is ready to be written to and read from. Callbacks implementing the following functionality exist:
·       Crafting a request buffer (or a chain of them) to be sent to the upstream server
·       Re-initializing/resetting the connection to the upstream server (which happens right before creating the request again)
·       Processing the first bits of an upstream response and saving pointers to the payload received from the upstream server
·       Aborting requests (which happens when the client terminates prematurely)
·       Finalizing the request when nginx finishes reading from the upstream server
·       Trimming the response body (e.g. removing a trailer)
Load balancer modules attach to the proxy_pass handler to provide the ability to choose an upstream server when more than one upstream server is eligible. A load balancer registers an enabling configuration file directive, provides additional upstream initialization functions (to resolve upstream names in DNS, etc.), initializes the connection structures, decides where to route the requests, and updates stats information. Currently nginx supports two standard disciplines for load balancing to upstream servers: round-robin and ip-hash.
Upstream and load balancing handling mechanisms include algorithms to detect failed upstream servers and to re-route new requests to the remaining ones—though a lot of additional work is planned to enhance this functionality. In general, more work on load balancers is planned, and in the next versions of nginx the mechanisms for distributing the load across different upstream servers as well as health checks will be greatly improved.
There are also a couple of other interesting modules which provide an additional set of variables for use in the configuration file. While the variables in nginx are created and updated across different modules, there are two modules that are entirely dedicated to variables: geo and map. The geo module is used to facilitate tracking of clients based on their IP addresses. This module can create arbitrary variables that depend on the client's IP address. The other module, map, allows for the creation of variables from other variables, essentially providing the ability to do flexible mappings of hostnames and other run-time variables. This kind of module may be called the variable handler.
Memory allocation mechanisms implemented inside a single nginx worker were, to some extent, inspired by Apache. A high-level description of nginx memory management would be the following: For each connection, the necessary memory buffers are dynamically allocated, linked, used for storing and manipulating the header and body of the request and the response, and then freed upon connection release. It is very important to note that nginx tries to avoid copying data in memory as much as possible and most of the data is passed along by pointer values, not by calling memcpy.
Going a bit deeper, when the response is generated by a module, the retrieved content is put in a memory buffer which is then added to a buffer chain link. Subsequent processing works with this buffer chain link as well. Buffer chains are quite complicated in nginx because there are several processing scenarios which differ depending on the module type. For instance, it can be quite tricky to manage the buffers precisely while implementing a body filter module. Such a module can only operate on one buffer (chain link) at a time and it must decide whether to overwrite the input buffer, replace the buffer with a newly allocated buffer, or insert a new buffer before or after the buffer in question. To complicate things, sometimes a module will receive several buffers so that it has an incomplete buffer chain that it must operate on. However, at this time nginx provides only a low-level API for manipulating buffer chains, so before doing any actual implementation a third-party module developer should become really fluent with this arcane part of nginx.
A note on the above approach is that there are memory buffers allocated for the entire life of a connection, thus for long-lived connections some extra memory is kept. At the same time, on an idle keepalive connection, nginx spends just 550 bytes of memory. A possible optimization for future releases of nginx would be to reuse and share memory buffers for long-lived connections.
The task of managing memory allocation is done by the nginx pool allocator. Shared memory areas are used to accept mutex, cache metadata, the SSL session cache and the information associated with bandwidth policing and management (limits). There is a slab allocator implemented in nginx to manage shared memory allocation. To allow simultaneous safe use of shared memory, a number of locking mechanisms are available (mutexes and semaphores). In order to organize complex data structures, nginx also provides a red-black tree implementation. Red-black trees are used to keep cache metadata in shared memory, track non-regex location definitions and for a couple of other tasks.
Unfortunately, all of the above was never described in a consistent and simple manner, making the job of developing third-party extensions for nginx quite complicated. Although some good documents on nginx internals exist—for instance, those produced by Evan Miller—such documents required a huge reverse engineering effort, and the implementation of nginx modules is still a black art for many.
Despite certain difficulties associated with third-party module development, the nginx user community recently saw a lot of useful third-party modules. There is, for instance, an embedded Lua interpreter module for nginx, additional modules for load balancing, full WebDAV support, advanced cache control and other interesting third-party work that the authors of this chapter encourage and will support in the future.


upstream
Nginx HTTP Upstream 模块,这个模块通过一个简单的调度算法来实现客户端 IP 到后端服务器的负载均衡。在上面的设定中,通过 upstream 指令指定了一个负载均衡器的名称 test.net。这个名称可以任意指定,在后面需要用到的地方直接调用即可。
upstream 支持的负载均衡算
Nginx 的负载均衡模块目前支持 6 种调度算法,下面进行分别介绍,其中后两项属于第三方调度算法。
·       轮询(默认):每个请求按时间顺序逐一分配到不同的后端服务器,如果后端某台服务器宕机,故障系统被自动剔除,使用户访问不受影响。Weight 指定轮询权值,Weight 值越大,分配到的访问机率越高,主要用于后端每个服务器性能不均的情况下
·       ip_hash:每个请求按访问 IP hash 结果分配,这样来自同一个 IP 的访客固定访问一个后端服务器,有效解决了动态网页存在的 session 共享问题
·       fair:这是比上面两个更加智能的负载均衡算法。此种算法可以依据页面大小和加载时间长短智能地进行负载均衡,也就是根据后端服务器的响应时间来分配请求,响应时间短的优先分配。Nginx 本身是不支持 fair 的,如果需要使用这种调度算法,必须下载 Nginx upstream_fair 模块
·       url_hash:此方法按访问 url hash 结果来分配请求,使每个 url 定向到同一个后端服务器,可以进一步提高后端缓存服务器的效率。Nginx 本身是不支持 url_hash 的,如果需要使用这种调度算法,必须安装 Nginx hash 软件包
·       least_conn:最少连接负载均衡算法,简单来说就是每次选择的后端都是当前最少连接的一个 server(这个最少连接不是共享的,是每个 worker 都有自己的一个数组进行记录后端 server 的连接数)
·       hash:这个 hash 模块又支持两种模式 hash, 一种是普通的 hash, 另一种是一致性 hash(consistent)
Related Topic: GCP load balancer:

Two important features:

Session affinity:

Session affinity sends all requests from the same client to the same virtual machine instance as long as the instance stays healthy and has capacity.
GCP HTTP(S) Load Balancing offers two types of session affinity:
·       client IP affinity— forwards all requests from the same client IP address to the same instance.
·       generated cookie affinity— sets a client cookie, then sends all requests with that cookie to the same instance.

WebSocket proxy support




OpenResty:  also is called “ngx_openresty”.  It is the parent project of lua-nginx-module. This is nginx bundled with lua-nginx-module as well as other popular nginx/lua-nginx modules.

OpenResty Best Practice Book:


Major points:

‘epoll’ to monitor multiple fd.
mmap to share same memory between kernel space and user space.

Do JWT based service to service authentication in openresty:

the jwt library:
https://github.com/SkyLothar/lua-resty-jwt

Install Openresty on Mac:
(read http://nginx.org/en/docs/beginners_guide.html to understand configuration)
brew install openresty
cd /usr/local/etc/openresty
(edit nginx.conf and lur/ there)
openresty (to run)


proxy_pass when host contains variable


NGINX really needs to talk to DNS server to resolve backend servers defined in variables.