理解Docker容器端口映射
在” 理解Docker單機容器網絡 “一文中,還有一個Docker容器網絡的功能尚未提及,那就是Docker容器的端口映射。即將容器的服務端口P’ 綁定到宿主機的端口P上,最終達到一種效果:外部程序通過宿主機的P端口訪問,就像直接訪問Docker容器網絡內部容器提供的服務一樣。
Docker針對端口映射前后有兩種方案,一種是1.7版本之前docker-proxy+iptables DNAT 的方式;另一種則是1.7版本(及之后)提供的完全由iptables DNAT實現的端口映射。不過在目前docker 1.9.1中,前一種方式依舊是默認方式。但是從Docker 1.7版本起,Docker提供了一個配置項:–userland-proxy,以讓Docker用戶決定是否啟用docker-proxy,默認為true,即啟用docker-proxy。本文續前文,繼續探討使用端口映射時Docker容器網絡的通信流程。
本文中的實驗環境依舊保持與上文相同:docker 1.9.1,ubuntu 12.04宿主機,docker image基于官方ubuntu 14.04 image做的一些軟件安裝。
一、–userland-proxy=true(defaut)的情況下端口映射
我們首先在實驗環境下采用默認的方式進行端口映射,即–userland-proxy=true。
我們來建立一個 新container – container3(172.17.0.4),實現了0.0.0.0:12580 -> container3:12580。
$docker run -it --name container3 -p 12580:12580 dockernetworking/ubuntu:14.04 /bin/bash
這個命令執行后,iptables增加了三條rules:
filter forward鏈: Chain DOCKER (1 references) pkts bytes target prot opt in out source destination 0 0 ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.4 tcp dpt:12580 nat output鏈: Chain DOCKER (1 references) pkts bytes target prot opt in out source destination 0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:12580 to:172.17.0.4:12580 nat postrouting鏈: Chain POSTROUTING (policy ACCEPT 24 packets, 1472 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE tcp -- * * 172.17.0.4 172.17.0.4 tcp dpt:12580
我們可以看到了一個DNAT target,是在nat output鏈中,這個是一個關鍵點。同樣是考慮到調試的方便,在這新增的rules前面,增加LOG target,新的iptables導出內容為:
iptables.portmap.stage1.rules # Generated by iptables-save v1.4.12 on Fri Jan 15 15:31:06 2016 *raw : PREROUTING ACCEPT [5737658:60554342802] :OUTPUT ACCEPT [4294004:56674784720] -A PREROUTING -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-RawPrerouting:" --log-level 7 -A PREROUTING -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-RawPrerouting:" --log-level 7 -A OUTPUT -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-RawOutput:" --log-level 7 -A OUTPUT -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-RawOutput:" --log-level 7 COMMIT # Completed on Fri Jan 15 15:31:06 2016 # Generated by iptables-save v1.4.12 on Fri Jan 15 15:31:06 2016 *filter :INPUT ACCEPT [4444190:53498587744] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [4292173:56674165678] : DOCKER - [0:0] :FwdId0Od0 - [0:0] :FwdId0Ond0 - [0:0] :FwdOd0 - [0:0] -A INPUT ! -i lo -p icmp -j LOG --log-prefix "[TonyBai]-EnterFilterInput:" --log-level 7 -A INPUT -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-FilterInput:" --log-level 7 -A INPUT -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-FilterInput:" --log-level 7 -A FORWARD -o docker0 -j DOCKER -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j FwdOd0 -A FORWARD -i docker0 ! -o docker0 -j FwdId0Ond0 -A FORWARD -i docker0 -o docker0 -j FwdId0Od0 -A OUTPUT ! -s 127.0.0.1/32 -p icmp -j LOG --log-prefix "[TonyBai]-EnterFilterOutput:" --log-level 7 -A OUTPUT -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-FilterOutput:" --log-level 7 -A OUTPUT -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-FilterOutput:" --log-level 7 -A DOCKER -d 172.17.0.4/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-PortmapFowardDocker:" --log-level 7 -A DOCKER -d 172.17.0.4/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 12580 -j ACCEPT -A FwdId0Od0 -i docker0 -o docker0 -j LOG --log-prefix "[TonyBai]-FwdId0Od0:" --log-level 7 -A FwdId0Od0 -i docker0 -o docker0 -j ACCEPT -A FwdId0Ond0 -i docker0 ! -o docker0 -j LOG --log-prefix "[TonyBai]-FwdId0Ond0:" --log-level 7 -A FwdId0Ond0 -i docker0 ! -o docker0 -j ACCEPT -A FwdOd0 -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j LOG --log-prefix "[TonyBai]-FwdOd0:" --log-level 7 -A FwdOd0 -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT COMMIT # Completed on Fri Jan 15 15:31:06 2016 # Generated by iptables-save v1.4.12 on Fri Jan 15 15:31:06 2016 *nat : PREROUTING ACCEPT [24690:5091417] :INPUT ACCEPT [10942:2271167] :OUTPUT ACCEPT [7756:523318] : POSTROUTING ACCEPT [7759:523498] : DOCKER - [0:0] :LogNatPostRouting - [0:0] -A PREROUTING -p icmp -j LOG --log-prefix "[TonyBai]-Enter iptables:" --log-level 7 -A PREROUTING -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-NatPrerouting:" --log-level 7 -A PREROUTING -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-NatPrerouting:" --log-level 7 -A INPUT ! -i lo -p icmp -j LOG --log-prefix "[TonyBai]-EnterNatInput:" --log-level 7 -A INPUT -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-NatInput:" --log-level 7 -A INPUT -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-NatInput:" --log-level 7 -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j LogNatPostRouting -A POSTROUTING -s 172.17.0.4/32 -d 172.17.0.4/32 -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-PortmapNatPostRouting:" --log-level 7 -A POSTROUTING -s 172.17.0.4/32 -d 172.17.0.4/32 -p tcp -m tcp --dport 12580 -j MASQUERADE -A DOCKER ! -i docker0 -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-PortmapNatOutputDocker:" --log-level 7 -A DOCKER ! -i docker0 -p tcp -m tcp --dport 12580 -j DNAT --to-destination 172.17.0.4:12580 -A LogNatPostRouting -s 172.17.0.0/16 ! -o docker0 -j LOG --log-prefix "[TonyBai]-NatPostRouting:" --log-level 7 -A LogNatPostRouting -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE COMMIT # Completed on Fri Jan 15 15:31:06 2016
另外我們可以查看到宿主機中多了一個進程,這就是前面所說的docker-proxy,每增加一個端口映射,宿主機就會多出一個docker-proxy進程:
root 5742 2113 0 08:48 ? 00:00:00 docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 12580 -container-ip 172.17.0.4 -container-port 12580
1、從10.10.126.187訪問宿主機(10.10.126.101)的12580端口
10.10.126.187是與101在同一直連網路的主機,我們在其上執行telnet 10.10.126.101 12580。如果container3中有server在監聽12580,則建立連接和數據通信(發送一個hello)的過程如下。
【187到101的tcp握手sync包】
101從eth0網卡收到目的地址是自己的sync數據包:
Jan 15 16:04:54 pc-baim kernel: [28410.162828] [TonyBai]-RawPrerouting:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:19:bb:5e:0a:86:08:00 SRC=10.10.126.187 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=32617 DF PROTO=TCP SPT=33250 DPT=12580 WINDOW=5840 RES=0x00 SYN URGP=0 Jan 15 16:04:54 pc-baim kernel: [28410.162862] [TonyBai]-NatPrerouting:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:19:bb:5e:0a:86:08:00 SRC=10.10.126.187 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=32617 DF PROTO=TCP SPT=33250 DPT=12580 WINDOW=5840 RES=0x00 SYN URGP=0
由于目的地址就是自己,因此在iptables中走input chain將數據包發給user層:
Jan 15 16:04:54 pc-baim kernel: [28410.162885] [TonyBai]-FilterInput:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:19:bb:5e:0a:86:08:00 SRC=10.10.126.187 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=32617 DF PROTO=TCP SPT=33250 DPT=12580 WINDOW=5840 RES=0x00 SYN URGP=0 Jan 15 16:04:54 pc-baim kernel: [28410.162900] [TonyBai]-NatInput:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:19:bb:5e:0a:86:08:00 SRC=10.10.126.187 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=32617 DF PROTO=TCP SPT=33250 DPT=12580 WINDOW=5840 RES=0x00 SYN URGP=0
【101回復ack sync包】
101上的用戶層是docker-proxy在監聽12580端口,當收到sync后,會回復ack sync。由于是user空間自產包,路由后走output鏈。
Jan 15 16:04:54 pc-baim kernel: [28410.162933] [TonyBai]-RawOutput:IN= OUT=eth0 SRC=10.10.126.101 DST=10.10.126.187 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=33250 WINDOW=28960 RES=0x00 ACK SYN URGP=0 Jan 15 16:04:54 pc-baim kernel: [28410.162948] [TonyBai]-FilterOutput:IN= OUT=eth0 SRC=10.10.126.101 DST=10.10.126.187 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=33250 WINDOW=28960 RES=0x00 ACK SYN URGP=0
【187回復ack,101與187握手完成】
187回復握手過程最后的一個ack。這個過程與sync類似:
Jan 15 16:04:54 pc-baim kernel: [28410.163397] [TonyBai]-RawPrerouting:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:19:bb:5e:0a:86:08:00 SRC=10.10.126.187 DST=10.10.126.101 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=32618 DF PROTO=TCP SPT=33250 DPT=12580 WINDOW=92 RES=0x00 ACK URGP=0 Jan 15 16:04:54 pc-baim kernel: [28410.163437] [TonyBai]-FilterInput:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:19:bb:5e:0a:86:08:00 SRC=10.10.126.187 DST=10.10.126.101 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=32618 DF PROTO=TCP SPT=33250 DPT=12580 WINDOW=92 RES=0x00 ACK URGP=0
重點是接下來發生的事情:101上的docker-proxy向container3上的server程序建立tcp連接!
【host向container3發送sync】
Jan 15 16:04:54 pc-baim kernel: [28410.163863] [TonyBai]-RawOutput:IN= OUT=docker0 SRC=172.17.0.1 DST=172.17.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=5768 DF PROTO=TCP SPT=43771 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0 Jan 15 16:04:54 pc-baim kernel: [28410.163901] [TonyBai]-FilterOutput:IN= OUT=docker0 SRC=172.17.0.1 DST=172.17.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=5768 DF PROTO=TCP SPT=43771 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0
我們看到SYN數據包源地址用的是172.17.0.1,不知是否是docker-proxy內部有意選擇了網橋的ip。由于是user層發出的包,于是走iptables output鏈。
【container3回復ack sync】
container3回復ack sync,目的地址是172.17.0.1,host從docker0網卡收到ack sync數據,路由后發現是發給自己的包,于是走input chain.
Jan 15 16:04:54 pc-baim kernel: [28410.164000] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=vethf0cc298 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=172.17.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=43771 WINDOW=28960 RES=0x00 ACK SYN URGP=0 Jan 15 16:04:54 pc-baim kernel: [28410.164026] [TonyBai]-FilterInput:IN=docker0 OUT= PHYSIN=vethf0cc298 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=172.17.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=43771 WINDOW=28960 RES=0x00 ACK SYN URGP=0
【host回復ack,host與container3握手完成】
host回復握手過程最后的一個ack。user空間自產數據包,于是走output chain:
Jan 15 16:04:54 pc-baim kernel: [28410.164049] [TonyBai]-RawOutput:IN= OUT=docker0 SRC=172.17.0.1 DST=172.17.0.4 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=5769 DF PROTO=TCP SPT=43771 DPT=12580 WINDOW=229 RES=0x00 ACK URGP=0 Jan 15 16:04:54 pc-baim kernel: [28410.164058] [TonyBai]-FilterOutput:IN= OUT=docker0 SRC=172.17.0.1 DST=172.17.0.4 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=5769 DF PROTO=TCP SPT=43771 DPT=12580 WINDOW=229 RES=0x00 ACK URGP=0
【187 在已經建立的連接上發送”hello”】
187發送hello to host,docker-proxy收到hello數據:
Jan 15 16:04:58 pc-baim kernel: [28413.840854] [TonyBai]-RawPrerouting:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:19:bb:5e:0a:86:08:00 SRC=10.10.126.187 DST=10.10.126.101 LEN=59 TOS=0x10 PREC=0x00 TTL=64 ID=32619 DF PROTO=TCP SPT=33250 DPT=12580 WINDOW=92 RES=0x00 ACK PSH URGP=0 Jan 15 16:04:58 pc-baim kernel: [28413.840874] [TonyBai]-FilterInput:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:19:bb:5e:0a:86:08:00 SRC=10.10.126.187 DST=10.10.126.101 LEN=59 TOS=0x10 PREC=0x00 TTL=64 ID=32619 DF PROTO=TCP SPT=33250 DPT=12580 WINDOW=92 RES=0x00 ACK PSH URGP=0
【host返回 ack push】
Jan 15 16:04:58 pc-baim kernel: [28413.840893] [TonyBai]-RawOutput:IN= OUT=eth0 SRC=10.10.126.101 DST=10.10.126.187 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=22415 DF PROTO=TCP SPT=12580 DPT=33250 WINDOW=227 RES=0x00 ACK URGP=0 Jan 15 16:04:58 pc-baim kernel: [28413.840902] [TonyBai]-FilterOutput:IN= OUT=eth0 SRC=10.10.126.101 DST=10.10.126.187 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=22415 DF PROTO=TCP SPT=12580 DPT=33250 WINDOW=227 RES=0x00 ACK URGP=0
接下來,docker-proxy將hello從已有連接上轉發給container3。
【host轉發hello到container3】
Jan 15 16:04:58 pc-baim kernel: [28413.841000] [TonyBai]-RawOutput:IN= OUT=docker0 SRC=172.17.0.1 DST=172.17.0.4 LEN=59 TOS=0x00 PREC=0x00 TTL=64 ID=5770 DF PROTO=TCP SPT=43771 DPT=12580 WINDOW=229 RES=0x00 ACK PSH URGP=0 Jan 15 16:04:58 pc-baim kernel: [28413.841026] [TonyBai]-FilterOutput:IN= OUT=docker0 SRC=172.17.0.1 DST=172.17.0.4 LEN=59 TOS=0x00 PREC=0x00 TTL=64 ID=5770 DF PROTO=TCP SPT=43771 DPT=12580 WINDOW=229 RES=0x00 ACK PSH URGP=0
【container3回復ack 】
Jan 15 16:04:58 pc-baim kernel: [28413.841101] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=vethf0cc298 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=172.17.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=61139 DF PROTO=TCP SPT=12580 DPT=43771 WINDOW=227 RES=0x00 ACK URGP=0 Jan 15 16:04:58 pc-baim kernel: [28413.841119] [TonyBai]-FilterInput:IN=docker0 OUT= PHYSIN=vethf0cc298 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=172.17.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=61139 DF PROTO=TCP SPT=12580 DPT=43771 WINDOW=227 RES=0x00 ACK URGP=0
通信過程到此結束。通過這個過程,我們至少了解到兩點:
1、docker-proxy將外部建立在host:12580上的連接上的數據轉發到container中,反之亦然,如果container 通過與host已經建立的連接向外發送數據,docker-proxy也會將數據轉發給187。2、通過iptables log輸出我們可以看到:為了port map而添加的DNAT和MASQUERADE 并沒有被匹配到,也就是說在這個過程中并沒有用到DNAT,而是完全依靠docker-proxy做的4層代理。
2、從宿主機上訪問10.10.126.101:12580
我們在宿主機本機上訪問10.10.126.101:12580,看看這個通信過程與上面的是否有差異。
【與本機12580端口建立連接,發送sync包】
由于是user層發送數據包,因此走iptables output鏈。
Jan 15 16:40:15 pc-baim kernel: [30532.594545] [TonyBai]-RawOutput:IN= OUT=lo SRC=10.10.126.101 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=53747 DF PROTO=TCP SPT=48039 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0
在output鏈上,匹配到nat output上的規則:
Chain DOCKER (1 references) pkts bytes target prot opt in out source destination 1 60 LOG tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:12580 LOG flags 0 level 7 prefix "[TonyBai]-PortmapNatOutputDoc" 1 60 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:12580 to:172.17.0.4:12580
于是這里將做一個DNAT,數據包的目的地址10.10.126.101被替換為172.17.0.4。
Jan 15 16:40:15 pc-baim kernel: [30532.594561] [TonyBai]-PortmapNatOutputDoc IN= OUT=lo SRC=10.10.126.101 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=53747 DF PROTO=TCP SPT=48039 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0 Jan 15 16:40:15 pc-baim kernel: [30532.594572] [TonyBai]-FilterOutput:IN= OUT=lo SRC=10.10.126.101 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=53747 DF PROTO=TCP SPT=48039 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0
DNAT后,將按照目的地址做一個重新路由:叫實際路由。消息實際重定向到docker0進行封包發送,sync包直接進入到container3 中。
【container3發送ack sync包】
docker0出來的ack sync 通過input chain送到user空間。這塊應該由一個自動un-DNAT,將172.17.0.4自動轉回10.10.126.101,但通過iptables日志無法確認這點。
Jan 15 16:40:15 pc-baim kernel: [30532.594615] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=vethf0cc298 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=10.10.126.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=48039 WINDOW=28960 RES=0x00 ACK SYN URGP=0 Jan 15 16:40:15 pc-baim kernel: [30532.594624] [TonyBai]-FilterInput:IN=docker0 OUT= PHYSIN=vethf0cc298 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=10.10.126.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=48039 WINDOW=28960 RES=0x00 ACK SYN URGP=0
【host發送ack,完成握手】
host回復ack。user層自產包,走output鏈,看rawoutput,dst依舊是126.101(telnet自然不應該知道 172.17.0.4的存在),但是filter output 前,iptables對該地址自動做了dnat,無需重新進入到nat output鏈,因為之前已經進過了。在filter output中,我們看到dst ip已經變成了container3的ip地址:
Jan 15 16:40:15 pc-baim kernel: [30532.594637] [TonyBai]-RawOutput:IN= OUT=lo SRC=10.10.126.101 DST=10.10.126.101 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=53748 DF PROTO=TCP SPT=48039 DPT=12580 WINDOW=342 RES=0x00 ACK URGP=0 Jan 15 16:40:15 pc-baim kernel: [30532.594643] [TonyBai]-FilterOutput:IN= OUT=lo SRC=10.10.126.101 DST=172.17.0.4 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=53748 DF PROTO=TCP SPT=48039 DPT=12580 WINDOW=342 RES=0x00 ACK URGP=0
【host發送hello】
這個過程同上,不贅述。
Jan 15 16:40:18 pc-baim kernel: [30535.344921] [TonyBai]-RawOutput:IN= OUT=lo SRC=10.10.126.101 DST=10.10.126.101 LEN=59 TOS=0x10 PREC=0x00 TTL=64 ID=53749 DF PROTO=TCP SPT=48039 DPT=12580 WINDOW=342 RES=0x00 ACK PSH URGP=0 Jan 15 16:40:18 pc-baim kernel: [30535.344956] [TonyBai]-FilterOutput:IN= OUT=lo SRC=10.10.126.101 DST=172.17.0.4 LEN=59 TOS=0x10 PREC=0x00 TTL=64 ID=53749 DF PROTO=TCP SPT=48039 DPT=12580 WINDOW=342 RES=0x00 ACK PSH URGP=0
【container回復ack】
不贅述。
Jan 15 16:40:18 pc-baim kernel: [30535.345027] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=vethf0cc298 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=10.10.126.101 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=43021 DF PROTO=TCP SPT=12580 DPT=48039 WINDOW=227 RES=0x00 ACK URGP=0 Jan 15 16:40:18 pc-baim kernel: [30535.345056] [TonyBai]-FilterInput:IN=docker0 OUT= PHYSIN=vethf0cc298 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=10.10.126.101 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=43021 DF PROTO=TCP SPT=12580 DPT=48039 WINDOW=227 RES=0x00 ACK URGP=0
從這個過程可以看到,在宿主機上訪問container的映射端口,通信流程不走docker-proxy,而是直接通過output 的dnat將數據包被直接轉給container中的server程序。
3、container to container
在container1中telnet 10.10.126.101 12580會發生什么呢?這里就不長篇大論的列log了,直接給出結論:通過docker-proxy轉發,因為不滿足nat output中DNAT的匹配條件。
二、在–userland-proxy=false的情況下
我們修改了一下/etc/default/docker配置,為DOCKER_OPTS增加一個option: –userland-proxy=false。
DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4 --userland-proxy=false"
重啟docker daemon并清理iptables規則(-F),并啟動做端口映射的container3。啟動后,你會發現之前的docker-proxy并沒有出現在啟動進程列表中,iptables的規則與–userland-proxy=true時也有所不同:
$ sudo iptables -nL -v Chain INPUT (policy ACCEPT 1645 packets, 368K bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED 0 0 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0 Chain OUTPUT (policy ACCEPT 263 packets, 134K bytes) pkts bytes target prot opt in out source destination Chain DOCKER (1 references) pkts bytes target prot opt in out source destination 0 0 ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.4 tcp dpt:12580 $ sudo iptables -t nat -nL -v Chain PREROUTING (policy ACCEPT 209 packets, 65375 bytes) pkts bytes target prot opt in out source destination 71 49357 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain INPUT (policy ACCEPT 98 packets, 39060 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 34 packets, 2096 bytes) pkts bytes target prot opt in out source destination 21 1302 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL Chain POSTROUTING (policy ACCEPT 34 packets, 2096 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- * docker0 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match src-type LOCAL 0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0 0 0 MASQUERADE tcp -- * * 172.17.0.4 172.17.0.4 tcp dpt:12580 Chain DOCKER (2 references) pkts bytes target prot opt in out source destination 0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:12580 to:172.17.0.4:12580
可以看到nat表中prerouting鏈增加了target為DOCKER鏈的規則,并且Docker鏈中對dnat的匹配條件也放開了,只要是dst-type是LOCAL的,dport=12580的,都將ip映射為172.17.0.4。
由于iptables的規則有所變化,因此因此我的log target的匹配條件也該調整一下了,調整后的iptables為:
iptables.portmap.stage1.tmp.rules # Generated by iptables-save v1.4.12 on Mon Jan 18 09:06:06 2016 *mangle : POSTROUTING ACCEPT [0:0] -A POSTROUTING -o docker0 -m addrtype --src-type LOCAL -j LOG --log-prefix "[TonyBai]-manglepost1" --log-level 7 -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j LOG --log-prefix "[TonyBai]-manglepost2" --log-level 7 -A POSTROUTING -s 172.17.0.4/32 -d 172.17.0.4/32 -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-manglepost3" --log-level 7 COMMIT *raw : PREROUTING ACCEPT [1008742:377375989] :OUTPUT ACCEPT [426678:274235692] -A PREROUTING -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-RawPrerouting:" --log-level 7 -A PREROUTING -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-RawPrerouting:" --log-level 7 -A OUTPUT -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-RawOutput:" --log-level 7 -A OUTPUT -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-RawOutput:" --log-level 7 COMMIT # Completed on Mon Jan 18 09:06:06 2016 # Generated by iptables-save v1.4.12 on Mon Jan 18 09:06:06 2016 *filter :INPUT ACCEPT [187016:64478647] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [81342:51955911] : DOCKER - [0:0] :FwdId0Od0 - [0:0] :FwdId0Ond0 - [0:0] :FwdOd0 - [0:0] -A INPUT ! -i lo -p icmp -j LOG --log-prefix "[TonyBai]-EnterFilterInput:" --log-level 7 -A INPUT -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-FilterInput:" --log-level 7 -A INPUT -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-FilterInput:" --log-level 7 -A FORWARD -o docker0 -j DOCKER -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j FwdOd0 -A FORWARD -i docker0 ! -o docker0 -j FwdId0Ond0 -A FORWARD -i docker0 -o docker0 -j FwdId0Od0 -A OUTPUT ! -s 127.0.0.1/32 -p icmp -j LOG --log-prefix "[TonyBai]-EnterFilterOutput:" --log-level 7 -A OUTPUT -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-FilterOutput:" --log-level 7 -A OUTPUT -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-FilterOutput:" --log-level 7 -A DOCKER -d 172.17.0.4/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-PortmapFowardDocker" --log-level 7 -A DOCKER -d 172.17.0.4/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 12580 -j ACCEPT -A FwdId0Od0 -i docker0 -o docker0 -j LOG --log-prefix "[TonyBai]-FwdId0Od0:" --log-level 7 -A FwdId0Od0 -i docker0 -o docker0 -j ACCEPT -A FwdId0Ond0 -i docker0 ! -o docker0 -j LOG --log-prefix "[TonyBai]-FwdId0Ond0:" --log-level 7 -A FwdId0Ond0 -i docker0 ! -o docker0 -j ACCEPT -A FwdOd0 -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j LOG --log-prefix "[TonyBai]-FwdOd0:" --log-level 7 -A FwdOd0 -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT COMMIT # Completed on Mon Jan 18 09:06:06 2016 # Generated by iptables-save v1.4.12 on Mon Jan 18 09:06:06 2016 *nat : PREROUTING ACCEPT [34423:7014094] :INPUT ACCEPT [9475:1880078] :OUTPUT ACCEPT [3524:218202] : POSTROUTING ACCEPT [3508:217098] : DOCKER - [0:0] :LogNatPostRouting1 - [0:0] :LogNatPostRouting2 - [0:0] :LogNatPostRouting3 - [0:0] -A PREROUTING -p icmp -j LOG --log-prefix "[TonyBai]-Enter iptables:" --log-level 7 -A PREROUTING -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-NatPrerouting:" --log-level 7 -A PREROUTING -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-NatPrerouting:" --log-level 7 -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A INPUT ! -i lo -p icmp -j LOG --log-prefix "[TonyBai]-EnterNatInput:" --log-level 7 -A INPUT -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-NatInput:" --log-level 7 -A INPUT -p tcp -m tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-NatInput:" --log-level 7 -A OUTPUT -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -p tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-NatPostrouteEnter" --log-level 7 -A POSTROUTING -p tcp --sport 12580 -j LOG --log-prefix "[TonyBai]-NatPostrouteEnter" --log-level 7 -A POSTROUTING -o docker0 -m addrtype --src-type LOCAL -j LogNatPostRouting1 -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j LogNatPostRouting2 -A POSTROUTING -s 172.17.0.4/32 -d 172.17.0.4/32 -p tcp -m tcp --dport 12580 -j LogNatPostRouting3 -A DOCKER -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-PortmapNatPrerouting" --log-level 7 -A DOCKER -p tcp -m tcp --dport 12580 -j DNAT --to-destination 172.17.0.4:12580 -A LogNatPostRouting1 -o docker0 -m addrtype --src-type LOCAL -j LOG --log-prefix "[TonyBai]-NatPost1" --log-level 7 -A LogNatPostRouting1 -o docker0 -m addrtype --src-type LOCAL -j MASQUERADE -A LogNatPostRouting2 -s 172.17.0.0/16 ! -o docker0 -j LOG --log-prefix "[TonyBai]-NatPost2" --log-level 7 -A LogNatPostRouting2 -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE -A LogNatPostRouting3 -s 172.17.0.4/32 -d 172.17.0.4/32 -p tcp -m tcp --dport 12580 -j LOG --log-prefix "[TonyBai]-NatPost3" --log-level 7 -A LogNatPostRouting3 -s 172.17.0.4/32 -d 172.17.0.4/32 -p tcp -m tcp --dport 12580 -j MASQUERADE COMMIT # Completed on Mon Jan 18 09:06:06 2016
接下來,我們按照上面的方法再做一遍實驗例子,看看通信流程有何不同。這次我們將187主機換為10.10.105.71,其他無差別。
1、 在71上telnet 10.10.126.101 12580
宿主機從eth0接口收到syn,nat prerouting中做DNAT。路由后,通過forward鏈轉發到docker0:
Jan 18 13:35:55 pc-baim kernel: [278835.389225] [TonyBai]-RawPrerouting:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:23:89:7d:b6:b1:08:00 SRC=10.10.105.71 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=63 ID=61480 DF PROTO=TCP SPT=41502 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0 Jan 18 13:35:55 pc-baim kernel: [278835.389275] [TonyBai]-NatPrerouting:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:23:89:7d:b6:b1:08:00 SRC=10.10.105.71 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=63 ID=61480 DF PROTO=TCP SPT=41502 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0 Jan 18 13:35:55 pc-baim kernel: [278835.389290] [TonyBai]-PortmapNatPreroutinIN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:23:89:7d:b6:b1:08:00 SRC=10.10.105.71 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=63 ID=61480 DF PROTO=TCP SPT=41502 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0 Jan 18 13:35:55 pc-baim kernel: [278835.389326] [TonyBai]-PortmapFowardDockerIN=eth0 OUT=docker0 MAC=2c:59:e5:01:98:28:00:23:89:7d:b6:b1:08:00 SRC=10.10.105.71 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=62 ID=61480 DF PROTO=TCP SPT=41502 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0 Jan 18 13:35:55 pc-baim kernel: [278835.389339] [TonyBai]-NatPostrouteEnterIN= OUT=docker0 SRC=10.10.105.71 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=62 ID=61480 DF PROTO=TCP SPT=41502 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0
接下來從docker0網卡收到container3的ack syn應答,在從eth0轉發出去前自動un-DNAT, src ip從172.17.0.4變為101.0126.101,但這個在日志中看不出來。
Jan 18 13:35:55 pc-baim kernel: [278835.389496] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=veth0d66af2 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=10.10.105.71 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=41502 WINDOW=28960 RES=0x00 ACK SYN URGP=0 Jan 18 13:35:55 pc-baim kernel: [278835.389519] [TonyBai]-FwdId0Ond0:IN=docker0 OUT=eth0 PHYSIN=veth0d66af2 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=10.10.105.71 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=TCP SPT=12580 DPT=41502 WINDOW=28960 RES=0x00 ACK SYN URGP=0 Jan 18 13:35:55 pc-baim kernel: [278835.389528] [TonyBai]-manglepost2IN= OUT=eth0 PHYSIN=veth0d66af2 SRC=172.17.0.4 DST=10.10.105.71 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=TCP SPT=12580 DPT=41502 WINDOW=28960 RES=0x00 ACK SYN URGP=0
回送ack,這回無需再匹配natprerouting鏈,前面進過鏈一次,后續自動進行DNAT:
Jan 18 13:35:55 pc-baim kernel: [278835.390079] [TonyBai]-RawPrerouting:IN=eth0 OUT= MAC=2c:59:e5:01:98:28:00:23:89:7d:b6:b1:08:00 SRC=10.10.105.71 DST=10.10.126.101 LEN=52 TOS=0x10 PREC=0x00 TTL=63 ID=61481 DF PROTO=TCP SPT=41502 DPT=12580 WINDOW=229 RES=0x00 ACK URGP=0 Jan 18 13:35:55 pc-baim kernel: [278835.390149] [TonyBai]-PortmapFowardDockerIN=eth0 OUT=docker0 MAC=2c:59:e5:01:98:28:00:23:89:7d:b6:b1:08:00 SRC=10.10.105.71 DST=172.17.0.4 LEN=52 TOS=0x10 PREC=0x00 TTL=62 ID=61481 DF PROTO=TCP SPT=41502 DPT=12580 WINDOW=229 RES=0x00 ACK URGP=0
這次我們看到,在這種方式下,外部流量也是通過DNAT方式導入到container中的。
2、在宿主機上 telnet 10.10.126.101 12580
telnet發起tcp握手,syn包進入output鏈,匹配到nat output規則,做DNAT。目的ip轉換為172.17.0.4。注意繼續向下,我們看iptables匹配到了NatPost1,也就是規則:
-A LogNatPostRouting1 -o docker0 -m addrtype --src-type LOCAL -j MASQUERADE
即將源地址偽裝為出口網卡docker0的當前地址:172.0.0.1。于是實際上進入到container3的syn數據包的源地址為172.0.0.1,目的地址:172.0.0.4。
Jan 18 13:49:43 pc-baim kernel: [279663.426497] [TonyBai]-RawOutput:IN= OUT=lo SRC=10.10.126.101 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=40854 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0 Jan 18 13:49:43 pc-baim kernel: [279663.426526] [TonyBai]-PortmapNatPreroutinIN= OUT=lo SRC=10.10.126.101 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=40854 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0 Jan 18 13:49:43 pc-baim kernel: [279663.426545] [TonyBai]-FilterOutput:IN= OUT=lo SRC=10.10.126.101 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=40854 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0 Jan 18 13:49:43 pc-baim kernel: [279663.426553] [TonyBai]-manglepost1IN= OUT=docker0 SRC=10.10.126.101 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=40854 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0 Jan 18 13:49:43 pc-baim kernel: [279663.426561] [TonyBai]-NatPostrouteEnterIN= OUT=docker0 SRC=10.10.126.101 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=40854 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0 Jan 18 13:49:43 pc-baim kernel: [279663.426567] [TonyBai]-NatPost1IN= OUT=docker0 SRC=10.10.126.101 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=40854 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=43690 RES=0x00 SYN URGP=0
container3返回ack,從宿主機角度來看,相當于從docker0網卡收到ack。我們看到進來的原始數據:dst = 172.17.0.1,這是上面MASQUERADE的作用。在進入input鏈前,做自動un-SNAT,目的地址由172.17.0.1轉換為10.10.126.101。在真正送到user層之前(output鏈等同的左邊同緯度位置),做自動un-DNAT(但在下面日志中看不出來),src由172.17.0.4變為10.10.126.101。數據包的變換總體次序依次為:即DNAT -> SNAT -> (應答包)un-SNAT -> un-DNAT。
Jan 18 13:49:43 pc-baim kernel: [279663.426646] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=veth0d66af2 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=172.17.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=52736 WINDOW=28960 RES=0x00 ACK SYN URGP=0 Jan 18 13:49:43 pc-baim kernel: [279663.426665] [TonyBai]-FilterInput:IN=docker0 OUT= PHYSIN=veth0d66af2 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=10.10.126.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=52736 WINDOW=28960 RES=0x00 ACK SYN URGP=0
宿主機回復ack,握手完成。由于之前走過nat output和post鏈,因此這里不會再匹配,而是自動DNAT和SNAT:
Jan 18 13:49:43 pc-baim kernel: [279663.426690] [TonyBai]-RawOutput:IN= OUT=lo SRC=10.10.126.101 DST=10.10.126.101 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=40855 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=342 RES=0x00 ACK URGP=0 Jan 18 13:49:43 pc-baim kernel: [279663.426707] [TonyBai]-FilterOutput:IN= OUT=lo SRC=10.10.126.101 DST=172.17.0.4 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=40855 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=342 RES=0x00 ACK URGP=0 Jan 18 13:49:43 pc-baim kernel: [279663.426719] [TonyBai]-manglepost1IN= OUT=docker0 SRC=10.10.126.101 DST=172.17.0.4 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=40855 DF PROTO=TCP SPT=52736 DPT=12580 WINDOW=342 RES=0x00 ACK URGP=0
3、從container1 telnet 10.10.126.101 12580
container1向服務發起tcp連接,宿主機從docker0網卡收到sync包。
Jan 18 13:51:10 pc-baim kernel: [279750.806496] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=veth44a97d7 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:02:08:00 SRC=172.17.0.2 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=31888 DF PROTO=TCP SPT=54408 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0 Jan 18 13:51:10 pc-baim kernel: [279750.806519] [TonyBai]-NatPrerouting:IN=docker0 OUT= PHYSIN=veth44a97d7 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:02:08:00 SRC=172.17.0.2 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=31888 DF PROTO=TCP SPT=54408 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0 Jan 18 13:51:10 pc-baim kernel: [279750.806531] [TonyBai]-PortmapNatPreroutinIN=docker0 OUT= PHYSIN=veth44a97d7 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:02:08:00 SRC=172.17.0.2 DST=10.10.126.101 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=31888 DF PROTO=TCP SPT=54408 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0
做DNAT后,再次路由到docker0,于是走forward鏈,但是沒有匹配上nat postrouting,也就沒有做SNAT:
Jan 18 13:51:10 pc-baim kernel: [279750.806581] [TonyBai]-FwdId0Od0:IN=docker0 OUT=docker0 PHYSIN=veth44a97d7 PHYSOUT=veth0d66af2 MAC=02:42:ac:11:00:04:02:42:ac:11:00:02:08:00 SRC=172.17.0.2 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=31888 DF PROTO=TCP SPT=54408 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0 Jan 18 13:51:10 pc-baim kernel: [279750.806608] [TonyBai]-NatPostrouteEnterIN= OUT=docker0 PHYSIN=veth44a97d7 PHYSOUT=veth0d66af2 SRC=172.17.0.2 DST=172.17.0.4 LEN=60 TOS=0x10 PREC=0x00 TTL=64 ID=31888 DF PROTO=TCP SPT=54408 DPT=12580 WINDOW=29200 RES=0x00 SYN URGP=0
container3回復ack sync。宿主機從docker0收到ack sync包,目的地址172.17.0.2,再次路由到docker0。
Jan 18 13:51:10 pc-baim kernel: [279750.806719] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=veth0d66af2 MAC=02:42:ac:11:00:02:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=172.17.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=54408 WINDOW=28960 RES=0x00 ACK SYN URGP=0 Jan 18 13:51:10 pc-baim kernel: [279750.806746] [TonyBai]-FwdOd0:IN=docker0 OUT=docker0 PHYSIN=veth0d66af2 PHYSOUT=veth44a97d7 MAC=02:42:ac:11:00:02:02:42:ac:11:00:04:08:00 SRC=172.17.0.4 DST=172.17.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=12580 DPT=54408 WINDOW=28960 RES=0x00 ACK SYN URGP=0
由于之前docker0上做過DNAT,因此從docker0回到172.17.0.2時,src地址會自動un-DNAT,從172.17.0.4改為10.10.126.101,不過在上面日志中看不出這一點。
172.17.0.2回復ack,握手完成,DNAT自動進行:
Jan 18 13:51:10 pc-baim kernel: [279750.806823] [TonyBai]-RawPrerouting:IN=docker0 OUT= PHYSIN=veth44a97d7 MAC=02:42:23:39:fd:f5:02:42:ac:11:00:02:08:00 SRC=172.17.0.2 DST=10.10.126.101 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=31889 DF PROTO=TCP SPT=54408 DPT=12580 WINDOW=229 RES=0x00 ACK URGP=0 Jan 18 13:51:10 pc-baim kernel: [279750.806852] [TonyBai]-FwdOd0:IN=docker0 OUT=docker0 PHYSIN=veth44a97d7 PHYSOUT=veth0d66af2 MAC=02:42:ac:11:00:04:02:42:ac:11:00:02:08:00 SRC=172.17.0.2 DST=172.17.0.4 LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=31889 DF PROTO=TCP SPT=54408 DPT=12580 WINDOW=229 RES=0x00 ACK URGP=0
三、網絡性能考量
docker-proxy常被docker使用者詬病,一是因為每個映射端口都要啟動一個docker-proxy進程,映射端口多了,大量進程被創建、被調度勢必消耗大量系統資源;二來,在高負載場合,docker-proxy的轉發性能也力不從心。理論上,docker-proxy代理轉發流量的方式在性能方面要比單純iptables DNAT要弱上一些。不過我在單機上通過 sparkyfish 測試的結果倒是二者相差不大,估計是因為我僅僅啟動了一個docker-proxy,系統負荷并不大的緣故。
? 2016,bigwhite. 版權所有.
來自: http://tonybai.com/2016/01/18/understanding-binding-docker-container-ports-to-host/