高性能 TCP 堆棧：mTCP

jopen 11年前發布 | 69K 次閱讀 mTCP 網絡技術

MTCP是多核系統的高性能用戶級的TCP協議棧。mTCP 從 I/O 包到 TCP 連接管理上進行全方位的優化。

Besides adopting well-known techniques, our mTCP stack (1) translates expensive system calls to shared memory access between two threads within the same CPU core, (2) allows efficient flow-level event aggregation, and (3) performs batch processing of RX/TX packets for high I/O efficiency. mTCP on an 8-core machine improves the performance of small message transactions by a factor 25 (compared with the latest Linux TCP stack (kernel version 3.10.12)) and 3 (compared with with the best-performing research system). It also improves the performance of various popular applications by 33% (SSLShader) to 320% (lighttpd) compared with those on the Linux stack.

高性能 TCP 堆棧：mTCP

為什么是用戶級的 TCP?

Many high-performance network applications spend a significant portion of CPU cycles for TCP processing in the kernel. (e.g., ~80% inside kernel for lighttpd) Even worse, these CPU cycles are not utilized effectively; according to our measurements, Linux spends more than 4x the cycles than mTCP in handling the same number of TCP transactions.

Then, can we design a user-level TCP stack that incorporates all existing optimizations into a single system? Can we bring the performance of existing packet I/O libraries to the TCP stack? To answer these questions, we build a TCP stack in the user level. User-level TCP is attractive for many reasons.

Easily depart from the kernel's complexity
Directly benefit from the optimizations in the high performance packet I/O libraries
Naturally aggregate flow-level events by packet-level I/O batching
Easily preserve the existing application programming interface

事件驅動 Packet I/O Library

Several packet I/O systems allow high-speed packet I/O (~100M packets/s) from a user-level application. However, they are not suitable for implementing a transport layer because (i) they waste CPU cycles by polling NICs and (ii) they do not allow multiplexing between RX and TX.

To address these challenges, we extend PacketShader I/O engine (PSIO) for efficient event-driven packet I/O. The new event-driven interface, ps_select(), works similarly to select() except that it operates on TX/RX queues of interested NIC ports. For example, mTCP specifies interested NIC interfaces for RX and/or TX events with a timeout in microseconds, and ps_select() returns immediately if any events of interests are available.

The use of PSIO brings the opportunity to amortize the overhead of various system calls and context switches throughout the system, in addition to eliminating the per-packet memory allocation and DMA overhead. For more detail about the PSIO, please refer to the PacketShader project page.