docs: infiniband: convert docs to ReST and rename to *.rst
[linux-2.6-block.git] / Documentation / infiniband / ipoib.rst
CommitLineData
97162a1e
MCC
1==================
2IP over InfiniBand
3==================
1da177e4
LT
4
5 The ib_ipoib driver is an implementation of the IP over InfiniBand
ac83cbaa
RD
6 protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
7 working group. It is a "native" implementation in the sense of
8 setting the interface type to ARPHRD_INFINIBAND and the hardware
9 address length to 20 (earlier proprietary implementations
1da177e4
LT
10 masqueraded to the kernel as ethernet interfaces).
11
12Partitions and P_Keys
97162a1e 13=====================
1da177e4
LT
14
15 When the IPoIB driver is loaded, it creates one interface for each
16 port using the P_Key at index 0. To create an interface with a
17 different P_Key, write the desired P_Key into the main interface's
97162a1e 18 /sys/class/net/<intf name>/create_child file. For example::
1da177e4
LT
19
20 echo 0x8001 > /sys/class/net/ib0/create_child
21
22 This will create an interface named ib0.8001 with P_Key 0x8001. To
97162a1e 23 remove a subinterface, use the "delete_child" file::
1da177e4
LT
24
25 echo 0x8001 > /sys/class/net/ib0/delete_child
26
27 The P_Key for any interface is given by the "pkey" file, and the
28 main interface for a subinterface is in "parent."
29
9baa0b03 30 Child interface create/delete can also be done using IPoIB's
08559657 31 rtnl_link_ops, where children created using either way behave the same.
9baa0b03 32
6a3335b4 33Datagram vs Connected modes
97162a1e 34===========================
6a3335b4
OG
35
36 The IPoIB driver supports two modes of operation: datagram and
37 connected. The mode is set and read through an interface's
38 /sys/class/net/<intf name>/mode file.
39
40 In datagram mode, the IB UD (Unreliable Datagram) transport is used
41 and so the interface MTU has is equal to the IB L2 MTU minus the
42 IPoIB encapsulation header (4 bytes). For example, in a typical IB
43 fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes.
44
45 In connected mode, the IB RC (Reliable Connected) transport is used.
f7111821
BVA
46 Connected mode takes advantage of the connected nature of the IB
47 transport and allows an MTU up to the maximal IP packet size of 64K,
48 which reduces the number of IP packets needed for handling large UDP
49 datagrams, TCP segments, etc and increases the performance for large
50 messages.
6a3335b4
OG
51
52 In connected mode, the interface's UD QP is still used for multicast
53 and communication with peers that don't support connected mode. In
54 this case, RX emulation of ICMP PMTU packets is used to cause the
55 networking stack to use the smaller UD MTU for these neighbours.
56
57Stateless offloads
97162a1e 58==================
6a3335b4
OG
59
60 If the IB HW supports IPoIB stateless offloads, IPoIB advertises
61 TCP/IP checksum and/or Large Send (LSO) offloading capability to the
62 network stack.
63
64 Large Receive (LRO) offloading is also implemented and may be turned
65 on/off using ethtool calls. Currently LRO is supported only for
66 checksum offload capable devices.
67
97162a1e 68 Stateless offloads are supported only in datagram mode.
6a3335b4
OG
69
70Interrupt moderation
97162a1e 71====================
6a3335b4
OG
72
73 If the underlying IB device supports CQ event moderation, one can
74 use ethtool to set interrupt mitigation parameters and thus reduce
75 the overhead incurred by handling interrupts. The main code path of
76 IPoIB doesn't use events for TX completion signaling so only RX
77 moderation is supported.
78
1da177e4 79Debugging Information
97162a1e 80=====================
1da177e4
LT
81
82 By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
83 to 'y', tracing messages are compiled into the driver. They are
84 turned on by setting the module parameters debug_level and
85 mcast_debug_level to 1. These parameters can be controlled at
86 runtime through files in /sys/module/ib_ipoib/.
87
b1ed8dab 88 CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs
97162a1e 89 virtual filesystem. By mounting this filesystem, for example with::
1da177e4 90
b1ed8dab 91 mount -t debugfs none /sys/kernel/debug
1da177e4
LT
92
93 it is possible to get statistics about multicast groups from the
b1ed8dab 94 files /sys/kernel/debug/ipoib/ib0_mcg and so on.
1da177e4
LT
95
96 The performance impact of this option is negligible, so it
97 is safe to enable this option with debug_level set to 0 for normal
98 operation.
99
100 CONFIG_INFINIBAND_IPOIB_DEBUG_DATA enables even more debug output in
101 the data path when data_debug_level is set to 1. However, even with
102 the output disabled, enabling this configuration option will affect
103 performance, because it adds tests to the fast path.
104
105References
97162a1e 106==========
1da177e4 107
ac83cbaa 108 Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
97162a1e
MCC
109 http://ietf.org/rfc/rfc4391.txt
110
ac83cbaa 111 IP over InfiniBand (IPoIB) Architecture (RFC 4392)
97162a1e
MCC
112 http://ietf.org/rfc/rfc4392.txt
113
6a3335b4
OG
114 IP over InfiniBand: Connected Mode (RFC 4755)
115 http://ietf.org/rfc/rfc4755.txt