NTB: Add new Memory Windows API documentation
[linux-2.6-block.git] / Documentation / ntb.txt
CommitLineData
a1bd3bae
AH
1# NTB Drivers
2
3NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
cdcca896
SS
4the separate memory systems of two or more computers to the same PCI-Express
5fabric. Existing NTB hardware supports a common feature set: doorbell
6registers and memory translation windows, as well as non common features like
7scratchpad and message registers. Scratchpad registers are read-and-writable
8registers that are accessible from either side of the device, so that peers can
9exchange a small amount of information at a fixed address. Message registers can
10be utilized for the same purpose. Additionally they are provided with with
11special status bits to make sure the information isn't rewritten by another
12peer. Doorbell registers provide a way for peers to send interrupt events.
13Memory windows allow translated read and write access to the peer memory.
a1bd3bae
AH
14
15## NTB Core Driver (ntb)
16
17The NTB core driver defines an api wrapping the common feature set, and allows
18clients interested in NTB features to discover NTB the devices supported by
19hardware drivers. The term "client" is used here to mean an upper layer
20component making use of the NTB api. The term "driver," or "hardware driver,"
21is used here to mean a driver for a specific vendor and model of NTB hardware.
22
23## NTB Client Drivers
24
25NTB client drivers should register with the NTB core driver. After
26registering, the client probe and remove functions will be called appropriately
27as ntb hardware, or hardware drivers, are inserted and removed. The
28registration uses the Linux Device framework, so it should feel familiar to
29anyone who has written a pci driver.
30
cdcca896
SS
31### NTB Typical client driver implementation
32
33Primary purpose of NTB is to share some peace of memory between at least two
34systems. So the NTB device features like Scratchpad/Message registers are
35mainly used to perform the proper memory window initialization. Typically
36there are two types of memory window interfaces supported by the NTB API:
37inbound translation configured on the local ntb port and outbound translation
38configured by the peer, on the peer ntb port. The first type is
39depicted on the next figure
40
41Inbound translation:
42 Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
43 ____________
44 | dma-mapped |-ntb_mw_set_trans(addr) |
45 | memory | _v____________ | ______________
46 | (addr) |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
47 |------------| |--------------| | |--------------|
48
49So typical scenario of the first type memory window initialization looks:
501) allocate a memory region, 2) put translated address to NTB config,
513) somehow notify a peer device of performed initialization, 4) peer device
52maps corresponding outbound memory window so to have access to the shared
53memory region.
54
55The second type of interface, that implies the shared windows being
56initialized by a peer device, is depicted on the figure:
57
58Outbound translation:
59 Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
60 ____________ ______________
61 | dma-mapped | | | MW base addr |<== memory-mapped IO
62 | memory | | |--------------|
63 | (addr) |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
64 |------------| | |--------------|
65
66Typical scenario of the second type interface initialization would be:
671) allocate a memory region, 2) somehow deliver a translated address to a peer
68device, 3) peer puts the translated address to NTB config, 4) peer device maps
69outbound memory window so to have access to the shared memory region.
70
71As one can see the described scenarios can be combined in one portable
72algorithm.
73 Local device:
74 1) Allocate memory for a shared window
75 2) Initialize memory window by translated address of the allocated region
76 (it may fail if local memory window initialization is unsupported)
77 3) Send the translated address and memory window index to a peer device
78 Peer device:
79 1) Initialize memory window with retrieved address of the allocated
80 by another device memory region (it may fail if peer memory window
81 initialization is unsupported)
82 2) Map outbound memory window
83
84In accordance with this scenario, the NTB Memory Window API can be used as
85follows:
86 Local device:
87 1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
88 be allocated for memory windows between local device and peer device
89 of port with specified index.
90 2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
91 shared memory region alignment and size. Then memory can be properly
92 allocated.
93 3) Allocate physically contiguous memory region in compliance with
94 restrictions retrieved in 2).
95 4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
96 the memory window with specified index for the defined peer device
97 (it may fail if local translated address setting is not supported)
98 5) Send translated base address (usually together with memory window
99 number) to the peer device using, for instance, scratchpad or message
100 registers.
101 Peer device:
102 1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
103 device (related to pidx) translated address for specified memory
104 window. It may fail if retrieved address, for instance, exceeds
105 maximum possible address or isn't properly aligned.
106 2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
107 window so to have an access to the shared memory.
108
109Also it is worth to note, that method ntb_mw_count(pidx) should return the
110same value as ntb_peer_mw_count() on the peer with port index - pidx.
111
e26a5843
AH
112### NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
113
114The primary client for NTB is the Transport client, used in tandem with NTB
115Netdev. These drivers function together to create a logical link to the peer,
116across the ntb, to exchange packets of network data. The Transport client
117establishes a logical link to the peer, and creates queue pairs to exchange
118messages and data. The NTB Netdev then creates an ethernet device using a
119Transport queue pair. Network data is copied between socket buffers and the
120Transport queue pair buffer. The Transport client may be used for other things
121besides Netdev, however no other applications have yet been written.
122
963de473
AH
123### NTB Ping Pong Test Client (ntb\_pingpong)
124
125The Ping Pong test client serves as a demonstration to exercise the doorbell
126and scratchpad registers of NTB hardware, and as an example simple NTB client.
127Ping Pong enables the link when started, waits for the NTB link to come up, and
128then proceeds to read and write the doorbell scratchpad registers of the NTB.
129The peers interrupt each other using a bit mask of doorbell bits, which is
130shifted by one in each round, to test the behavior of multiple doorbell bits
131and interrupt vectors. The Ping Pong driver also reads the first local
132scratchpad, and writes the value plus one to the first peer scratchpad, each
133round before writing the peer doorbell register.
134
135Module Parameters:
136
137* unsafe - Some hardware has known issues with scratchpad and doorbell
138 registers. By default, Ping Pong will not attempt to exercise such
139 hardware. You may override this behavior at your own risk by setting
140 unsafe=1.
141* delay\_ms - Specify the delay between receiving a doorbell
142 interrupt event and setting the peer doorbell register for the next
143 round.
144* init\_db - Specify the doorbell bits to start new series of rounds. A new
145 series begins once all the doorbell bits have been shifted out of
146 range.
147* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
148 then to observe debugging output on the console.
149
578b881b
AH
150### NTB Tool Test Client (ntb\_tool)
151
152The Tool test client serves for debugging, primarily, ntb hardware and drivers.
153The Tool provides access through debugfs for reading, setting, and clearing the
154NTB doorbell, and reading and writing scratchpads.
155
156The Tool does not currently have any module parameters.
157
158Debugfs Files:
159
160* *debugfs*/ntb\_tool/*hw*/ - A directory in debugfs will be created for each
161 NTB device probed by the tool. This directory is shortened to *hw*
162 below.
163* *hw*/db - This file is used to read, set, and clear the local doorbell. Not
164 all operations may be supported by all hardware. To read the doorbell,
165 read the file. To set the doorbell, write `s` followed by the bits to
166 set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c`
167 followed by the bits to clear.
168* *hw*/mask - This file is used to read, set, and clear the local doorbell mask.
169 See *db* for details.
170* *hw*/peer\_db - This file is used to read, set, and clear the peer doorbell.
171 See *db* for details.
172* *hw*/peer\_mask - This file is used to read, set, and clear the peer doorbell
173 mask. See *db* for details.
174* *hw*/spad - This file is used to read and write local scratchpads. To read
175 the values of all scratchpads, read the file. To write values, write a
176 series of pairs of scratchpad number and value
177 (eg: `echo '4 0x123 7 0xabc' > spad`
178 # to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
179* *hw*/peer\_spad - This file is used to read and write peer scratchpads. See
180 *spad* for details.
181
a1bd3bae
AH
182## NTB Hardware Drivers
183
184NTB hardware drivers should register devices with the NTB core driver. After
185registering, clients probe and remove functions will be called.
e26a5843
AH
186
187### NTB Intel Hardware Driver (ntb\_hw\_intel)
188
189The Intel hardware driver supports NTB on Xeon and Atom CPUs.
190
191Module Parameters:
192
193* b2b\_mw\_idx - If the peer ntb is to be accessed via a memory window, then use
194 this memory window to access the peer ntb. A value of zero or positive
195 starts from the first mw idx, and a negative value starts from the last
196 mw idx. Both sides MUST set the same value here! The default value is
197 `-1`.
198* b2b\_mw\_share - If the peer ntb is to be accessed via a memory window, and if
199 the memory window is large enough, still allow the client to use the
200 second half of the memory window for address translation to the peer.
2f887b9a
DJ
201* xeon\_b2b\_usd\_bar2\_addr64 - If using B2B topology on Xeon hardware, use
202 this 64 bit address on the bus between the NTB devices for the window
203 at BAR2, on the upstream side of the link.
204* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
205* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
206* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
207* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
208* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
209* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
210* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.