Commit | Line | Data |
---|---|---|
09bbf055 MCC |
1 | ======================================== |
2 | Symmetric Communication Interface (SCIF) | |
3 | ======================================== | |
4 | ||
7df20f2d SD |
5 | The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low |
6 | level communications API across PCIe currently implemented for MIC. Currently | |
7 | SCIF provides inter-node communication within a single host platform, where a | |
8 | node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of | |
9 | communicating over the PCIe bus while providing an API that is symmetric | |
10 | across all the nodes in the PCIe network. An important design objective for SCIF | |
11 | is to deliver the maximum possible performance given the communication | |
12 | abilities of the hardware. SCIF has been used to implement an offload compiler | |
13 | runtime and OFED support for MPI implementations for MIC coprocessors. | |
14 | ||
09bbf055 MCC |
15 | SCIF API Components |
16 | =================== | |
17 | ||
7df20f2d | 18 | The SCIF API has the following parts: |
09bbf055 | 19 | |
7df20f2d SD |
20 | 1. Connection establishment using a client server model |
21 | 2. Byte stream messaging intended for short messages | |
22 | 3. Node enumeration to determine online nodes | |
23 | 4. Poll semantics for detection of incoming connections and messages | |
24 | 5. Memory registration to pin down pages | |
25 | 6. Remote memory mapping for low latency CPU accesses via mmap | |
26 | 7. Remote DMA (RDMA) for high bandwidth DMA transfers | |
27 | 8. Fence APIs for RDMA synchronization | |
28 | ||
29 | SCIF exposes the notion of a connection which can be used by peer processes on | |
30 | nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A | |
31 | process in a SCIF node initiates a SCIF connection to a peer process on a | |
32 | different node via a SCIF "endpoint". SCIF endpoints support messaging APIs | |
33 | which are similar to connection oriented socket APIs. Connected SCIF endpoints | |
34 | can also register local memory which is followed by data transfer using either | |
35 | DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and | |
36 | kernel mode clients which are functionally equivalent. | |
37 | ||
09bbf055 MCC |
38 | SCIF Performance for MIC |
39 | ======================== | |
40 | ||
7df20f2d | 41 | DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus |
09bbf055 MCC |
42 | SCIF shows the performance advantages of SCIF for HPC applications and |
43 | runtimes:: | |
7df20f2d SD |
44 | |
45 | Comparison of TCP and SCIF based BW | |
46 | ||
47 | Throughput (GB/sec) | |
48 | 8 + PCIe Bandwidth ****** | |
49 | + TCP ###### | |
50 | 7 + ************************************** SCIF %%%%%% | |
51 | | %%%%%%%%%%%%%%%%%%% | |
52 | 6 + %%%% | |
53 | | %% | |
54 | | %%% | |
55 | 5 + %% | |
56 | | %% | |
57 | 4 + %% | |
58 | | %% | |
59 | 3 + %% | |
60 | | % | |
61 | 2 + %% | |
62 | | %% | |
63 | | % | |
64 | 1 + | |
65 | + ###################################### | |
66 | 0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+- | |
67 | 1 10 100 1000 10000 100000 | |
68 | Transfer Size (KBytes) | |
69 | ||
70 | SCIF allows memory sharing via mmap(..) between processes on different PCIe | |
71 | nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap | |
72 | latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs. | |
73 | ||
74 | SCIF has a user space library which is a thin IOCTL wrapper providing a user | |
75 | space API similar to the kernel API in scif.h. The SCIF user space library | |
76 | is distributed @ https://software.intel.com/en-us/mic-developer | |
77 | ||
78 | Here is some pseudo code for an example of how two applications on two PCIe | |
09bbf055 | 79 | nodes would typically use the SCIF API:: |
7df20f2d | 80 | |
09bbf055 | 81 | Process A (on node A) Process B (on node B) |
7df20f2d | 82 | |
09bbf055 MCC |
83 | /* get online node information */ |
84 | scif_get_node_ids(..) scif_get_node_ids(..) | |
85 | scif_open(..) scif_open(..) | |
86 | scif_bind(..) scif_bind(..) | |
87 | scif_listen(..) | |
88 | scif_accept(..) scif_connect(..) | |
89 | /* SCIF connection established */ | |
7df20f2d | 90 | |
09bbf055 MCC |
91 | /* Send and receive short messages */ |
92 | scif_send(..)/scif_recv(..) scif_send(..)/scif_recv(..) | |
7df20f2d | 93 | |
09bbf055 MCC |
94 | /* Register memory */ |
95 | scif_register(..) scif_register(..) | |
7df20f2d | 96 | |
09bbf055 MCC |
97 | /* RDMA */ |
98 | scif_readfrom(..)/scif_writeto(..) scif_readfrom(..)/scif_writeto(..) | |
7df20f2d | 99 | |
09bbf055 MCC |
100 | /* Fence DMAs */ |
101 | scif_fence_signal(..) scif_fence_signal(..) | |
7df20f2d | 102 | |
09bbf055 | 103 | mmap(..) mmap(..) |
7df20f2d | 104 | |
09bbf055 | 105 | /* Access remote registered memory */ |
7df20f2d | 106 | |
09bbf055 MCC |
107 | /* Close the endpoints */ |
108 | scif_close(..) scif_close(..) |