Commit | Line | Data |
---|---|---|
f1d97dd3 IA |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ============= | |
4 | Page Pool API | |
5 | ============= | |
6 | ||
7 | The page_pool allocator is optimized for the XDP mode that uses one frame | |
8 | per-page, but it can fallback on the regular page allocator APIs. | |
9 | ||
10 | Basic use involves replacing alloc_pages() calls with the | |
11 | page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages() | |
12 | replacing dev_alloc_pages(). | |
13 | ||
14 | API keeps track of inflight pages, in order to let API user know | |
15 | when it is safe to free a page_pool object. Thus, API users | |
16 | must run page_pool_release_page() when a page is leaving the page_pool or | |
17 | call page_pool_put_page() where appropriate in order to maintain correct | |
18 | accounting. | |
19 | ||
20 | API user must call page_pool_put_page() once on a page, as it | |
21 | will either recycle the page, or in case of refcnt > 1, it will | |
22 | release the DMA mapping and inflight state accounting. | |
23 | ||
24 | Architecture overview | |
25 | ===================== | |
26 | ||
27 | .. code-block:: none | |
28 | ||
29 | +------------------+ | |
30 | | Driver | | |
31 | +------------------+ | |
32 | ^ | |
33 | | | |
34 | | | |
35 | | | |
36 | v | |
37 | +--------------------------------------------+ | |
38 | | request memory | | |
39 | +--------------------------------------------+ | |
40 | ^ ^ | |
41 | | | | |
42 | | Pool empty | Pool has entries | |
43 | | | | |
44 | v v | |
45 | +-----------------------+ +------------------------+ | |
46 | | alloc (and map) pages | | get page from cache | | |
47 | +-----------------------+ +------------------------+ | |
48 | ^ ^ | |
49 | | | | |
50 | | cache available | No entries, refill | |
51 | | | from ptr-ring | |
52 | | | | |
53 | v v | |
54 | +-----------------+ +------------------+ | |
55 | | Fast cache | | ptr-ring cache | | |
56 | +-----------------+ +------------------+ | |
57 | ||
58 | API interface | |
59 | ============= | |
60 | The number of pools created **must** match the number of hardware queues | |
61 | unless hardware restrictions make that impossible. This would otherwise beat the | |
62 | purpose of page pool, which is allocate pages fast from cache without locking. | |
63 | This lockless guarantee naturally comes from running under a NAPI softirq. | |
64 | The protection doesn't strictly have to be NAPI, any guarantee that allocating | |
65 | a page will cause no race conditions is enough. | |
66 | ||
67 | * page_pool_create(): Create a pool. | |
68 | * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV | |
69 | * order: 2^order pages on allocation | |
70 | * pool_size: size of the ptr_ring | |
71 | * nid: preferred NUMA node for allocation | |
72 | * dev: struct device. Used on DMA operations | |
73 | * dma_dir: DMA direction | |
74 | * max_len: max DMA sync memory size | |
75 | * offset: DMA address offset | |
76 | ||
77 | * page_pool_put_page(): The outcome of this depends on the page refcnt. If the | |
78 | driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1 | |
79 | the allocator owns the page and will try to recycle it in one of the pool | |
80 | caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device | |
81 | using dma_sync_single_range_for_device(). | |
82 | ||
83 | * page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync | |
84 | for the entire memory area configured in area pool->max_len. | |
85 | ||
86 | * page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller | |
87 | must guarantee safe context (e.g NAPI), since it will recycle the page | |
88 | directly into the pool fast cache. | |
89 | ||
90 | * page_pool_release_page(): Unmap the page (if mapped) and account for it on | |
91 | inflight counters. | |
92 | ||
93 | * page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool | |
94 | caches. | |
95 | ||
96 | * page_pool_get_dma_addr(): Retrieve the stored DMA address. | |
97 | ||
98 | * page_pool_get_dma_dir(): Retrieve the stored DMA direction. | |
99 | ||
2f1cce21 LB |
100 | * page_pool_put_page_bulk(): Tries to refill a number of pages into the |
101 | ptr_ring cache holding ptr_ring producer lock. If the ptr_ring is full, | |
102 | page_pool_put_page_bulk() will release leftover pages to the page allocator. | |
103 | page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx | |
104 | completion loop for the XDP_REDIRECT use case. | |
105 | Please note the caller must not use data area after running | |
106 | page_pool_put_page_bulk(), as this function overwrites it. | |
107 | ||
a3dd9828 JD |
108 | * page_pool_get_stats(): Retrieve statistics about the page_pool. This API |
109 | is only available if the kernel has been configured with | |
110 | ``CONFIG_PAGE_POOL_STATS=y``. A pointer to a caller allocated ``struct | |
111 | page_pool_stats`` structure is passed to this API which is filled in. The | |
112 | caller can then report those stats to the user (perhaps via ethtool, | |
113 | debugfs, etc.). See below for an example usage of this API. | |
114 | ||
115 | Stats API and structures | |
116 | ------------------------ | |
117 | If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API | |
118 | ``page_pool_get_stats()`` and structures described below are available. It | |
119 | takes a pointer to a ``struct page_pool`` and a pointer to a ``struct | |
120 | page_pool_stats`` allocated by the caller. | |
121 | ||
122 | The API will fill in the provided ``struct page_pool_stats`` with | |
123 | statistics about the page_pool. | |
124 | ||
125 | The stats structure has the following fields:: | |
126 | ||
127 | struct page_pool_stats { | |
128 | struct page_pool_alloc_stats alloc_stats; | |
129 | struct page_pool_recycle_stats recycle_stats; | |
130 | }; | |
131 | ||
132 | ||
133 | The ``struct page_pool_alloc_stats`` has the following fields: | |
134 | * ``fast``: successful fast path allocations | |
135 | * ``slow``: slow path order-0 allocations | |
136 | * ``slow_high_order``: slow path high order allocations | |
137 | * ``empty``: ptr ring is empty, so a slow path allocation was forced. | |
138 | * ``refill``: an allocation which triggered a refill of the cache | |
139 | * ``waive``: pages obtained from the ptr ring that cannot be added to | |
140 | the cache due to a NUMA mismatch. | |
141 | ||
142 | The ``struct page_pool_recycle_stats`` has the following fields: | |
143 | * ``cached``: recycling placed page in the page pool cache | |
144 | * ``cache_full``: page pool cache was full | |
145 | * ``ring``: page placed into the ptr ring | |
146 | * ``ring_full``: page released from page pool because the ptr ring was full | |
147 | * ``released_refcnt``: page released (and not recycled) because refcnt > 1 | |
148 | ||
f1d97dd3 IA |
149 | Coding examples |
150 | =============== | |
151 | ||
152 | Registration | |
153 | ------------ | |
154 | ||
155 | .. code-block:: c | |
156 | ||
157 | /* Page pool registration */ | |
158 | struct page_pool_params pp_params = { 0 }; | |
159 | struct xdp_rxq_info xdp_rxq; | |
160 | int err; | |
161 | ||
162 | pp_params.order = 0; | |
163 | /* internal DMA mapping in page_pool */ | |
164 | pp_params.flags = PP_FLAG_DMA_MAP; | |
165 | pp_params.pool_size = DESC_NUM; | |
166 | pp_params.nid = NUMA_NO_NODE; | |
167 | pp_params.dev = priv->dev; | |
168 | pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; | |
169 | page_pool = page_pool_create(&pp_params); | |
170 | ||
171 | err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0); | |
172 | if (err) | |
173 | goto err_out; | |
174 | ||
175 | err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool); | |
176 | if (err) | |
177 | goto err_out; | |
178 | ||
179 | NAPI poller | |
180 | ----------- | |
181 | ||
182 | ||
183 | .. code-block:: c | |
184 | ||
185 | /* NAPI Rx poller */ | |
186 | enum dma_data_direction dma_dir; | |
187 | ||
188 | dma_dir = page_pool_get_dma_dir(dring->page_pool); | |
189 | while (done < budget) { | |
190 | if (some error) | |
191 | page_pool_recycle_direct(page_pool, page); | |
192 | if (packet_is_xdp) { | |
193 | if XDP_DROP: | |
194 | page_pool_recycle_direct(page_pool, page); | |
195 | } else (packet_is_skb) { | |
196 | page_pool_release_page(page_pool, page); | |
197 | new_page = page_pool_dev_alloc_pages(page_pool); | |
198 | } | |
199 | } | |
200 | ||
a3dd9828 JD |
201 | Stats |
202 | ----- | |
203 | ||
204 | .. code-block:: c | |
205 | ||
206 | #ifdef CONFIG_PAGE_POOL_STATS | |
207 | /* retrieve stats */ | |
208 | struct page_pool_stats stats = { 0 }; | |
209 | if (page_pool_get_stats(page_pool, &stats)) { | |
210 | /* perhaps the driver reports statistics with ethool */ | |
211 | ethtool_print_allocation_stats(&stats.alloc_stats); | |
212 | ethtool_print_recycle_stats(&stats.recycle_stats); | |
213 | } | |
214 | #endif | |
215 | ||
f1d97dd3 IA |
216 | Driver unload |
217 | ------------- | |
218 | ||
219 | .. code-block:: c | |
220 | ||
221 | /* Driver unload */ | |
222 | page_pool_put_full_page(page_pool, page, false); | |
223 | xdp_rxq_info_unreg(&xdp_rxq); |