[linux-2.6-block.git] / Documentation / trace / events-kmem.rst

============================
Subsystem Trace Points: kmem
============================

The kmem tracing system captures events related to object and page allocation
within the kernel. Broadly speaking there are five major subheadings.

  - Slab allocation of small objects of unknown type (kmalloc)
  - Slab allocation of small objects of known type
  - Page allocation
  - Per-CPU Allocator Activity
  - External Fragmentation

This document describes what each of the tracepoints is and why they
might be useful.

1. Slab allocation of small objects of unknown type
===================================================
::

  kmalloc		call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
  kmalloc_node	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
  kfree		call_site=%lx ptr=%p

Heavy activity for these events may indicate that a specific cache is
justified, particularly if kmalloc slab pages are getting significantly
internal fragmented as a result of the allocation pattern. By correlating
kmalloc with kfree, it may be possible to identify memory leaks and where
the allocation sites were.


2. Slab allocation of small objects of known type
=================================================
::

  kmem_cache_alloc	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
  kmem_cache_alloc_node	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
  kmem_cache_free		call_site=%lx ptr=%p

These events are similar in usage to the kmalloc-related events except that
it is likely easier to pin the event down to a specific cache. At the time
of writing, no information is available on what slab is being allocated from,
but the call_site can usually be used to extrapolate that information.

3. Page allocation
==================
::

  mm_page_alloc		  page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
  mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
  mm_page_free		  page=%p pfn=%lu order=%d
  mm_page_free_batched	  page=%p pfn=%lu order=%d cold=%d

These four events deal with page allocation and freeing. mm_page_alloc is
a simple indicator of page allocator activity. Pages may be allocated from
the per-CPU allocator (high performance) or the buddy allocator.

If pages are allocated directly from the buddy allocator, the
mm_page_alloc_zone_locked event is triggered. This event is important as high
amounts of activity imply high activity on the zone->lock. Taking this lock
impairs performance by disabling interrupts, dirtying cache lines between
CPUs and serialising many CPUs.

When a page is freed directly by the caller, the only mm_page_free event
is triggered. Significant amounts of activity here could indicate that the
callers should be batching their activities.

When pages are freed in batch, the also mm_page_free_batched is triggered.
Broadly speaking, pages are taken off the LRU lock in bulk and
freed in batch with a page list. Significant amounts of activity here could
indicate that the system is under memory pressure and can also indicate
contention on the lruvec->lru_lock.

4. Per-CPU Allocator Activity
=============================
::

  mm_page_alloc_zone_locked	page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
  mm_page_pcpu_drain		page=%p pfn=%lu order=%d cpu=%d migratetype=%d

In front of the page allocator is a per-cpu page allocator. It exists only
for order-0 pages, reduces contention on the zone->lock and reduces the
amount of writing on struct page.

When a per-CPU list is empty or pages of the wrong type are allocated,
the zone->lock will be taken once and the per-CPU list refilled. The event
triggered is mm_page_alloc_zone_locked for each page allocated with the
event indicating whether it is for a percpu_refill or not.

When the per-CPU list is too full, a number of pages are freed, each one
which triggers a mm_page_pcpu_drain event.

The individual nature of the events is so that pages can be tracked
between allocation and freeing. A number of drain or refill pages that occur
consecutively imply the zone->lock being taken once. Large amounts of per-CPU
refills and drains could imply an imbalance between CPUs where too much work
is being concentrated in one place. It could also indicate that the per-CPU
lists should be a larger size. Finally, large amounts of refills on one CPU
and drains on another could be a factor in causing large amounts of cache
line bounces due to writes between CPUs and worth investigating if pages
can be allocated and freed on the same CPU through some algorithm change.

5. External Fragmentation
=========================
::

  mm_page_alloc_extfrag		page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d

External fragmentation affects whether a high-order allocation will be
successful or not. For some types of hardware, this is important although
it is avoided where possible. If the system is using huge pages and needs
to be able to resize the pool over the lifetime of the system, this value
is important.

Large numbers of this event implies that memory is fragmenting and
high-order allocations will start failing at some time in the future. One
means of reducing the occurrence of this event is to increase the size of
min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
pageblock_size is usually the size of the default hugepage size.
Commit	Line	Data
3cdd868e CD	1	============================
	2	Subsystem Trace Points: kmem
	3	============================
8fbb398f	4
2ec91eec RD	5	The kmem tracing system captures events related to object and page allocation
2ec91eec RD	6	within the kernel. Broadly speaking there are five major subheadings.
8fbb398f	7
3cdd868e CD	8	- Slab allocation of small objects of unknown type (kmalloc)
	9	- Slab allocation of small objects of known type
	10	- Page allocation
	11	- Per-CPU Allocator Activity
	12	- External Fragmentation
8fbb398f	13
2ec91eec	14	This document describes what each of the tracepoints is and why they
8fbb398f MG	15	might be useful.
	16
	17	1. Slab allocation of small objects of unknown type
	18	===================================================
3cdd868e CD	19	::
	20
	21	kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
	22	kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
	23	kfree call_site=%lx ptr=%p
8fbb398f MG	24
	25	Heavy activity for these events may indicate that a specific cache is
	26	justified, particularly if kmalloc slab pages are getting significantly
	27	internal fragmented as a result of the allocation pattern. By correlating
	28	kmalloc with kfree, it may be possible to identify memory leaks and where
	29	the allocation sites were.
	30
	31
	32	2. Slab allocation of small objects of known type
	33	=================================================
3cdd868e CD	34	::
	35
	36	kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
	37	kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
	38	kmem_cache_free call_site=%lx ptr=%p
8fbb398f MG	39
	40	These events are similar in usage to the kmalloc-related events except that
	41	it is likely easier to pin the event down to a specific cache. At the time
	42	of writing, no information is available on what slab is being allocated from,
2ec91eec	43	but the call_site can usually be used to extrapolate that information.
8fbb398f MG	44
	45	3. Page allocation
	46	==================
3cdd868e CD	47	::
	48
	49	mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
	50	mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
	51	mm_page_free page=%p pfn=%lu order=%d
	52	mm_page_free_batched page=%p pfn=%lu order=%d cold=%d
8fbb398f MG	53
	54	These four events deal with page allocation and freeing. mm_page_alloc is
	55	a simple indicator of page allocator activity. Pages may be allocated from
	56	the per-CPU allocator (high performance) or the buddy allocator.
	57
	58	If pages are allocated directly from the buddy allocator, the
	59	mm_page_alloc_zone_locked event is triggered. This event is important as high
	60	amounts of activity imply high activity on the zone->lock. Taking this lock
	61	impairs performance by disabling interrupts, dirtying cache lines between
	62	CPUs and serialising many CPUs.
	63
b413d48a	64	When a page is freed directly by the caller, the only mm_page_free event
8fbb398f MG	65	is triggered. Significant amounts of activity here could indicate that the
	66	callers should be batching their activities.
	67
b413d48a KK	68	When pages are freed in batch, the also mm_page_free_batched is triggered.
	69	Broadly speaking, pages are taken off the LRU lock in bulk and
	70	freed in batch with a page list. Significant amounts of activity here could
8fbb398f	71	indicate that the system is under memory pressure and can also indicate
15b44736	72	contention on the lruvec->lru_lock.
8fbb398f MG	73
	74	4. Per-CPU Allocator Activity
	75	=============================
3cdd868e CD	76	::
	77
	78	mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
	79	mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d
8fbb398f MG	80
	81	In front of the page allocator is a per-cpu page allocator. It exists only
	82	for order-0 pages, reduces contention on the zone->lock and reduces the
	83	amount of writing on struct page.
	84
	85	When a per-CPU list is empty or pages of the wrong type are allocated,
	86	the zone->lock will be taken once and the per-CPU list refilled. The event
	87	triggered is mm_page_alloc_zone_locked for each page allocated with the
	88	event indicating whether it is for a percpu_refill or not.
	89
	90	When the per-CPU list is too full, a number of pages are freed, each one
	91	which triggers a mm_page_pcpu_drain event.
	92
2ec91eec	93	The individual nature of the events is so that pages can be tracked
8fbb398f	94	between allocation and freeing. A number of drain or refill pages that occur
2ec91eec	95	consecutively imply the zone->lock being taken once. Large amounts of per-CPU
8fbb398f MG	96	refills and drains could imply an imbalance between CPUs where too much work
	97	is being concentrated in one place. It could also indicate that the per-CPU
	98	lists should be a larger size. Finally, large amounts of refills on one CPU
	99	and drains on another could be a factor in causing large amounts of cache
	100	line bounces due to writes between CPUs and worth investigating if pages
	101	can be allocated and freed on the same CPU through some algorithm change.
	102
	103	5. External Fragmentation
	104	=========================
3cdd868e CD	105	::
	106
	107	mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
8fbb398f MG	108
	109	External fragmentation affects whether a high-order allocation will be
	110	successful or not. For some types of hardware, this is important although
	111	it is avoided where possible. If the system is using huge pages and needs
	112	to be able to resize the pool over the lifetime of the system, this value
	113	is important.
	114
	115	Large numbers of this event implies that memory is fragmenting and
	116	high-order allocations will start failing at some time in the future. One
2ec91eec	117	means of reducing the occurrence of this event is to increase the size of
8fbb398f MG	118	min_free_kbytes in increments of 3pageblock_sizenr_online_nodes where
8fbb398f MG	119	pageblock_size is usually the size of the default hugepage size.