From: Linus Torvalds Date: Tue, 9 Jan 2024 19:18:47 +0000 (-0800) Subject: Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel... X-Git-Tag: v6.8-rc1~180 X-Git-Url: https://git.kernel.dk/?a=commitdiff_plain;h=fb46e22a9e3863e08aef8815df9f17d0f4b9aede;p=linux-block.git Merge tag 'mm-stable-2024-01-08-15-31' of git://git./linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ... --- fb46e22a9e3863e08aef8815df9f17d0f4b9aede diff --cc fs/buffer.c index 5ffc44ab4854,5c29850e4781..d3bcf601d3e5 --- a/fs/buffer.c +++ b/fs/buffer.c @@@ -1064,17 -1073,14 +1073,14 @@@ static bool grow_dev_folio(struct block * lock to be atomic wrt __find_get_block(), which does not * run under the folio lock. */ - spin_lock(&inode->i_mapping->private_lock); + spin_lock(&inode->i_mapping->i_private_lock); link_dev_buffers(folio, bh); - end_block = folio_init_buffers(folio, bdev, - (sector_t)index << sizebits, size); + end_block = folio_init_buffers(folio, bdev, size); - spin_unlock(&inode->i_mapping->private_lock); + spin_unlock(&inode->i_mapping->i_private_lock); - done: - ret = (block < end_block) ? 1 : -ENXIO; - failed: + unlock: folio_unlock(folio); folio_put(folio); - return ret; + return block < end_block; } /* diff --cc include/linux/slab.h index b2015d0e01ad,d63823e518c0..b5f5ee8308d0 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@@ -307,10 -308,19 +307,10 @@@ static inline unsigned int arch_slab_mi * (PAGE_SIZE*2). Larger requests are passed to the page allocator. */ #define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1) - #define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT) + #define KMALLOC_SHIFT_MAX (MAX_PAGE_ORDER + PAGE_SHIFT) #ifndef KMALLOC_SHIFT_LOW -#define KMALLOC_SHIFT_LOW 5 -#endif -#endif - -#ifdef CONFIG_SLUB -#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1) -#define KMALLOC_SHIFT_MAX (MAX_PAGE_ORDER + PAGE_SHIFT) -#ifndef KMALLOC_SHIFT_LOW #define KMALLOC_SHIFT_LOW 3 #endif -#endif /* Maximum allocatable size */ #define KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_MAX) diff --cc mm/kasan/quarantine.c index 138c57b836f2,8afa77bc5d3b..3ba02efb952a --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@@ -143,7 -143,10 +143,9 @@@ static void *qlink_to_object(struct qli static void qlink_free(struct qlist_node *qlink, struct kmem_cache *cache) { void *object = qlink_to_object(qlink, cache); - struct kasan_free_meta *meta = kasan_get_free_meta(cache, object); + struct kasan_free_meta *free_meta = kasan_get_free_meta(cache, object); - unsigned long flags; + + kasan_release_object_meta(cache, object); /* * If init_on_free is enabled and KASAN's free metadata is stored in @@@ -153,15 -156,15 +155,9 @@@ */ if (slab_want_init_on_free(cache) && cache->kasan_info.free_meta_offset == 0) - memzero_explicit(meta, sizeof(*meta)); - - /* - * As the object now gets freed from the quarantine, assume that its - * free track is no longer valid. - */ - *(u8 *)kasan_mem_to_shadow(object) = KASAN_SLAB_FREE; + memzero_explicit(free_meta, sizeof(*free_meta)); - if (IS_ENABLED(CONFIG_SLAB)) - local_irq_save(flags); - ___cache_free(cache, object, _THIS_IP_); - - if (IS_ENABLED(CONFIG_SLAB)) - local_irq_restore(flags); } static void qlist_free_all(struct qlist_head *q, struct kmem_cache *cache) diff --cc mm/mempool.c index 4759be0ff9de,cb7b4b56cec1..dbbf0e9fb424 --- a/mm/mempool.c +++ b/mm/mempool.c @@@ -89,28 -97,29 +97,29 @@@ static void poison_element(mempool_t *p } else if (pool->alloc == mempool_alloc_pages) { /* Mempools backed by page allocator */ int order = (int)(long)pool->pool_data; - void *addr = kmap_atomic((struct page *)element); + void *addr = kmap_local_page((struct page *)element); __poison_element(addr, 1UL << (PAGE_SHIFT + order)); - kunmap_atomic(addr); + kunmap_local(addr); } } -#else /* CONFIG_DEBUG_SLAB || CONFIG_SLUB_DEBUG_ON */ +#else /* CONFIG_SLUB_DEBUG_ON */ static inline void check_element(mempool_t *pool, void *element) { } static inline void poison_element(mempool_t *pool, void *element) { } -#endif /* CONFIG_DEBUG_SLAB || CONFIG_SLUB_DEBUG_ON */ +#endif /* CONFIG_SLUB_DEBUG_ON */ - static __always_inline void kasan_poison_element(mempool_t *pool, void *element) + static __always_inline bool kasan_poison_element(mempool_t *pool, void *element) { if (pool->alloc == mempool_alloc_slab || pool->alloc == mempool_kmalloc) - kasan_slab_free_mempool(element); + return kasan_mempool_poison_object(element); else if (pool->alloc == mempool_alloc_pages) - kasan_poison_pages(element, (unsigned long)pool->pool_data, - false); + return kasan_mempool_poison_pages(element, + (unsigned long)pool->pool_data); + return true; } static void kasan_unpoison_element(mempool_t *pool, void *element) diff --cc mm/slub.c index fac07382d3a6,ba162e661e2e..2ef88bbf56a3 --- a/mm/slub.c +++ b/mm/slub.c @@@ -3876,148 -3503,37 +3883,148 @@@ void *kmem_cache_alloc_lru(struct kmem_ return ret; } +EXPORT_SYMBOL(kmem_cache_alloc_lru); -void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags) +/** + * kmem_cache_alloc_node - Allocate an object on the specified node + * @s: The cache to allocate from. + * @gfpflags: See kmalloc(). + * @node: node number of the target node. + * + * Identical to kmem_cache_alloc but it will allocate memory on the given + * node, which can improve the performance for cpu bound structures. + * + * Fallback to other node is possible if __GFP_THISNODE is not set. + * + * Return: pointer to the new object or %NULL in case of error + */ +void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node) { - return __kmem_cache_alloc_lru(s, NULL, gfpflags); + void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size); + + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node); + + return ret; } -EXPORT_SYMBOL(kmem_cache_alloc); +EXPORT_SYMBOL(kmem_cache_alloc_node); -void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, - gfp_t gfpflags) +/* + * To avoid unnecessary overhead, we pass through large allocation requests + * directly to the page allocator. We use __GFP_COMP, because we will need to + * know the allocation order to free the pages properly in kfree. + */ +static void *__kmalloc_large_node(size_t size, gfp_t flags, int node) { - struct page *page; - return __kmem_cache_alloc_lru(s, lru, gfpflags); ++ struct folio *folio; + void *ptr = NULL; + unsigned int order = get_order(size); + + if (unlikely(flags & GFP_SLAB_BUG_MASK)) + flags = kmalloc_fix_flags(flags); + + flags |= __GFP_COMP; - page = alloc_pages_node(node, flags, order); - if (page) { - ptr = page_address(page); - mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B, ++ folio = (struct folio *)alloc_pages_node(node, flags, order); ++ if (folio) { ++ ptr = folio_address(folio); ++ lruvec_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B, + PAGE_SIZE << order); + } + + ptr = kasan_kmalloc_large(ptr, size, flags); + /* As ptr might get tagged, call kmemleak hook after KASAN. */ + kmemleak_alloc(ptr, size, 1, flags); + kmsan_kmalloc_large(ptr, size, flags); + + return ptr; } -EXPORT_SYMBOL(kmem_cache_alloc_lru); -void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, - int node, size_t orig_size, - unsigned long caller) +void *kmalloc_large(size_t size, gfp_t flags) { - return slab_alloc_node(s, NULL, gfpflags, node, - caller, orig_size); + void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE); + + trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size), + flags, NUMA_NO_NODE); + return ret; } +EXPORT_SYMBOL(kmalloc_large); -void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node) +void *kmalloc_large_node(size_t size, gfp_t flags, int node) { - void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size); + void *ret = __kmalloc_large_node(size, flags, node); - trace_kmem_cache_alloc(_RET_IP_, ret, s, gfpflags, node); + trace_kmalloc(_RET_IP_, ret, size, PAGE_SIZE << get_order(size), + flags, node); + return ret; +} +EXPORT_SYMBOL(kmalloc_large_node); +static __always_inline +void *__do_kmalloc_node(size_t size, gfp_t flags, int node, + unsigned long caller) +{ + struct kmem_cache *s; + void *ret; + + if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) { + ret = __kmalloc_large_node(size, flags, node); + trace_kmalloc(caller, ret, size, + PAGE_SIZE << get_order(size), flags, node); + return ret; + } + + if (unlikely(!size)) + return ZERO_SIZE_PTR; + + s = kmalloc_slab(size, flags, caller); + + ret = slab_alloc_node(s, NULL, flags, node, caller, size); + ret = kasan_kmalloc(s, ret, size, flags); + trace_kmalloc(caller, ret, size, s->size, flags, node); return ret; } -EXPORT_SYMBOL(kmem_cache_alloc_node); + +void *__kmalloc_node(size_t size, gfp_t flags, int node) +{ + return __do_kmalloc_node(size, flags, node, _RET_IP_); +} +EXPORT_SYMBOL(__kmalloc_node); + +void *__kmalloc(size_t size, gfp_t flags) +{ + return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_); +} +EXPORT_SYMBOL(__kmalloc); + +void *__kmalloc_node_track_caller(size_t size, gfp_t flags, + int node, unsigned long caller) +{ + return __do_kmalloc_node(size, flags, node, caller); +} +EXPORT_SYMBOL(__kmalloc_node_track_caller); + +void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size) +{ + void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE, + _RET_IP_, size); + + trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, NUMA_NO_NODE); + + ret = kasan_kmalloc(s, ret, size, gfpflags); + return ret; +} +EXPORT_SYMBOL(kmalloc_trace); + +void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, + int node, size_t size) +{ + void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, size); + + trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags, node); + + ret = kasan_kmalloc(s, ret, size, gfpflags); + return ret; +} +EXPORT_SYMBOL(kmalloc_node_trace); static noinline void free_to_partial_list( struct kmem_cache *s, struct slab *slab, @@@ -4357,52 -3839,6 +4364,52 @@@ void kmem_cache_free(struct kmem_cache } EXPORT_SYMBOL(kmem_cache_free); +static void free_large_kmalloc(struct folio *folio, void *object) +{ + unsigned int order = folio_order(folio); + + if (WARN_ON_ONCE(order == 0)) + pr_warn_once("object pointer: 0x%p\n", object); + + kmemleak_free(object); + kasan_kfree_large(object); + kmsan_kfree_large(object); + - mod_lruvec_page_state(folio_page(folio, 0), NR_SLAB_UNRECLAIMABLE_B, ++ lruvec_stat_mod_folio(folio, NR_SLAB_UNRECLAIMABLE_B, + -(PAGE_SIZE << order)); - __free_pages(folio_page(folio, 0), order); ++ folio_put(folio); +} + +/** + * kfree - free previously allocated memory + * @object: pointer returned by kmalloc() or kmem_cache_alloc() + * + * If @object is NULL, no operation is performed. + */ +void kfree(const void *object) +{ + struct folio *folio; + struct slab *slab; + struct kmem_cache *s; + void *x = (void *)object; + + trace_kfree(_RET_IP_, object); + + if (unlikely(ZERO_OR_NULL_PTR(object))) + return; + + folio = virt_to_folio(object); + if (unlikely(!folio_test_slab(folio))) { + free_large_kmalloc(folio, (void *)object); + return; + } + + slab = folio_slab(folio); + s = slab->slab_cache; + slab_free(s, slab, x, _RET_IP_); +} +EXPORT_SYMBOL(kfree); + struct detached_freelist { struct slab *slab; void *tail;