commit 4e68c9b0763ff55eaa69d6e519f07515f1c9037b Author: Greg Kroah-Hartman Date: Sun Jul 11 12:48:13 2021 +0200 Linux 4.14.239 Link: https://lore.kernel.org/r/20210709131627.928131764@linuxfoundation.org Tested-by: Jon Hunter Tested-by: Linux Kernel Functional Testing Tested-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit c75310b5e17d0369fec2ab28c748fccf1c2b626f Author: Juergen Gross Date: Wed Jun 23 15:09:13 2021 +0200 xen/events: reset active flag for lateeoi events later commit 3de218ff39b9e3f0d453fe3154f12a174de44b25 upstream. In order to avoid a race condition for user events when changing cpu affinity reset the active flag only when EOI-ing the event. This is working fine as all user events are lateeoi events. Note that lateeoi_ack_mask_dynirq() is not modified as there is no explicit call to xen_irq_lateeoi() expected later. Cc: stable@vger.kernel.org Reported-by: Julien Grall Fixes: b6622798bc50b62 ("xen/events: avoid handling the same event on two cpus at the same time") Tested-by: Julien Grall Signed-off-by: Juergen Gross Reviewed-by: Boris Ostrovsky Link: https://lore.kernel.org/r/20210623130913.9405-1-jgross@suse.com Signed-off-by: Juergen Gross Signed-off-by: Greg Kroah-Hartman commit 5f0185cd37347267ff06dd61cd0131b27f164ac5 Author: Petr Mladek Date: Thu Jun 24 18:39:48 2021 -0700 kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream. The system might hang with the following backtrace: schedule+0x80/0x100 schedule_timeout+0x48/0x138 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c kthread_flush_work+0x114/0x1cc kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144 kthread_cancel_delayed_work_sync+0x18/0x2c xxxx_pm_notify+0xb0/0xd8 blocking_notifier_call_chain_robust+0x80/0x194 pm_notifier_call_chain_robust+0x28/0x4c suspend_prepare+0x40/0x260 enter_state+0x80/0x3f4 pm_suspend+0x60/0xdc state_store+0x108/0x144 kobj_attr_store+0x38/0x88 sysfs_kf_write+0x64/0xc0 kernfs_fop_write_iter+0x108/0x1d0 vfs_write+0x2f4/0x368 ksys_write+0x7c/0xec It is caused by the following race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(): CPU0 CPU1 Context: Thread A Context: Thread B kthread_mod_delayed_work() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() kthread_cancel_delayed_work_sync() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() spin_lock() work->canceling++ spin_unlock spin_lock() queue_delayed_work() // dwork is put into the worker->delayed_work_list spin_unlock() kthread_flush_work() // flush_work is put at the tail of the dwork wait_for_completion() Context: IRQ kthread_delayed_work_timer_fn() spin_lock() list_del_init(&work->node); spin_unlock() BANG: flush_work is not longer linked and will never get proceed. The problem is that kthread_mod_delayed_work() checks work->canceling flag before canceling the timer. A simple solution is to (re)check work->canceling after __kthread_cancel_work(). But then it is not clear what should be returned when __kthread_cancel_work() removed the work from the queue (list) and it can't queue it again with the new @delay. The return value might be used for reference counting. The caller has to know whether a new work has been queued or an existing one was replaced. The proper solution is that kthread_mod_delayed_work() will remove the work from the queue (list) _only_ when work->canceling is not set. The flag must be checked after the timer is stopped and the remaining operations can be done under worker->lock. Note that kthread_mod_delayed_work() could remove the timer and then bail out. It is fine. The other canceling caller needs to cancel the timer as well. The important thing is that the queue (list) manipulation is done atomically under worker->lock. Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com Fixes: 9a6b06c8d9a220860468a ("kthread: allow to modify delayed kthread work") Signed-off-by: Petr Mladek Reported-by: Martin Liu Cc: Cc: Minchan Kim Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Oleg Nesterov Cc: Tejun Heo Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 5f7c8a41b8a96709c165f41cc793c8a0a6eee160 Author: Petr Mladek Date: Thu Jun 24 18:39:45 2021 -0700 kthread_worker: split code for canceling the delayed work timer commit 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 upstream. Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()". This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling. This patch (of 2): Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(). It does not modify the existing behavior. Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek Cc: Cc: Martin Liu Cc: Minchan Kim Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Oleg Nesterov Cc: Tejun Heo Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit d63af6c931f73b4597e37816ed4e77ee690ada82 Author: Sean Young Date: Sun Oct 8 12:12:16 2017 -0400 kfifo: DECLARE_KIFO_PTR(fifo, u64) does not work on arm 32 bit commit 8a866fee3909c49738e1c4429a8d2b9bf27e015d upstream. If you try to store u64 in a kfifo (or a struct with u64 members), then the buf member of __STRUCT_KFIFO_PTR will cause 4 bytes padding due to alignment (note that struct __kfifo is 20 bytes on 32 bit). That in turn causes the __is_kfifo_ptr() to fail, which is caught by kfifo_alloc(), which now returns EINVAL. So, ensure that __is_kfifo_ptr() compares to the right structure. Signed-off-by: Sean Young Acked-by: Stefani Seibold Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Matthew Weber Signed-off-by: Greg Kroah-Hartman commit fb2479ddfb0b1eae0d60868aa444aace4ce6287c Author: Christian König Date: Fri Jun 11 14:34:50 2021 +0200 drm/nouveau: fix dma_address check for CPU/GPU sync [ Upstream commit d330099115597bbc238d6758a4930e72b49ea9ba ] AGP for example doesn't have a dma_address array. Signed-off-by: Christian König Acked-by: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/20210614110517.1624-1-christian.koenig@amd.com Signed-off-by: Sasha Levin commit 4164c07e5062cba5e6666df81d7fce1e4f54c317 Author: ManYi Li Date: Fri Jun 11 17:44:02 2021 +0800 scsi: sr: Return appropriate error code when disk is ejected [ Upstream commit 7dd753ca59d6c8cc09aa1ed24f7657524803c7f3 ] Handle a reported media event code of 3. This indicates that the media has been removed from the drive and user intervention is required to proceed. Return DISK_EVENT_EJECT_REQUEST in that case. Link: https://lore.kernel.org/r/20210611094402.23884-1-limanyi@uniontech.com Signed-off-by: ManYi Li Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit c5bb56066fac7d7fdd51f3e8127a9704386ba694 Author: Hugh Dickins Date: Thu Jun 24 18:39:52 2021 -0700 mm, futex: fix shared futex pgoff on shmem huge page [ Upstream commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 ] If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong. When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages. page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head. Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it. [akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope] Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Reported-by: Neel Natu Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) Acked-by: Thomas Gleixner Cc: "Kirill A. Shutemov" Cc: Zhang Yi Cc: Mel Gorman Cc: Mike Kravetz Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Darren Hart Cc: Davidlohr Bueso Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Note on stable backport: leave redundant #include in kernel/futex.c, to avoid conflict over the header files included. Signed-off-by: Hugh Dickins Signed-off-by: Sasha Levin commit fc308458ef456f488d4de30d27eefa0904835d53 Author: Hugh Dickins Date: Thu Jun 24 18:39:30 2021 -0700 mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() [ Upstream commit a7a69d8ba88d8dcee7ef00e91d413a4bd003a814 ] Aha! Shouldn't that quick scan over pte_none()s make sure that it holds ptlock in the PVMW_SYNC case? That too might have been responsible for BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though I've never seen any. Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Tested-by: Wang Yugui Cc: Alistair Popple Cc: Matthew Wilcox Cc: Peter Xu Cc: Ralph Campbell Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 3a5f1cdac2f698a9c708429aa23c3bccd5c1ccee Author: Hugh Dickins Date: Thu Jun 24 18:39:26 2021 -0700 mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes [ Upstream commit a9a7504d9beaf395481faa91e70e2fd08f7a3dde ] Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). Crash dumps showed two tail pages of a shmem huge page remained mapped by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a long unmapped range; and no page table had yet been allocated for the head of the huge page to be mapped into. Although designed to handle these odd misaligned huge-page-mapped-by-pte cases, page_vma_mapped_walk() falls short by returning false prematurely when !pmd_present or !pud_present or !p4d_present or !pgd_present: there are cases when a huge page may span the boundary, with ptes present in the next. Restructure page_vma_mapped_walk() as a loop to continue in these cases, while keeping its layout much as before. Add a step_forward() helper to advance pvmw->address across those boundaries: originally I tried to use mm's standard p?d_addr_end() macros, but hit the same crash 512 times less often: because of the way redundant levels are folded together, but folded differently in different configurations, it was just too difficult to use them correctly; and step_forward() is simpler anyway. Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Matthew Wilcox Cc: Peter Xu Cc: Ralph Campbell Cc: Wang Yugui Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 329d4fb943b042db257f56b00a9da70631d36b3d Author: Hugh Dickins Date: Thu Jun 24 18:39:23 2021 -0700 mm: page_vma_mapped_walk(): get vma_address_end() earlier [ Upstream commit a765c417d876cc635f628365ec9aa6f09470069a ] page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start, rather than later at next_pte. It's a little unnecessary overhead on the first call, but makes for a simpler loop in the following commit. Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Matthew Wilcox Cc: Peter Xu Cc: Ralph Campbell Cc: Wang Yugui Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit ca054d41da1b96b2c3f1f556efabf19c54abff50 Author: Hugh Dickins Date: Thu Jun 24 18:39:20 2021 -0700 mm: page_vma_mapped_walk(): use goto instead of while (1) [ Upstream commit 474466301dfd8b39a10c01db740645f3f7ae9a28 ] page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte, and use "goto this_pte", in place of the "while (1)" loop at the end. Link: https://lkml.kernel.org/r/a52b234a-851-3616-2525-f42736e8934@google.com Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Matthew Wilcox Cc: Peter Xu Cc: Ralph Campbell Cc: Wang Yugui Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 72b2b0d093c5a3cf6513c5f4c2aecb6ad8faff2e Author: Hugh Dickins Date: Thu Jun 24 18:39:17 2021 -0700 mm: page_vma_mapped_walk(): add a level of indentation [ Upstream commit b3807a91aca7d21c05d5790612e49969117a72b9 ] page_vma_mapped_walk() cleanup: add a level of indentation to much of the body, making no functional change in this commit, but reducing the later diff when this is all converted to a loop. [hughd@google.com: : page_vma_mapped_walk(): add a level of indentation fix] Link: https://lkml.kernel.org/r/7f817555-3ce1-c785-e438-87d8efdcaf26@google.com Link: https://lkml.kernel.org/r/efde211-f3e2-fe54-977-ef481419e7f3@google.com Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Matthew Wilcox Cc: Peter Xu Cc: Ralph Campbell Cc: Wang Yugui Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 1c1ea4e4397022a89be0699239db1c90cfa5ba48 Author: Hugh Dickins Date: Thu Jun 24 18:39:14 2021 -0700 mm: page_vma_mapped_walk(): crossing page table boundary [ Upstream commit 448282487483d6fa5b2eeeafaa0acc681e544a9c ] page_vma_mapped_walk() cleanup: adjust the test for crossing page table boundary - I believe pvmw->address is always page-aligned, but nothing else here assumed that; and remember to reset pvmw->pte to NULL after unmapping the page table, though I never saw any bug from that. Link: https://lkml.kernel.org/r/799b3f9c-2a9e-dfef-5d89-26e9f76fd97@google.com Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Matthew Wilcox Cc: Peter Xu Cc: Ralph Campbell Cc: Wang Yugui Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 43d40057fdc5df5f0809ee2a13d436be9adcbc96 Author: Hugh Dickins Date: Thu Jun 24 18:39:10 2021 -0700 mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block [ Upstream commit e2e1d4076c77b3671cf8ce702535ae7dee3acf89 ] page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to follow the same "return not_found, return not_found, return true" pattern as the block above it (note: returning not_found there is never premature, since existence or prior existence of huge pmd guarantees good alignment). Link: https://lkml.kernel.org/r/378c8650-1488-2edf-9647-32a53cf2e21@google.com Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Reviewed-by: Peter Xu Cc: Alistair Popple Cc: Matthew Wilcox Cc: Ralph Campbell Cc: Wang Yugui Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 3d98b8080cffa74d1a833a1ca639985573668627 Author: Hugh Dickins Date: Thu Jun 24 18:39:07 2021 -0700 mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd [ Upstream commit 3306d3119ceacc43ea8b141a73e21fea68eec30c ] page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then use it in subsequent tests, instead of repeatedly dereferencing pointer. Link: https://lkml.kernel.org/r/53fbc9d-891e-46b2-cb4b-468c3b19238e@google.com Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Reviewed-by: Peter Xu Cc: Alistair Popple Cc: Matthew Wilcox Cc: Ralph Campbell Cc: Wang Yugui Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 084d41a8294988b1f124e22837f85cb6391ae82b Author: Hugh Dickins Date: Thu Jun 24 18:39:04 2021 -0700 mm: page_vma_mapped_walk(): settle PageHuge on entry [ Upstream commit 6d0fd5987657cb0c9756ce684e3a74c0f6351728 ] page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case out of the way at the start, so no need to worry about it later. Link: https://lkml.kernel.org/r/e31a483c-6d73-a6bb-26c5-43c3b880a2@google.com Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Reviewed-by: Peter Xu Cc: Alistair Popple Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Ralph Campbell Cc: Wang Yugui Cc: Will Deacon Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 66c488875de24d10c6e2bd26e226a4bf39ae3e1e Author: Hugh Dickins Date: Thu Jun 24 18:39:01 2021 -0700 mm: page_vma_mapped_walk(): use page for pvmw->page [ Upstream commit f003c03bd29e6f46fef1b9a8e8d636ac732286d5 ] Patch series "mm: page_vma_mapped_walk() cleanup and THP fixes". I've marked all of these for stable: many are merely cleanups, but I think they are much better before the main fix than after. This patch (of 11): page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was used, sometimes pvmw->page itself: use the local copy "page" throughout. Link: https://lkml.kernel.org/r/589b358c-febc-c88e-d4c2-7834b37fa7bf@google.com Link: https://lkml.kernel.org/r/88e67645-f467-c279-bf5e-af4b5c6b13eb@google.com Signed-off-by: Hugh Dickins Reviewed-by: Alistair Popple Acked-by: Kirill A. Shutemov Reviewed-by: Peter Xu Cc: Yang Shi Cc: Wang Yugui Cc: Matthew Wilcox Cc: Ralph Campbell Cc: Zi Yan Cc: Will Deacon Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit b5acf9a91826ed8a5f9141a7e0c7420740a8894e Author: Yang Shi Date: Tue Jun 15 18:24:07 2021 -0700 mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split [ Upstream commit 504e070dc08f757bccaed6d05c0f53ecbfac8a23 ] When debugging the bug reported by Wang Yugui [1], try_to_unmap() may fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however it may miss the failure when head page is unmapped but other subpage is mapped. Then the second DEBUG_VM BUG() that check total mapcount would catch it. This may incur some confusion. As this is not a fatal issue, so consolidate the two DEBUG_VM checks into one VM_WARN_ON_ONCE_PAGE(). [1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com Signed-off-by: Yang Shi Reviewed-by: Zi Yan Acked-by: Kirill A. Shutemov Signed-off-by: Hugh Dickins Cc: Alistair Popple Cc: Jan Kara Cc: Jue Wang Cc: "Matthew Wilcox (Oracle)" Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Note on stable backport: fixed up variables, split_queue_lock, tree_lock in split_huge_page_to_list(), and conflict on ttu_flags in unmap_page(). Signed-off-by: Hugh Dickins Signed-off-by: Sasha Levin commit d5d912c4c36f97112dd1545bfbfc71c06201d345 Author: Jue Wang Date: Tue Jun 15 18:24:00 2021 -0700 mm/thp: fix page_address_in_vma() on file THP tails [ Upstream commit 31657170deaf1d8d2f6a1955fbc6fa9d228be036 ] Anon THP tails were already supported, but memory-failure may need to use page_address_in_vma() on file THP tails, which its page->mapping check did not permit: fix it. hughd adds: no current usage is known to hit the issue, but this does fix a subtle trap in a general helper: best fixed in stable sooner than later. Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Signed-off-by: Jue Wang Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) Reviewed-by: Yang Shi Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Jan Kara Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 4dfa0d6f482311db9e89f53da73d121c47ca2a7d Author: Hugh Dickins Date: Tue Jun 15 18:23:56 2021 -0700 mm/thp: fix vma_address() if virtual address below file offset [ Upstream commit 494334e43c16d63b878536a26505397fce6ff3a2 ] Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). When that BUG() was changed to a WARN(), it would later crash on the VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in mm/internal.h:vma_address(), used by rmap_walk_file() for try_to_unmap(). vma_address() is usually correct, but there's a wraparound case when the vm_start address is unusually low, but vm_pgoff not so low: vma_address() chooses max(start, vma->vm_start), but that decides on the wrong address, because start has become almost ULONG_MAX. Rewrite vma_address() to be more careful about vm_pgoff; move the VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can be safely used from page_mapped_in_vma() and page_address_in_vma() too. Add vma_address_end() to apply similar care to end address calculation, in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one(); though it raises a question of whether callers would do better to supply pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch. An irritation is that their apparent generality breaks down on KSM pages, which cannot be located by the page->index that page_to_pgoff() uses: as commit 4b0ece6fa016 ("mm: migrate: fix remove_migration_pte() for ksm pages") once discovered. I dithered over the best thing to do about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both vma_address() and vma_address_end(); though the only place in danger of using it on them was try_to_unmap_one(). Sidenote: vma_address() and vma_address_end() now use compound_nr() on a head page, instead of thp_size(): to make the right calculation on a hugetlbfs page, whether or not THPs are configured. try_to_unmap() is used on hugetlbfs pages, but perhaps the wrong calculation never mattered. Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com Fixes: a8fa41ad2f6f ("mm, rmap: check all VMAs that PTE-mapped THP can be part of") Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Alistair Popple Cc: Jan Kara Cc: Jue Wang Cc: "Matthew Wilcox (Oracle)" Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Note on stable backport: fixed up conflicts on intervening thp_size(), and mmu_notifier_range initializations; substitute for compound_nr(). Signed-off-by: Hugh Dickins Signed-off-by: Sasha Levin commit 97cd3badbd3432cf40143f2553a3c1ab38346847 Author: Hugh Dickins Date: Tue Jun 15 18:23:53 2021 -0700 mm/thp: try_to_unmap() use TTU_SYNC for safe splitting [ Upstream commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 ] Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins Cc: Alistair Popple Cc: Jan Kara Cc: Jue Wang Cc: Kirill A. Shutemov Cc: "Matthew Wilcox (Oracle)" Cc: Miaohe Lin Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Peter Xu Cc: Ralph Campbell Cc: Shakeel Butt Cc: Wang Yugui Cc: Yang Shi Cc: Zi Yan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Note on stable backport: upstream TTU_SYNC 0x10 takes the value which 5.11 commit 013339df116c ("mm/rmap: always do TTU_IGNORE_ACCESS") freed. It is very tempting to backport that commit (as 5.10 already did) and make no change here; but on reflection, good as that commit is, I'm reluctant to include any possible side-effect of it in this series. Signed-off-by: Hugh Dickins Signed-off-by: Sasha Levin commit 1decdcdf8ac3ea56b6001b3c60f35c7b15daabb3 Author: Miaohe Lin Date: Thu Feb 25 17:18:03 2021 -0800 mm/rmap: use page_not_mapped in try_to_unmap() [ Upstream commit b7e188ec98b1644ff70a6d3624ea16aadc39f5e0 ] page_mapcount_is_zero() calculates accurately how many mappings a hugepage has in order to check against 0 only. This is a waste of cpu time. We can do this via page_not_mapped() to save some possible atomic_read cycles. Remove the function page_mapcount_is_zero() as it's not used anymore and move page_not_mapped() above try_to_unmap() to avoid identifier undeclared compilation error. Link: https://lkml.kernel.org/r/20210130084904.35307-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit a369974d1547fc41bdab5e19186303c0ac31dc5a Author: Miaohe Lin Date: Thu Feb 25 17:17:56 2021 -0800 mm/rmap: remove unneeded semicolon in page_not_mapped() [ Upstream commit e0af87ff7afcde2660be44302836d2d5618185af ] Remove extra semicolon without any functional change intended. Link: https://lkml.kernel.org/r/20210127093425.39640-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 37a4a68cd12bfa404bea4dea71cbef35cd63614e Author: Alex Shi Date: Fri Dec 18 14:01:31 2020 -0800 mm: add VM_WARN_ON_ONCE_PAGE() macro [ Upstream commit a4055888629bc0467d12d912cd7c90acdf3d9b12 part ] Add VM_WARN_ON_ONCE_PAGE() macro. Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi Acked-by: Michal Hocko Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Note on stable backport: original commit was titled mm/memcg: warning on !memcg after readahead page charged which included uses of this macro in mm/memcontrol.c: here omitted. Signed-off-by: Hugh Dickins Signed-off-by: Sasha Levin commit 951fe4bf532512fe8e88408d329a64119b25a854 Author: Michal Hocko Date: Thu Apr 5 16:25:30 2018 -0700 include/linux/mmdebug.h: make VM_WARN* non-rvals [ Upstream commit 91241681c62a5a690c88eb2aca027f094125eaac ] At present the construct if (VM_WARN(...)) will compile OK with CONFIG_DEBUG_VM=y and will fail with CONFIG_DEBUG_VM=n. The reason is that VM_{WARN,BUG}* have always been special wrt. {WARN/BUG}* and never generate any code when DEBUG_VM is disabled. So we cannot really use it in conditionals. We considered changing things so that this construct works in both cases but that might cause unwanted code generation with CONFIG_DEBUG_VM=n. It is safer and simpler to make the build fail in both cases. [akpm@linux-foundation.org: changelog] Signed-off-by: Michal Hocko Reviewed-by: Andrew Morton Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin