From patchwork Tue Feb 20 20:32:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxwell Bland X-Patchwork-Id: 774669 Received: from mx0b-00823401.pphosted.com (mx0b-00823401.pphosted.com [148.163.152.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0D2B150991; Tue, 20 Feb 2024 20:33:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.152.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461228; cv=none; b=IxjFE6HTxbrwEI96nZ50diJHp5rw/i8MQc9Vk9Dh/EoP1hRR/lMxNOUCBSSnYtfJ/V0/AwprtjqPbxMXETQUyratN4rP1ILXeMKRoxLNkSwjEPhPWXkfR6B74N5YHX6JLDEIHWsmpN5JvxFyRkjSxrtmEHHaQ/kEWkiMm527bJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461228; c=relaxed/simple; bh=kIHH0cooVYActrYEV0TilEGQ5DXxtwVkOOPB79ofTVs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=LMbQ/IMUp7RATpn9AMKkPj8eTDrWzo/tNIOTg9v8ZGMUCsvsR8Yac9u9LUkGcUbywK78dxDBntWCRtFS0OBVsb2+wZDEzImYgONfKL4opgOMxG3tY2u5QfPG8i/J2wKHlTXJ2uSufhH8YRgnXn1mKgUdocF1hHpds31Xm0NdJs8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com; spf=pass smtp.mailfrom=motorola.com; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b=UXr8sKPP; arc=none smtp.client-ip=148.163.152.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=motorola.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b="UXr8sKPP" Received: from pps.filterd (m0355090.ppops.net [127.0.0.1]) by m0355090.ppops.net (8.17.1.24/8.17.1.24) with ESMTP id 41KJFO83021030; Tue, 20 Feb 2024 20:33:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=motorola.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= DKIM202306; bh=73sw9qCcfHZDBF19mVjUh6QeXe6a4xhn7+e2L7eQ6wM=; b=U Xr8sKPPpKTWY8rQgN86shbmM6QDZi7UtwRsqBighe6FWxZoOMr6rIACyHTESh75p TaL+95m5h12hLM5fp+Euf/jpR5gd7C73csAZN+uMDLfdBEHVEr+v2bWSF8o3aZfD wE1FP/JHojDSvWjg+Weok6tYyOEFZmPfFjwLuT0/cfby0Z8Cp3JSKGDPy/4Q+SbF unNaBT16quJy/oUQBcJEr8wXJnMUJ0mpzHNACQneQBUbNzjyNngstVLKsGYt6i/N jwgsKjxGh6oxHNJAUG/i0kOAAiZwKG1/4JWNGXVIoW+REj0gJzWZglScJ2Z4kwi0 BsyzVxBJqxVLlH4hfPsqA== Received: from va32lpfpp01.lenovo.com ([104.232.228.21]) by m0355090.ppops.net (PPS) with ESMTPS id 3wd231g7tc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 20 Feb 2024 20:33:18 +0000 (GMT) Received: from ilclmmrp01.lenovo.com (ilclmmrp01.mot.com [100.65.83.165]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by va32lpfpp01.lenovo.com (Postfix) with ESMTPS id 4TfWM15BdpzldQn; Tue, 20 Feb 2024 20:33:17 +0000 (UTC) Received: from ilclasset01.mot.com (ilclasset01.mot.com [100.64.7.105]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: mbland) by ilclmmrp01.lenovo.com (Postfix) with ESMTPSA id 4TfWM13X59z3n3fr; Tue, 20 Feb 2024 20:33:17 +0000 (UTC) From: Maxwell Bland To: linux-arm-kernel@lists.infradead.org Cc: gregkh@linuxfoundation.org, agordeev@linux.ibm.com, akpm@linux-foundation.org, andreyknvl@gmail.com, andrii@kernel.org, aneesh.kumar@kernel.org, aou@eecs.berkeley.edu, ardb@kernel.org, arnd@arndb.de, ast@kernel.org, borntraeger@linux.ibm.com, bpf@vger.kernel.org, brauner@kernel.org, catalin.marinas@arm.com, christophe.leroy@csgroup.eu, cl@linux.com, daniel@iogearbox.net, dave.hansen@linux.intel.com, david@redhat.com, dennis@kernel.org, dvyukov@google.com, glider@google.com, gor@linux.ibm.com, guoren@kernel.org, haoluo@google.com, hca@linux.ibm.com, hch@infradead.org, john.fastabend@gmail.com, jolsa@kernel.org, kasan-dev@googlegroups.com, kpsingh@kernel.org, linux-arch@vger.kernel.org, linux@armlinux.org.uk, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, lstoakes@gmail.com, mark.rutland@arm.com, martin.lau@linux.dev, meted@linux.ibm.com, michael.christie@oracle.com, mjguzik@gmail.com, mpe@ellerman.id.au, mst@redhat.com, muchun.song@linux.dev, naveen.n.rao@linux.ibm.com, npiggin@gmail.com, palmer@dabbelt.com, paul.walmsley@sifive.com, quic_nprakash@quicinc.com, quic_pkondeti@quicinc.com, rick.p.edgecombe@intel.com, ryabinin.a.a@gmail.com, ryan.roberts@arm.com, samitolvanen@google.com, sdf@google.com, song@kernel.org, surenb@google.com, svens@linux.ibm.com, tj@kernel.org, urezki@gmail.com, vincenzo.frascino@arm.com, will@kernel.org, wuqiang.matt@bytedance.com, yonghong.song@linux.dev, zlim.lnx@gmail.com, mbland@motorola.com, awheeler@motorola.com Subject: [PATCH 1/4] mm/vmalloc: allow arch-specific vmalloc_node overrides Date: Tue, 20 Feb 2024 14:32:53 -0600 Message-Id: <20240220203256.31153-2-mbland@motorola.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240220203256.31153-1-mbland@motorola.com> References: <20240220203256.31153-1-mbland@motorola.com> X-Proofpoint-GUID: wzagho37jhijkGrV1Lgjn4DCAKqFMTDO X-Proofpoint-ORIG-GUID: wzagho37jhijkGrV1Lgjn4DCAKqFMTDO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-20_06,2024-02-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 spamscore=0 malwarescore=0 lowpriorityscore=0 adultscore=0 bulkscore=0 impostorscore=0 clxscore=1015 suspectscore=0 phishscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2402120000 definitions=main-2402200146 Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Present non-uniform use of __vmalloc_node and __vmalloc_node_range makes enforcing appropriate code and data seperation untenable on certain microarchitectures, as VMALLOC_START and VMALLOC_END are monolithic while the use of the vmalloc interface is non-monolithic: in particular, appropriate randomness in ASLR makes it such that code regions must fall in some region between VMALLOC_START and VMALLOC_end, but this necessitates that code pages are intermingled with data pages, meaning code-specific protections, such as arm64's PXNTable, cannot be performantly runtime enforced. The solution to this problem allows architectures to override the vmalloc wrapper functions by enforcing that the rest of the kernel does not reimplement __vmalloc_node by using __vmalloc_node_range with the same parameters as __vmalloc_node or provides a __weak tag to those functions using __vmalloc_node_range with parameters repeating those of __vmalloc_node. Two benefits of this approach are (1) greater flexibility to each architecture for handling of virtual memory while not compromising the kernel's vmalloc logic and (2) more uniform use of the __vmalloc_node interface, reserving the more specialized __vmalloc_node_range for more specialized cases, such as kasan's shadow memory. Signed-off-by: Maxwell Bland --- arch/arm/kernel/irq.c | 2 +- arch/arm64/include/asm/vmap_stack.h | 2 +- arch/arm64/kernel/efi.c | 2 +- arch/powerpc/kernel/irq.c | 2 +- arch/riscv/include/asm/irq_stack.h | 2 +- arch/s390/hypfs/hypfs_diag.c | 2 +- arch/s390/kernel/setup.c | 6 ++--- arch/s390/kernel/sthyi.c | 2 +- include/linux/vmalloc.h | 15 ++++++++++- kernel/bpf/syscall.c | 4 +-- kernel/fork.c | 4 +-- kernel/scs.c | 3 +-- lib/objpool.c | 2 +- lib/test_vmalloc.c | 6 ++--- mm/util.c | 3 +-- mm/vmalloc.c | 39 +++++++++++------------------ 16 files changed, 47 insertions(+), 49 deletions(-) diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c index fe28fc1f759d..109f4f363621 100644 --- a/arch/arm/kernel/irq.c +++ b/arch/arm/kernel/irq.c @@ -61,7 +61,7 @@ static void __init init_irq_stacks(void) THREAD_SIZE_ORDER); else stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, - THREADINFO_GFP, NUMA_NO_NODE, + THREADINFO_GFP, 0, NUMA_NO_NODE, __builtin_return_address(0)); if (WARN_ON(!stack)) diff --git a/arch/arm64/include/asm/vmap_stack.h b/arch/arm64/include/asm/vmap_stack.h index 20873099c035..57a7eaa720d5 100644 --- a/arch/arm64/include/asm/vmap_stack.h +++ b/arch/arm64/include/asm/vmap_stack.h @@ -21,7 +21,7 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node) BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK)); - p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node, + p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, 0, node, __builtin_return_address(0)); return kasan_reset_tag(p); } diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 0228001347be..48efa31a9161 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -205,7 +205,7 @@ static int __init arm64_efi_rt_init(void) return 0; p = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, GFP_KERNEL, - NUMA_NO_NODE, &&l); + 0, NUMA_NO_NODE, &&l); l: if (!p) { pr_warn("Failed to allocate EFI runtime stack\n"); clear_bit(EFI_RUNTIME_SERVICES, &efi.flags); diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 6f7d4edaa0bc..ceb7ea07ca28 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -308,7 +308,7 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(do_IRQ) static void *__init alloc_vm_stack(void) { return __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP, - NUMA_NO_NODE, (void *)_RET_IP_); + 0, NUMA_NO_NODE, (void *)_RET_IP_); } static void __init vmap_irqstack_init(void) diff --git a/arch/riscv/include/asm/irq_stack.h b/arch/riscv/include/asm/irq_stack.h index 6441ded3b0cf..d2410735bde0 100644 --- a/arch/riscv/include/asm/irq_stack.h +++ b/arch/riscv/include/asm/irq_stack.h @@ -24,7 +24,7 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t stack_size, int node) { void *p; - p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node, + p = __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, 0, node, __builtin_return_address(0)); return kasan_reset_tag(p); } diff --git a/arch/s390/hypfs/hypfs_diag.c b/arch/s390/hypfs/hypfs_diag.c index 279b7bba4d43..16359d854288 100644 --- a/arch/s390/hypfs/hypfs_diag.c +++ b/arch/s390/hypfs/hypfs_diag.c @@ -70,7 +70,7 @@ void *diag204_get_buffer(enum diag204_format fmt, int *pages) return ERR_PTR(-EOPNOTSUPP); } diag204_buf = __vmalloc_node(array_size(*pages, PAGE_SIZE), - PAGE_SIZE, GFP_KERNEL, NUMA_NO_NODE, + PAGE_SIZE, GFP_KERNEL, 0, NUMA_NO_NODE, __builtin_return_address(0)); if (!diag204_buf) return ERR_PTR(-ENOMEM); diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c index d1f3b56e7afc..2c25b4e9f20a 100644 --- a/arch/s390/kernel/setup.c +++ b/arch/s390/kernel/setup.c @@ -254,7 +254,7 @@ static void __init conmode_default(void) cpcmd("QUERY TERM", query_buffer, 1024, NULL); ptr = strstr(query_buffer, "CONMODE"); /* - * Set the conmode to 3215 so that the device recognition + * Set the conmode to 3215 so that the device recognition * will set the cu_type of the console to 3215. If the * conmode is 3270 and we don't set it back then both * 3215 and the 3270 driver will try to access the console @@ -314,7 +314,7 @@ static inline void setup_zfcpdump(void) {} /* * Reboot, halt and power_off stubs. They just call _machine_restart, - * _machine_halt or _machine_power_off. + * _machine_halt or _machine_power_off. */ void machine_restart(char *command) @@ -364,7 +364,7 @@ unsigned long stack_alloc(void) void *ret; ret = __vmalloc_node(THREAD_SIZE, THREAD_SIZE, THREADINFO_GFP, - NUMA_NO_NODE, __builtin_return_address(0)); + 0, NUMA_NO_NODE, __builtin_return_address(0)); kmemleak_not_leak(ret); return (unsigned long)ret; #else diff --git a/arch/s390/kernel/sthyi.c b/arch/s390/kernel/sthyi.c index 30bb20461db4..5bf239bcdae9 100644 --- a/arch/s390/kernel/sthyi.c +++ b/arch/s390/kernel/sthyi.c @@ -318,7 +318,7 @@ static void fill_diag(struct sthyi_sctns *sctns) return; diag204_buf = __vmalloc_node(array_size(pages, PAGE_SIZE), - PAGE_SIZE, GFP_KERNEL, NUMA_NO_NODE, + PAGE_SIZE, GFP_KERNEL, 0, NUMA_NO_NODE, __builtin_return_address(0)); if (!diag204_buf) return; diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c720be70c8dd..f13bd711ad7d 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -150,7 +150,8 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align, pgprot_t prot, unsigned long vm_flags, int node, const void *caller) __alloc_size(1); void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, - int node, const void *caller) __alloc_size(1); + unsigned long vm_flags, int node, const void *caller) + __alloc_size(1); void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1); extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2); @@ -295,4 +296,16 @@ bool vmalloc_dump_obj(void *object); static inline bool vmalloc_dump_obj(void *object) { return false; } #endif +#if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32) +#define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL) +#elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA) +#define GFP_VMALLOC32 (GFP_DMA | GFP_KERNEL) +#else +/* + * 64b systems should always have either DMA or DMA32 zones. For others + * GFP_DMA32 should do the right thing and use the normal zone. + */ +#define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL) +#endif + #endif /* _LINUX_VMALLOC_H */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a1f18681721c..79c11307ff40 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -303,8 +303,8 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) return area; } - return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, - gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, PAGE_KERNEL, + return __vmalloc_node(size, align, + gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, flags, numa_node, __builtin_return_address(0)); } diff --git a/kernel/fork.c b/kernel/fork.c index 0d944e92a43f..800bb1c76000 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -304,10 +304,8 @@ static int alloc_thread_stack_node(struct task_struct *tsk, int node) * so memcg accounting is performed manually on assigning/releasing * stacks to tasks. Drop __GFP_ACCOUNT. */ - stack = __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN, - VMALLOC_START, VMALLOC_END, + stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP & ~__GFP_ACCOUNT, - PAGE_KERNEL, 0, node, __builtin_return_address(0)); if (!stack) return -ENOMEM; diff --git a/kernel/scs.c b/kernel/scs.c index d7809affe740..5b89fb08a392 100644 --- a/kernel/scs.c +++ b/kernel/scs.c @@ -43,8 +43,7 @@ static void *__scs_alloc(int node) } } - s = __vmalloc_node_range(SCS_SIZE, 1, VMALLOC_START, VMALLOC_END, - GFP_SCS, PAGE_KERNEL, 0, node, + s = __vmalloc_node(SCS_SIZE, 1, GFP_SCS, 0, node, __builtin_return_address(0)); out: diff --git a/lib/objpool.c b/lib/objpool.c index cfdc02420884..f0acd421a652 100644 --- a/lib/objpool.c +++ b/lib/objpool.c @@ -80,7 +80,7 @@ objpool_init_percpu_slots(struct objpool_head *pool, int nr_objs, slot = kmalloc_node(size, pool->gfp, cpu_to_node(i)); else slot = __vmalloc_node(size, sizeof(void *), pool->gfp, - cpu_to_node(i), __builtin_return_address(0)); + 0, cpu_to_node(i), __builtin_return_address(0)); if (!slot) return -ENOMEM; memset(slot, 0, size); diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c index 3718d9886407..6bde73f892f9 100644 --- a/lib/test_vmalloc.c +++ b/lib/test_vmalloc.c @@ -97,7 +97,7 @@ static int random_size_align_alloc_test(void) size = ((rnd % 10) + 1) * PAGE_SIZE; ptr = __vmalloc_node(size, align, GFP_KERNEL | __GFP_ZERO, 0, - __builtin_return_address(0)); + 0, __builtin_return_address(0)); if (!ptr) return -1; @@ -120,7 +120,7 @@ static int align_shift_alloc_test(void) align = ((unsigned long) 1) << i; ptr = __vmalloc_node(PAGE_SIZE, align, GFP_KERNEL|__GFP_ZERO, 0, - __builtin_return_address(0)); + 0, __builtin_return_address(0)); if (!ptr) return -1; @@ -138,7 +138,7 @@ static int fix_align_alloc_test(void) for (i = 0; i < test_loop_count; i++) { ptr = __vmalloc_node(5 * PAGE_SIZE, THREAD_ALIGN << 1, GFP_KERNEL | __GFP_ZERO, 0, - __builtin_return_address(0)); + 0, __builtin_return_address(0)); if (!ptr) return -1; diff --git a/mm/util.c b/mm/util.c index 5a6a9802583b..c6b7111215e2 100644 --- a/mm/util.c +++ b/mm/util.c @@ -639,8 +639,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) * about the resulting pointer, and cannot play * protection games. */ - return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, - flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, + return __vmalloc_node(size, 1, flags, VM_ALLOW_HUGE_VMAP, node, __builtin_return_address(0)); } EXPORT_SYMBOL(kvmalloc_node); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d12a17fc0c17..18ece28e79d3 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3119,7 +3119,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) { - area->pages = __vmalloc_node(array_size, 1, nested_gfp, node, + area->pages = __vmalloc_node(array_size, 1, nested_gfp, 0, node, area->caller); } else { area->pages = kmalloc_node(array_size, nested_gfp, node); @@ -3379,11 +3379,12 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, * * Return: pointer to the allocated memory or %NULL on error */ -void *__vmalloc_node(unsigned long size, unsigned long align, - gfp_t gfp_mask, int node, const void *caller) +__weak void *__vmalloc_node(unsigned long size, unsigned long align, + gfp_t gfp_mask, unsigned long vm_flags, int node, + const void *caller) { return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, - gfp_mask, PAGE_KERNEL, 0, node, caller); + gfp_mask, PAGE_KERNEL, vm_flags, node, caller); } /* * This is only for performance analysis of vmalloc and stress purpose. @@ -3396,7 +3397,7 @@ EXPORT_SYMBOL_GPL(__vmalloc_node); void *__vmalloc(unsigned long size, gfp_t gfp_mask) { - return __vmalloc_node(size, 1, gfp_mask, NUMA_NO_NODE, + return __vmalloc_node(size, 1, gfp_mask, 0, NUMA_NO_NODE, __builtin_return_address(0)); } EXPORT_SYMBOL(__vmalloc); @@ -3415,7 +3416,7 @@ EXPORT_SYMBOL(__vmalloc); */ void *vmalloc(unsigned long size) { - return __vmalloc_node(size, 1, GFP_KERNEL, NUMA_NO_NODE, + return __vmalloc_node(size, 1, GFP_KERNEL, 0, NUMA_NO_NODE, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc); @@ -3432,7 +3433,7 @@ EXPORT_SYMBOL(vmalloc); * * Return: pointer to the allocated memory or %NULL on error */ -void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) +__weak void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) { return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, @@ -3455,7 +3456,7 @@ EXPORT_SYMBOL_GPL(vmalloc_huge); */ void *vzalloc(unsigned long size) { - return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE, + return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, 0, NUMA_NO_NODE, __builtin_return_address(0)); } EXPORT_SYMBOL(vzalloc); @@ -3469,7 +3470,7 @@ EXPORT_SYMBOL(vzalloc); * * Return: pointer to the allocated memory or %NULL on error */ -void *vmalloc_user(unsigned long size) +__weak void *vmalloc_user(unsigned long size) { return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL, @@ -3493,7 +3494,7 @@ EXPORT_SYMBOL(vmalloc_user); */ void *vmalloc_node(unsigned long size, int node) { - return __vmalloc_node(size, 1, GFP_KERNEL, node, + return __vmalloc_node(size, 1, GFP_KERNEL, 0, node, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc_node); @@ -3511,23 +3512,11 @@ EXPORT_SYMBOL(vmalloc_node); */ void *vzalloc_node(unsigned long size, int node) { - return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, node, + return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, 0, node, __builtin_return_address(0)); } EXPORT_SYMBOL(vzalloc_node); -#if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32) -#define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL) -#elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA) -#define GFP_VMALLOC32 (GFP_DMA | GFP_KERNEL) -#else -/* - * 64b systems should always have either DMA or DMA32 zones. For others - * GFP_DMA32 should do the right thing and use the normal zone. - */ -#define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL) -#endif - /** * vmalloc_32 - allocate virtually contiguous memory (32bit addressable) * @size: allocation size @@ -3539,7 +3528,7 @@ EXPORT_SYMBOL(vzalloc_node); */ void *vmalloc_32(unsigned long size) { - return __vmalloc_node(size, 1, GFP_VMALLOC32, NUMA_NO_NODE, + return __vmalloc_node(size, 1, GFP_VMALLOC32, 0, NUMA_NO_NODE, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc_32); @@ -3553,7 +3542,7 @@ EXPORT_SYMBOL(vmalloc_32); * * Return: pointer to the allocated memory or %NULL on error */ -void *vmalloc_32_user(unsigned long size) +__weak void *vmalloc_32_user(unsigned long size) { return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL, From patchwork Tue Feb 20 20:32:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxwell Bland X-Patchwork-Id: 774670 Received: from mx0a-00823401.pphosted.com (mx0a-00823401.pphosted.com [148.163.148.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BD7D2DF9F; Tue, 20 Feb 2024 20:33:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.148.104 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461224; cv=none; b=bodX2moI5aZH8Q5cfJRhqqku9fLIByVLfgR9ClrEvyvKhtYuO9Ug4YCM06u1d5mJwZq8E9mU9toOJsxtpEfmOWIfMuHReIT/gnwJtsJfs8BT0LCB4bqL7KR9FncvyfO+50kAwNhGyJoSdhxa9KhINDtJ70GEtoAoWZIBrbu9K54= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461224; c=relaxed/simple; bh=dIsqkuQOXZDTAHt9tEvRQ+VzkukP9xhHmaGKEFmYWwM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=uwJQKZ5lSsnJprsILvai6yRFftJbOnnjBUhxGrsN0RqIV/ULakizlaBVI3p6Qqh3kPViyHpunwZopoKqUYySA0N8VF9p3dWdcnP+PHMiBJXVF8IghGp/43dQIFCvXM3uNbB9hj/IbEMCC/HZ67haHtFEyvVQri0f0NDp/aCxOgw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com; spf=pass smtp.mailfrom=motorola.com; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b=ElSgGv+K; arc=none smtp.client-ip=148.163.148.104 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=motorola.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b="ElSgGv+K" Received: from pps.filterd (m0355087.ppops.net [127.0.0.1]) by mx0a-00823401.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 41KJD95s024286; Tue, 20 Feb 2024 20:33:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=motorola.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= DKIM202306; bh=lwi3HRNY78hVZA0tkjf7HxRAkL6o9c683flfMjEmlDw=; b=E lSgGv+KqoOf5uoj86cmnKJHlpO8dh3somQfJMLfScSNqsu6w2yerKMo4dfeFFdjc ac0NUN5+TdUJgTX/l98WxBM2B2g9h4agUt2h5xsTdnQM3W5+KVeo4VL+5TT1Ih/A 28MoIHA57HPmOmTXO2DDqDVDP/tktkt/rWfWggdoPMwsl1gv/kFDWTT8Lmmu8h8E 4WSK0ZZkUD0mbRcWeNuLl5MTFmyj4fWznK/5Z8Fs6+RzBDvb9vO+uhnPmXpO5bWZ Wpec6+tkdY03l9oWiYDuUUeH70qYhoL90l7Yg9SbclAwfFkO/NUPP8i9FbLKVX4k wF0RW4+b/UsQdUbrND77w== Received: from ilclpfpp01.lenovo.com ([144.188.128.67]) by mx0a-00823401.pphosted.com (PPS) with ESMTPS id 3wd22085c8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 20 Feb 2024 20:33:20 +0000 (GMT) Received: from ilclmmrp01.lenovo.com (ilclmmrp01.mot.com [100.65.83.165]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ilclpfpp01.lenovo.com (Postfix) with ESMTPS id 4TfWM34D5FzfBZq; Tue, 20 Feb 2024 20:33:19 +0000 (UTC) Received: from ilclasset01.mot.com (ilclasset01.mot.com [100.64.7.105]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: mbland) by ilclmmrp01.lenovo.com (Postfix) with ESMTPSA id 4TfWM33V0nz3n3fr; Tue, 20 Feb 2024 20:33:19 +0000 (UTC) From: Maxwell Bland To: linux-arm-kernel@lists.infradead.org Cc: gregkh@linuxfoundation.org, agordeev@linux.ibm.com, akpm@linux-foundation.org, andreyknvl@gmail.com, andrii@kernel.org, aneesh.kumar@kernel.org, aou@eecs.berkeley.edu, ardb@kernel.org, arnd@arndb.de, ast@kernel.org, borntraeger@linux.ibm.com, bpf@vger.kernel.org, brauner@kernel.org, catalin.marinas@arm.com, christophe.leroy@csgroup.eu, cl@linux.com, daniel@iogearbox.net, dave.hansen@linux.intel.com, david@redhat.com, dennis@kernel.org, dvyukov@google.com, glider@google.com, gor@linux.ibm.com, guoren@kernel.org, haoluo@google.com, hca@linux.ibm.com, hch@infradead.org, john.fastabend@gmail.com, jolsa@kernel.org, kasan-dev@googlegroups.com, kpsingh@kernel.org, linux-arch@vger.kernel.org, linux@armlinux.org.uk, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, lstoakes@gmail.com, mark.rutland@arm.com, martin.lau@linux.dev, meted@linux.ibm.com, michael.christie@oracle.com, mjguzik@gmail.com, mpe@ellerman.id.au, mst@redhat.com, muchun.song@linux.dev, naveen.n.rao@linux.ibm.com, npiggin@gmail.com, palmer@dabbelt.com, paul.walmsley@sifive.com, quic_nprakash@quicinc.com, quic_pkondeti@quicinc.com, rick.p.edgecombe@intel.com, ryabinin.a.a@gmail.com, ryan.roberts@arm.com, samitolvanen@google.com, sdf@google.com, song@kernel.org, surenb@google.com, svens@linux.ibm.com, tj@kernel.org, urezki@gmail.com, vincenzo.frascino@arm.com, will@kernel.org, wuqiang.matt@bytedance.com, yonghong.song@linux.dev, zlim.lnx@gmail.com, mbland@motorola.com, awheeler@motorola.com Subject: [PATCH 2/4] mm: pgalloc: support address-conditional pmd allocation Date: Tue, 20 Feb 2024 14:32:54 -0600 Message-Id: <20240220203256.31153-3-mbland@motorola.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240220203256.31153-1-mbland@motorola.com> References: <20240220203256.31153-1-mbland@motorola.com> X-Proofpoint-GUID: 2FTPwLLxVF_PmOE_sZo-Tr78eaAtLsFQ X-Proofpoint-ORIG-GUID: 2FTPwLLxVF_PmOE_sZo-Tr78eaAtLsFQ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-20_06,2024-02-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 bulkscore=0 phishscore=0 lowpriorityscore=0 malwarescore=0 mlxlogscore=781 spamscore=0 clxscore=1015 impostorscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2402120000 definitions=main-2402200146 Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: While other descriptors (e.g. pud) allow allocations conditional on which virtual address is allocated, pmd descriptor allocations do not. However, adding support for this is straightforward and is beneficial to future kernel development targeting the PMD memory granularity. As many architectures already implement pmd_populate_kernel in an address-generic manner, it is necessary to roll out support incrementally. For this purpose a preprocessor flag, __HAVE_ARCH_ADDR_COND_PMD is introduced to capture whether the architecture supports some feature requiring PMD allocation conditional on virtual address. Some microarchitectures (e.g. arm64) support configurations for table descriptors, for example to enforce Privilege eXecute Never, which benefit from knowing the virtual memory addresses referenced by PMDs. Thus two major arguments in favor of this change are (1) unformity of allocation between PMD and other table descriptor types and (2) the capability of address-specific PMD allocation. Signed-off-by: Maxwell Bland --- include/asm-generic/pgalloc.h | 18 ++++++++++++++++++ include/linux/mm.h | 4 ++-- mm/hugetlb_vmemmap.c | 4 ++-- mm/kasan/init.c | 22 +++++++++++++--------- mm/memory.c | 4 ++-- mm/percpu.c | 2 +- mm/pgalloc-track.h | 3 ++- mm/sparse-vmemmap.c | 2 +- 8 files changed, 41 insertions(+), 18 deletions(-) diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h index 879e5f8aa5e9..e5cdce77c6e4 100644 --- a/include/asm-generic/pgalloc.h +++ b/include/asm-generic/pgalloc.h @@ -142,6 +142,24 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) } #endif +#ifdef __HAVE_ARCH_ADDR_COND_PMD +static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, + pte_t *ptep, unsigned long address); +#else +static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, + pte_t *ptep); +#endif + +static inline void pmd_populate_kernel_at(struct mm_struct *mm, pmd_t *pmdp, + pte_t *ptep, unsigned long address) +{ +#ifdef __HAVE_ARCH_ADDR_COND_PMD + pmd_populate_kernel(mm, pmdp, ptep, address); +#else + pmd_populate_kernel(mm, pmdp, ptep); +#endif +} + #ifndef __HAVE_ARCH_PMD_FREE static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd) { diff --git a/include/linux/mm.h b/include/linux/mm.h index f5a97dec5169..6a9d5ded428d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2782,7 +2782,7 @@ static inline void mm_dec_nr_ptes(struct mm_struct *mm) {} #endif int __pte_alloc(struct mm_struct *mm, pmd_t *pmd); -int __pte_alloc_kernel(pmd_t *pmd); +int __pte_alloc_kernel(pmd_t *pmd, unsigned long address); #if defined(CONFIG_MMU) @@ -2977,7 +2977,7 @@ pte_t *pte_offset_map_nolock(struct mm_struct *mm, pmd_t *pmd, NULL : pte_offset_map_lock(mm, pmd, address, ptlp)) #define pte_alloc_kernel(pmd, address) \ - ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd))? \ + ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address)) ? \ NULL: pte_offset_kernel(pmd, address)) #if USE_SPLIT_PMD_PTLOCKS diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index da177e49d956..1f5664b656f1 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -58,7 +58,7 @@ static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start, if (!pgtable) return -ENOMEM; - pmd_populate_kernel(&init_mm, &__pmd, pgtable); + pmd_populate_kernel_at(&init_mm, &__pmd, pgtable, addr); for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) { pte_t entry, *pte; @@ -81,7 +81,7 @@ static int vmemmap_split_pmd(pmd_t *pmd, struct page *head, unsigned long start, /* Make pte visible before pmd. See comment in pmd_install(). */ smp_wmb(); - pmd_populate_kernel(&init_mm, pmd, pgtable); + pmd_populate_kernel_at(&init_mm, pmd, pgtable, addr); if (!(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)) flush_tlb_kernel_range(start, start + PMD_SIZE); } else { diff --git a/mm/kasan/init.c b/mm/kasan/init.c index 89895f38f722..1e31d965a14e 100644 --- a/mm/kasan/init.c +++ b/mm/kasan/init.c @@ -116,8 +116,9 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr, next = pmd_addr_end(addr, end); if (IS_ALIGNED(addr, PMD_SIZE) && end - addr >= PMD_SIZE) { - pmd_populate_kernel(&init_mm, pmd, - lm_alias(kasan_early_shadow_pte)); + pmd_populate_kernel_at(&init_mm, pmd, + lm_alias(kasan_early_shadow_pte), + addr); continue; } @@ -131,7 +132,7 @@ static int __ref zero_pmd_populate(pud_t *pud, unsigned long addr, if (!p) return -ENOMEM; - pmd_populate_kernel(&init_mm, pmd, p); + pmd_populate_kernel_at(&init_mm, pmd, p, addr); } zero_pte_populate(pmd, addr, next); } while (pmd++, addr = next, addr != end); @@ -157,8 +158,9 @@ static int __ref zero_pud_populate(p4d_t *p4d, unsigned long addr, pud_populate(&init_mm, pud, lm_alias(kasan_early_shadow_pmd)); pmd = pmd_offset(pud, addr); - pmd_populate_kernel(&init_mm, pmd, - lm_alias(kasan_early_shadow_pte)); + pmd_populate_kernel_at(&init_mm, pmd, + lm_alias(kasan_early_shadow_pte), + addr); continue; } @@ -203,8 +205,9 @@ static int __ref zero_p4d_populate(pgd_t *pgd, unsigned long addr, pud_populate(&init_mm, pud, lm_alias(kasan_early_shadow_pmd)); pmd = pmd_offset(pud, addr); - pmd_populate_kernel(&init_mm, pmd, - lm_alias(kasan_early_shadow_pte)); + pmd_populate_kernel_at(&init_mm, pmd, + lm_alias(kasan_early_shadow_pte), + addr); continue; } @@ -266,8 +269,9 @@ int __ref kasan_populate_early_shadow(const void *shadow_start, pud_populate(&init_mm, pud, lm_alias(kasan_early_shadow_pmd)); pmd = pmd_offset(pud, addr); - pmd_populate_kernel(&init_mm, pmd, - lm_alias(kasan_early_shadow_pte)); + pmd_populate_kernel_at(&init_mm, pmd, + lm_alias(kasan_early_shadow_pte), + addr); continue; } diff --git a/mm/memory.c b/mm/memory.c index 15f8b10ea17c..15702822d904 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -447,7 +447,7 @@ int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) return 0; } -int __pte_alloc_kernel(pmd_t *pmd) +int __pte_alloc_kernel(pmd_t *pmd, unsigned long address) { pte_t *new = pte_alloc_one_kernel(&init_mm); if (!new) @@ -456,7 +456,7 @@ int __pte_alloc_kernel(pmd_t *pmd) spin_lock(&init_mm.page_table_lock); if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ smp_wmb(); /* See comment in pmd_install() */ - pmd_populate_kernel(&init_mm, pmd, new); + pmd_populate_kernel_at(&init_mm, pmd, new, address); new = NULL; } spin_unlock(&init_mm.page_table_lock); diff --git a/mm/percpu.c b/mm/percpu.c index 4e11fc1e6def..7312e584c1b5 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -3238,7 +3238,7 @@ void __init __weak pcpu_populate_pte(unsigned long addr) new = memblock_alloc(PTE_TABLE_SIZE, PTE_TABLE_SIZE); if (!new) goto err_alloc; - pmd_populate_kernel(&init_mm, pmd, new); + pmd_populate_kernel_at(&init_mm, pmd, new, addr); } return; diff --git a/mm/pgalloc-track.h b/mm/pgalloc-track.h index e9e879de8649..0984681c03d4 100644 --- a/mm/pgalloc-track.h +++ b/mm/pgalloc-track.h @@ -45,7 +45,8 @@ static inline pmd_t *pmd_alloc_track(struct mm_struct *mm, pud_t *pud, #define pte_alloc_kernel_track(pmd, address, mask) \ ((unlikely(pmd_none(*(pmd))) && \ - (__pte_alloc_kernel(pmd) || ({*(mask)|=PGTBL_PMD_MODIFIED;0;})))?\ + (__pte_alloc_kernel(pmd, address) || \ + ({*(mask) |= PGTBL_PMD_MODIFIED; 0; }))) ? \ NULL: pte_offset_kernel(pmd, address)) #endif /* _LINUX_PGALLOC_TRACK_H */ diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index a2cbe44c48e1..d876cc4dc700 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -191,7 +191,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node) void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; - pmd_populate_kernel(&init_mm, pmd, p); + pmd_populate_kernel_at(&init_mm, pmd, p, addr); } return pmd; } From patchwork Tue Feb 20 20:32:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxwell Bland X-Patchwork-Id: 774364 Received: from mx0a-00823401.pphosted.com (mx0a-00823401.pphosted.com [148.163.148.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A05A14F9E7; Tue, 20 Feb 2024 20:37:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.148.104 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461462; cv=none; b=FPM8EQ9W8C7X/Mp8q1B0nde4gCorspvGvpzBDAb12PohjZ8H3YRqvBgxPrx7LXhIsW7zm317y28qFfWro2lHYuT5pmLmaFPEBBH7IEKshLuegnrmfOoV5DRVm6H/6WQPEbc7XpVaS3A1b6Z+GaaEH3xyhRXBeXKCIy7Y0dClBLI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461462; c=relaxed/simple; bh=uBERHdw2LAkX/ox0xE0bEhdCEdWfMV2K3mU7Jp07W8M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=Tet8q4tByhb+4cUmUwhHuw6olJtc+N7+mNBOJg0LnigN8Isf/zvzO7QqxfLi0EsCc3SY/tthveewi1QyUa2lI5ANkoY7cfcFhWdQcccV3GU9QykOryFY/qv5Td2lXkXk8g4f7fyXNXTCpjIAOxB+remxh8SIGpHbQSZOs4nUIO0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com; spf=pass smtp.mailfrom=motorola.com; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b=G8c3FTHW; arc=none smtp.client-ip=148.163.148.104 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=motorola.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b="G8c3FTHW" Received: from pps.filterd (m0355088.ppops.net [127.0.0.1]) by m0355088.ppops.net (8.17.1.24/8.17.1.24) with ESMTP id 41KJD1Po003877; Tue, 20 Feb 2024 20:33:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=motorola.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= DKIM202306; bh=bbMfJLfH6G5a4s+Z0cFe2wOzrCFA0lQFMo+SQbxfAvI=; b=G 8c3FTHWyfPhIWnn3QaQguVptC48Uy9lbb+BcoCWa1nlQvoc+woWLCowBjoFNI/ix dF3dTPLkKuRe51BCEWpmjjyoWUwOEeG1JP3K+L03xl7q88Lz8LruYRC/RRB3CTwB VoNx/lEOcTE6WiCIHXgg5dkPsrrUKhnkMwA9g0o57GM1rcm1QJ1bJzqn4LBPaSvJ 5aZqXa8byiN8r8QcOQjWXqu7qO8ypNwY4E1F+FiRQQ8X2r4skIxi7eqpl2aUyNPQ uI7mEb5ULBOlIP5YUGkeqB8pHq9Ly3Cnsty72nkRlu5/CiR5/fmbdecAA2NZSgUu DRWYGLwsYmIHKfHyKYEXQ== Received: from va32lpfpp03.lenovo.com ([104.232.228.23]) by m0355088.ppops.net (PPS) with ESMTPS id 3wd21x05ua-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 20 Feb 2024 20:33:22 +0000 (GMT) Received: from ilclmmrp01.lenovo.com (ilclmmrp01.mot.com [100.65.83.165]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by va32lpfpp03.lenovo.com (Postfix) with ESMTPS id 4TfWM54k8jz4ygs4; Tue, 20 Feb 2024 20:33:21 +0000 (UTC) Received: from ilclasset01.mot.com (ilclasset01.mot.com [100.64.7.105]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: mbland) by ilclmmrp01.lenovo.com (Postfix) with ESMTPSA id 4TfWM52tFkz3n3fr; Tue, 20 Feb 2024 20:33:21 +0000 (UTC) From: Maxwell Bland To: linux-arm-kernel@lists.infradead.org Cc: gregkh@linuxfoundation.org, agordeev@linux.ibm.com, akpm@linux-foundation.org, andreyknvl@gmail.com, andrii@kernel.org, aneesh.kumar@kernel.org, aou@eecs.berkeley.edu, ardb@kernel.org, arnd@arndb.de, ast@kernel.org, borntraeger@linux.ibm.com, bpf@vger.kernel.org, brauner@kernel.org, catalin.marinas@arm.com, christophe.leroy@csgroup.eu, cl@linux.com, daniel@iogearbox.net, dave.hansen@linux.intel.com, david@redhat.com, dennis@kernel.org, dvyukov@google.com, glider@google.com, gor@linux.ibm.com, guoren@kernel.org, haoluo@google.com, hca@linux.ibm.com, hch@infradead.org, john.fastabend@gmail.com, jolsa@kernel.org, kasan-dev@googlegroups.com, kpsingh@kernel.org, linux-arch@vger.kernel.org, linux@armlinux.org.uk, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, lstoakes@gmail.com, mark.rutland@arm.com, martin.lau@linux.dev, meted@linux.ibm.com, michael.christie@oracle.com, mjguzik@gmail.com, mpe@ellerman.id.au, mst@redhat.com, muchun.song@linux.dev, naveen.n.rao@linux.ibm.com, npiggin@gmail.com, palmer@dabbelt.com, paul.walmsley@sifive.com, quic_nprakash@quicinc.com, quic_pkondeti@quicinc.com, rick.p.edgecombe@intel.com, ryabinin.a.a@gmail.com, ryan.roberts@arm.com, samitolvanen@google.com, sdf@google.com, song@kernel.org, surenb@google.com, svens@linux.ibm.com, tj@kernel.org, urezki@gmail.com, vincenzo.frascino@arm.com, will@kernel.org, wuqiang.matt@bytedance.com, yonghong.song@linux.dev, zlim.lnx@gmail.com, mbland@motorola.com, awheeler@motorola.com Subject: [PATCH 3/4] arm64: separate code and data virtual memory allocation Date: Tue, 20 Feb 2024 14:32:55 -0600 Message-Id: <20240220203256.31153-4-mbland@motorola.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240220203256.31153-1-mbland@motorola.com> References: <20240220203256.31153-1-mbland@motorola.com> X-Proofpoint-ORIG-GUID: FwETS-WEm85ynHN23cNJXJC1VUZpgnFr X-Proofpoint-GUID: FwETS-WEm85ynHN23cNJXJC1VUZpgnFr X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-20_06,2024-02-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 clxscore=1015 suspectscore=0 phishscore=0 mlxscore=0 malwarescore=0 adultscore=0 bulkscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2402120000 definitions=main-2402200146 Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Current BPF and kprobe instruction allocation interfaces do not match the base kernel and intermingle code and data pages within the same sections. In the case of BPF, this appears to be a result of code duplication between the kernel's JIT compiler and arm64's JIT. However, This is no longer necessary given the possibility of overriding vmalloc wrapper functions. arm64's vmalloc_node routines now include a layer of indirection which splits the vmalloc region into two segments surrounding the middle module_alloc region determined by ASLR. To support this, code_region_start and code_region_end are defined to match the 2GB boundary chosen by the kernel module ASLR initialization routine. The result is a large benefits to overall kernel security, as code pages now remain protected by this ASLR routine and protections can be defined linearly for code regions rather than through PTE-level tracking. Signed-off-by: Maxwell Bland --- arch/arm64/include/asm/vmalloc.h | 3 ++ arch/arm64/kernel/module.c | 7 ++++ arch/arm64/kernel/probes/kprobes.c | 2 +- arch/arm64/mm/Makefile | 3 +- arch/arm64/mm/vmalloc.c | 57 ++++++++++++++++++++++++++++++ arch/arm64/net/bpf_jit_comp.c | 5 +-- 6 files changed, 73 insertions(+), 4 deletions(-) create mode 100644 arch/arm64/mm/vmalloc.c diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h index 38fafffe699f..dbcf8ad20265 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -31,4 +31,7 @@ static inline pgprot_t arch_vmap_pgprot_tagged(pgprot_t prot) return pgprot_tagged(prot); } +extern unsigned long code_region_start __ro_after_init; +extern unsigned long code_region_end __ro_after_init; + #endif /* _ASM_ARM64_VMALLOC_H */ diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index dd851297596e..c4fe753a71a9 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -29,6 +29,10 @@ static u64 module_direct_base __ro_after_init = 0; static u64 module_plt_base __ro_after_init = 0; +/* For pre-init vmalloc, assume the worst-case code range */ +unsigned long code_region_start __ro_after_init = (u64) (_end - SZ_2G); +unsigned long code_region_end __ro_after_init = (u64) (_text + SZ_2G); + /* * Choose a random page-aligned base address for a window of 'size' bytes which * entirely contains the interval [start, end - 1]. @@ -101,6 +105,9 @@ static int __init module_init_limits(void) module_plt_base = random_bounding_box(SZ_2G, min, max); } + code_region_start = module_plt_base; + code_region_end = module_plt_base + SZ_2G; + pr_info("%llu pages in range for non-PLT usage", module_direct_base ? (SZ_128M - kernel_size) / PAGE_SIZE : 0); pr_info("%llu pages in range for PLT usage", diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index 70b91a8c6bb3..c9e109d6c8bc 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -131,7 +131,7 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) void *alloc_insn_page(void) { - return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END, + return __vmalloc_node_range(PAGE_SIZE, 1, code_region_start, code_region_end, GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile index dbd1bc95967d..730b805d8388 100644 --- a/arch/arm64/mm/Makefile +++ b/arch/arm64/mm/Makefile @@ -2,7 +2,8 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ cache.o copypage.o flush.o \ ioremap.o mmap.o pgd.o mmu.o \ - context.o proc.o pageattr.o fixmap.o + context.o proc.o pageattr.o fixmap.o \ + vmalloc.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PTDUMP_DEBUGFS) += ptdump_debugfs.o diff --git a/arch/arm64/mm/vmalloc.c b/arch/arm64/mm/vmalloc.c new file mode 100644 index 000000000000..b6d2fa841f90 --- /dev/null +++ b/arch/arm64/mm/vmalloc.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include + +static void *__vmalloc_node_range_split(unsigned long size, unsigned long align, + unsigned long start, unsigned long end, + unsigned long exclusion_start, unsigned long exclusion_end, gfp_t gfp_mask, + pgprot_t prot, unsigned long vm_flags, int node, + const void *caller) +{ + void *res = NULL; + + res = __vmalloc_node_range(size, align, start, exclusion_start, + gfp_mask, prot, vm_flags, node, caller); + if (!res) + res = __vmalloc_node_range(size, align, exclusion_end, end, + gfp_mask, prot, vm_flags, node, caller); + + return res; +} + +void *__vmalloc_node(unsigned long size, unsigned long align, + gfp_t gfp_mask, unsigned long vm_flags, int node, + const void *caller) +{ + return __vmalloc_node_range_split(size, align, VMALLOC_START, + VMALLOC_END, code_region_start, code_region_end, + gfp_mask, PAGE_KERNEL, vm_flags, node, caller); +} + +void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) +{ + return __vmalloc_node_range_split(size, 1, VMALLOC_START, VMALLOC_END, + code_region_start, code_region_end, + gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, + NUMA_NO_NODE, __builtin_return_address(0)); +} + +void *vmalloc_user(unsigned long size) +{ + return __vmalloc_node_range_split(size, SHMLBA, VMALLOC_START, VMALLOC_END, + code_region_start, code_region_end, + GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL, + VM_USERMAP, NUMA_NO_NODE, + __builtin_return_address(0)); +} + +void *vmalloc_32_user(unsigned long size) +{ + return __vmalloc_node_range_split(size, SHMLBA, VMALLOC_START, VMALLOC_END, + code_region_start, code_region_end, + GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL, + VM_USERMAP, NUMA_NO_NODE, + __builtin_return_address(0)); +} + diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 8955da5c47cf..40426f3a9bdf 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -1690,12 +1691,12 @@ u64 bpf_jit_alloc_exec_limit(void) void *bpf_jit_alloc_exec(unsigned long size) { /* Memory is intended to be executable, reset the pointer tag. */ - return kasan_reset_tag(vmalloc(size)); + return kasan_reset_tag(module_alloc(size)); } void bpf_jit_free_exec(void *addr) { - return vfree(addr); + return module_memfree(addr); } /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */ From patchwork Tue Feb 20 20:32:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxwell Bland X-Patchwork-Id: 774365 Received: from mx0b-00823401.pphosted.com (mx0b-00823401.pphosted.com [148.163.152.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E426E69D10; Tue, 20 Feb 2024 20:33:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.152.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461225; cv=none; b=CdX9j/OPLFznpaahddYFbNpl80Zr8DlhcxXmnxIS59L7jzEdFTrAWpSD6HDNPi80eLMKvYY/xrK3vMEYL7fZFNw6NVuqrq1FkoLF0Wcn8NQL6CBNxsaYgTrS29wC7NalwMfCxvoPV69SWYp60Z2nZ39KeBJP9I6hStnP8qTsduY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461225; c=relaxed/simple; bh=5fyrpL6I35oXbQ79JL+OMciY0wUkV8Hs5J/Gm9DAExw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=Esn3ERIaTK7klEhKDQMe3hOzgksO7Nl+CNIf7IyxneyjzbjE93haTcvy5Q34FYAB5H5EqBLElPW28QdRtZIf8rGYRxpCaxgAZ6/WU07n/I52RrqHD/hBcO5C/rp/Kb6Zu6vT2NoQJxydHB5K6GIiSExFhLgYzXIfQ+XB8TR08sU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com; spf=pass smtp.mailfrom=motorola.com; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b=3DviofDb; arc=none smtp.client-ip=148.163.152.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=motorola.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b="3DviofDb" Received: from pps.filterd (m0355089.ppops.net [127.0.0.1]) by mx0b-00823401.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 41KJFBgl020507; Tue, 20 Feb 2024 20:33:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=motorola.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= DKIM202306; bh=yYaZ6KkZCfPgvE7ldRLRah8LlOxOAhyoW4Oo9vMYLI4=; b=3 DviofDb9pzniwChbEs283Q/nhrnZBOGxIjIbYC1Xij4+lyhygZxJPN3GIzERfFUn FRqnViLZyoLn8VtqSCE6yax3k1cyJkMzduys1f9K06sujdOQdyJ5v8X4aLSf7YQv DJpHBVfe55tafJqy9d0n6/2snEve64mdY1j6Ao9A4r/vZEWTvqGph4TEb0IydBJV 29uSUIr0Bz33FEtALIf1RFGASoSx7cTH2Mlm/csy1VwndF+H5rvB0gZI1BGaao9K uo4ExG1esCF3SuXWz284KarBuUKzF8ywJu6hJwimy2/BNJtTjQCKyOnRt9V1ibg4 196bpTuBMlRupJNonffFg== Received: from va32lpfpp02.lenovo.com ([104.232.228.22]) by mx0b-00823401.pphosted.com (PPS) with ESMTPS id 3wd22x86fy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 20 Feb 2024 20:33:23 +0000 (GMT) Received: from ilclmmrp01.lenovo.com (ilclmmrp01.mot.com [100.65.83.165]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by va32lpfpp02.lenovo.com (Postfix) with ESMTPS id 4TfWM709tKz50TkW; Tue, 20 Feb 2024 20:33:23 +0000 (UTC) Received: from ilclasset01.mot.com (ilclasset01.mot.com [100.64.7.105]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: mbland) by ilclmmrp01.lenovo.com (Postfix) with ESMTPSA id 4TfWM65bfHz3n3fr; Tue, 20 Feb 2024 20:33:22 +0000 (UTC) From: Maxwell Bland To: linux-arm-kernel@lists.infradead.org Cc: gregkh@linuxfoundation.org, agordeev@linux.ibm.com, akpm@linux-foundation.org, andreyknvl@gmail.com, andrii@kernel.org, aneesh.kumar@kernel.org, aou@eecs.berkeley.edu, ardb@kernel.org, arnd@arndb.de, ast@kernel.org, borntraeger@linux.ibm.com, bpf@vger.kernel.org, brauner@kernel.org, catalin.marinas@arm.com, christophe.leroy@csgroup.eu, cl@linux.com, daniel@iogearbox.net, dave.hansen@linux.intel.com, david@redhat.com, dennis@kernel.org, dvyukov@google.com, glider@google.com, gor@linux.ibm.com, guoren@kernel.org, haoluo@google.com, hca@linux.ibm.com, hch@infradead.org, john.fastabend@gmail.com, jolsa@kernel.org, kasan-dev@googlegroups.com, kpsingh@kernel.org, linux-arch@vger.kernel.org, linux@armlinux.org.uk, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, lstoakes@gmail.com, mark.rutland@arm.com, martin.lau@linux.dev, meted@linux.ibm.com, michael.christie@oracle.com, mjguzik@gmail.com, mpe@ellerman.id.au, mst@redhat.com, muchun.song@linux.dev, naveen.n.rao@linux.ibm.com, npiggin@gmail.com, palmer@dabbelt.com, paul.walmsley@sifive.com, quic_nprakash@quicinc.com, quic_pkondeti@quicinc.com, rick.p.edgecombe@intel.com, ryabinin.a.a@gmail.com, ryan.roberts@arm.com, samitolvanen@google.com, sdf@google.com, song@kernel.org, surenb@google.com, svens@linux.ibm.com, tj@kernel.org, urezki@gmail.com, vincenzo.frascino@arm.com, will@kernel.org, wuqiang.matt@bytedance.com, yonghong.song@linux.dev, zlim.lnx@gmail.com, mbland@motorola.com, awheeler@motorola.com Subject: [PATCH 4/4] arm64: dynamic enforcement of pmd-level PXNTable Date: Tue, 20 Feb 2024 14:32:56 -0600 Message-Id: <20240220203256.31153-5-mbland@motorola.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240220203256.31153-1-mbland@motorola.com> References: <20240220203256.31153-1-mbland@motorola.com> X-Proofpoint-ORIG-GUID: IdOVOSFTH5OwTQnHH7fAIQBlI2JoJAu- X-Proofpoint-GUID: IdOVOSFTH5OwTQnHH7fAIQBlI2JoJAu- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-20_06,2024-02-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 phishscore=0 mlxlogscore=766 spamscore=0 malwarescore=0 adultscore=0 suspectscore=0 impostorscore=0 priorityscore=1501 bulkscore=0 mlxscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2402120000 definitions=main-2402200146 Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: In an attempt to protect against write-then-execute attacks wherein an adversary stages malicious code into a data page and then later uses a write gadget to mark the data page executable, arm64 enforces PXNTable when allocating pmd descriptors during the init process. However, these protections are not maintained for dynamic memory allocations, creating an extensive threat surface to write-then-execute attacks targeting pages allocated through the vmalloc interface. Straightforward modifications to the pgalloc interface allow for the dynamic enforcement of PXNTable, restricting writable and privileged-executable code pages to known kernel text, bpf-allocated programs, and kprobe-allocated pages, all of which have more extensive verification interfaces than the generic vmalloc region. This patch adds a preprocessor define to check whether a pmd is allocated by vmalloc and exists outside of a known code region, and if so, marks the pmd as PXNTable, protecting over 100 last-level page tables from manipulation in the process. Signed-off-by: Maxwell Bland --- arch/arm64/include/asm/pgalloc.h | 11 +++++++++-- arch/arm64/include/asm/vmalloc.h | 5 +++++ arch/arm64/mm/trans_pgd.c | 2 +- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h index 237224484d0f..5e9262241e8b 100644 --- a/arch/arm64/include/asm/pgalloc.h +++ b/arch/arm64/include/asm/pgalloc.h @@ -13,6 +13,7 @@ #include #include +#define __HAVE_ARCH_ADDR_COND_PMD #define __HAVE_ARCH_PGD_FREE #include @@ -74,10 +75,16 @@ static inline void __pmd_populate(pmd_t *pmdp, phys_addr_t ptep, * of the mm address space. */ static inline void -pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep) +pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep, + unsigned long address) { + pmdval_t pmd = PMD_TYPE_TABLE | PMD_TABLE_UXN; VM_BUG_ON(mm && mm != &init_mm); - __pmd_populate(pmdp, __pa(ptep), PMD_TYPE_TABLE | PMD_TABLE_UXN); + if (IS_DATA_VMALLOC_ADDR(address) && + IS_DATA_VMALLOC_ADDR(address + PMD_SIZE)) { + pmd |= PMD_TABLE_PXN; + } + __pmd_populate(pmdp, __pa(ptep), pmd); } static inline void diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h index dbcf8ad20265..6f254ab83f4a 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -34,4 +34,9 @@ static inline pgprot_t arch_vmap_pgprot_tagged(pgprot_t prot) extern unsigned long code_region_start __ro_after_init; extern unsigned long code_region_end __ro_after_init; +#define IS_DATA_VMALLOC_ADDR(vaddr) (((vaddr) < code_region_start || \ + (vaddr) > code_region_end) && \ + ((vaddr) >= VMALLOC_START && \ + (vaddr) < VMALLOC_END)) + #endif /* _ASM_ARM64_VMALLOC_H */ diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 7b14df3c6477..7f903c51e1eb 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -69,7 +69,7 @@ static int copy_pte(struct trans_pgd_info *info, pmd_t *dst_pmdp, dst_ptep = trans_alloc(info); if (!dst_ptep) return -ENOMEM; - pmd_populate_kernel(NULL, dst_pmdp, dst_ptep); + pmd_populate_kernel_at(NULL, dst_pmdp, dst_ptep, addr); dst_ptep = pte_offset_kernel(dst_pmdp, start); src_ptep = pte_offset_kernel(src_pmdp, start);