From patchwork Wed Oct 4 10:16:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Prathamesh Kulkarni X-Patchwork-Id: 729280 Delivered-To: patch@linaro.org Received: by 2002:a5d:4a4e:0:b0:31d:da82:a3b4 with SMTP id v14csp260630wrs; Wed, 4 Oct 2023 03:17:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IECT1glVnEgeONSfOv8zniIxPlwEwUVhuPH1wZf0cRmFV3NroCAD4BLUDJwEbI1Plk2WYT7 X-Received: by 2002:a17:907:77c7:b0:9ae:1872:d01a with SMTP id kz7-20020a17090777c700b009ae1872d01amr1361028ejc.76.1696414622789; Wed, 04 Oct 2023 03:17:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696414622; cv=none; d=google.com; s=arc-20160816; b=vD/iWgEgHURc9etmKh1fdZWop+QguFdSXbTV/kVx5GFhE2u11WFit9q9j97+Wtq0rc LrYP0EcxiPKA8lrRcL+J1HMVus708JjnJYe92QaPqyPSYLRDo+FuXsTW7ReJwEmOP3nQ JeQM2t3raxYRqBfeEJMq7ZAy72mjiP4T1BnnriaP+2COdYXzNQ7PvOEImYCBgi2MzA8i Sa8afJNnVp5f7nUTB8IDYNQ/ztrnkKYMst2mxnpOw9IC6WmdVnnCr0L2VKWO4dwclKGb zmv5RPiUVTcINQJOxG7aF0ljULT/6uvTHOR1Witt3w4V0j1t8Q8GbTswg6obU+lLYvwI 2Xdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:to:subject:message-id:date:from :mime-version:dkim-signature:dmarc-filter:delivered-to; bh=wGUWkmjS8V4x6CWvwnRSAEFiUKIlhft+KGO7tTM7K0o=; fh=mQwGPwJ8ccpLWXgwK1sB6HsFvBUL8YaB4cdGvFKB/fo=; b=NMkmfZKJWNiaGJbqxuvd1gNPoWnUTyicc8sdCQnnbOj9ELaabCdAG+yLhi2UHhYe+B k0pDfcMYWUpksmHCjNeeCFwzuoGTzqSpAmo3LC6fufXbcPblZu/arrSqmYN8kJhPasGZ UltgFlY29GgYMe/6QFnrSc37kMQAMlNsuFD00/xYnbzoBpE8FQITu9Dy1bsDMYIj/1nu JZvF5q0TqS6KuwfphUC0yBrV0TOSRfl7yX6DtFyt2d6zxhmK1TAzGTFh4uwHIhRQo+xf YuBEpkaKPU5rapT9C9Uegz15jVrl8AWStCx8DDugZZx8toowWEYeJT1Zyeqe3rF73Z9T PU8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IRz6wzEV; spf=pass (google.com: domain of gcc-patches-bounces+patch=linaro.org@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+patch=linaro.org@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id k10-20020a1709065fca00b00993a68a3af5si1577398ejv.529.2023.10.04.03.17.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Oct 2023 03:17:02 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+patch=linaro.org@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IRz6wzEV; spf=pass (google.com: domain of gcc-patches-bounces+patch=linaro.org@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+patch=linaro.org@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6E0A03856965 for ; Wed, 4 Oct 2023 10:17:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by sourceware.org (Postfix) with ESMTPS id 287FC3857705 for ; Wed, 4 Oct 2023 10:16:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 287FC3857705 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wr1-x431.google.com with SMTP id ffacd0b85a97d-3215f19a13aso1896803f8f.3 for ; Wed, 04 Oct 2023 03:16:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1696414610; x=1697019410; darn=gcc.gnu.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=wGUWkmjS8V4x6CWvwnRSAEFiUKIlhft+KGO7tTM7K0o=; b=IRz6wzEV75f6RTfvp9A6MfolQJ8zDGPieVHOeOjCr507oBV1CkoPoISfOE5bs3eLEl uLXLBmD/Al9GI2sbq6yyyVo7wPk/EKCNiA/ZscuFkb7Caq4P6FDuUoVPfYgMbc/2zcj4 VDMcSX7Cg8tosqjGX/+Qq+2iMgqymroWg7GJ6YvIL++zbksM+Vi+nhSHya90WVSuVB26 clyca87o1hULTMhmce9iHSnw1wPtW5yeRkObTgsa6TOlAx3KaHsEuygMla3+27woX0QQ S3JNefzpbEASZrgQJJzEjTk5j7UlN/kv4wZ0Cgdt7m73EScVGn1GWjeHwJA7fXhKHTXS OPxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696414610; x=1697019410; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=wGUWkmjS8V4x6CWvwnRSAEFiUKIlhft+KGO7tTM7K0o=; b=nr6QVapHsnO+mafU7ROiR8vEqPemWez2dOL0s1jvDe4PDHnUsoSxoZd8kt+aqUTs2t c/nyiMp8xQzAfCbfAn5hYn4cI26ABRIcU/M1app8tmLFrkERG/+2DrnSNt7zQybsNZ5F 1DPCSDIs1xLArwohR8wq32ltzDw3ykrwuere+7+REOMe4MHkOPKM/BmMsyWxQk4rbQKb 1IPL/1FdMMU4zY2UEtvXhqdhjjZCWiwl8IzVC+AecO4oLV6CknUMpHq0uh+dUB+lYNFK LUR4Sbpd6xh1UTTZnz1OASeCKPDSmGn4cnRSucQn/vrOwPLNgjCevOXBh2mxJBQbI3ZN DH7A== X-Gm-Message-State: AOJu0Yyoz9JXUNJ+Uv+Xama748xvTe5FvJdGRpzQEpv06025EyE/JZMP nQPS1SmipYnCbosYXEm1hCGuHj/PefJ2ObkMX76WqpH/KG1sbGEgffA= X-Received: by 2002:a5d:6408:0:b0:320:38:9e14 with SMTP id z8-20020a5d6408000000b0032000389e14mr1522016wru.7.1696414609964; Wed, 04 Oct 2023 03:16:49 -0700 (PDT) MIME-Version: 1.0 From: Prathamesh Kulkarni Date: Wed, 4 Oct 2023 15:46:14 +0530 Message-ID: Subject: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding To: gcc Patches , Richard Sandiford X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+patch=linaro.org@gcc.gnu.org Hi, The attached patch attempts to fix PR111648. As mentioned in PR, the issue is when a1 is a multiple of vector length, we end up creating following encoding in result: { base_elem, arg[0], arg[1], ... } (assuming S = 1), where arg is chosen input vector, which is incorrect, since the encoding originally in arg would be: { arg[0], arg[1], arg[2], ... } For the test-case mentioned in PR, vectorizer pass creates VEC_PERM_EXPR where: arg0: { -16, -9, -10, -11 } arg1: { -12, -5, -6, -7 } sel = { 3, 4, 5, 6 } arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. Since a1 = 4 and arg_len = 4, it ended up creating the result with following encoding: res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 = { -11, -12, -5 } So for res[3], it used S = (-5) - (-12) = 7 And hence computed it as -5 + 7 = 2. instead of selecting arg1[2], ie, -6. The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple of vector length, so a1 ... ae select elements only from stepped part of the pattern from input vector and return false for this case. Since the vectors are VLS, fold_vec_perm_cst then sets: res_npatterns = res_nelts res_nelts_per_pattern = 1 which seems to fix the issue by encoding all the elements. The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively, which used a1 = 0, and thus selected arg1[0]. I removed Case 4 because it was already covered in test_nunits_min_4, and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... } and added a new Case 9 to test for this issue. Passes bootstrap+test on aarch64-linux-gnu with and without SVE, and on x86_64-linux-gnu. Does the patch look OK ? Thanks, Prathamesh [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding. gcc/ChangeLog: PR tree-optimization/111648 * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1 is a multiple of vector length. (test_nunits_min_2): Remove Case 4 and move Case 5 to ... (test_nunits_min_4): ... here and rename case numbers. Also add Case 9. gcc/testsuite/ChangeLog: PR tree-optimization/111648 * gcc.dg/vect/pr111648.c: New test. diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 4f8561509ff..c5f421d6b76 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, return false; } - /* Ensure that the stepped sequence always selects from the same - input pattern. */ + /* Ensure that the stepped sequence always selects from the stepped + part of same input pattern. */ unsigned arg_npatterns = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0) : VECTOR_CST_NPATTERNS (arg1); @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1, *reason = "step is not multiple of npatterns"; return false; } + + /* If a1 is a multiple of len, it will select base element of input + vector resulting in following encoding: + { base_elem, arg[0], arg[1], ... } where arg is the chosen input + vector. This encoding is not originally present in arg, since it's + defined as: + { arg[0], arg[1], arg[2], ... }. */ + + if (multiple_p (a1, arg_len)) + { + if (reason) + *reason = "selecting base element of input vector"; + return false; + } } return true; @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode) tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; validate_res (2, 2, res, expected_res); } - - /* Case 4: mask = {0, 0, 1, ...} // (1, 3) - Test that the stepped sequence of the pattern selects from - same input pattern. Since input vectors have npatterns = 2, - and step (a2 - a1) = 1, step is not a multiple of npatterns - in input vector. So return NULL_TREE. */ - { - tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); - tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); - - vec_perm_builder builder (len, 1, 3); - poly_uint64 mask_elems[] = { 0, 0, 1 }; - builder_push_elems (builder, mask_elems); - - vec_perm_indices sel (builder, 2, len); - const char *reason; - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, - &reason); - ASSERT_TRUE (res == NULL_TREE); - ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); - } - - /* Case 5: mask = {len, 0, 1, ...} // (1, 3) - Test that stepped sequence of the pattern selects from arg0. - res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ - { - tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); - tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); - poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); - - vec_perm_builder builder (len, 1, 3); - poly_uint64 mask_elems[] = { len, 0, 1 }; - builder_push_elems (builder, mask_elems); - - vec_perm_indices sel (builder, 2, len); - tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); - - tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; - validate_res (1, 3, res, expected_res); - } } } @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode) validate_res (1, 3, res, expected_res); } - /* Case 4: + /* Case 4: mask = {len, 1, 2, ...} // (1, 3) + Test that stepped sequence of the pattern selects from arg0. + res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3) */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { len, 1, 2 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); + + tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) }; + validate_res (1, 3, res, expected_res); + } + + /* Case 5: sel = { len, 0, 2, ... } // (1, 3) This should return NULL because we cross the input vectors. Because, @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode) ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); } - /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 + /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2 mask = { 0, len, 1, len + 1, ...} // (2, 2) res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode) validate_res (2, 2, res, expected_res); } - /* Case 6: Test combination in sel, where one pattern is dup and other + /* Case 7: Test combination in sel, where one pattern is dup and other is stepped sequence. sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) res = { arg0[0], arg0[0], arg0[0], @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode) validate_res (2, 3, res, expected_res); } - /* Case 7: PR111048: Check that we set arg_npatterns correctly, + /* Case 8: PR111048: Check that we set arg_npatterns correctly, when arg0, arg1 and sel have different number of patterns. arg0 is of shape (1, 1) arg1 is of shape (4, 1) @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode) ASSERT_TRUE (res == NULL_TREE); ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); } + + /* Case 9: PR111648 - a1 is multiple of vector length, + which results in incorrect encoding. Verify that we return + NULL for this case. + sel = { base_elem, len, len+1, ... } // (1, 3) + In this case, the single pattern is: { base_elem len, len+1, ...} + Let's assume that base_elem is used for indexing into arg0, + and a1 ... ae chooses elements from arg1. + So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3) + Which creates an incorrect encoding with S = arg1[1] - arg1[0] + while the original encoding in arg1 is + arg1: { arg1[0], arg1[1], arg1[2], ... } + with S = arg1[2] - arg1[1]. + + As a concrete example, for above PR: + arg0: { -16, -9, -10, -11 } + arg1: { -12, -5, -6, -7 } + sel = { 3, 4, 5, 6 } + + arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. + Since a1 = 4 and arg_len = 4, it ended up creating the result with + following encoding: + res = { arg0[3], arg1[0], arg1[1] } // (1, 3) + = { -11, -12, -5 } + + So for res[3], it used S = (-5) - (-12) = 7 + And hence computed it as -5 + 7 = 2. + instead of arg1[2], ie, -6, which is the correct value. + Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case. */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3); + tree arg1 = build_vec_cst_rand (vmode, 1, 3); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { 0, len, len+1 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + const char *reason; + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); + ASSERT_TRUE (res == NULL_TREE); + ASSERT_TRUE (!strcmp (reason, + "selecting base element of input vector")); + } } } diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c new file mode 100644 index 00000000000..093e2b02654 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int a; +int *b = &a; +static int **c = &b; +static int d; +short e; +short f; + +_Bool foo () +{ + f = -21; + for (; f < -5; f++) { + e = f ^ 3; + d = *b; + **c = e; + } + + return d == -6; +} + +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */