This patch has been committed to the master branch: 8718727509b — x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction This patch optimizes float to __bf16 conversion under -ffast-math by using a simple bit shift (psrld $16) instead of a dedicated conversion instruction, removing the requirement for AVX512-BF16 or AVX-NE-CONVERT hardware. The Insight BF16… Read more x86: Fast-math float-to-BF16 truncation via PSRLD
Month: September 2024
x86: Extend AVX512 vectorized popcount to small vector modes
This patch has been committed to the master branch: 85910e650a6 — x86: Extend AVX512 Vectorization for Popcount in Various Modes This patch enables vectorized popcount (population count / bit counting) for small vector modes that were previously unhandled: V2QI, V4QI, V8QI, V2HI, V4HI, and V2SI. These are the partial-vectorization modes used when loop trip counts… Read more x86: Extend AVX512 vectorized popcount to small vector modes
i386: Integrate BFmode in ix86_preferred_simd_mode for auto-vectorization
This patch has been committed to the master branch: b851bce473d — i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode This small but important patch tells GCC’s auto-vectorizer how to choose the best SIMD register width for BF16 operations, enabling automatic vectorization of BF16 loops without manual intrinsics. The Problem GCC’s vectorizer queries the target backend… Read more i386: Integrate BFmode in ix86_preferred_simd_mode for auto-vectorization
i386: Native BF16 comparisons with AVX10.2 – vcmppbf16 and vcomsbf16
These patches have been committed to the master branch: f77435aa391 – i386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2 89d50c45048 – i386: Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16 61622cfa463 – i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2 These three patches enable native BF16 comparison support in GCC’s x86 backend using AVX10.2 instructions. Together they cover… Read more i386: Native BF16 comparisons with AVX10.2 – vcmppbf16 and vcomsbf16