1

[compiler-rt][SelectionDAG] Add extendbfsf2 libcall and use it for bf16 extends...

 11 months ago
source link: https://reviews.llvm.org/D151436
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Details
Summary

Previously this resulted in an assert (reproducible on RISC-V with soft FP). The existing code path assumes a libcall is present, and adding the libcall seems like the easiest fix. This libcall _is_ provided by libgcc, which perhaps providing its own motivation for adding it here.

The legalisation code in LegalizeDAG lowers to an anyext and shift which might be an alternative. This would however be more invasive to support vs just adding an extra case to the existing libcall lowering logic, and these soft targets are likely not a target we care strongly about BF16 support beyond wanting some basic support for completeness.

I'm not able to convince myself that the anyext+shift lowering is always identical to the more elaborate extension performed by the libcall in all cases (and if so, why do the trunc and extend libcalls even exist?). though I'm not sure I can convince myself. I know @craig.topper was involved in a previous discussion on this so I'd appreciate your view.

Diff Detail

Unit TestsFailed
TimeTest
50 msx64 debian > LLVM.CodeGen/RISCV::bfloat.ll
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -mtriple=riscv32 -verify-machineinstrs < /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/RISCV/bfloat.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/RISCV/bfloat.ll -check-prefix=RV32I-ILP32
60,050 msx64 debian > MLIR.Examples/standalone::test.toy
Script: -- : 'RUN: at line 1'; "/etc/cmake/bin/cmake" "/var/lib/buildkite-agent/builds/llvm-project/mlir/examples/standalone" -G "Ninja" -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/var/lib/buildkite-agent/builds/llvm-project/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld -DPython3_EXECUTABLE="/usr/bin/python3.9"

Event Timeline

asb created this revision.Thu, May 25, 6:04 AM
asb requested review of this revision.Thu, May 25, 6:04 AM
Comment Actions

I'm not able to convince myself that the anyext+shift lowering is always identical to the more elaborate extension performed by the libcall in all cases (and if so, why do the trunc and extend libcalls even exist?). though I'm not sure I can convince myself. I know @craig.topper was involved in a previous discussion on this so I'd appreciate your view.

fp32 has more bits of mantissa than bfloat16 but they have the same number of exponent bits.

The trunc libcall exists because the extra bits of mantissa that exist in fp32 need to be rounded to convert to bfloat16. Also some f32 subnormal values can't be represented in bfloat16. So it can't be done as an integer truncate.

For extend, we should just need to add 0s to the end of the mantissa. The +0.0, -0.0 are encoded as all 0s in the mantissa and exponent in both encodings. infinity is encoded with a special exponent and all 0 mantissa in both formats. nan uses the same exponent as infinity but a non-zero mantissa. If the mantissa is already non-zero, adding more zeros doesn't change that. Adding zeros to the end of the mantissa for normals and denormals shouldn't change their value.

Comment Actions

For extend, we should just need to add 0s to the end of the mantissa. The +0.0, -0.0 are encoded as all 0s in the mantissa and exponent in both encodings. infinity is encoded with a special exponent and all 0 mantissa in both formats. nan uses the same exponent as infinity but a non-zero mantissa. If the mantissa is already non-zero, adding more zeros doesn't change that. Adding zeros to the end of the mantissa for normals and denormals shouldn't change their value.

And then we'd just lose out on FE_INVALID being set if the input is a signalling NaN - it seems libgcc does have some support for setting these exception bits (on some platforms at least, with the right support hooks implemented) while compiler-rt has none. So I think that justifies the libcall for them. Thanks for helping clear that up.

Comment Actions

You would only need to worry about snans with the constrained fptrunc


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK