Shreesh Adiga
|
5085432f8b
|
avutil/crc: add aarch64 NEON PMULL+EOR3 SIMD implementation for av_crc
Implemented clmul algorithm for aarch64 using PMULL and EOR3 instructions.
The logic and structure is same as x86 clmul implementation with
slight rearrangement of constants as per PMULL and PMULL2 instructions.
Benchmarking in Android (Termux) on a MediaTek Dimensity 9400 SoC:
./tests/checkasm/checkasm --test=crc --bench --runs=12
benchmarking with native FFmpeg timers
nop: 0.2
checkasm: SVE 128 bits, using random seed 2502847808
checkasm: bench runs 4096 (1 << 12)
CRC:
- crc.crc [OK]
PMULL:
- crc.crc [OK]
checkasm: all 10 tests passed
crc_8_ATM_c: 26.0 ( 1.00x)
crc_8_ATM_pmull_eor3: 0.7 (37.17x)
crc_8_EBU_c: 46.4 ( 1.00x)
crc_8_EBU_pmull_eor3: 1.5 (31.47x)
crc_16_ANSI_c: 36.3 ( 1.00x)
crc_16_ANSI_pmull_eor3: 1.1 (31.70x)
crc_16_ANSI_LE_c: 90.9 ( 1.00x)
crc_16_ANSI_LE_pmull_eor3: 2.8 (32.30x)
crc_16_CCITT_c: 118.0 ( 1.00x)
crc_16_CCITT_pmull_eor3: 3.7 (32.00x)
crc_24_IEEE_c: 1.6 ( 1.00x)
crc_24_IEEE_pmull_eor3: 0.1 (12.19x)
crc_32_IEEE_c: 45.2 ( 1.00x)
crc_32_IEEE_pmull_eor3: 1.4 (31.39x)
crc_32_IEEE_LE_c: 49.1 ( 1.00x)
crc_32_IEEE_LE_crc: 2.5 (19.51x)
crc_32_IEEE_LE_pmull_eor3: 1.5 (32.84x)
crc_custom_polynomial_c: 45.3 ( 1.00x)
crc_custom_polynomial_pmull_eor3: 1.3 (35.16x)
|
2026-03-11 14:03:36 +00:00 |
|