Files
FFmpeg/libavcodec
Jun Zhao 0886e50c6b lavc/hevc: add aarch64 neon for 8-bit dequant
Implement NEON optimization for HEVC dequant at 8-bit depth.

The NEON implementation uses srshr (Signed Rounding Shift Right) which
does both the add with offset and right shift in a single instruction.

Optimization details:
- 4x4 (16 coeffs): Single load-process-store sequence
- 8x8 (64 coeffs): Fully unrolled, no loop overhead
- 16x16 (256 coeffs): Pipelined load/compute/store to hide memory latency
- 32x32 (1024 coeffs): Pipelined with all available NEON registers

Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_8_c:                                   11.3 ( 1.00x)
hevc_dequant_4x4_8_neon:                                 6.3 ( 1.78x)

hevc_dequant_8x8_8_c:                                   33.9 ( 1.00x)
hevc_dequant_8x8_8_neon:                                 6.6 ( 5.11x)

hevc_dequant_16x16_8_c:                                153.8 ( 1.00x)
hevc_dequant_16x16_8_neon:                               9.0 (17.02x)

hevc_dequant_32x32_8_c:                                 78.1 ( 1.00x)
hevc_dequant_32x32_8_neon:                              31.9 ( 2.45x)

Note on Performance Anomaly:
The observation that hevc_dequant_32x32_8_c is faster than 16x16 (78.1 vs 153.8)
is due to Clang auto-vectorizing only for sizes >= 32x32.
Compiler: Apple clang version 17.0.0 (clang-1700.6.3.2)

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-25 06:55:26 +00:00
..
2026-01-19 16:37:15 +01:00
2025-11-05 16:31:59 +00:00
2025-07-29 23:38:16 +02:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-09-22 23:46:29 +00:00
2025-10-08 20:40:08 +02:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-12-13 18:45:17 -03:00
2025-12-13 18:45:17 -03:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-11-26 15:16:42 +01:00
2025-11-08 18:48:54 +01:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2026-01-19 20:41:04 +00:00
2025-08-07 19:44:59 +00:00
2025-08-03 13:19:25 +00:00
2025-08-03 13:48:47 +02:00
2025-07-03 20:35:23 +02:00
2025-10-30 03:41:24 +01:00
2025-08-11 20:31:09 +02:00
2025-06-23 17:16:42 +10:00
2026-01-02 18:39:48 +01:00
2026-01-02 18:39:48 +01:00
2026-01-02 18:39:48 +01:00
2026-01-02 18:39:48 +01:00
2025-08-03 13:48:47 +02:00
2025-06-06 17:21:37 +02:00
2025-11-26 15:16:43 +01:00
2025-08-03 13:19:25 +00:00
2025-08-03 13:19:25 +00:00
2025-07-20 01:05:23 +02:00
2025-12-13 18:45:17 -03:00
2025-12-30 17:30:45 +00:00
2025-08-03 13:48:47 +02:00
2025-08-06 21:04:56 +00:00
2026-01-02 18:39:48 +01:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-11-09 02:42:17 +01:00
2025-12-13 18:45:17 -03:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2026-01-10 22:47:22 +01:00
2025-08-08 18:29:40 +09:00
2025-08-03 13:48:47 +02:00
2025-07-03 20:34:51 +02:00
2025-08-03 13:48:47 +02:00
2025-07-03 20:35:23 +02:00
2025-07-03 20:35:32 +02:00
2025-11-10 01:46:52 +00:00
2025-08-11 11:54:31 +02:00
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-09-26 06:20:30 +02:00
2025-09-22 23:46:29 +00:00
2025-08-03 13:48:47 +02:00
2025-06-21 22:08:52 +02:00
2025-08-03 13:48:47 +02:00
2025-08-20 11:20:14 +02:00
2025-08-03 13:48:47 +02:00
2025-11-27 11:34:25 +01:00
2025-12-30 14:39:08 -05:00
2025-12-30 14:39:08 -05:00
2025-08-04 09:12:17 +00:00
2026-01-02 18:39:48 +01:00
2026-01-02 18:39:48 +01:00
2026-01-02 18:39:48 +01:00