Jun Zhao
8966101fa6
lavc/hevc: add aarch64 neon for 12-bit dequant
Implement NEON optimization for HEVC dequant at 12-bit depth.
For 12-bit: shift = 15 - 12 - log2_size = 3 - log2_size. When shift
is negative, we use shl (shift left) instead of srshr.
Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_12_c: 9.9 ( 1.00x)
hevc_dequant_4x4_12_neon: 5.7 ( 1.74x)
hevc_dequant_8x8_12_c: 1.7 ( 1.00x)
hevc_dequant_8x8_12_neon: 1.3 ( 1.30x)
hevc_dequant_16x16_12_c: 131.1 ( 1.00x)
hevc_dequant_16x16_12_neon: 7.9 (16.52x)
hevc_dequant_32x32_12_c: 69.7 ( 1.00x)
hevc_dequant_32x32_12_neon: 28.4 ( 2.46x)
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-25 06:55:26 +00:00
..
2025-04-29 15:24:14 +08:00
2026-01-08 17:35:55 +00:00
2025-01-28 10:44:40 +02:00
2025-01-28 10:44:40 +02:00
2023-10-21 23:25:54 +03:00
2024-04-08 13:36:40 +03:00
2025-03-02 01:17:53 +02:00
2024-05-13 14:54:10 +02:00
2024-05-13 14:54:10 +02:00
2024-05-13 14:54:10 +02:00
2023-10-21 23:25:23 +03:00
2023-10-21 23:25:29 +03:00
2023-10-21 23:25:23 +03:00
2025-10-24 15:32:35 +00:00
2023-12-07 23:20:14 +02:00
2023-12-07 23:20:14 +02:00
2024-02-28 10:14:58 +01:00
2026-01-25 06:55:26 +00:00
2025-03-04 17:01:58 +08:00
2026-01-25 06:55:26 +00:00
2026-01-04 03:22:55 +00:00
2024-05-13 14:50:38 +02:00
2026-01-25 06:55:26 +00:00
2025-03-26 04:08:33 +01:00
2025-08-03 13:48:47 +02:00
2023-11-28 15:54:49 +02:00
2024-09-01 13:42:30 +02:00
2024-09-01 13:42:30 +02:00
2024-09-02 11:56:53 +02:00
2025-02-10 14:55:16 +02:00
2025-10-25 01:01:15 +02:00
2023-10-21 23:25:29 +03:00
2025-01-03 17:53:46 -05:00