Files
FFmpeg/libavcodec
Jun Zhao 89c21b5ab7 lavc/hevc: add aarch64 NEON for Planar prediction
Add NEON-optimized implementation for HEVC intra Planar prediction at
8-bit depth, supporting all block sizes (4x4 to 32x32).

Planar prediction implements bilinear interpolation using an incremental
base update: base_{y+1}[x] = base_y[x] - (top[x] - left[N]), reducing
per-row computation from 4 multiply-adds to 1 subtract + 1 multiply.
Uses rshrn for rounded narrowing shifts, eliminating manual rounding
bias. All left[y] values are broadcast in the NEON domain, avoiding
GP-to-NEON transfers.

4x4 interleaves row computations across 4 rows to break dependencies.
16x16 uses v19-v22 for persistent base/decrement vectors, avoiding
callee-saved register spills. 32x32 processes 8 rows per loop iteration
(4 iterations total) to reduce code size while maintaining full NEON
utilization.

Speedup over C on Apple M4 (checkasm --bench):

    4x4: 2.25x    8x8: 6.40x    16x16: 9.72x    32x32: 3.21x

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-03-30 14:32:10 +00:00
..
2026-03-28 11:25:38 +01:00
2026-03-10 13:52:18 +01:00
2026-03-10 13:52:19 +01:00
2026-03-10 19:31:00 +01:00
2026-03-10 13:52:19 +01:00
2026-03-13 16:09:40 +00:00
2026-03-13 02:49:59 +01:00
2026-03-10 13:52:19 +01:00
2026-03-07 19:22:40 -03:00
2026-03-10 13:52:19 +01:00
2026-03-28 11:25:38 +01:00
2026-03-28 11:25:38 +01:00
2026-03-28 11:25:38 +01:00
2026-03-02 19:01:46 +01:00
2026-03-02 19:01:46 +01:00
2026-03-10 13:52:18 +01:00
2026-03-03 02:41:05 +01:00
2026-03-10 13:52:19 +01:00
2026-03-10 13:52:18 +01:00
2026-03-16 10:24:33 +00:00