mirror of
https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg
synced 2026-04-20 21:00:41 +08:00
Add NEON-optimized implementation for HEVC intra Planar prediction at
8-bit depth, supporting all block sizes (4x4 to 32x32).
Planar prediction implements bilinear interpolation using an incremental
base update: base_{y+1}[x] = base_y[x] - (top[x] - left[N]), reducing
per-row computation from 4 multiply-adds to 1 subtract + 1 multiply.
Uses rshrn for rounded narrowing shifts, eliminating manual rounding
bias. All left[y] values are broadcast in the NEON domain, avoiding
GP-to-NEON transfers.
4x4 interleaves row computations across 4 rows to break dependencies.
16x16 uses v19-v22 for persistent base/decrement vectors, avoiding
callee-saved register spills. 32x32 processes 8 rows per loop iteration
(4 iterations total) to reduce code size while maintaining full NEON
utilization.
Speedup over C on Apple M4 (checkasm --bench):
4x4: 2.25x 8x8: 6.40x 16x16: 9.72x 32x32: 3.21x
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>