Files
FFmpeg/libswscale
Ramiro Polla 2517c328fc swscale/aarch64: add NEON sws_ops backend
This commit pieces together the previous few commits to implement the
NEON backend for sws_ops.

In essence, a tool which runs on the target (sws_ops_aarch64) is used
to enumerate all the functions that the backend needs to implement. The
list it generates is stored in the repository (ops_entries.c).

The list from above is used at build time by a code generator tool
(ops_asmgen) to implement all the sws_ops functions the NEON backend
supports, and generate a lookup function in C to retrieve the assembly
function pointers.

At runtime, the NEON backend fetches the function pointers to the
assembly functions and chains them together in a continuation-passing
style design, similar to the x86 backend.

The following speedup is observed from legacy swscale to NEON:
A520: Overall speedup=3.780x faster, min=0.137x max=91.928x
A720: Overall speedup=4.129x faster, min=0.234x max=92.424x

And the following from the C sws_ops implementation to NEON:
A520: Overall speedup=5.513x faster, min=0.927x max=14.169x
A720: Overall speedup=4.786x faster, min=0.585x max=20.157x

The slowdowns from legacy to NEON are the same for C/x86. Mostly low
bit-depth conversions that did not perform dithering in legacy.

The 0.585x outlier from C to NEON is gbrpf32le -> gbrapf32le, which is
mostly memcpy with the C implementation. All other conversions are
better.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-03-30 11:38:35 +00:00
..
2025-08-03 13:48:47 +02:00
2025-08-03 13:48:47 +02:00
2025-07-22 19:56:34 +02:00
2026-03-29 12:10:38 +02:00