Instead of this needlessly complicated dance of allocating on-stack copies
of SwsOpList only to iterate with AVERROR(EAGAIN).
This was originally thought to be useful for compiling multiple ops at once,
but even that can be solved in easier ways.
Signed-off-by: Niklas Haas <git@haasn.dev>
And plumb it all the way through to the SwsCompiledOp. This is cleaner than
setting up this metadata up-front in x86/ops.c; and more importantly, it
allows us to determine the amount of over-read programmatically during ops
setup.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Mainly so that implementations can consult sws->flags, to e.g. decide
whether the kernel needs to be bit-exact.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
The pixel format for the process loops have already been checked at
this point to be valid.
The switch added in e4abfb8e51 returns AVERROR(EINVAL) in the default
case without calling ff_sws_op_chain_free(chain), but there's no need
to free it since we mark this branch as unreachable.
Annoying C-ism; we can't overload the function type even though they will
always be pointers. We can't even get away with using (void *) in the
function signature, despite casts to void * being technically valid.
Avoid the issue altogether by just moving the process loop into the
type-specific template altogether, and just referring to the correct
compiled process function at runtime. Hopefully, the compiler should be
able to optimize these into a single implementation. GCC, at least, compiles
these down into a single implementation plus three stubs that just jmp
to the correct one.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of in each read() function. Not only is this slightly faster, due
to promoting more tail calls, but it also allows us to have operation chains
that don't start with a read.
Also simplifies the implementations.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This will serve as a reference for the SIMD backends to come. That said,
with auto-vectorization enabled, the performance of this is not atrocious.
It easily beats the old C code and sometimes even the old SIMD.
In theory, we can dramatically speed it up by using GCC vectors instead of
arrays, but the performance gains from this are too dependent on exact GCC
versions and flags, so it practice it's not a substitute for a SIMD
implementation.