Needed to allow us to phase out SwsComps.unused altogether.
It's worth pointing out the change in semantics; while unused tracks the
unused *input* components, the mask is defined as representing the
computed *output* components.
This is 90% the same, expect for read/write, pack/unpack, and clear; which
are the only operations that can be used to change the number of components.
Signed-off-by: Niklas Haas <git@haasn.dev>
Since this now has an explicit mask, we can just check that directly, instead
of relying on the unused comps hack/trick.
Additionally, this also allows us to distinguish between fixed value and
arbitrary value clears by just having the SwsOpEntry contain NAN values iff
they support any clear value.
Signed-off-by: Niklas Haas <git@haasn.dev>
This has the side benefit of not relying on the q2pixel macro to avoid division
by zero, since we can now explicitly avoid operating on undefined clear values.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of this needlessly complicated dance of allocating on-stack copies
of SwsOpList only to iterate with AVERROR(EAGAIN).
This was originally thought to be useful for compiling multiple ops at once,
but even that can be solved in easier ways.
Signed-off-by: Niklas Haas <git@haasn.dev>
Allows implementations to implement more advanced logic to determine if an
operation is compatible or not.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And plumb it all the way through to the SwsCompiledOp. This is cleaner than
setting up this metadata up-front in x86/ops.c; and more importantly, it
allows us to determine the amount of over-read programmatically during ops
setup.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Mainly so that implementations can consult sws->flags, to e.g. decide
whether the kernel needs to be bit-exact.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Mainly so setup functions can look at table->block_size, and perhaps
the table flags, as well as anything else we may add in the future.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is basically a cosmetic commit that groups all of the parameters to
setup() into a single struct, as well as the return type. This gives the
immediate benefit of freeing up 8 bytes per op table entry, though the
main motivation will come in the following commits.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Passing a struct/union by value can generally be inefficient.
Additionally, when the struct/union is declared to be aligned,
whether it really stays aligned when passed as a parameter by
value is unclear.
This fixes build errors like this, with MSVC targeting 32 bit ARM:
libswscale/ops_chain.h(91): error C2719: 'unnamed-parameter': formal parameter with requested alignment of 16 won't be aligned
Instead of defining multiple patterns for the dither ops, just define a
single generic function that branches internally. The branch is well-predicted
and ridiculously cheap. At least on my end, within margin of error.
Signed-off-by: Niklas Haas <git@haasn.dev>
If you place the branch inside the loop, gcc at least reverts back to scalar
code, so better to just split up and guard the entire loop.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is useful especially for the special case of scaling by common
not-quite-power-of-two constants like 255 or 1023.
Signed-off-by: Niklas Haas <git@haasn.dev>
This fixes the following compiler error, if compiling with MSVC
for ARM (32 bit):
src/libswscale/ops_chain.c(48): error C2719: 'priv': formal parameter with requested alignment of 16 won't be aligned
This change shouldn't affect the performance of this operation
(which in itself probably isn't relevant); instead of copying the
contents of the SwsOpPriv struct from the stack as parameter,
it gets copied straight from the caller function's stack frame
instead.
Separately from this issue, MSVC 17.8 and 17.9 end up in an
internal compiler error when compiling libswscale/ops.c, but
older and newer versions do compile it successfully.