Makes various pieces of code that expect to get a SWS_OP_READ more robust,
and also allows us to generalize to introduce more input op types in the
future (in particular, I am looking ahead towards filter ops).
Signed-off-by: Niklas Haas <git@haasn.dev>
We often need to dither only a subset of the components. Previously this
was not possible, but we can just use the special value -1 for this.
The main motivating factor is actually the fact that "unnecessary" dither ops
would otherwise frequently prevent plane splitting, since e.g. a copied
alpha plane has to come along for the ride through the whole F32/dither
pipeline.
Additionally, it somewhat simplifies implementations.
Signed-off-by: Niklas Haas <git@haasn.dev>
This gives more information about each operation and helps catch issues
earlier on.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Instead of awkwardly preserving these from the `SwsOp` itself. This
interpretation lessens the risk of bugs as a result of changing the plane
swizzle mask without updating the corresponding components.
After this commit, the plane swizzle mask is automatically taken into
account; i.e. the src_comps mask is always interpreted as if the read op
was in-order (unswizzled).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This can be used to have the execution code directly swizzle the plane
pointers, instead of swizzling the data via SWS_OP_SWIZZLE. This can be used
to, for example, extract a subset of the input/output planes for partial
processing of split graphs (e.g. subsampled chroma, or independent alpha),
or just to skip an SWS_OP_SWIZZLE operation.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And use it in ff_sws_compile_pass() instead of hard-coding the check there.
This check will become more sophisticated in the following commits.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
The current behavior of assuming the value range implicitly on SWS_OP_READ
has a number of serious drawbacks and shortcomings:
- It ignored the effects of SWS_OP_RSHIFT, such as for p010 and related
MSB-aligned formats. (This is actually a bug)
- It adds a needless dependency on the "purely informative" src/dst fields
inside SwsOpList.
- It is difficult to reason about when acted upon by SWS_OP_SWAP_BYTES, and
the existing hack of simply ignoring SWAP_BYTES on the value range is not
a very good solution here.
Instead, we need a more principled way for the op list generating code
to communicate extra metadata about the operations read to the optimizer.
I think the simplest way of doing this is to allow the SwsComps field attached
to SWS_OP_READ to carry additional, user-provided information about the values
read.
This requires changing ff_sws_op_list_update_comps() slightly to not completely
overwrite SwsComps on SWS_OP_READ, but instead merge the implicit information
with the explictly provided one.
This function was assuming that the bits are MSB-aligned, but they are
LSB-aligned in both practice (and in the actual backend).
Also update the documentation of SwsPackOp to make this clearer.
Fixes an incorrect omission of a clamp after decoding e.g. rgb4, since
the max value range was incorrectly determined as 0 as a result of unpacking
the MSB bits instead of the LSB bits:
bgr4 -> gray:
[ u8 XXXX -> +XXX] SWS_OP_READ : 1 elem(s) packed >> 1
[ u8 .XXX -> +++X] SWS_OP_UNPACK : {1 2 1 0}
[ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103
[ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32
[f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [...]
[f32 .XXX -> .++X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5}
+ [f32 .XXX -> .++X] SWS_OP_MIN : x <= {255 _ _ _}
[f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u8
[ u8 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0
(X = unused, + = exact, 0 = zero)
To improve decorrelation between components, we offset the dither matrix
slightly for each component. This is currently done by adding a hard-coded
offset of {0, 3, 2, 5} to each of the four components, respectively.
However, this represents a serious challenge when re-ordering SwsDitherOp
past a swizzle, or when splitting an SwsOpList into multiple sub-operations
(e.g. for decoupling luma from subsampled chroma when they are independent).
To fix this on a fundamental level, we have to keep track of the offset per
channel as part of the SwsDitherOp metadata, and respect those values at
runtime.
This commit merely adds the metadata; the update to the underlying backends
will come in a follow-up commit. The FATE change is merely due to the
added offsets in the op list print-out.
Turns out these are not, in fact, purely informative - but the optimizer
can take them into account. This should be documented properly.
I tried to think of a way to avoid needing this in the optimizer, but any
way I could think of would require shoving this to SwsReadWriteOp, which I
am particularly unwilling to do.
This function uses ff_sws_pixel_type_size to switch on the
size of the provided type. However, ff_sws_pixel_type_size returns
a size in bytes (from sizeof()), not a size in bits. Therefore,
this would previously never return the right thing but always
hit the av_unreachable() below.
As the function is entirely unused, just remove it.
This fixes compilation with MSVC 2026 18.0 when targeting ARM64,
which previously hit an internal compiler error [1].
[1] https://developercommunity.visualstudio.com/t/Internal-Compiler-Error-targeting-ARM64-/10962922
This handles the low-level execution of an op list, and integration into
the SwsGraph infrastructure. To handle frames with insufficient padding in
the stride (or a width smaller than one block size), we use a fallback loop
that pads the last column of pixels using `memcpy` into an appropriately
sized buffer.
This is responsible for taking a "naive" ops list and optimizing it
as much as possible. Also includes a small analyzer that generates component
metadata for use by the optimizer.
See docs/swscale-v2.txt for an in-depth introduction to the new approach.
This commit merely introduces the ops definitions and boilerplate functions.
The subsequent commits will flesh out the underlying implementation.