This is far more commonly used without an offset than with; so having it there
prevents these special cases from actually doing much good.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of implicitly testing for NaN values. This is mostly a straightforward
translation, but we need some slight extra boilerplate to ensure the mask
is correctly updated when e.g. commuting past a swizzle.
Signed-off-by: Niklas Haas <git@haasn.dev>
It was a bit clunky, lacked semantic contextual information, and made it
harder to reason about the effects of extending this struct. There should be
zero runtime overhead as a result of the fact that this is already a big
union.
I made the changes in this commit by hand, but due to the length and noise
level of the commit, I used Opus 4.6 to verify that I did not accidentally
introduce any bugs or typos.
Signed-off-by: Niklas Haas <git@haasn.dev>
Just define these directly as integer arrays; there's really no point in
having them re-use SwsSwizzleOp; the only place this was ever even remotely
relevant was in the no-op check, which any decent compiler should already
be capable of optimizing into a single 32-bit comparison.
Signed-off-by: Niklas Haas <git@haasn.dev>
This allows reads to directly embed filter kernels. This is because, in
practice, a filter needs to be combined with a read anyways. To accomplish
this, we define filter ops as their semantic high-level operation types, and
then have the optimizer fuse them with the corresponding read/write ops
(where possible).
Ultimately, something like this will be needed anyways for subsampled formats,
and doing it here is just incredibly clean and beneficial compared to each
of the several alternative designs I explored.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This commit merely adds the definitions. The implementations will follow.
It may seem a bit impractical to have these filter ops given that they
break the usual 1:1 association between operation inputs and outputs, but
the design path I chose will have these filter "pseudo-ops" end up migrating
towards the read/write for CPU implementations. (Which don't benefit from
any ability to hide the intermediate memory internally the way e.g. a fused
Vulkan compute shader might).
What we gain from this design, on the other hand, is considerably cleaner
high-level code, which doesn't need to concern itself with low-level
execution details at all, and can just freely insert these ops wherever
it needs to. The dispatch layer will take care of actually executing these
by implicitly splitting apart subpasses.
To handle out-of-range values and so on, the filters by necessity have to
also convert the pixel range. I have settled on using floating point types
as the canonical intermediate format - not only does this save us from having
to define e.g. I32 as a new intermediate format, but it also allows these
operations to chain naturally into SWS_OP_DITHER, which will basically
always be needed after a filter pass anyways.
The one exception here is for point sampling, which would rather preserve
the input type. I'll worry about this optimization at a later point in time.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This moves the logic from tests/sws_ops into the library itself, where it
can be reused by e.g. the aarch64 asmgen backend to iterate over all possible
operation types it can expect to see.
Signed-off-by: Niklas Haas <git@haasn.dev>
Annoyingly, access to order_src/dst requires access to the SwsOpList, so
we have to append that data after the fact.
Maybe this is another incremental tick in favor of `SwsReadWriteOp` in the
ever-present question in my head of whether the plane order should go there
or into SwsOpList.
Signed-off-by: Niklas Haas <git@haasn.dev>
More useful than just allowing it to "modify" the ops; in practice this means
the contents will be undefined anyways - might as well have this function
take care of freeing it afterwards as well.
Will make things simpler with regards to subpass splitting.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This improves the debugging experience. These are all internal structs so
there is no need to worry about ABI stability as a result of adding flags.
Signed-off-by: Niklas Haas <git@haasn.dev>
Makes various pieces of code that expect to get a SWS_OP_READ more robust,
and also allows us to generalize to introduce more input op types in the
future (in particular, I am looking ahead towards filter ops).
Signed-off-by: Niklas Haas <git@haasn.dev>
We often need to dither only a subset of the components. Previously this
was not possible, but we can just use the special value -1 for this.
The main motivating factor is actually the fact that "unnecessary" dither ops
would otherwise frequently prevent plane splitting, since e.g. a copied
alpha plane has to come along for the ride through the whole F32/dither
pipeline.
Additionally, it somewhat simplifies implementations.
Signed-off-by: Niklas Haas <git@haasn.dev>
This gives more information about each operation and helps catch issues
earlier on.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Instead of awkwardly preserving these from the `SwsOp` itself. This
interpretation lessens the risk of bugs as a result of changing the plane
swizzle mask without updating the corresponding components.
After this commit, the plane swizzle mask is automatically taken into
account; i.e. the src_comps mask is always interpreted as if the read op
was in-order (unswizzled).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This can be used to have the execution code directly swizzle the plane
pointers, instead of swizzling the data via SWS_OP_SWIZZLE. This can be used
to, for example, extract a subset of the input/output planes for partial
processing of split graphs (e.g. subsampled chroma, or independent alpha),
or just to skip an SWS_OP_SWIZZLE operation.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And use it in ff_sws_compile_pass() instead of hard-coding the check there.
This check will become more sophisticated in the following commits.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
The current behavior of assuming the value range implicitly on SWS_OP_READ
has a number of serious drawbacks and shortcomings:
- It ignored the effects of SWS_OP_RSHIFT, such as for p010 and related
MSB-aligned formats. (This is actually a bug)
- It adds a needless dependency on the "purely informative" src/dst fields
inside SwsOpList.
- It is difficult to reason about when acted upon by SWS_OP_SWAP_BYTES, and
the existing hack of simply ignoring SWAP_BYTES on the value range is not
a very good solution here.
Instead, we need a more principled way for the op list generating code
to communicate extra metadata about the operations read to the optimizer.
I think the simplest way of doing this is to allow the SwsComps field attached
to SWS_OP_READ to carry additional, user-provided information about the values
read.
This requires changing ff_sws_op_list_update_comps() slightly to not completely
overwrite SwsComps on SWS_OP_READ, but instead merge the implicit information
with the explictly provided one.
This function was assuming that the bits are MSB-aligned, but they are
LSB-aligned in both practice (and in the actual backend).
Also update the documentation of SwsPackOp to make this clearer.
Fixes an incorrect omission of a clamp after decoding e.g. rgb4, since
the max value range was incorrectly determined as 0 as a result of unpacking
the MSB bits instead of the LSB bits:
bgr4 -> gray:
[ u8 XXXX -> +XXX] SWS_OP_READ : 1 elem(s) packed >> 1
[ u8 .XXX -> +++X] SWS_OP_UNPACK : {1 2 1 0}
[ u8 ...X -> +++X] SWS_OP_SWIZZLE : 2103
[ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32
[f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [...]
[f32 .XXX -> .++X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5}
+ [f32 .XXX -> .++X] SWS_OP_MIN : x <= {255 _ _ _}
[f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u8
[ u8 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0
(X = unused, + = exact, 0 = zero)
To improve decorrelation between components, we offset the dither matrix
slightly for each component. This is currently done by adding a hard-coded
offset of {0, 3, 2, 5} to each of the four components, respectively.
However, this represents a serious challenge when re-ordering SwsDitherOp
past a swizzle, or when splitting an SwsOpList into multiple sub-operations
(e.g. for decoupling luma from subsampled chroma when they are independent).
To fix this on a fundamental level, we have to keep track of the offset per
channel as part of the SwsDitherOp metadata, and respect those values at
runtime.
This commit merely adds the metadata; the update to the underlying backends
will come in a follow-up commit. The FATE change is merely due to the
added offsets in the op list print-out.
Turns out these are not, in fact, purely informative - but the optimizer
can take them into account. This should be documented properly.
I tried to think of a way to avoid needing this in the optimizer, but any
way I could think of would require shoving this to SwsReadWriteOp, which I
am particularly unwilling to do.
This function uses ff_sws_pixel_type_size to switch on the
size of the provided type. However, ff_sws_pixel_type_size returns
a size in bytes (from sizeof()), not a size in bits. Therefore,
this would previously never return the right thing but always
hit the av_unreachable() below.
As the function is entirely unused, just remove it.
This fixes compilation with MSVC 2026 18.0 when targeting ARM64,
which previously hit an internal compiler error [1].
[1] https://developercommunity.visualstudio.com/t/Internal-Compiler-Error-targeting-ARM64-/10962922
This handles the low-level execution of an op list, and integration into
the SwsGraph infrastructure. To handle frames with insufficient padding in
the stride (or a width smaller than one block size), we use a fallback loop
that pads the last column of pixels using `memcpy` into an appropriately
sized buffer.
This is responsible for taking a "naive" ops list and optimizing it
as much as possible. Also includes a small analyzer that generates component
metadata for use by the optimizer.
See docs/swscale-v2.txt for an in-depth introduction to the new approach.
This commit merely introduces the ops definitions and boilerplate functions.
The subsequent commits will flesh out the underlying implementation.