46 Commits

Author SHA1 Message Date
Niklas Haas
b4bcb00cd3 swscale/ops: add and use ff_sws_op_list_input/output()
Makes various pieces of code that expect to get a SWS_OP_READ more robust,
and also allows us to generalize to introduce more input op types in the
future (in particular, I am looking ahead towards filter ops).

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-05 23:34:56 +00:00
Niklas Haas
68f3886460 swscale/ops_dispatch: split off compile/dispatch code from ops.c
This code is self-contained and logically distinct from the ops-related
helpers in ops.c, so it belongs in its own file.

Purely cosmetic; no functional change.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-05 23:34:56 +00:00
Niklas Haas
4178c4d430 swscale/ops: remove unneeded macro
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-05 23:34:56 +00:00
Niklas Haas
5384908e56 swscale/ops: move pass compilation logic to helper function
Purely cosmetic.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-05 23:34:56 +00:00
Niklas Haas
02e4b45f7f swscale/graph: reintroduce SwsFrame
AVFrame just really doesn't have the semantics we want. However, there a
tangible benefit to having SwsFrame act as a carbon copy of a (subset of)
AVFrame.

This partially reverts commit 67f3627267.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-01 21:57:53 +00:00
Niklas Haas
67f3627267 swscale/graph: nuke SwsImg
This has now become fully redundant with AVFrame, especially since the
existence of SwsPassBuffer. Delete it, simplifying a lot of things and
avoiding reinventing the wheel everywhere.

Also generally reduces overhead, since there is less redundant copying
going on.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-27 16:18:34 +00:00
Niklas Haas
363779a4bb swscale/ops: don't set src/dst_frame_ptr in op_pass_run()
Already set by setup().

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-27 16:18:34 +00:00
Niklas Haas
62dc591a80 swscale/ops: correctly shift pointers for last row handling
The current logic didn't take into account the possible plane shift. Just
re-compute the correctly shifted pointers using the row position.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-27 16:18:34 +00:00
Niklas Haas
bca806d4f9 swscale/ops: avoid stack copies of SwsImg
Instead, precompute the correctly swizzled data and stride in setup()
and just reference the SwsOpExec fields directly.

To avoid the stack copies in handle_tail() we can introduce a temporary
array to hold just the pointers.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-27 16:18:34 +00:00
Niklas Haas
79334c8ca1 swscale/ops: add subsampling shift to SwsOpExec
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-27 16:18:34 +00:00
Niklas Haas
841ca7a2cb swscale/format: pass SwsFormat by ref instead of by value where possible
The one exception is in adapt_colors(), which mutates these structs on
its own stack anyways.
2026-02-26 18:08:49 +00:00
Lynne
1d2e616d5f swscale: add a Vulkan backend for ops.c
Sponsored-by: Sovereign Tech Fund
2026-02-26 14:10:22 +01:00
Lynne
ad452205b6 swscale/ops: add SwsOpBackend.hw_format
Allows to filter hardware formats.

Sponsored-by: Sovereign Tech Fund
2026-02-26 14:10:22 +01:00
Lynne
c911295f09 swscale: forward original frame pointers to ops.c backend
Sponsored-by: Sovereign Tech Fund
2026-02-26 14:10:21 +01:00
Lynne
00907e1244 swscale/ops: realign after adding slice_align
This is a separate commit since it makes it easier to see the changes.

Sponsored-by: Sovereign Tech Fund
2026-02-26 14:10:21 +01:00
Lynne
9c51aa1824 swscale: add SwsCompiledOp.slice_align
Certain backends may not support (or need) slices, since they
would handle slicing themselves.

Sponsored-by: Sovereign Tech Fund
2026-02-26 14:10:21 +01:00
Niklas Haas
ef4a597ad8 swscale/ops: allow excluding components from SWS_OP_DITHER
We often need to dither only a subset of the components. Previously this
was not possible, but we can just use the special value -1 for this.

The main motivating factor is actually the fact that "unnecessary" dither ops
would otherwise frequently prevent plane splitting, since e.g. a copied
alpha plane has to come along for the ride through the whole F32/dither
pipeline.

Additionally, it somewhat simplifies implementations.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-26 13:09:14 +00:00
Niklas Haas
5649ac2b4d swscale/ops: avoid UB in handle_tail()
Stupid NULL + 0 rule.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-26 10:15:52 +00:00
Ramiro Polla
c9977acbc6 swscale/ops: clear range values SWS_OP_{MIN,MAX}
This gives partial range values on conversions with floats, after the
values have been clamped.

 gbrpf32be -> rgb8:
   [f32 XXXX -> zzzX] SWS_OP_READ         : 3 elem(s) planar >> 0
   [f32 ...X -> ...X] SWS_OP_SWAP_BYTES
   [f32 ...X -> ...X] SWS_OP_SWIZZLE      : 2013
   [f32 ...X -> ...X] SWS_OP_LINEAR       : diag3 [[7 0 0 0 0] [0 7 0 0 0] [0 0 3 0 0] [0 0 0 1 0]]
   [f32 ...X -> ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 5}
   [f32 ...X -> ...X] SWS_OP_MAX          : {0 0 0 0} <= x
+    min: {0, 0, 0, _}, max: {nan, nan, nan, _}
   [f32 ...X -> ...X] SWS_OP_MIN          : x <= {7 7 3 _}
+    min: {0, 0, 0, _}, max: {7, 7, 3, _}
   [f32 ...X -> +++X] SWS_OP_CONVERT      : f32 -> u8
+    min: {0, 0, 0, _}, max: {7, 7, 3, _}
   [ u8 ...X -> +XXX] SWS_OP_PACK         : {3 3 2 0}
-    min: {0, _, _, _}, max: {0, _, _, _}
+    min: {0, _, _, _}, max: {255, _, _, _}
   [ u8 .XXX -> +XXX] SWS_OP_WRITE        : 1 elem(s) packed >> 0
-    min: {0, _, _, _}, max: {0, _, _, _}
+    min: {0, _, _, _}, max: {255, _, _, _}
     (X = unused, z = byteswapped, + = exact, 0 = zero)

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-02-24 20:25:59 +01:00
Ramiro Polla
9fbb03f428 swscale/tests/sws_ops: don't print unused components in the output
Clean up the output by not printing the flags and range values of
unused components in ff_sws_op_list_print().

 rgb24 -> gray16le:
   [ u8 XXXX -> +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
-    min: {0, 0, 0, nan}, max: {255, 255, 255, nan}
+    min: {0, 0, 0, _}, max: {255, 255, 255, _}
   [ u8 ...X -> +++X] SWS_OP_CONVERT      : u8 -> f32
-    min: {0, 0, 0, nan}, max: {255, 255, 255, nan}
-  [f32 ...X -> .++X] SWS_OP_LINEAR       : dot3 [[76.843000 150.859000 29.298000 0 0] [0 1 0 0 0] [0 0 1 0 0] [0 0 0 1 0]]
-    min: {0, 0, 0, nan}, max: {65535, 255, 255, nan}
-  [f32 .XXX -> +++X] SWS_OP_CONVERT      : f32 -> u16
-    min: {0, 0, 0, nan}, max: {65535, 255, 255, nan}
-  [u16 .XXX -> +++X] SWS_OP_WRITE        : 1 elem(s) planar >> 0
-    min: {0, 0, 0, nan}, max: {65535, 255, 255, nan}
+    min: {0, 0, 0, _}, max: {255, 255, 255, _}
+  [f32 ...X -> .XXX] SWS_OP_LINEAR       : dot3 [[76.843000 150.859000 29.298000 0 0] [0 1 0 0 0] [0 0 1 0 0] [0 0 0 1 0]]
+    min: {0, _, _, _}, max: {65535, _, _, _}
+  [f32 .XXX -> +XXX] SWS_OP_CONVERT      : f32 -> u16
+    min: {0, _, _, _}, max: {65535, _, _, _}
+  [u16 .XXX -> +XXX] SWS_OP_WRITE        : 1 elem(s) planar >> 0
+    min: {0, _, _, _}, max: {65535, _, _, _}
     (X = unused, z = byteswapped, + = exact, 0 = zero)

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-02-24 20:22:12 +01:00
Ramiro Polla
c7c8c31302 swscale/tests/sws_ops: print range values in the output
This gives more information about each operation and helps catch issues
earlier on.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-02-24 19:27:51 +01:00
Niklas Haas
da47951bd7 swscale/ops: lift read op metadata to SwsOpList
Instead of awkwardly preserving these from the `SwsOp` itself. This
interpretation lessens the risk of bugs as a result of changing the plane
swizzle mask without updating the corresponding components.

After this commit, the plane swizzle mask is automatically taken into
account; i.e. the src_comps mask is always interpreted as if the read op
was in-order (unswizzled).

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-19 19:44:46 +00:00
Niklas Haas
1940662ac6 swscale/ops: take plane order into account during noop() check
This helper function now also takes into account the plane order, and only
returns true if the SwsOpList is a true no-op (i.e. the input image may be
exactly ref'd to the output, with no change in plane order, etc.)

As pointed out in the code, this is unlikely to actually matter, but still
technically correct.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-19 19:44:46 +00:00
Niklas Haas
d0cb74881c swscale/ops: fix PRINTQ snprintf buffer size
There is no reason to subtract 1 here; snprintf guarantees zero-termination.
2026-02-19 19:44:46 +00:00
Niklas Haas
70d30056dc swscale/ops: also print plane order when swizzled
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-19 19:44:46 +00:00
Niklas Haas
998cffb432 swscale/ops: add input/output plane swizzle mask to SwsOpList
This can be used to have the execution code directly swizzle the plane
pointers, instead of swizzling the data via SWS_OP_SWIZZLE. This can be used
to, for example, extract a subset of the input/output planes for partial
processing of split graphs (e.g. subsampled chroma, or independent alpha),
or just to skip an SWS_OP_SWIZZLE operation.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-19 19:44:46 +00:00
Niklas Haas
2dfde1531d swscale/ops: reset comp flags on SWS_OP_CLEAR
Even if we clear to a non-integer value.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-19 19:44:46 +00:00
Niklas Haas
9662d1fa97 swscale/optimizer: remove read+write optimization
This optimization is lossy, since it removes important information about the
number of planes to be copied. Subsumed by the more correct

Instead, move this code to the new ff_sws_op_list_is_noop().

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-19 19:44:46 +00:00
Niklas Haas
e96332cb65 swscale/ops: add ff_sws_op_list_is_noop()
And use it in ff_sws_compile_pass() instead of hard-coding the check there.
This check will become more sophisticated in the following commits.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-19 19:44:46 +00:00
Niklas Haas
c0f49db53d swscale/ops: remove broken value range assumption hack
This information is now pre-filled automatically for SWS_OP_READ when
relevant.

yuv444p10msbbe -> rgb24:
   [u16 XXXX -> +++X] SWS_OP_READ         : 3 elem(s) planar >> 0
   [u16 ...X -> +++X] SWS_OP_SWAP_BYTES
   [u16 ...X -> +++X] SWS_OP_RSHIFT       : >> 6
   [u16 ...X -> +++X] SWS_OP_CONVERT      : u16 -> f32
   [f32 ...X -> ...X] SWS_OP_LINEAR       : matrix3+off3 [...]
   [f32 ...X -> ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 5}
   [f32 ...X -> ...X] SWS_OP_MAX          : {0 0 0 0} <= x
+  [f32 ...X -> ...X] SWS_OP_MIN          : x <= {255 255 255 _}
   [f32 ...X -> +++X] SWS_OP_CONVERT      : f32 -> u8
   [ u8 ...X -> +++X] SWS_OP_WRITE        : 3 elem(s) packed >> 0
    (X = unused, + = exact, 0 = zero)

(This clamp is needed and was incorrectly optimized away before, because the
`SWS_OP_RSHIFT` incorrectly distorted the value range assertion)
2025-12-24 16:37:22 +00:00
Niklas Haas
ede8318e9f swscale/ops: explicitly reset value range after SWS_OP_UNPACK
The current logic implicitly pulled the new value range out of SwsComps using
ff_sws_apply_op_q(), but this was quite ill-formed and not very robust. In
particular, it only worked because of the implicit assumption that the value
range was always set to 0b1111...111.

This actually poses a serious problem for 32-bit packed formats, whose
value range actually does not fit into AVRational. In the past, it only
worked because the value would implicitly overflow to -1, which SWS_OP_UNPACK
would then correctly extract the bits out from again.

In general, it's cleaner (and sufficient) to just explicitly reset the value
range on SWS_OP_UNPACK again.
2025-12-24 16:37:22 +00:00
Niklas Haas
a0032fb40f swscale/ops: use switch/case for updating SwsComps ranges
A bit more readable and makes it clear what the special cases are.
2025-12-24 16:37:22 +00:00
Niklas Haas
5f1be98f62 swscale/ops: add SWS_COMP_SWAPPED
This flag keeps track of whether a pixel is currently byte-swapped or
not. Not needed by current backends, but informative and useful for
catching potential endianness errors.

Updates a lot of FATE tests with a cosmetic diff like this:

 rgb24 -> gray16be:
   [ u8 XXXX -> +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
   [ u8 ...X -> +++X] SWS_OP_CONVERT      : u8 -> f32
   [f32 ...X -> .++X] SWS_OP_LINEAR       : dot3 [...]
   [f32 .XXX -> +++X] SWS_OP_CONVERT      : f32 -> u16
-  [u16 .XXX -> +++X] SWS_OP_SWAP_BYTES
-  [u16 .XXX -> +++X] SWS_OP_WRITE        : 1 elem(s) planar >> 0
-    (X = unused, + = exact, 0 = zero)
+  [u16 .XXX -> zzzX] SWS_OP_SWAP_BYTES
+  [u16 .XXX -> zzzX] SWS_OP_WRITE        : 1 elem(s) planar >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)

(The choice of `z` to represent swapped integers is arbitrary, but I think
it's visually evocative and distinct from the other symbols)
2025-12-24 16:37:22 +00:00
Niklas Haas
9586b81373 swscale/ops: don't strip SwsComps from SWS_OP_READ
The current behavior of assuming the value range implicitly on SWS_OP_READ
has a number of serious drawbacks and shortcomings:

- It ignored the effects of SWS_OP_RSHIFT, such as for p010 and related
  MSB-aligned formats. (This is actually a bug)

- It adds a needless dependency on the "purely informative" src/dst fields
  inside SwsOpList.

- It is difficult to reason about when acted upon by SWS_OP_SWAP_BYTES, and
  the existing hack of simply ignoring SWAP_BYTES on the value range is not
  a very good solution here.

Instead, we need a more principled way for the op list generating code
to communicate extra metadata about the operations read to the optimizer.

I think the simplest way of doing this is to allow the SwsComps field attached
to SWS_OP_READ to carry additional, user-provided information about the values
read.

This requires changing ff_sws_op_list_update_comps() slightly to not completely
overwrite SwsComps on SWS_OP_READ, but instead merge the implicit information
with the explictly provided one.
2025-12-24 16:37:22 +00:00
Niklas Haas
61eca588dc swscale/ops: move ff_sws_op_list_update_comps() to ops.c
I think this is ultimately a better home, since the semantics of this are
not really tied to optimization itself; and because I want to make it an
explicitly suported part of the user-facing API (rather than just an
internal-use field).

The secondary motivating reason here is that I intend to use internal
helpers of `ops.c` inside the next commit. (Though this is a weak reason
on its own, and not sufficient to justify this move by itself.)
2025-12-24 16:37:22 +00:00
Niklas Haas
75ba2bf457 swscale/ops: correctly truncate on ff_sws_apply_op_q(SWS_OP_RSHIFT)
Instead of using a "precise" division, simulate the actual truncation.

Note that the division by `den` is unneeded in principle because the
denominator *should* always be 1 for an integer, but this way we don't
explode if the user should happen to pass `4/2` or something.

Fixes a lot of unnecessary clamps w.r.t. xv36, e.g.:

 xv36be -> yuv444p12be:
   [u16 XXXX -> ++++] SWS_OP_READ         : 4 elem(s) packed >> 0
   [u16 ...X -> ++++] SWS_OP_SWAP_BYTES
   [u16 ...X -> ++++] SWS_OP_SWIZZLE      : 1023
   [u16 ...X -> ++++] SWS_OP_RSHIFT       : >> 4
-  [u16 ...X -> ++++] SWS_OP_CONVERT      : u16 -> f32
-  [f32 ...X -> ++++] SWS_OP_MIN          : x <= {4095 4095 4095 _}
-  [f32 ...X -> ++++] SWS_OP_CONVERT      : f32 -> u16
   [u16 ...X -> ++++] SWS_OP_SWAP_BYTES
   [u16 ...X -> ++++] SWS_OP_WRITE        : 3 elem(s) planar >> 0
     (X = unused, + = exact, 0 = zero)
2025-12-20 13:52:45 +00:00
Niklas Haas
d1eaea1a03 swscale/ops: add type assertions to ff_sws_apply_op_q() 2025-12-20 13:52:45 +00:00
Niklas Haas
960cf3015e swscale/ops: add explicit row offset to SwsDitherOp
To improve decorrelation between components, we offset the dither matrix
slightly for each component. This is currently done by adding a hard-coded
offset of {0, 3, 2, 5} to each of the four components, respectively.

However, this represents a serious challenge when re-ordering SwsDitherOp
past a swizzle, or when splitting an SwsOpList into multiple sub-operations
(e.g. for decoupling luma from subsampled chroma when they are independent).

To fix this on a fundamental level, we have to keep track of the offset per
channel as part of the SwsDitherOp metadata, and respect those values at
runtime.

This commit merely adds the metadata; the update to the underlying backends
will come in a follow-up commit. The FATE change is merely due to the
added offsets in the op list print-out.
2025-12-15 14:31:58 +00:00
Martin Storsjö
3cc1dc3358 swscale: Remove the unused ff_sws_pixel_type_to_uint
This function uses ff_sws_pixel_type_size to switch on the
size of the provided type. However, ff_sws_pixel_type_size returns
a size in bytes (from sizeof()), not a size in bits. Therefore,
this would previously never return the right thing but always
hit the av_unreachable() below.

As the function is entirely unused, just remove it.

This fixes compilation with MSVC 2026 18.0 when targeting ARM64,
which previously hit an internal compiler error [1].

[1] https://developercommunity.visualstudio.com/t/Internal-Compiler-Error-targeting-ARM64-/10962922
2025-11-21 21:07:34 +00:00
Andreas Rheinhardt
7dc4c4f6f5 swscale/ops: Fix linking with x86 assembly disabled
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-09-04 22:14:39 +02:00
Niklas Haas
982d3a98d0 swscale/x86: add SIMD backend
This covers most 8-bit and 16-bit ops, and some 32-bit ops. It also covers all
floating point operations. While this is not yet 100% coverage, it's good
enough for the vast majority of formats out there.

Of special note is the packed shuffle fast path, which uses pshufb at vector
sizes up to AVX512.
2025-09-01 19:28:36 +02:00
Niklas Haas
a151b426f9 swscale/ops_memcpy: add 'memcpy' backend for plane->plane copies
Provides a generic fast path for any operation list that can be decomposed
into a series of memcpy and memset operations.

25% faster than the x86 backend for yuv444p -> yuva444p
33% faster than the x86 backend for gray -> yuvj444p
2025-09-01 19:28:36 +02:00
Niklas Haas
5aef513fb4 swscale/ops_backend: add reference backend basend on C templates
This will serve as a reference for the SIMD backends to come. That said,
with auto-vectorization enabled, the performance of this is not atrocious.
It easily beats the old C code and sometimes even the old SIMD.

In theory, we can dramatically speed it up by using GCC vectors instead of
arrays, but the performance gains from this are too dependent on exact GCC
versions and flags, so it practice it's not a substitute for a SIMD
implementation.
2025-09-01 19:28:36 +02:00
Niklas Haas
db2bc11a97 swscale/ops: add dispatch layer
This handles the low-level execution of an op list, and integration into
the SwsGraph infrastructure. To handle frames with insufficient padding in
the stride (or a width smaller than one block size), we use a fallback loop
that pads the last column of pixels using `memcpy` into an appropriately
sized buffer.
2025-09-01 19:28:36 +02:00
Niklas Haas
d3ca0e300d swscale/ops_internal: add internal ops backend API
This adds an internal API for ops backends, which are responsible for
compiling op lists into executable functions.
2025-09-01 19:28:36 +02:00
Niklas Haas
16e191c8ef swscale/ops: introduce new low level framework
See docs/swscale-v2.txt for an in-depth introduction to the new approach.

This commit merely introduces the ops definitions and boilerplate functions.
The subsequent commits will flesh out the underlying implementation.
2025-09-01 19:28:36 +02:00