FFmpeg

mirror of https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg synced 2026-04-21 21:30:24 +08:00

Author	SHA1	Message	Date
Niklas Haas	b4bcb00cd3	swscale/ops: add and use ff_sws_op_list_input/output() Makes various pieces of code that expect to get a SWS_OP_READ more robust, and also allows us to generalize to introduce more input op types in the future (in particular, I am looking ahead towards filter ops). Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00
Niklas Haas	68f3886460	swscale/ops_dispatch: split off compile/dispatch code from ops.c This code is self-contained and logically distinct from the ops-related helpers in ops.c, so it belongs in its own file. Purely cosmetic; no functional change. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00
Niklas Haas	4178c4d430	swscale/ops: remove unneeded macro Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00
Niklas Haas	5384908e56	swscale/ops: move pass compilation logic to helper function Purely cosmetic. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-05 23:34:56 +00:00
Niklas Haas	02e4b45f7f	swscale/graph: reintroduce SwsFrame AVFrame just really doesn't have the semantics we want. However, there a tangible benefit to having SwsFrame act as a carbon copy of a (subset of) AVFrame. This partially reverts commit `67f3627267`. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-03-01 21:57:53 +00:00
Niklas Haas	67f3627267	swscale/graph: nuke SwsImg This has now become fully redundant with AVFrame, especially since the existence of SwsPassBuffer. Delete it, simplifying a lot of things and avoiding reinventing the wheel everywhere. Also generally reduces overhead, since there is less redundant copying going on. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-27 16:18:34 +00:00
Niklas Haas	363779a4bb	swscale/ops: don't set src/dst_frame_ptr in op_pass_run() Already set by setup(). Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-27 16:18:34 +00:00
Niklas Haas	62dc591a80	swscale/ops: correctly shift pointers for last row handling The current logic didn't take into account the possible plane shift. Just re-compute the correctly shifted pointers using the row position. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-27 16:18:34 +00:00
Niklas Haas	bca806d4f9	swscale/ops: avoid stack copies of SwsImg Instead, precompute the correctly swizzled data and stride in setup() and just reference the SwsOpExec fields directly. To avoid the stack copies in handle_tail() we can introduce a temporary array to hold just the pointers. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-27 16:18:34 +00:00
Niklas Haas	79334c8ca1	swscale/ops: add subsampling shift to SwsOpExec Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-27 16:18:34 +00:00
Niklas Haas	841ca7a2cb	swscale/format: pass SwsFormat by ref instead of by value where possible The one exception is in adapt_colors(), which mutates these structs on its own stack anyways.	2026-02-26 18:08:49 +00:00
Lynne	1d2e616d5f	swscale: add a Vulkan backend for ops.c Sponsored-by: Sovereign Tech Fund	2026-02-26 14:10:22 +01:00
Lynne	ad452205b6	swscale/ops: add SwsOpBackend.hw_format Allows to filter hardware formats. Sponsored-by: Sovereign Tech Fund	2026-02-26 14:10:22 +01:00
Lynne	c911295f09	swscale: forward original frame pointers to ops.c backend Sponsored-by: Sovereign Tech Fund	2026-02-26 14:10:21 +01:00
Lynne	00907e1244	swscale/ops: realign after adding slice_align This is a separate commit since it makes it easier to see the changes. Sponsored-by: Sovereign Tech Fund	2026-02-26 14:10:21 +01:00
Lynne	9c51aa1824	swscale: add SwsCompiledOp.slice_align Certain backends may not support (or need) slices, since they would handle slicing themselves. Sponsored-by: Sovereign Tech Fund	2026-02-26 14:10:21 +01:00
Niklas Haas	ef4a597ad8	swscale/ops: allow excluding components from SWS_OP_DITHER We often need to dither only a subset of the components. Previously this was not possible, but we can just use the special value -1 for this. The main motivating factor is actually the fact that "unnecessary" dither ops would otherwise frequently prevent plane splitting, since e.g. a copied alpha plane has to come along for the ride through the whole F32/dither pipeline. Additionally, it somewhat simplifies implementations. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-26 13:09:14 +00:00
Niklas Haas	5649ac2b4d	swscale/ops: avoid UB in handle_tail() Stupid NULL + 0 rule. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-26 10:15:52 +00:00
Ramiro Polla	c9977acbc6	swscale/ops: clear range values SWS_OP_{MIN,MAX} This gives partial range values on conversions with floats, after the values have been clamped. gbrpf32be -> rgb8: [f32 XXXX -> zzzX] SWS_OP_READ : 3 elem(s) planar >> 0 [f32 ...X -> ...X] SWS_OP_SWAP_BYTES [f32 ...X -> ...X] SWS_OP_SWIZZLE : 2013 [f32 ...X -> ...X] SWS_OP_LINEAR : diag3 [[7 0 0 0 0] [0 7 0 0 0] [0 0 3 0 0] [0 0 0 1 0]] [f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5} [f32 ...X -> ...X] SWS_OP_MAX : {0 0 0 0} <= x + min: {0, 0, 0, _}, max: {nan, nan, nan, _} [f32 ...X -> ...X] SWS_OP_MIN : x <= {7 7 3 _} + min: {0, 0, 0, _}, max: {7, 7, 3, _} [f32 ...X -> +++X] SWS_OP_CONVERT : f32 -> u8 + min: {0, 0, 0, _}, max: {7, 7, 3, _} [ u8 ...X -> +XXX] SWS_OP_PACK : {3 3 2 0} - min: {0, _, _, _}, max: {0, _, _, _} + min: {0, _, _, _}, max: {255, _, _, _} [ u8 .XXX -> +XXX] SWS_OP_WRITE : 1 elem(s) packed >> 0 - min: {0, _, _, _}, max: {0, _, _, _} + min: {0, _, _, _}, max: {255, _, _, _} (X = unused, z = byteswapped, + = exact, 0 = zero) Sponsored-by: Sovereign Tech Fund Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>	2026-02-24 20:25:59 +01:00
Ramiro Polla	9fbb03f428	swscale/tests/sws_ops: don't print unused components in the output Clean up the output by not printing the flags and range values of unused components in ff_sws_op_list_print(). rgb24 -> gray16le: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 - min: {0, 0, 0, nan}, max: {255, 255, 255, nan} + min: {0, 0, 0, _}, max: {255, 255, 255, _} [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32 - min: {0, 0, 0, nan}, max: {255, 255, 255, nan} - [f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [[76.843000 150.859000 29.298000 0 0] [0 1 0 0 0] [0 0 1 0 0] [0 0 0 1 0]] - min: {0, 0, 0, nan}, max: {65535, 255, 255, nan} - [f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u16 - min: {0, 0, 0, nan}, max: {65535, 255, 255, nan} - [u16 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0 - min: {0, 0, 0, nan}, max: {65535, 255, 255, nan} + min: {0, 0, 0, _}, max: {255, 255, 255, _} + [f32 ...X -> .XXX] SWS_OP_LINEAR : dot3 [[76.843000 150.859000 29.298000 0 0] [0 1 0 0 0] [0 0 1 0 0] [0 0 0 1 0]] + min: {0, _, _, _}, max: {65535, _, _, _} + [f32 .XXX -> +XXX] SWS_OP_CONVERT : f32 -> u16 + min: {0, _, _, _}, max: {65535, _, _, _} + [u16 .XXX -> +XXX] SWS_OP_WRITE : 1 elem(s) planar >> 0 + min: {0, _, _, _}, max: {65535, _, _, _} (X = unused, z = byteswapped, + = exact, 0 = zero) Sponsored-by: Sovereign Tech Fund Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>	2026-02-24 20:22:12 +01:00
Ramiro Polla	c7c8c31302	swscale/tests/sws_ops: print range values in the output This gives more information about each operation and helps catch issues earlier on. Sponsored-by: Sovereign Tech Fund Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>	2026-02-24 19:27:51 +01:00
Niklas Haas	da47951bd7	swscale/ops: lift read op metadata to SwsOpList Instead of awkwardly preserving these from the `SwsOp` itself. This interpretation lessens the risk of bugs as a result of changing the plane swizzle mask without updating the corresponding components. After this commit, the plane swizzle mask is automatically taken into account; i.e. the src_comps mask is always interpreted as if the read op was in-order (unswizzled). Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	1940662ac6	swscale/ops: take plane order into account during noop() check This helper function now also takes into account the plane order, and only returns true if the SwsOpList is a true no-op (i.e. the input image may be exactly ref'd to the output, with no change in plane order, etc.) As pointed out in the code, this is unlikely to actually matter, but still technically correct. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	d0cb74881c	swscale/ops: fix PRINTQ snprintf buffer size There is no reason to subtract 1 here; snprintf guarantees zero-termination.	2026-02-19 19:44:46 +00:00
Niklas Haas	70d30056dc	swscale/ops: also print plane order when swizzled Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	998cffb432	swscale/ops: add input/output plane swizzle mask to SwsOpList This can be used to have the execution code directly swizzle the plane pointers, instead of swizzling the data via SWS_OP_SWIZZLE. This can be used to, for example, extract a subset of the input/output planes for partial processing of split graphs (e.g. subsampled chroma, or independent alpha), or just to skip an SWS_OP_SWIZZLE operation. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	2dfde1531d	swscale/ops: reset comp flags on SWS_OP_CLEAR Even if we clear to a non-integer value. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	9662d1fa97	swscale/optimizer: remove read+write optimization This optimization is lossy, since it removes important information about the number of planes to be copied. Subsumed by the more correct Instead, move this code to the new ff_sws_op_list_is_noop(). Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	e96332cb65	swscale/ops: add ff_sws_op_list_is_noop() And use it in ff_sws_compile_pass() instead of hard-coding the check there. This check will become more sophisticated in the following commits. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-02-19 19:44:46 +00:00
Niklas Haas	c0f49db53d	swscale/ops: remove broken value range assumption hack This information is now pre-filled automatically for SWS_OP_READ when relevant. yuv444p10msbbe -> rgb24: [u16 XXXX -> +++X] SWS_OP_READ : 3 elem(s) planar >> 0 [u16 ...X -> +++X] SWS_OP_SWAP_BYTES [u16 ...X -> +++X] SWS_OP_RSHIFT : >> 6 [u16 ...X -> +++X] SWS_OP_CONVERT : u16 -> f32 [f32 ...X -> ...X] SWS_OP_LINEAR : matrix3+off3 [...] [f32 ...X -> ...X] SWS_OP_DITHER : 16x16 matrix + {0 3 2 5} [f32 ...X -> ...X] SWS_OP_MAX : {0 0 0 0} <= x + [f32 ...X -> ...X] SWS_OP_MIN : x <= {255 255 255 _} [f32 ...X -> +++X] SWS_OP_CONVERT : f32 -> u8 [ u8 ...X -> +++X] SWS_OP_WRITE : 3 elem(s) packed >> 0 (X = unused, + = exact, 0 = zero) (This clamp is needed and was incorrectly optimized away before, because the `SWS_OP_RSHIFT` incorrectly distorted the value range assertion)	2025-12-24 16:37:22 +00:00
Niklas Haas	ede8318e9f	swscale/ops: explicitly reset value range after SWS_OP_UNPACK The current logic implicitly pulled the new value range out of SwsComps using ff_sws_apply_op_q(), but this was quite ill-formed and not very robust. In particular, it only worked because of the implicit assumption that the value range was always set to 0b1111...111. This actually poses a serious problem for 32-bit packed formats, whose value range actually does not fit into AVRational. In the past, it only worked because the value would implicitly overflow to -1, which SWS_OP_UNPACK would then correctly extract the bits out from again. In general, it's cleaner (and sufficient) to just explicitly reset the value range on SWS_OP_UNPACK again.	2025-12-24 16:37:22 +00:00
Niklas Haas	a0032fb40f	swscale/ops: use switch/case for updating SwsComps ranges A bit more readable and makes it clear what the special cases are.	2025-12-24 16:37:22 +00:00
Niklas Haas	5f1be98f62	swscale/ops: add SWS_COMP_SWAPPED This flag keeps track of whether a pixel is currently byte-swapped or not. Not needed by current backends, but informative and useful for catching potential endianness errors. Updates a lot of FATE tests with a cosmetic diff like this: rgb24 -> gray16be: [ u8 XXXX -> +++X] SWS_OP_READ : 3 elem(s) packed >> 0 [ u8 ...X -> +++X] SWS_OP_CONVERT : u8 -> f32 [f32 ...X -> .++X] SWS_OP_LINEAR : dot3 [...] [f32 .XXX -> +++X] SWS_OP_CONVERT : f32 -> u16 - [u16 .XXX -> +++X] SWS_OP_SWAP_BYTES - [u16 .XXX -> +++X] SWS_OP_WRITE : 1 elem(s) planar >> 0 - (X = unused, + = exact, 0 = zero) + [u16 .XXX -> zzzX] SWS_OP_SWAP_BYTES + [u16 .XXX -> zzzX] SWS_OP_WRITE : 1 elem(s) planar >> 0 + (X = unused, z = byteswapped, + = exact, 0 = zero) (The choice of `z` to represent swapped integers is arbitrary, but I think it's visually evocative and distinct from the other symbols)	2025-12-24 16:37:22 +00:00
Niklas Haas	9586b81373	swscale/ops: don't strip SwsComps from SWS_OP_READ The current behavior of assuming the value range implicitly on SWS_OP_READ has a number of serious drawbacks and shortcomings: - It ignored the effects of SWS_OP_RSHIFT, such as for p010 and related MSB-aligned formats. (This is actually a bug) - It adds a needless dependency on the "purely informative" src/dst fields inside SwsOpList. - It is difficult to reason about when acted upon by SWS_OP_SWAP_BYTES, and the existing hack of simply ignoring SWAP_BYTES on the value range is not a very good solution here. Instead, we need a more principled way for the op list generating code to communicate extra metadata about the operations read to the optimizer. I think the simplest way of doing this is to allow the SwsComps field attached to SWS_OP_READ to carry additional, user-provided information about the values read. This requires changing ff_sws_op_list_update_comps() slightly to not completely overwrite SwsComps on SWS_OP_READ, but instead merge the implicit information with the explictly provided one.	2025-12-24 16:37:22 +00:00
Niklas Haas	61eca588dc	swscale/ops: move ff_sws_op_list_update_comps() to ops.c I think this is ultimately a better home, since the semantics of this are not really tied to optimization itself; and because I want to make it an explicitly suported part of the user-facing API (rather than just an internal-use field). The secondary motivating reason here is that I intend to use internal helpers of `ops.c` inside the next commit. (Though this is a weak reason on its own, and not sufficient to justify this move by itself.)	2025-12-24 16:37:22 +00:00
Niklas Haas	75ba2bf457	swscale/ops: correctly truncate on ff_sws_apply_op_q(SWS_OP_RSHIFT) Instead of using a "precise" division, simulate the actual truncation. Note that the division by `den` is unneeded in principle because the denominator should always be 1 for an integer, but this way we don't explode if the user should happen to pass `4/2` or something. Fixes a lot of unnecessary clamps w.r.t. xv36, e.g.: xv36be -> yuv444p12be: [u16 XXXX -> ++++] SWS_OP_READ : 4 elem(s) packed >> 0 [u16 ...X -> ++++] SWS_OP_SWAP_BYTES [u16 ...X -> ++++] SWS_OP_SWIZZLE : 1023 [u16 ...X -> ++++] SWS_OP_RSHIFT : >> 4 - [u16 ...X -> ++++] SWS_OP_CONVERT : u16 -> f32 - [f32 ...X -> ++++] SWS_OP_MIN : x <= {4095 4095 4095 _} - [f32 ...X -> ++++] SWS_OP_CONVERT : f32 -> u16 [u16 ...X -> ++++] SWS_OP_SWAP_BYTES [u16 ...X -> ++++] SWS_OP_WRITE : 3 elem(s) planar >> 0 (X = unused, + = exact, 0 = zero)	2025-12-20 13:52:45 +00:00
Niklas Haas	d1eaea1a03	swscale/ops: add type assertions to ff_sws_apply_op_q()	2025-12-20 13:52:45 +00:00
Niklas Haas	960cf3015e	swscale/ops: add explicit row offset to SwsDitherOp To improve decorrelation between components, we offset the dither matrix slightly for each component. This is currently done by adding a hard-coded offset of {0, 3, 2, 5} to each of the four components, respectively. However, this represents a serious challenge when re-ordering SwsDitherOp past a swizzle, or when splitting an SwsOpList into multiple sub-operations (e.g. for decoupling luma from subsampled chroma when they are independent). To fix this on a fundamental level, we have to keep track of the offset per channel as part of the SwsDitherOp metadata, and respect those values at runtime. This commit merely adds the metadata; the update to the underlying backends will come in a follow-up commit. The FATE change is merely due to the added offsets in the op list print-out.	2025-12-15 14:31:58 +00:00
Martin Storsjö	3cc1dc3358	swscale: Remove the unused ff_sws_pixel_type_to_uint This function uses ff_sws_pixel_type_size to switch on the size of the provided type. However, ff_sws_pixel_type_size returns a size in bytes (from sizeof()), not a size in bits. Therefore, this would previously never return the right thing but always hit the av_unreachable() below. As the function is entirely unused, just remove it. This fixes compilation with MSVC 2026 18.0 when targeting ARM64, which previously hit an internal compiler error [1]. [1] https://developercommunity.visualstudio.com/t/Internal-Compiler-Error-targeting-ARM64-/10962922	2025-11-21 21:07:34 +00:00
Andreas Rheinhardt	7dc4c4f6f5	swscale/ops: Fix linking with x86 assembly disabled Reviewed-by: Niklas Haas <ffmpeg@haasn.dev> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-04 22:14:39 +02:00
Niklas Haas	982d3a98d0	swscale/x86: add SIMD backend This covers most 8-bit and 16-bit ops, and some 32-bit ops. It also covers all floating point operations. While this is not yet 100% coverage, it's good enough for the vast majority of formats out there. Of special note is the packed shuffle fast path, which uses pshufb at vector sizes up to AVX512.	2025-09-01 19:28:36 +02:00
Niklas Haas	a151b426f9	swscale/ops_memcpy: add 'memcpy' backend for plane->plane copies Provides a generic fast path for any operation list that can be decomposed into a series of memcpy and memset operations. 25% faster than the x86 backend for yuv444p -> yuva444p 33% faster than the x86 backend for gray -> yuvj444p	2025-09-01 19:28:36 +02:00
Niklas Haas	5aef513fb4	swscale/ops_backend: add reference backend basend on C templates This will serve as a reference for the SIMD backends to come. That said, with auto-vectorization enabled, the performance of this is not atrocious. It easily beats the old C code and sometimes even the old SIMD. In theory, we can dramatically speed it up by using GCC vectors instead of arrays, but the performance gains from this are too dependent on exact GCC versions and flags, so it practice it's not a substitute for a SIMD implementation.	2025-09-01 19:28:36 +02:00
Niklas Haas	db2bc11a97	swscale/ops: add dispatch layer This handles the low-level execution of an op list, and integration into the SwsGraph infrastructure. To handle frames with insufficient padding in the stride (or a width smaller than one block size), we use a fallback loop that pads the last column of pixels using `memcpy` into an appropriately sized buffer.	2025-09-01 19:28:36 +02:00
Niklas Haas	d3ca0e300d	swscale/ops_internal: add internal ops backend API This adds an internal API for ops backends, which are responsible for compiling op lists into executable functions.	2025-09-01 19:28:36 +02:00
Niklas Haas	16e191c8ef	swscale/ops: introduce new low level framework See docs/swscale-v2.txt for an in-depth introduction to the new approach. This commit merely introduces the ops definitions and boilerplate functions. The subsequent commits will flesh out the underlying implementation.	2025-09-01 19:28:36 +02:00

46 Commits