FFmpeg

mirror of https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg synced 2026-04-21 13:21:55 +08:00

Author	SHA1	Message	Date
Niklas Haas	96f82f4fbb	swscale/x86/ops: simplify SWS_OP_CLEAR patterns Mark the components to be cleared, not the components to be preserved. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:25:17 +02:00
Niklas Haas	08707934cc	swscale/ops_backend: simplify SWS_OP_CLEAR declarations Mark the components to be cleared, not the components to be preserved. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:25:17 +02:00
Niklas Haas	7a71a01a1b	swscale/ops: nuke SwsComps.unused Finally, remove the last relic of this accursed design mistake. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:25:17 +02:00
Niklas Haas	a797e30f71	swscale/aarch64/ops: compute SWS_OP_PACK mask directly Instead of implicitly relying on SwsComps.unused, which contains the exact same information. (cf. ff_sws_op_list_update_comps) Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:25:17 +02:00
Niklas Haas	6d1e549195	swscale/aarch64/ops: use SWS_OP_NEEDED() instead of next->comps.unused These are basically identical, but the latter is being phased out. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:25:17 +02:00
Niklas Haas	18cc71fc8e	swscale/aarch64/ops: fix SWS_OP_LINEAR mask check The implementation of AARCH64_SWS_OP_LINEAR loops over elements of this mask to determine which output rows to compute. However, it is being set by this loop to `op->comps.unused`, which is a mask of unused input rows. As such, it should be looking at `next->comps.unused` instead. This did not result in problems in practice, because none of the linear matrices happened to trigger this case (more input columns than output rows). Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:25:17 +02:00
Niklas Haas	df4fe85ae3	swscale/ops_chain: replace SwsOpEntry.unused by SwsCompMask Needed to allow us to phase out SwsComps.unused altogether. It's worth pointing out the change in semantics; while unused tracks the unused input components, the mask is defined as representing the computed output components. This is 90% the same, expect for read/write, pack/unpack, and clear; which are the only operations that can be used to change the number of components. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:25:10 +02:00
Niklas Haas	215cd90201	swscale/x86/ops: simplify DECL_DITHER definition This extra indirection boilerplate just for the 0-size fast path really isn't doing us any favors. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:24:55 +02:00
Niklas Haas	9f0dded48d	swscale/ops_chain: check for exact linear mask match Makes this logic a lot simpler and less brittle. We can trivially adjust the list of linear masks that are required, whenever it changes as a result of any future modifications. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:24:55 +02:00
Niklas Haas	e20a32d730	swscale/x86/ops: align linear kernels with reference backend See previous commit. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:24:55 +02:00
Niklas Haas	9b1c1fe95f	swscale/ops_backend: align linear kernels with actually needed masks Using the power of libswscale/tests/sws_ops -summarize lets us see which kernels are actually needed by real op lists. Note: I'm working on a separate series which will obsolete this implementation whack-a-mole game altogether, by generating a list of all possible op kernels at compile time. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:24:55 +02:00
Niklas Haas	af2674645f	swscale/ops: drop offset from SWS_MASK_ALPHA This is far more commonly used without an offset than with; so having it there prevents these special cases from actually doing much good. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:24:55 +02:00
Niklas Haas	526195e0a3	swscale/x86/ops_float: fix typo in linear_row First vector is %2, not %3. This was never triggered before because all of the existing masks never hit this exact case. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:24:55 +02:00
Niklas Haas	6a83e15392	swscale/ops_chain: simplify SwsClearOp checking Since this now has an explicit mask, we can just check that directly, instead of relying on the unused comps hack/trick. Additionally, this also allows us to distinguish between fixed value and arbitrary value clears by just having the SwsOpEntry contain NAN values iff they support any clear value. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:24:22 +02:00
Niklas Haas	80bd6c0cd5	swscale/ops: don't strip range metadata for unused components As alluded to by the previous commit, this is now no longer necessary to prevent their print-out. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:23:36 +02:00
Niklas Haas	3680642e1b	swscale/ops: simplify min/max range print check This does come with a slight change in behavior, as we now don't print the range information in the case that the range is only known for unused components. However, in practice, that's already guaranteed by update_comps() stripping the range info explicitly in this case. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:23:36 +02:00
Niklas Haas	9bb2b11d5b	swscale/ops: add SwsCompMask parameter to print_q4() Instead of implicitly excluding NAN values if ignore_den0 is set. This gives callers more explicit control over which values to print, and in doing so, makes sure "unintended" NaN values are properly printed as such. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:23:36 +02:00
Niklas Haas	cf2d40f65d	swscale/ops: add explicit clear mask to SwsClearOp Instead of implicitly testing for NaN values. This is mostly a straightforward translation, but we need some slight extra boilerplate to ensure the mask is correctly updated when e.g. commuting past a swizzle. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:23:36 +02:00
Niklas Haas	4020607f0a	swscale/ops: add SwsCompMask and related helpers This new type will be used over the following commits to simplify the codebase. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:23:36 +02:00
Niklas Haas	ce2ca1a186	swscale/ops_optimizer: fix commutation of U32 clear + swap_bytes This accidentally unconditionally overwrote the entire clear mask, since Q(n) always set the denominator to 1, resulting in all channels being cleared instead of just the ones with nonzero denominators. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 23:23:36 +02:00
Niklas Haas	953d278a01	tests/swscale: fix input pattern generation for very small sizes This currently completely fails for images smaller than 12x12; and even in that case, the limited resolution makes these tests a bit useless. At the risk of triggering a lot of spurious SSIM regressions for very small sizes (due to insufficiently modelling the effects of low resolution on the expected noise), this patch allows us to at least run such tests. Incidentally, 8x8 is the smallest size that passes the SSIM check.	2026-04-16 20:59:39 +00:00
Niklas Haas	0da2bbab68	swscale/ops_dispatch: re-indent (cosmetic)	2026-04-16 20:59:39 +00:00
Niklas Haas	4c19f82cc0	swscale/ops_dispatch: compute minimum needed tail size Not only does this take into account extreme edge cases where the plane padding can significantly exceed the actual width/stride, but it also correctly takes into account the filter offsets when scaling; which the previous code completely ignored. Simpler, robuster, and more correct. Now valgrind passes for 100% of format conversions for me, with and without scaling. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	cd8ece4114	swscale/ops_dispatch: generalize the number of tail blocks This is a mostly straightforward internal mechanical change that I wanted to isolate from the following commit to make bisection easier in the case of regressions. While the number of tail blocks could theoretically be different for input vs output memcpy, the extra complexity of handling that mismatch (and adjusting all of the tail offsets, strides etc.) seems not worth it. I tested this commit by manually setting `p->tail_blocks` to higher values and seeing if that still passed the self-check under valgrind. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	dba7b81b38	swscale/ops_dispatch: avoid calling comp->func with w=0 The x86 kernel e.g. assumes that at least one block is processed; so avoid calling this with an empty width. This is currently only possible if e.g. operating on an unpadded, very small image whose total linesize is less than a single block. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	35174913ac	swscale/ops_dispatch: fix and generalize tail buffer size calculation This code had two issues: 1. It was over-allocating bytes for the input offset map case, and 2. It was hard-coding the assumption that there is only a single tail block We can fix both of these issues by rewriting the way the tail size is derived. In the non-offset case, and assuming only 1 tail block: aligned_w - safe_width = num_blocks * block_size - (num_blocks - 1) * block_size = block_size Additionally, the FFMAX(tail_size_in/out) is unnecessary, because: tail_size = pass->width - safe_width <= aligned_w - safe_width In the input offset case, we instead realize that the input kernel already never over-reads the input due to the filter size adjustment/clamping, so the only thing we need to ensure is that we allocate extra bytes for the input over-read. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	f604add8c1	swscale/ops_dispatch: remove pointless AV_CEIL_RSHIFT() The over_read/write fields are not documented as depending on the subsampling factor. Actually, they are not documented as depending on the plane at all. If and when we do actually add support for horizontal subsampling to this code, it will most likely be by turning all of these key variables into arrays, which will be an upgrade we get basically for free. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	dd8ff89adf	swscale/ops_dispatch: add helper to explicitly control pixel->bytes rounding This makes it far less likely to accidentally add or remove a +7 bias when repeating this often-used expression. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	16a57b2985	swscale/ops_dispatch: ensure block size is multiple of pixel size This could trigger if e.g. a backend tries to operate on monow formats with a block size that is not a multiple of 1. In this case, `block_size_in` would previously be miscomputed (to e.g. 0), which is obviously wrong. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	86307dad4a	swscale/ops_dispatch: make offset calculation code robust against overflow As well as weird edge cases like trying to filter `monow` and pixels landing in the middle of a byte. Realistically, this will never happen - we'd instead pre-process it into something byte-aligned, and then dispatch a byte-aligned filter on it. However, I need to add a check for overflow in any case, so we might as well add the alignment check at the same time. It's basically free. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	95e4f7cac5	swscale/ops_dispatch: fix rounding direction of plane_size This is an upper bound, so it should be rounded up. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	c6e47b293d	swscale/ops_dispatch: pre-emptively guard against int overflow By using size_t whenever we compute derived figures. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	0524e66aec	swscale/ops_dispatch: drop pointless `const` (cosmetic) These are clearly not mutated within their constrained scope, and it just wastes valuable horizontal space. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	c98810ac78	swscale/ops_dispatch: zero-init tail buffer Prevents valgrind from complaining about operating on uninitialized bytes. This should be cheap as it's only done once during setup(). Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	ba516a34cd	swscale/x86/ops_int: use sized mov for packed_shuffle output This code made the input read conditional on the byte count, but not the output, leading to a lot of over-write for cases like 15, 5. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Niklas Haas	4264045137	swscale/x86/ops: set missing over_read metadata on filter ops These align the filter size to a multiple of the internal tap grouping (either 1/2/4 for vpgatherdd, or the XMM size for the 4x4 transposed kernel). This may over-read past the natural end of the input buffer, if the aligned size exceeds the true size. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 20:59:39 +00:00
Kacper Michajłow	369dbbe488	swscale/ops_memcpy: guard exec->in_stride[-1] access When use_loop == true and idx < 0, we would incorrectly check in_stride[idx], which is OOB read. Reorder conditions to avoid that. Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2026-04-16 18:56:22 +00:00
Niklas Haas	1764683668	swscale/ops_backend: disable FP contraction where possible In particular, Clang defaults to FP contraction enabled. GCC defaults to off in standard C mode (-std=c11), but the C standard does not actually require any particular default. The #pragma STDC pragma, despite its name, warns on anything except Clang. Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/22796 See-also: https://discourse.llvm.org/t/fp-contraction-fma-on-by-default/64975 Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-16 17:19:51 +00:00
Niklas Haas	e199d6b375	swscale/x86/ops: add missing component annotation on expand_bits This only does a single component; so it should be marked as such. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Niklas Haas	b6755b0158	swscale/ops_memcpy: always use loop on buffers with large padding The overhead of the loop and memcpy call is less than the overhead of possibly spilling into one extra unnecessary cache line. 64 is still a good rule of thumb for L1 cache line size in 2026. I leave it to future code archeologists to find and tweak this constant if it ever becomes unnecessary. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-15 14:51:16 +00:00
Andreas Rheinhardt	fc8c6d4665	swscale/swscale: Remove ineffective check If any of the dstStrides is not aligned mod 16, the warning above this one will be triggered, setting stride_unaligned_warned, so that the following check for stride_unaligned_warned will be always false. Reviewed-by: Niklas Haas <ffmpeg@haasn.dev> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-14 15:22:49 +02:00
Kacper Michajłow	1092852406	swscale/ops: remove type from continuation functions The glue code doesn't care about types, so long the functions are chained correctly. Let's not pretend there is any type safety there, as the function pointers were casted anyway from unrelated types. Particularly some f32 and u32 are shared. This fixes errors like so: src/libswscale/ops_tmpl_int.c:471:1: runtime error: call to function linear_diagoff3_f32 through pointer to incorrect function type 'void ()(struct SwsOpIter , const struct SwsOpImpl , unsigned int , unsigned int , unsigned int , unsigned int *)' libswscale/ops_tmpl_float.c:208: note: linear_diagoff3_f32 defined here Fixes: #22332	2026-04-13 23:28:30 +00:00
Kacper Michajłow	9a2a0557ad	swscale/ops: remove optimize attribute from op functions It was added to force auto vectorization on GCC builds. Since then auto vectorization has been enabled for whole code base, `1464930696`. According to GCC documentaiton, the optimize attribute should be used for debugging purposes only. It is not suitable in production code. In particular it's unclear whether the attribute is applied, as it's is actually lost when function is inlined, so usage of it is quite fragile. Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2026-04-13 23:28:30 +00:00
Andreas Rheinhardt	761b6f2359	swscale/x86/output: Remove obsolete MMXEXT function Possible now that the SSE2 function is available even when the stack is not aligned. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-13 08:46:44 +02:00
Andreas Rheinhardt	8a7c1f7fb8	swscale/x86/output: Make xmm functions usable even without aligned stack x86-32 lacks one GPR, so it needs to be read from the stack. If the stack needs to be realigned, we can no longer access the original location of one argument, so just request a bit more stack size and copy said argument at a fixed offset from the new stack. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-13 08:46:44 +02:00
Andreas Rheinhardt	0bb161fd09	swscale/x86/output: Simplify creating dither register Only the lower quadword needs to be rotated, because the register is zero-extended immediately afterwards anyway. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-13 08:46:44 +02:00
Andreas Rheinhardt	f5c5bca803	swscale/x86/scale: Remove always-false mmsize checks Forgotten in `a05f22eaf3`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-13 08:46:44 +02:00
Niklas Haas	c29465bcb6	swscale/x86/ops: use plain `ret` instruction The original intent here was probably to make the ops code agnostic to which operation is actually last in the list, but the existence of a divergence between CONTINUE and FINISH already implies that we hard-code the assumption that the final operation is a write op. So we can just massively simplify this with a call/ret pair instead of awkwardly exporting and then jumping back to the return label. This actually collapses FINISH down into just a plain RET, since the op kernels already don't set up any extra stack frame. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-11 16:30:15 +00:00
Niklas Haas	0e983a0604	swscale: align allocated frame buffers to SwsPass hints This avoids hitting the slow memcpy fallback paths altogether, whenever swscale.c is handling plane allocation. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-10 15:12:18 +02:00
Niklas Haas	b5573a8683	swscale/ops_dispatch: cosmetic Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-10 15:12:18 +02:00

1 2 3 4 5 ...

3269 Commits