124056 Commits

Author SHA1 Message Date
Niklas Haas
0da2bbab68 swscale/ops_dispatch: re-indent (cosmetic) 2026-04-16 20:59:39 +00:00
Niklas Haas
4c19f82cc0 swscale/ops_dispatch: compute minimum needed tail size
Not only does this take into account extreme edge cases where the plane
padding can significantly exceed the actual width/stride, but it also
correctly takes into account the filter offsets when scaling; which the
previous code completely ignored.

Simpler, robuster, and more correct. Now valgrind passes for 100% of format
conversions for me, with and without scaling.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
cd8ece4114 swscale/ops_dispatch: generalize the number of tail blocks
This is a mostly straightforward internal mechanical change that I wanted
to isolate from the following commit to make bisection easier in the case of
regressions.

While the number of tail blocks could theoretically be different for input
vs output memcpy, the extra complexity of handling that mismatch (and
adjusting all of the tail offsets, strides etc.) seems not worth it.

I tested this commit by manually setting `p->tail_blocks` to higher values
and seeing if that still passed the self-check under valgrind.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
dba7b81b38 swscale/ops_dispatch: avoid calling comp->func with w=0
The x86 kernel e.g. assumes that at least one block is processed; so avoid
calling this with an empty width. This is currently only possible if e.g.
operating on an unpadded, very small image whose total linesize is less than
a single block.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
35174913ac swscale/ops_dispatch: fix and generalize tail buffer size calculation
This code had two issues:

1. It was over-allocating bytes for the input offset map case, and
2. It was hard-coding the assumption that there is only a single tail block

We can fix both of these issues by rewriting the way the tail size is derived.

In the non-offset case, and assuming only 1 tail block:
    aligned_w - safe_width
  = num_blocks * block_size - (num_blocks - 1) * block_size
  = block_size

Additionally, the FFMAX(tail_size_in/out) is unnecessary, because:
    tail_size = pass->width - safe_width <= aligned_w - safe_width

In the input offset case, we instead realize that the input kernel already
never over-reads the input due to the filter size adjustment/clamping, so
the only thing we need to ensure is that we allocate extra bytes for the
input over-read.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
f604add8c1 swscale/ops_dispatch: remove pointless AV_CEIL_RSHIFT()
The over_read/write fields are not documented as depending on the subsampling
factor. Actually, they are not documented as depending on the plane at all.

If and when we do actually add support for horizontal subsampling to this
code, it will most likely be by turning all of these key variables into
arrays, which will be an upgrade we get basically for free.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
dd8ff89adf swscale/ops_dispatch: add helper to explicitly control pixel->bytes rounding
This makes it far less likely to accidentally add or remove a +7 bias when
repeating this often-used expression.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
16a57b2985 swscale/ops_dispatch: ensure block size is multiple of pixel size
This could trigger if e.g. a backend tries to operate on monow formats with
a block size that is not a multiple of 1. In this case, `block_size_in`
would previously be miscomputed (to e.g. 0), which is obviously wrong.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
86307dad4a swscale/ops_dispatch: make offset calculation code robust against overflow
As well as weird edge cases like trying to filter `monow` and pixels landing
in the middle of a byte. Realistically, this will never happen - we'd instead
pre-process it into something byte-aligned, and then dispatch a byte-aligned
filter on it.

However, I need to add a check for overflow in any case, so we might as well
add the alignment check at the same time. It's basically free.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
95e4f7cac5 swscale/ops_dispatch: fix rounding direction of plane_size
This is an upper bound, so it should be rounded up.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
c6e47b293d swscale/ops_dispatch: pre-emptively guard against int overflow
By using size_t whenever we compute derived figures.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
0524e66aec swscale/ops_dispatch: drop pointless const (cosmetic)
These are clearly not mutated within their constrained scope, and it just
wastes valuable horizontal space.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
c98810ac78 swscale/ops_dispatch: zero-init tail buffer
Prevents valgrind from complaining about operating on uninitialized bytes.
This should be cheap as it's only done once during setup().

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
ba516a34cd swscale/x86/ops_int: use sized mov for packed_shuffle output
This code made the input read conditional on the byte count, but not the
output, leading to a lot of over-write for cases like 15, 5.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Niklas Haas
4264045137 swscale/x86/ops: set missing over_read metadata on filter ops
These align the filter size to a multiple of the internal tap grouping
(either 1/2/4 for vpgatherdd, or the XMM size for the 4x4 transposed kernel).
This may over-read past the natural end of the input buffer, if the aligned
size exceeds the true size.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 20:59:39 +00:00
Peter von Kaenel
d013863f00 avcodec/lcevcdec: poll on LCEVC_Again from LCEVC_ReceiveDecoderPicture
The V-Nova LCEVC pipeline processes frames on internal background
worker threads. LCEVC_ReceiveDecoderPicture returns LCEVC_Again (-1)
when the worker has not yet completed the frame, which is the
documented "not ready, try again" response. The original code treated
any non-zero return as a fatal error (AVERROR_EXTERNAL), causing decode
to abort mid-stream.

Poll until LCEVC_Success or a genuine error is returned.

Signed-off-by: Peter von Kaenel <Peter.vonKaenel@harmonicinc.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-16 17:19:28 -03:00
Andreas Rheinhardt
9ab37ef918 avcodec/packet: Remove always-true check
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 19:27:03 +00:00
Andreas Rheinhardt
4b5e1d25c3 avcodec/decode: Short-circuit side-data processing
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 19:27:03 +00:00
Andreas Rheinhardt
a85709537e avcodec/decode: Avoid temporary frame in ff_reget_buffer()
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 19:27:03 +00:00
Andreas Rheinhardt
b595b3075e avcodec/decode: Fix shadowing
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 19:27:03 +00:00
Andreas Rheinhardt
f99e4a0f23 avcodec/decode: Optimize call away if possible
post_process_opaque is only used by LCEVC, so it is unused
on most builds.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 19:27:03 +00:00
Andreas Rheinhardt
312bfd512d avcodec/decode: Remove always-true checks
dc->lcevc.ctx is only != NULL for video.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 19:27:03 +00:00
Andreas Rheinhardt
2d062dd0c6 avcodec/decode: Make post_process_opaque a RefStruct reference
Avoids the post_process_opaque_free callback; the only user of
this is already a RefStruct reference and presumably other users
would want to use a pool for this, too, so they would use
RefStruct-objects, too.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 19:27:03 +00:00
Andreas Rheinhardt
0ee1947d9b avcodec/lcevcdec: Use pool to avoid allocations of FFLCEVCFrame
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 19:27:03 +00:00
Kacper Michajłow
03967fcff4 tests/checkasm/sw_ops: fix too large shift for int
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-04-16 18:56:22 +00:00
Kacper Michajłow
369dbbe488 swscale/ops_memcpy: guard exec->in_stride[-1] access
When use_loop == true and idx < 0, we would incorrectly check
in_stride[idx], which is OOB read. Reorder conditions to avoid that.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-04-16 18:56:22 +00:00
Niklas Haas
1764683668 swscale/ops_backend: disable FP contraction where possible
In particular, Clang defaults to FP contraction enabled. GCC defaults to
off in standard C mode (-std=c11), but the C standard does not actually
require any particular default.

The #pragma STDC pragma, despite its name, warns on anything except Clang.

Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/22796
See-also: https://discourse.llvm.org/t/fp-contraction-fma-on-by-default/64975
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 17:19:51 +00:00
Hassan Hany
3b19a61837 avcodec/vorbisdec: validate windowtype and transformtype 2026-04-16 10:24:41 +00:00
Gyan Doshi
5abc240a27 avcodec/videotoolboxenc: add missing field and rectify cap flags 2026-04-16 09:58:06 +00:00
Daniel Verkamp
8eae5de5af avformat/wavenc: Keep fmt chunk first for -rf64 auto
When the WAV muxer's `-rf64 auto` option is used, the output is intended
to be a normal WAV file if possible, only extended to RF64 format when
the file size grows too large. This was accomplished by reserving space
for the extra RF64-specific data using a standard JUNK chunk (ignored by
readers), then overwriting the reserved space later with a ds64 chunk if
needed.

In the original rf64 auto implementation, the JUNK chunk was placed
right after the RIFF/WAVE file header, before the fmt chunk; this is the
design suggested by the "Achieving compatibility between BWF and RF64"
section of the RF64 spec:

  RIFF 'WAVE' <JUNK chunk> <fmt-ck> ...

However, this approach means that the fmt chunk is no longer in its
conventional location at the beginning of the file, and some WAV-reading
tools are confused by this layout. For example, the `file` tool is not
able to show the format information for a file with the extra JUNK chunk
before fmt.

This change shuffles the order of the chunks for `-rf64 auto` mode so
that the reserved space follows fmt instead of preceding it:

  RIFF 'WAVE' <fmt-ck> <JUNK chunk> ...

With this small modification, tools expecting the fmt chunk to be the
first chunk in the file work with files produced by `-rf64 auto`.

This means the fmt chunk won't be in the location required by RF64, so
if the automatic RF64 conversion is triggered, the fmt chunk needs to be
relocated by rewriting it following the ds64 chunk during the conversion:

  RF64 'WAVE' <ds64 chunk> <fmt-ck> ...
2026-04-16 09:12:45 +00:00
Andreas Rheinhardt
39f34ee019 tests/checkasm/h264chroma: Use more realistic block sizes
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 07:36:01 +02:00
Andreas Rheinhardt
3de38c6b6e avcodec/h264chroma: Fix incorrect alignment documentation
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 07:36:01 +02:00
Andreas Rheinhardt
53a26f7d41 avcodec/mpegvideo_dec: Use C version of h264chroma mc2 functions
H.264 only uses these functions with height 2 or 4 and
the aarch64, arm and mips versions of them optimize based
on this. Yet this is not true when these functions are used
by the lowres code in mpegvideo_dec.c. So revert back to
the C versions of these functions for mpegvideo_dec so that
the H.264 decoder can still use fully optimized functions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-16 07:36:01 +02:00
James Almer
2feb213287 avcodec/lcevc: make CBS reallocate the LCEVC payload
Frame side data unfortunately lacks padding, which CBS needs, so we can't reuse
the existing AVBufferRef.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-15 16:13:44 -03:00
Niklas Haas
dcfd8ebe86 tests/checkasm/sw_ops: remove random value clears
These can randomly trigger the alpha/zero fast paths, resulting in spurious
tests or randomly diverging performance if the backend happens to implement
that particular fast path.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
80b86f0807 tests/checkasm/sw_ops: fix check_scale()
This was not actually testing integer path. Additionally, for integer
scales, there is a special fast path for expansion from bits to full range,
which we should separate from the random value test.
2026-04-15 14:51:16 +00:00
Niklas Haas
e199d6b375 swscale/x86/ops: add missing component annotation on expand_bits
This only does a single component; so it should be marked as such.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
b6755b0158 swscale/ops_memcpy: always use loop on buffers with large padding
The overhead of the loop and memcpy call is less than the overhead of
possibly spilling into  one extra unnecessary cache line. 64 is still a
good rule of thumb for L1 cache line size in 2026.

I leave it to future code archeologists to find and tweak this constant if
it ever becomes unnecessary.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
026a6a3101 tests/checkasm/sw_ops: remove redundant filter tests
Most of these filters don't test anything meaningfully different relative to
each other; the only filters that really have special significant are POINT
(for now) and maybe BILINEAR down the line.

Apart from that, SINC, combined with the src size loop, already tests both
extreme cases (large and small filters), with large, oscillating unwindonwed
weights.

The other filters are not adding anything of substance to this, while massively
slowing down the runtime of this test. We can, of course, change this if the
backends ever get more nuanced handling.

checkasm: all 855 tests passed (down from 1575)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
91582f7287 tests/checkasm/sw_ops: explicitly test all backends
The current code was a bit clumsy in that it always picked the first
available backend when choosing the new function. This meant that some x86
paths were not being tested at all, whenever the memcpy backend (which has
higher priority) could serve the request.

This change makes it so that each backend is explicitly tested against only
implementations provided by that same backend.

checkasm: all 1575 tests passed (up from 1305)

As an aside, it also lets us benchmark the memcpy backend directly against
the C reference backend.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
d5089a1c62 tests/checkasm/sw_ops: don't shadow 'report'
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
3c1781f931 tests/checkasm/sw_ops: separate op compilation from testing
This commit is purely moving around code; there is no functional change.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
e83de76f08 tests/checkasm/sw_ops: check all planes in CHECK_COMMON()
This can help e.g. properly test that the masked/excluded components are
left unmodified.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
eac90ce6ce tests/checkasm/sw_ops: set correct plane index order
All four components were accidentally being read/written to/from the same
plane.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Niklas Haas
590eb4b70d tests/checkasm/sw_ops: remove some unnecessary checks
These don't actually exist at runtime, and will soon be removed from the
backends as well.

This commit is intentionally a bit incomplete; as I will rewrite this
based on the auto-generated macros in the upcoming ops_micro series.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-15 14:51:16 +00:00
Stéphane Cerveau
3f9e04b489 vulkan: fix encode feedback query handling
Check that the driver supports both BUFFER_OFFSET and BYTES_WRITTEN
encode feedback flags before creating the query pool, failing with
EINVAL if either is missing.

Set these flags explicitly instead of masking off HAS_OVERRIDES with a
bitwise NOT, which could pass unrecognized bits from newer drivers to
vkCreateQueryPool causing validation errors and
crashes.
2026-04-14 21:31:45 +00:00
Vignesh Venkat
c8dd769217 ffprobe: Support printing SMPTE 2094 APP5 side data
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
2026-04-14 20:41:14 +00:00
Vignesh Venkat
37aefb6e40 avcodec/dav1d: Support parsing smpte 2094-50 metadata
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
2026-04-14 20:35:57 +00:00
Andreas Rheinhardt
d5fc732359 avcodec/codec_internal: Include avcodec.h for enum AVCodecConfig
Forward-declaring an enum is not legal C (the underlying type of
the enum may depend upon the enum constants, so this may cause
ABI issues with -fshort-enums); compilers warn about this
with -pedantic.

This essentially reverts 7e84865cff.
Notice that almost* all files that include codec_internal.h also
need to include avcodec.h, so this does not lead to unnecessary
rebuilds.

This addresses part of #22684.

*: The only file I am aware of that defines an FFCodec and does not
need AVCodecContext as complete type is null.c (but even it already
includes it implicitly); the avcodec.c test tool seems to be the only
file where this commit actually leads to an unnecessary avcodec.h
inclusion.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-14 16:04:47 +02:00
Andreas Rheinhardt
fc8c6d4665 swscale/swscale: Remove ineffective check
If any of the dstStrides is not aligned mod 16, the warning
above this one will be triggered, setting stride_unaligned_warned,
so that the following check for stride_unaligned_warned will
be always false.

Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-14 15:22:49 +02:00