124046 Commits

Author SHA1 Message Date
Zhao Zhili
a85a8e6757 configure: fix VSX remaining enabled when -mvsx is unsupported
When check_cflags -mvsx fails, the && short-circuit prevents
check_cc from running. Since check_cc is responsible for
disabling vsx on failure, skipping it leaves vsx incorrectly
enabled.

Fix by removing the && so check_cc always executes.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-04-13 11:45:36 +00:00
Andreas Rheinhardt
32678dcc88 avcodec/x86/snowdsp_init: Remove disabled SSE2 functions
Disabled in 3e0f7126b5
(almost 20 years ago) and no one fixed them, so remove them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:56:35 +02:00
Andreas Rheinhardt
bd2964e611 avcodec/x86/snowdsp_init: Use standard init pattern
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:56:01 +02:00
Andreas Rheinhardt
338dc25642 avcodec/x86/snowdsp_init: Remove MMXEXT, SSE2 inner_add_yblock versions
They have been superseded by SSSE3; the SSE2 version was even disabled
(and segfaults if enabled).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:53:17 +02:00
Andreas Rheinhardt
5c830fccf4 avcodec/x86/snowdsp: Add SSSE3 inner_add_yblock
Compared to the MMX version, this version benefits from wider
registers and pmaddubsw. It also has fewer unnecessary loads
and stores: On x64, the MMX version has 12 unnecessary GPR loads
and 6 stores in each line when width is eight; for width 16,
there are 17 unnecessary GPR loads and six stores per line.
Even the 32bit SSSE3 version only has six loads and zero stores
per line more than the x64 version. Furthermore, in contrast
to the MMX version, the SSSE3 version also does not clobber
the array of block pointers given to it.

Benchmarks:
inner_add_yblock_2_c:                                   29.2 ( 1.00x)
inner_add_yblock_2_mmx:                                 32.5 ( 0.90x)
inner_add_yblock_2_ssse3:                               28.6 ( 1.02x)
inner_add_yblock_4_c:                                   85.2 ( 1.00x)
inner_add_yblock_4_mmx:                                 89.2 ( 0.96x)
inner_add_yblock_4_ssse3:                               84.5 ( 1.01x)
inner_add_yblock_8_c:                                  302.0 ( 1.00x)
inner_add_yblock_8_mmx:                                 77.0 ( 3.92x)
inner_add_yblock_8_ssse3:                               30.6 ( 9.85x)
inner_add_yblock_16_c:                                1164.7 ( 1.00x)
inner_add_yblock_16_mmx:                               260.4 ( 4.47x)
inner_add_yblock_16_ssse3:                              82.3 (14.15x)

Both the MMX and SSSE3 versions leave the size 2 and 4 cases
to ff_snow_inner_add_yblock_c() (but the MMX version has
a prologue at the beginning that it needs to undo before
the call, leading to the higher overhead for these sizes).
I don't know why the SSSE3 version is marginally faster than
the C version in these cases.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:51:35 +02:00
Andreas Rheinhardt
2fdccaf7d6 tests/checkasm/mpegvideo_unquantize: Fix precedence problem
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:51:35 +02:00
Andreas Rheinhardt
4f30bd6fba tests/checkasm/llvidencdsp: Fix nonsense randomization
The first loop was never entered due to a precedence problem;
the second loop initialized everything, although it was not intended
that way.
This has been added in 56b8769a1c.
Sorry for this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:51:34 +02:00
Andreas Rheinhardt
e0ed3fa834 tests/checkasm: Add snowdsp test
Only inner_add_yblock for now.
Hint: Said function uses a pointer to an array of pointers as parameter.
The MMX version clobbers the array in such a way that calling the
function repeatedly with the same arguments (as happens inside bench_new())
leads to buffer overflows and segfaults. Therefore CALL4 had to be
overridden to restore the original pointers. This workaround will be
removed soon when the MMX version is removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:46:24 +02:00
Andreas Rheinhardt
764e021946 avcodec/snowdata: Add explicit alignment for obmc tables
This is in preparation for adding SSSE3 assembly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:46:24 +02:00
Andreas Rheinhardt
28d0a5091a avcodec/snow_dwt: Remove pointless forward declaration
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:46:24 +02:00
Andreas Rheinhardt
5f373872c0 avcodec/x86/snow_dwt: Avoid slice_buffer in inner_add_yblock
It is unnecessary and avoids the src_y parameter;
it also makes this function more ASM-friendly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:46:24 +02:00
Andreas Rheinhardt
fd77f00a8f avcodec/snow: Avoid always-true branch
The input lines used in ff_snow_inner_add_yblock()
must always be set (because their values are used).
The MMX assembly always relied on this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:46:24 +02:00
Andreas Rheinhardt
13d621cc7c avcodec/snow: Disable dead code in ff_snow_inner_add_yblock()
It is only used with add != 0 (and the assembly functions
only support this case).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:46:24 +02:00
Andreas Rheinhardt
eed0830a0c avcodec/snowdata: Don't use 8 bits for six bits data
This has been done in 561a18d3ba
in order to avoid shifts, yet this rationale no longer applies
since d593e32983. So shift them back;
this is in preparation for using these coefficients together with
pmaddubsw.

Hint: 561a18d3ba also added a block
guarded by "if(LOG2_OBMC_MAX == 8". I changed the condition to remove
this check (i.e. kept the block) which should not change the output
at all. Yet all FATE tests pass if the block is completely
removed. I don't know if this block is necessary at all.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 12:46:24 +02:00
Andreas Rheinhardt
761b6f2359 swscale/x86/output: Remove obsolete MMXEXT function
Possible now that the SSE2 function is available
even when the stack is not aligned.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 08:46:44 +02:00
Andreas Rheinhardt
8a7c1f7fb8 swscale/x86/output: Make xmm functions usable even without aligned stack
x86-32 lacks one GPR, so it needs to be read from the stack.
If the stack needs to be realigned, we can no longer access
the original location of one argument, so just request a bit
more stack size and copy said argument at a fixed offset from
the new stack.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 08:46:44 +02:00
Andreas Rheinhardt
0bb161fd09 swscale/x86/output: Simplify creating dither register
Only the lower quadword needs to be rotated, because
the register is zero-extended immediately afterwards anyway.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 08:46:44 +02:00
Andreas Rheinhardt
f5c5bca803 swscale/x86/scale: Remove always-false mmsize checks
Forgotten in a05f22eaf3.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 08:46:44 +02:00
Andreas Rheinhardt
999ccf6495 swresample/x86/{audio_convert,rematrix}: Remove remnants of MMX
Forgotten in 2b94f23b06,
4e51e48ebd and
374b3ab03c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 01:16:46 +02:00
Andreas Rheinhardt
e29c7089d2 avcodec/x86/vp8dsp_loopfilter: Remove always-true mmsize checks
Forgotten in 6a551f1405.
Also fix the comment claiming that there are MMXEXT functions
in this file.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 00:41:22 +02:00
Andreas Rheinhardt
9f560c8c1a avcodec/x86/vp3dsp: Remove unused macros
Forgotten in a677b38298.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 00:41:22 +02:00
Jun Zhao
411484e8c9 lavc/videotoolbox_vp9: fix vpcC flags offset
Write the 24-bit vpcC flags field at the current cursor position after
the version byte. The previous code wrote to p+1 instead of p, leaving
one byte uninitialized between version and flags and shifting all
subsequent fields (profile, level, bitdepth, etc.) by one byte.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-04-12 22:15:51 +00:00
Jun Zhao
57397a683d lavc/videotoolboxenc: return SEI parse errors
Return the actual find_sei_end() error when SEI appending fails instead of
reusing the previous status code. This preserves the real parse failure for
callers instead of reporting malformed SEI handling as success.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-04-12 22:15:51 +00:00
Niklas Haas
b09d57c41d avfilter/buffersrc: re-add missing overflow warning
This was originally introduced by commit 05d6cc116e. During the FFmpeg-libav
split, this function was refactored by commit 7e350379f8 into
av_buffersrc_add_frame(), replacing av_buffersrc_add_ref(). The new function
did not include the overflow warning, despite the same being done for
buffersink.

Then, when commit a05a44e205 merged the two functions back together, the
libav implementation was favored over the FFmpeg implementation, silently
removing the overflow warning in the process.

This commit re-adds that missing warning.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-12 20:02:18 +00:00
nyanmisaka
ab7b6ef0a2 hwcontext_vulkan: fix double free when vulkan_map_to_drm fails
The multiplanar image with storage_bit enabled fails to be exported
to DMA-BUF on the QCOM turnip driver, thus triggering this double-free issue.

```
[Parsed_hwmap_2 @ 0xffff5c002a70] Configure hwmap vulkan -> drm_prime.
[hwmap @ 0xffff5c001180] Filter input: vulkan, 1920x1080 (0).
[AVHWFramesContext @ 0xffff5c004e00] Unable to export the image as a FD!
free(): double free detected in tcache 2
Aborted
```

Additionally, add back an av_unused attribute. Otherwise, the compiler
will complain about unused variables when CUDA is not enabled.

Signed-off-by: nyanmisaka <nst799610810@gmail.com>
2026-04-12 20:50:38 +08:00
zuxy
56b97c03d4 avcodec/x86/h264_intrapred: Replace pred8x8_top_dc_8_mmxext with SSE2
More about deprecating MMX than any performance gain; nearly identical
performance numbers on my Zen 4 (1.36x vs c), but llvm-mca predicts
>60% perf gain on Intel CPUs newer than Skylake.

Signed-off-by: Zuxy Meng <zuxy.meng@gmail.com>
2026-04-11 19:11:46 -07:00
Niklas Haas
c29465bcb6 swscale/x86/ops: use plain ret instruction
The original intent here was probably to make the ops code agnostic to
which operation is actually last in the list, but the existence of a
divergence between CONTINUE and FINISH already implies that we hard-code
the assumption that the final operation is a write op.

So we can just massively simplify this with a call/ret pair instead of
awkwardly exporting and then jumping back to the return label. This actually
collapses FINISH down into just a plain RET, since the op kernels already
don't set up any extra stack frame.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-11 16:30:15 +00:00
Tymur Boiko
f7ca6f7481 vulkan: fix -Wdiscarded-qualifiers warning and misleading DRM modifier log
ff_vk_find_struct returns const void *, so storing it in const void *drm_create_pnext
fixes the initialization warning but then dpb_hwfc->create_pnext = drm_create_pnext
assigns const void * to void *, triggering the same warning at that line. The right
fix is a (void *) cast at the call site, same as done for buf_pnext.

Also restrict the GetPhysicalDeviceImageFormatProperties2 verbose log in
try_export_flags to the DRM modifier path only: when has_mods is false the log
always printed mod[0]=0x0, which is misleading since no DRM modifier is involved.

Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
2026-04-11 12:50:07 +00:00
Kacper Michajłow
eaadd05232 .forgejo/CODEOWNERS: add myself for hls.*
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-04-11 01:58:35 +02:00
Kacper Michajłow
721545a3c2 MAINTAINERS: add myself as HLS demuxer maintainer
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-04-11 01:58:35 +02:00
Kacper Michajłow
cc41e6a462 tests/fate/hlsenc: add hls-event-no-endlist test
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-04-11 01:58:34 +02:00
Kacper Michajłow
6d98a9a2e8 avformat/hls: fix seeking in EVENT playlists that start mid-stream
HLS EVENT playlists (e.g. Twitch VODs) are seekable but not finished,
so live_start_index causes playback to begin near the end. The first
packet's DTS then becomes first_timestamp, creating a wrong mapping
between timestamps and segments.

Fix this by subtracting the cumulative duration of skipped segments from
first_timestamp so it reflects the true start of the playlist.

Also set per-stream start_time from first_timestamp so correct time is
reported, reset pts_wrap_reference on seek to prevent bogus wrap
arounds.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-04-11 01:58:34 +02:00
Niklas Haas
ef13a29d08 avfilter/framepool: fix frame pool uninit check
Fixes a memory leak caused by AV_MEDIA_TYPE_VIDEO == 0 being excluded by
the !pool->type check. We can just remove the entire check because
av_buffer_pool_uninit() is already safe on NULL.

Fixes: fe2691b3bb
Reported-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 22:02:00 +02:00
Alexandru Ardelean
e43aab67ed avdevice/v4l2: rename 'buff_data' -> 'buf_desc'
Since we've added a 'buf_data' struct, rename this to avoid any confusion
about this one.

Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
2026-04-10 16:02:28 +00:00
Alexandru Ardelean
1011e4d647 avdevice/v4l2: wrap buf_start and buf_len into a struct
This reduces the number of malloc() & free() calls, and structures the
data for the buffers a bit neatly.
In case more per-buffer data needs to be added, having a separate struct
is useful.

Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
2026-04-10 16:02:28 +00:00
Alexandru Ardelean
24adcf3a72 avdevice/v4l2: fix potential memleak when allocating device buffers
In the loop which allocates the buffers for a V4L2 device, if failure
occurs for a certain buffer (e.g. 3rd of 4 buffers), then the previously
allocated buffers (and the buffer array) would not be free'd in
the mmap_init(). This would cause a leak.

This change handles the error cases of that loop to free all allocated
resources, so that when mmap_init() fails nothing is leaked.

Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com>
2026-04-10 16:02:28 +00:00
Niklas Haas
0e983a0604 swscale: align allocated frame buffers to SwsPass hints
This avoids hitting the slow memcpy fallback paths altogether, whenever
swscale.c is handling plane allocation.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
b5573a8683 swscale/ops_dispatch: cosmetic
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
3a15990368 swscale/ops_dispatch: forward correct pass alignment
As a consequence of the fact that the frame pool API doesn't let us directly
access the linesize, we have to "un-translate" the over_read/write back to
the nearest multiple of the pixel size.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
5441395a48 swscale/graph: add optimal alignment/padding hints
Allows the pass buffer allocator to make smarter decisions based on the actual
alignment requirements of the specific pass.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
2deca0ec19 swscale: clean up allocated frames on error
Matches the semantics of sws_frame_begin(), which also cleans up any
allocated buffers on error.

This is an issue introduced by the commit that allowed ff_sws_graph_run()
to fail in the first place.

Fixes: 563cc8216b
2026-04-10 15:12:18 +02:00
Niklas Haas
6c89a30ecd swscale: add FFFramePool and use it for allocating planes
The major consequence of this is that we start allocating buffers per plane,
instead of allocating one contiguous buffer. This makes the no-op/refcopy
case slightly slower, but doesn't meaningfully affect the rest:

yuva444p -> yuva444p, time=157/1000 us (ref=78/1000 us), speedup=0.497x slower
Overall speedup=1.016x faster, min=0.983x max=1.092x

However, this is a necessary consequence of the desire to allow partial plane
allocations / single plane refcopies. This slowdown also does not affect
vf_scale, which already uses avfilter/framepool.c (via ff_get_video_buffer).

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
fe2691b3bb avfilter/framepool: stack-allocate FFFramePool
Saves a pointless free/alloc cycle on reinit. For the vast majority of filter
links, this going to be allocated anyway; and on the occasions that it's not,
the waste is marginal.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
a2ca55c563 avfilter/framepool: remove unnecessary braces (style)
As per the FFmpeg coding style guidelines, braces should be avoided on
isolated single-line statement bodies.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
5c4490a0a6 avfilter/framepool: fix whitespace (cosmetic)
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
38543781cc avfilter/framepool: move variable declarations to site of definition
This is not C89 anymore.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
6efbd99e48 avfilter/framepool: remove check for impossible condition
FFALIGN(..., pool->align) = (...) & ~(pool->align - 1), so this condition
equates to: ((...) & ~(align - 1) & (align - 1)), which is trivially 0.

(Note that all expressions are of type `int`)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
0b43b8ef31 avfilter/framepool: make FFFramePool public
This struct is overally pretty trivial and there is little to no internal
state or invariants that need to be protected.

Making it public allows e.g. libswscale to allocate buffers for individual
planes directly.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
3e99631873 avfilter/framepool: remove pointless ternary (cosmetic)
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00
Niklas Haas
53ce7265ab avfilter/framepool: use strongly typed union of pixel/sample format
Replacing the generic `int format` field. This aids in debugging, as
e.g. gdb will tend to translate the strongly typed enums back into human
readable names automatically.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-10 15:12:18 +02:00