Commit Graph

122974 Commits

Author SHA1 Message Date
Michael Niedermayer
f81d6479ec tools/target_dec_fuzzer: Adjust threshold for MPC8
Fixes: Timeout
Fixes: 471587345/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MPC8_fuzzer-4824233864921088

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 23:15:19 +01:00
Michael Niedermayer
c8b57f0a1e tools/target_dec_fuzzer: Adjust threshold for BFI
Fixes: timeout
Fixes: 471606773/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BFI_fuzzer-6707440390569984

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 23:14:44 +01:00
Michael Niedermayer
4446dfb0e3 avcodec/flashsv: Check for input space before (re)allocating frame
Fixes: Timeout
Fixes: 471605680/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLASHSV2_DEC_fuzzer-6210773459468288
Fixes: 471605920/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLASHSV_DEC_fuzzer-6230719287590912

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 22:59:44 +01:00
Michael Niedermayer
40cafc25cf avcodec/mdec: Check input space vs minimal block size
Fixes: Timeout
Fixes: 481006706/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MDEC_fuzzer-6122832651419648

Found-by:  continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 22:54:38 +01:00
Michael Niedermayer
73681f888d avcodec/h264_parser: Check remaining input length in loop in scan_mmco_reset()
Fixes: read of uninitialized memory
Fixes: 476177761/clusterfuzz-testcase-minimized-ffmpeg_dem_H264_fuzzer-6400884824408064

Found-by:  continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 22:43:28 +01:00
Niklas Haas
b21f1b6482 tests/swscale: don't pass fake object to av_opt_eval_*
This is UB, as the fake object may be used for logging.

Reported-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: ea791a4ef1
2026-02-23 20:55:27 +00:00
Niklas Haas
afdb683a3f swscale: avoid UB on interlaced frames
NULL+0 is UB.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
d918551650 swscale/graph: switch SwsPass.output to refstruct
Allows multiple passes to share a single output buffer reference. We always
allocate an output buffer so that subpasses can share the same output buffer
reference while still allowing that reference to implicitly point to the
final output image.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
cc346b232d swscale/graph: store current pass input instead of global args
The global args to ff_sws_graph_run() really shouldn't matter inside thread
workers. If they ever do, it indicates a leaky abstraction. The only reason
it was needed in the first place was because of the way the input/output
buffers implicitly defaulted to the global args.

However, we can solve this much more elegantly by just calculating it in
ff_sws_graph_run() directly and storing the computed SwsImg inside the
execution state.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
1e071c8585 swscale/graph: omit memcpy() if src and dst are identical
This allows already referenced planes to be skipped, in the case of e.g.
only some of the output planes being sucessfully referenced. Also avoids
what is technically UB, if the user happens to call ff_sws_graph_run() after
already having ref'd an image.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
b98751b13c swscale/graph: set up palette using current input image
Using the original input image here is completely wrong - the format/palette
could have been set to anything else in the meantime. At best, we would want to
use the original input to add_legacy_sws_pass(), but it's impossible for this
to differ from the per-pass input. The only time legacy subpasses are added
is when using cascaded contexts, but in this case, the only context actually
reading from the palette format would be the first one.

I'm not entirely sure why this code was originally written this way, but
I'm reasonably confident that it's not at all necessary. Tested extensively
on both FATE, the self-test, and real-world files.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
0b446cdccd swscale/graph: switch to an AVBufferRef per plane
This annoyingly requires recreating some of the logic inside av_img_alloc(),
because there's no good existing current helper accessible from libswscale
that gives per-plane allocations like this.

The new code is based off the calculations inside libavframe/bufferpool.c.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
afa08f4971 swscale/graph: duplicate buffer dimensions in SwsPassBuffer
When multiple passes share a buffer reference, the true buffer dimensions
may be different for each pass, depending on slice alignment. So we can't
rely on the pass dimensions being representative.

Instead, store this information in the SwsPassBuffer itself.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
fe25e54d0f swscale/graph: move output image into separate struct
I want to add more metadata to this and also turn it into a refstruct,
but get the cosmetic diff out of the way first.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
18060a8820 swscale/graph: simplify ff_sws_graph_run() API
There's little reason not to directly take an SwsImg here; it's already an
internally visible struct.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Niklas Haas
e1fd274706 swscale/graph: check output plane pointer instead of pixel format
To see if the output buffers are allocated or not.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-23 19:39:17 +00:00
Marvin Scholz
64fafd63f0 avformat: remove HLS protocol
The use of this protocol was already discouraged and warned about
for years with the recommendation to use the HLS demuxer instead.
2026-02-23 20:20:20 +01:00
Niklas Haas
ea791a4ef1 swscale/tests/swscale: parse flags from string
We don't actually have an SwsContext yet at this point, so just use
AV_OPT_SEARCH_FAKE_OBJ. For the actual evaluation, the signature only
requires that we pass a "pointer to a struct that contains an AVClass as
its first member", so passing a double pointer to the class itself is
sufficient.
2026-02-23 19:23:09 +01:00
Marvin Scholz
fba9fc0c6b lavc: wmadec: limit variable scopes
Moves the loop variable declarations to the actual loops,
narrowing their scopes.
2026-02-23 15:29:27 +00:00
Marvin Scholz
d219be03d6 lavc: wmadec: assert channels count
This should never exceed MAX_CHANNELS, else there will be several
out of bounds writes.
2026-02-23 15:29:27 +00:00
Lynne
7b15039cdb Changelog: add changelog entry for Mps212 2026-02-23 07:57:57 +01:00
Lynne
baad75cafa aacdec_usac: add support for parsing Mpsp212 (MPEG surround)
This commit adds the full bitstream parsing for Mps212.
2026-02-23 07:57:57 +01:00
Lynne
86977fdb6b aacdec_tab: add Mps212 tables
To be used in the following commit.
2026-02-23 07:57:57 +01:00
Lynne
a4ab4a98c4 aacdec_tab: split up tables init 2026-02-23 07:57:57 +01:00
James Almer
40e0463113 avformat/mov: free item_name on infe entry parsing failure
Fixes regression since 28c330d0f3.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-02-22 23:16:15 -03:00
Michael Niedermayer
7e10579f49 avcodec/exr: fix AVERROR typo
Fixes: out of array read
Fixes: 485866440/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_DEC_fuzzer-4520520419966976

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-23 01:44:49 +00:00
James Almer
c3aa28f23d avformat/mov: check for EOF in more loops
Signed-off-by: James Almer <jamrial@gmail.com>
2026-02-23 00:43:50 +00:00
James Almer
28c330d0f3 avformat/mov: abort if the queried item doesn't exist instead of overwriting it
The check for item presence was insufficient as it would result in the last
item in the array being overwritten if it existed even if the id didn't match.

Fixes: Assertion ref failed at src/libavformat/mov.c:10649
Fixes: clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-5312542695292928
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg

Signed-off-by: James Almer <jamrial@gmail.com>
2026-02-23 00:43:50 +00:00
Nariman-Sayed
9bc4109b23 avformat/tls_openssl: fix memory leak in cert_from_pem_string
When PEM_read_bio_X509 fails, BIO was not freed, causing memory leak.
Free BIO before returning NULL to prevent resource leak.
2026-02-22 22:39:43 +00:00
Andreas Rheinhardt
53a9a34e23 avcodec/snow: Reduce sizeof(SnowContext)
Each SubBand currently contains an array of 519 uint8_t[32],
yet most of these are unused: For both the decoder and the
encoder, at most 34 contexts are actually used: The only
variable index is context+2, where context is the result
of av_log2() and therefore in the 0..31 range.

There are also several accesses using compile-time indices,
the highest of which is 30. FATE passes with 31 contexts
and maybe these are enough, but I don't know.

Reducing the number to 34 reduces sizeof(SnowContext)
from 2141664B to 155104B here (on x64).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 22:05:16 +01:00
Andreas Rheinhardt
bb92009386 avcodec/snow: Only allocate emu_edge_buffer for encoder
Also allocate it during init and move it to the encoder's context.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 22:05:15 +01:00
Michael Niedermayer
c7b5f1537d CONTRIBUTING.md: Add Forgejo
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-22 04:39:22 +00:00
Lynne
13e063ceec vulkan/ffv1: properly initialize the linecache 2026-02-22 03:39:23 +01:00
Michael Niedermayer
99515a3342 avcodec/jpeg2000htdec: Check Lcup and Lref
Fixes: use of uninitialized memory
Fixes: 482494999/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-6467586186608640

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-22 02:31:06 +00:00
Andreas Rheinhardt
6c1c1720cf avcodec/x86/vvc/dsp_init: Mark dsp init function as av_cold
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:05:12 +01:00
Andreas Rheinhardt
af3f8f5bd2 avcodec/x86/vvc/of: Break dependency chain
Don't extract and update one word of one and the same register
at a time; use separate src and dst registers, so that pextrw
and bsr can be done in parallel. Also use movd instead of pinsrw
for the first word.

Old benchmarks:
apply_bdof_8_8x16_c:                                  3275.2 ( 1.00x)
apply_bdof_8_8x16_avx2:                                487.6 ( 6.72x)
apply_bdof_8_16x8_c:                                  3243.1 ( 1.00x)
apply_bdof_8_16x8_avx2:                                284.4 (11.40x)
apply_bdof_8_16x16_c:                                 6501.8 ( 1.00x)
apply_bdof_8_16x16_avx2:                               570.0 (11.41x)
apply_bdof_10_8x16_c:                                 3286.5 ( 1.00x)
apply_bdof_10_8x16_avx2:                               461.7 ( 7.12x)
apply_bdof_10_16x8_c:                                 3274.5 ( 1.00x)
apply_bdof_10_16x8_avx2:                               271.4 (12.06x)
apply_bdof_10_16x16_c:                                6590.0 ( 1.00x)
apply_bdof_10_16x16_avx2:                              543.9 (12.12x)
apply_bdof_12_8x16_c:                                 3307.6 ( 1.00x)
apply_bdof_12_8x16_avx2:                               462.2 ( 7.16x)
apply_bdof_12_16x8_c:                                 3287.4 ( 1.00x)
apply_bdof_12_16x8_avx2:                               271.8 (12.10x)
apply_bdof_12_16x16_c:                                6465.7 ( 1.00x)
apply_bdof_12_16x16_avx2:                              543.8 (11.89x)

New benchmarks:
apply_bdof_8_8x16_c:                                  3255.7 ( 1.00x)
apply_bdof_8_8x16_avx2:                                349.3 ( 9.32x)
apply_bdof_8_16x8_c:                                  3262.5 ( 1.00x)
apply_bdof_8_16x8_avx2:                                214.8 (15.19x)
apply_bdof_8_16x16_c:                                 6471.6 ( 1.00x)
apply_bdof_8_16x16_avx2:                               429.8 (15.06x)
apply_bdof_10_8x16_c:                                 3227.7 ( 1.00x)
apply_bdof_10_8x16_avx2:                               321.6 (10.04x)
apply_bdof_10_16x8_c:                                 3250.2 ( 1.00x)
apply_bdof_10_16x8_avx2:                               201.2 (16.16x)
apply_bdof_10_16x16_c:                                6476.5 ( 1.00x)
apply_bdof_10_16x16_avx2:                              400.9 (16.16x)
apply_bdof_12_8x16_c:                                 3230.7 ( 1.00x)
apply_bdof_12_8x16_avx2:                               321.8 (10.04x)
apply_bdof_12_16x8_c:                                 3210.5 ( 1.00x)
apply_bdof_12_16x8_avx2:                               200.9 (15.98x)
apply_bdof_12_16x16_c:                                6474.5 ( 1.00x)
apply_bdof_12_16x16_avx2:                              400.2 (16.18x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:05:12 +01:00
Andreas Rheinhardt
19dc7b79a4 avcodec/x86/vvc/of: Unify shuffling
One can use the same shuffles for the width 8 and width 16
case if one also changes the permutation in vpermd (that always
follows pshufb for width 16).

This also allows to load it before checking width.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:03:22 +01:00
Andreas Rheinhardt
8e82416434 avcodec/x86/vvc/of: Avoid unused register
Avoids a push+pop.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:02:20 +01:00
Andreas Rheinhardt
81fb70c833 avcodec/x86/vvc/mc,dsp_init: Avoid pointless wrappers for w_avg
They only add overhead (in form of another function call,
sign-extending some parameters to 64bit (although the upper
bits are not used at all) and rederiving the actual number
of bits (from the maximum value (1<<bpp)-1)).

Old benchmarks:
w_avg_8_2x2_c:                                          16.4 ( 1.00x)
w_avg_8_2x2_avx2:                                       12.9 ( 1.27x)
w_avg_8_4x4_c:                                          48.0 ( 1.00x)
w_avg_8_4x4_avx2:                                       14.9 ( 3.23x)
w_avg_8_8x8_c:                                         168.2 ( 1.00x)
w_avg_8_8x8_avx2:                                       22.4 ( 7.49x)
w_avg_8_16x16_c:                                       396.5 ( 1.00x)
w_avg_8_16x16_avx2:                                     47.9 ( 8.28x)
w_avg_8_32x32_c:                                      1466.3 ( 1.00x)
w_avg_8_32x32_avx2:                                    172.8 ( 8.48x)
w_avg_8_64x64_c:                                      5629.3 ( 1.00x)
w_avg_8_64x64_avx2:                                    678.7 ( 8.29x)
w_avg_8_128x128_c:                                   22122.4 ( 1.00x)
w_avg_8_128x128_avx2:                                 2743.5 ( 8.06x)
w_avg_10_2x2_c:                                         18.7 ( 1.00x)
w_avg_10_2x2_avx2:                                      13.1 ( 1.43x)
w_avg_10_4x4_c:                                         50.3 ( 1.00x)
w_avg_10_4x4_avx2:                                      15.9 ( 3.17x)
w_avg_10_8x8_c:                                        109.3 ( 1.00x)
w_avg_10_8x8_avx2:                                      20.6 ( 5.30x)
w_avg_10_16x16_c:                                      395.5 ( 1.00x)
w_avg_10_16x16_avx2:                                    44.8 ( 8.83x)
w_avg_10_32x32_c:                                     1534.2 ( 1.00x)
w_avg_10_32x32_avx2:                                   141.4 (10.85x)
w_avg_10_64x64_c:                                     6003.6 ( 1.00x)
w_avg_10_64x64_avx2:                                   557.4 (10.77x)
w_avg_10_128x128_c:                                  23722.7 ( 1.00x)
w_avg_10_128x128_avx2:                                2205.0 (10.76x)
w_avg_12_2x2_c:                                         18.6 ( 1.00x)
w_avg_12_2x2_avx2:                                      13.1 ( 1.42x)
w_avg_12_4x4_c:                                         52.2 ( 1.00x)
w_avg_12_4x4_avx2:                                      16.1 ( 3.24x)
w_avg_12_8x8_c:                                        109.2 ( 1.00x)
w_avg_12_8x8_avx2:                                      20.6 ( 5.29x)
w_avg_12_16x16_c:                                      396.1 ( 1.00x)
w_avg_12_16x16_avx2:                                    45.0 ( 8.81x)
w_avg_12_32x32_c:                                     1532.6 ( 1.00x)
w_avg_12_32x32_avx2:                                   142.1 (10.79x)
w_avg_12_64x64_c:                                     6002.2 ( 1.00x)
w_avg_12_64x64_avx2:                                   557.3 (10.77x)
w_avg_12_128x128_c:                                  23748.7 ( 1.00x)
w_avg_12_128x128_avx2:                                2206.4 (10.76x)

New benchmarks:
w_avg_8_2x2_c:                                          16.0 ( 1.00x)
w_avg_8_2x2_avx2:                                        9.3 ( 1.71x)
w_avg_8_4x4_c:                                          48.4 ( 1.00x)
w_avg_8_4x4_avx2:                                       12.4 ( 3.91x)
w_avg_8_8x8_c:                                         168.7 ( 1.00x)
w_avg_8_8x8_avx2:                                       21.1 ( 8.00x)
w_avg_8_16x16_c:                                       394.5 ( 1.00x)
w_avg_8_16x16_avx2:                                     46.2 ( 8.54x)
w_avg_8_32x32_c:                                      1456.3 ( 1.00x)
w_avg_8_32x32_avx2:                                    171.8 ( 8.48x)
w_avg_8_64x64_c:                                      5636.2 ( 1.00x)
w_avg_8_64x64_avx2:                                    676.9 ( 8.33x)
w_avg_8_128x128_c:                                   22129.1 ( 1.00x)
w_avg_8_128x128_avx2:                                 2734.3 ( 8.09x)
w_avg_10_2x2_c:                                         18.7 ( 1.00x)
w_avg_10_2x2_avx2:                                      10.3 ( 1.82x)
w_avg_10_4x4_c:                                         50.8 ( 1.00x)
w_avg_10_4x4_avx2:                                      13.4 ( 3.79x)
w_avg_10_8x8_c:                                        109.7 ( 1.00x)
w_avg_10_8x8_avx2:                                      20.4 ( 5.38x)
w_avg_10_16x16_c:                                      395.2 ( 1.00x)
w_avg_10_16x16_avx2:                                    41.7 ( 9.48x)
w_avg_10_32x32_c:                                     1535.6 ( 1.00x)
w_avg_10_32x32_avx2:                                   137.9 (11.13x)
w_avg_10_64x64_c:                                     6002.1 ( 1.00x)
w_avg_10_64x64_avx2:                                   548.5 (10.94x)
w_avg_10_128x128_c:                                  23742.7 ( 1.00x)
w_avg_10_128x128_avx2:                                2179.8 (10.89x)
w_avg_12_2x2_c:                                         18.9 ( 1.00x)
w_avg_12_2x2_avx2:                                      10.3 ( 1.84x)
w_avg_12_4x4_c:                                         52.4 ( 1.00x)
w_avg_12_4x4_avx2:                                      13.4 ( 3.91x)
w_avg_12_8x8_c:                                        109.2 ( 1.00x)
w_avg_12_8x8_avx2:                                      20.3 ( 5.39x)
w_avg_12_16x16_c:                                      396.3 ( 1.00x)
w_avg_12_16x16_avx2:                                    41.7 ( 9.51x)
w_avg_12_32x32_c:                                     1532.6 ( 1.00x)
w_avg_12_32x32_avx2:                                   138.6 (11.06x)
w_avg_12_64x64_c:                                     5996.7 ( 1.00x)
w_avg_12_64x64_avx2:                                   549.6 (10.91x)
w_avg_12_128x128_c:                                  23738.0 ( 1.00x)
w_avg_12_128x128_avx2:                                2177.2 (10.90x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 01:01:27 +01:00
Andreas Rheinhardt
ea78402e9c avcodec/x86/vvc/mc,dsp_init: Avoid pointless wrappers for avg
Up until now, there were two averaging assembly functions,
one for eight bit content and one for <=16 bit content;
there are also three C-wrappers around these functions,
for 8, 10 and 12 bpp. These wrappers simply forward the
maximum permissible value (i.e. (1<<bpp)-1) and promote
some integer values to ptrdiff_t.

Yet these wrappers are absolutely useless: The assembly functions
rederive the bpp from the maximum and only the integer part
of the promoted ptrdiff_t values is ever used. Of course,
these wrappers also entail an additional call (not a tail call,
because the additional maximum parameter is passed on the stack).

Remove the wrappers and add per-bpp assembly functions instead.
Given that the only difference between 10 and 12 bits are some
constants in registers, the main part of these functions can be
shared (given that this code uses a jumptable, it can even
be done without adding any additional jump).

Old benchmarks:
avg_8_2x2_c:                                            11.4 ( 1.00x)
avg_8_2x2_avx2:                                          7.9 ( 1.44x)
avg_8_4x4_c:                                            30.7 ( 1.00x)
avg_8_4x4_avx2:                                         10.4 ( 2.95x)
avg_8_8x8_c:                                           134.5 ( 1.00x)
avg_8_8x8_avx2:                                         16.6 ( 8.12x)
avg_8_16x16_c:                                         255.6 ( 1.00x)
avg_8_16x16_avx2:                                       28.2 ( 9.07x)
avg_8_32x32_c:                                         897.7 ( 1.00x)
avg_8_32x32_avx2:                                       83.9 (10.70x)
avg_8_64x64_c:                                        3320.0 ( 1.00x)
avg_8_64x64_avx2:                                      321.1 (10.34x)
avg_8_128x128_c:                                     12981.8 ( 1.00x)
avg_8_128x128_avx2:                                   1480.1 ( 8.77x)
avg_10_2x2_c:                                           12.0 ( 1.00x)
avg_10_2x2_avx2:                                         8.4 ( 1.43x)
avg_10_4x4_c:                                           34.9 ( 1.00x)
avg_10_4x4_avx2:                                         9.8 ( 3.56x)
avg_10_8x8_c:                                           76.8 ( 1.00x)
avg_10_8x8_avx2:                                        15.1 ( 5.08x)
avg_10_16x16_c:                                        256.6 ( 1.00x)
avg_10_16x16_avx2:                                      25.1 (10.20x)
avg_10_32x32_c:                                        932.9 ( 1.00x)
avg_10_32x32_avx2:                                      73.4 (12.72x)
avg_10_64x64_c:                                       3517.9 ( 1.00x)
avg_10_64x64_avx2:                                     414.8 ( 8.48x)
avg_10_128x128_c:                                    13695.3 ( 1.00x)
avg_10_128x128_avx2:                                  1648.1 ( 8.31x)
avg_12_2x2_c:                                           13.1 ( 1.00x)
avg_12_2x2_avx2:                                         8.6 ( 1.53x)
avg_12_4x4_c:                                           35.4 ( 1.00x)
avg_12_4x4_avx2:                                        10.1 ( 3.49x)
avg_12_8x8_c:                                           76.6 ( 1.00x)
avg_12_8x8_avx2:                                        16.7 ( 4.60x)
avg_12_16x16_c:                                        256.6 ( 1.00x)
avg_12_16x16_avx2:                                      25.5 (10.07x)
avg_12_32x32_c:                                        933.2 ( 1.00x)
avg_12_32x32_avx2:                                      75.7 (12.34x)
avg_12_64x64_c:                                       3519.1 ( 1.00x)
avg_12_64x64_avx2:                                     416.8 ( 8.44x)
avg_12_128x128_c:                                    13695.1 ( 1.00x)
avg_12_128x128_avx2:                                  1651.6 ( 8.29x)

New benchmarks:
avg_8_2x2_c:                                            11.5 ( 1.00x)
avg_8_2x2_avx2:                                          6.0 ( 1.91x)
avg_8_4x4_c:                                            29.7 ( 1.00x)
avg_8_4x4_avx2:                                          8.0 ( 3.72x)
avg_8_8x8_c:                                           131.4 ( 1.00x)
avg_8_8x8_avx2:                                         12.2 (10.74x)
avg_8_16x16_c:                                         254.3 ( 1.00x)
avg_8_16x16_avx2:                                       24.8 (10.25x)
avg_8_32x32_c:                                         897.7 ( 1.00x)
avg_8_32x32_avx2:                                       77.8 (11.54x)
avg_8_64x64_c:                                        3321.3 ( 1.00x)
avg_8_64x64_avx2:                                      318.7 (10.42x)
avg_8_128x128_c:                                     12988.4 ( 1.00x)
avg_8_128x128_avx2:                                   1430.1 ( 9.08x)
avg_10_2x2_c:                                           12.1 ( 1.00x)
avg_10_2x2_avx2:                                         5.7 ( 2.13x)
avg_10_4x4_c:                                           35.0 ( 1.00x)
avg_10_4x4_avx2:                                         9.0 ( 3.88x)
avg_10_8x8_c:                                           77.2 ( 1.00x)
avg_10_8x8_avx2:                                        12.4 ( 6.24x)
avg_10_16x16_c:                                        256.2 ( 1.00x)
avg_10_16x16_avx2:                                      24.3 (10.56x)
avg_10_32x32_c:                                        932.9 ( 1.00x)
avg_10_32x32_avx2:                                      71.9 (12.97x)
avg_10_64x64_c:                                       3516.8 ( 1.00x)
avg_10_64x64_avx2:                                     414.7 ( 8.48x)
avg_10_128x128_c:                                    13693.7 ( 1.00x)
avg_10_128x128_avx2:                                  1609.3 ( 8.51x)
avg_12_2x2_c:                                           14.1 ( 1.00x)
avg_12_2x2_avx2:                                         5.7 ( 2.48x)
avg_12_4x4_c:                                           35.8 ( 1.00x)
avg_12_4x4_avx2:                                         9.0 ( 3.96x)
avg_12_8x8_c:                                           76.9 ( 1.00x)
avg_12_8x8_avx2:                                        12.4 ( 6.22x)
avg_12_16x16_c:                                        256.5 ( 1.00x)
avg_12_16x16_avx2:                                      24.4 (10.50x)
avg_12_32x32_c:                                        934.1 ( 1.00x)
avg_12_32x32_avx2:                                      72.0 (12.97x)
avg_12_64x64_c:                                       3518.2 ( 1.00x)
avg_12_64x64_avx2:                                     414.8 ( 8.48x)
avg_12_128x128_c:                                    13689.5 ( 1.00x)
avg_12_128x128_avx2:                                  1611.1 ( 8.50x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:58:33 +01:00
Andreas Rheinhardt
5a60b3f1a6 avcodec/x86/vvc/mc: Remove always-false branches
The C versions of the average and weighted average functions
contains "FFMAX(3, 15 - BIT_DEPTH)" and the code here followed
this; yet it is only instantiated for bit depths 8, 10 and 12,
for which the above is just 15-BIT_DEPTH. So the comparisons
are unnecessary.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
59f8ff4c18 avcodec/x86/vvc/mc: Remove unused constants
Also avoid overaligning .rodata.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
eabf52e787 avcodec/x86/vvc/mc: Avoid unused work
The high quadword of these registers is zero for width 2.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
9317fb2b2e avcodec/x86/vvc/mc: Avoid ymm registers where possible
Widths 2 and 4 fit into xmm registers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
caa0ae0cfb avcodec/x86/vvc/mc: Avoid pextr[dq], v{insert,extract}i128
Use mov[dq], movdqu instead if the least significant parts
are set (i.e. if the immediate value is 0x0).

Old benchmarks:
avg_8_2x2_c:                                            11.3 ( 1.00x)
avg_8_2x2_avx2:                                          7.5 ( 1.50x)
avg_8_4x4_c:                                            31.2 ( 1.00x)
avg_8_4x4_avx2:                                         10.7 ( 2.91x)
avg_8_8x8_c:                                           133.5 ( 1.00x)
avg_8_8x8_avx2:                                         21.2 ( 6.30x)
avg_8_16x16_c:                                         254.7 ( 1.00x)
avg_8_16x16_avx2:                                       30.1 ( 8.46x)
avg_8_32x32_c:                                         896.9 ( 1.00x)
avg_8_32x32_avx2:                                      103.9 ( 8.63x)
avg_8_64x64_c:                                        3320.7 ( 1.00x)
avg_8_64x64_avx2:                                      539.4 ( 6.16x)
avg_8_128x128_c:                                     12991.5 ( 1.00x)
avg_8_128x128_avx2:                                   1661.3 ( 7.82x)
avg_10_2x2_c:                                           21.3 ( 1.00x)
avg_10_2x2_avx2:                                         8.3 ( 2.55x)
avg_10_4x4_c:                                           34.9 ( 1.00x)
avg_10_4x4_avx2:                                        10.6 ( 3.28x)
avg_10_8x8_c:                                           76.3 ( 1.00x)
avg_10_8x8_avx2:                                        20.2 ( 3.77x)
avg_10_16x16_c:                                        255.9 ( 1.00x)
avg_10_16x16_avx2:                                      24.1 (10.60x)
avg_10_32x32_c:                                        932.4 ( 1.00x)
avg_10_32x32_avx2:                                      73.3 (12.72x)
avg_10_64x64_c:                                       3516.4 ( 1.00x)
avg_10_64x64_avx2:                                     601.7 ( 5.84x)
avg_10_128x128_c:                                    13690.6 ( 1.00x)
avg_10_128x128_avx2:                                  1613.2 ( 8.49x)
avg_12_2x2_c:                                           14.0 ( 1.00x)
avg_12_2x2_avx2:                                         8.3 ( 1.67x)
avg_12_4x4_c:                                           35.3 ( 1.00x)
avg_12_4x4_avx2:                                        10.9 ( 3.26x)
avg_12_8x8_c:                                           76.5 ( 1.00x)
avg_12_8x8_avx2:                                        20.3 ( 3.77x)
avg_12_16x16_c:                                        256.7 ( 1.00x)
avg_12_16x16_avx2:                                      24.1 (10.63x)
avg_12_32x32_c:                                        932.5 ( 1.00x)
avg_12_32x32_avx2:                                      73.3 (12.72x)
avg_12_64x64_c:                                       3520.5 ( 1.00x)
avg_12_64x64_avx2:                                     602.6 ( 5.84x)
avg_12_128x128_c:                                    13689.6 ( 1.00x)
avg_12_128x128_avx2:                                  1613.1 ( 8.49x)
w_avg_8_2x2_c:                                          16.7 ( 1.00x)
w_avg_8_2x2_avx2:                                       13.4 ( 1.25x)
w_avg_8_4x4_c:                                          44.5 ( 1.00x)
w_avg_8_4x4_avx2:                                       15.9 ( 2.81x)
w_avg_8_8x8_c:                                         166.1 ( 1.00x)
w_avg_8_8x8_avx2:                                       45.7 ( 3.63x)
w_avg_8_16x16_c:                                       392.9 ( 1.00x)
w_avg_8_16x16_avx2:                                     57.8 ( 6.80x)
w_avg_8_32x32_c:                                      1455.5 ( 1.00x)
w_avg_8_32x32_avx2:                                    215.0 ( 6.77x)
w_avg_8_64x64_c:                                      5621.8 ( 1.00x)
w_avg_8_64x64_avx2:                                    875.2 ( 6.42x)
w_avg_8_128x128_c:                                   22131.3 ( 1.00x)
w_avg_8_128x128_avx2:                                 3390.1 ( 6.53x)
w_avg_10_2x2_c:                                         18.0 ( 1.00x)
w_avg_10_2x2_avx2:                                      14.0 ( 1.28x)
w_avg_10_4x4_c:                                         53.9 ( 1.00x)
w_avg_10_4x4_avx2:                                      15.9 ( 3.40x)
w_avg_10_8x8_c:                                        109.5 ( 1.00x)
w_avg_10_8x8_avx2:                                      40.4 ( 2.71x)
w_avg_10_16x16_c:                                      395.7 ( 1.00x)
w_avg_10_16x16_avx2:                                    44.7 ( 8.86x)
w_avg_10_32x32_c:                                     1532.7 ( 1.00x)
w_avg_10_32x32_avx2:                                   142.4 (10.77x)
w_avg_10_64x64_c:                                     6007.7 ( 1.00x)
w_avg_10_64x64_avx2:                                   745.5 ( 8.06x)
w_avg_10_128x128_c:                                  23719.7 ( 1.00x)
w_avg_10_128x128_avx2:                                2217.7 (10.70x)
w_avg_12_2x2_c:                                         18.9 ( 1.00x)
w_avg_12_2x2_avx2:                                      13.6 ( 1.38x)
w_avg_12_4x4_c:                                         47.5 ( 1.00x)
w_avg_12_4x4_avx2:                                      15.9 ( 2.99x)
w_avg_12_8x8_c:                                        109.3 ( 1.00x)
w_avg_12_8x8_avx2:                                      40.9 ( 2.67x)
w_avg_12_16x16_c:                                      395.6 ( 1.00x)
w_avg_12_16x16_avx2:                                    44.8 ( 8.84x)
w_avg_12_32x32_c:                                     1531.0 ( 1.00x)
w_avg_12_32x32_avx2:                                   141.8 (10.80x)
w_avg_12_64x64_c:                                     6016.7 ( 1.00x)
w_avg_12_64x64_avx2:                                   732.8 ( 8.21x)
w_avg_12_128x128_c:                                  23762.2 ( 1.00x)
w_avg_12_128x128_avx2:                                2223.4 (10.69x)

New benchmarks:
avg_8_2x2_c:                                            11.3 ( 1.00x)
avg_8_2x2_avx2:                                          7.6 ( 1.49x)
avg_8_4x4_c:                                            31.2 ( 1.00x)
avg_8_4x4_avx2:                                         10.8 ( 2.89x)
avg_8_8x8_c:                                           131.6 ( 1.00x)
avg_8_8x8_avx2:                                         15.6 ( 8.42x)
avg_8_16x16_c:                                         255.3 ( 1.00x)
avg_8_16x16_avx2:                                       27.9 ( 9.16x)
avg_8_32x32_c:                                         897.9 ( 1.00x)
avg_8_32x32_avx2:                                       81.2 (11.06x)
avg_8_64x64_c:                                        3320.0 ( 1.00x)
avg_8_64x64_avx2:                                      335.1 ( 9.91x)
avg_8_128x128_c:                                     12999.1 ( 1.00x)
avg_8_128x128_avx2:                                   1456.3 ( 8.93x)
avg_10_2x2_c:                                           12.0 ( 1.00x)
avg_10_2x2_avx2:                                         8.6 ( 1.40x)
avg_10_4x4_c:                                           34.9 ( 1.00x)
avg_10_4x4_avx2:                                         9.7 ( 3.61x)
avg_10_8x8_c:                                           76.7 ( 1.00x)
avg_10_8x8_avx2:                                        16.3 ( 4.69x)
avg_10_16x16_c:                                        256.3 ( 1.00x)
avg_10_16x16_avx2:                                      25.2 (10.18x)
avg_10_32x32_c:                                        932.8 ( 1.00x)
avg_10_32x32_avx2:                                      73.3 (12.72x)
avg_10_64x64_c:                                       3518.8 ( 1.00x)
avg_10_64x64_avx2:                                     416.8 ( 8.44x)
avg_10_128x128_c:                                    13691.6 ( 1.00x)
avg_10_128x128_avx2:                                  1612.9 ( 8.49x)
avg_12_2x2_c:                                           14.1 ( 1.00x)
avg_12_2x2_avx2:                                         8.7 ( 1.62x)
avg_12_4x4_c:                                           35.7 ( 1.00x)
avg_12_4x4_avx2:                                         9.7 ( 3.68x)
avg_12_8x8_c:                                           77.0 ( 1.00x)
avg_12_8x8_avx2:                                        16.9 ( 4.57x)
avg_12_16x16_c:                                        256.2 ( 1.00x)
avg_12_16x16_avx2:                                      25.7 ( 9.96x)
avg_12_32x32_c:                                        933.5 ( 1.00x)
avg_12_32x32_avx2:                                      74.0 (12.62x)
avg_12_64x64_c:                                       3516.4 ( 1.00x)
avg_12_64x64_avx2:                                     408.7 ( 8.60x)
avg_12_128x128_c:                                    13691.6 ( 1.00x)
avg_12_128x128_avx2:                                  1613.8 ( 8.48x)
w_avg_8_2x2_c:                                          16.7 ( 1.00x)
w_avg_8_2x2_avx2:                                       14.0 ( 1.19x)
w_avg_8_4x4_c:                                          48.2 ( 1.00x)
w_avg_8_4x4_avx2:                                       16.1 ( 3.00x)
w_avg_8_8x8_c:                                         168.0 ( 1.00x)
w_avg_8_8x8_avx2:                                       22.5 ( 7.47x)
w_avg_8_16x16_c:                                       392.5 ( 1.00x)
w_avg_8_16x16_avx2:                                     47.9 ( 8.19x)
w_avg_8_32x32_c:                                      1453.7 ( 1.00x)
w_avg_8_32x32_avx2:                                    176.1 ( 8.26x)
w_avg_8_64x64_c:                                      5631.4 ( 1.00x)
w_avg_8_64x64_avx2:                                    690.8 ( 8.15x)
w_avg_8_128x128_c:                                   22139.5 ( 1.00x)
w_avg_8_128x128_avx2:                                 2742.4 ( 8.07x)
w_avg_10_2x2_c:                                         18.1 ( 1.00x)
w_avg_10_2x2_avx2:                                      13.8 ( 1.31x)
w_avg_10_4x4_c:                                         47.0 ( 1.00x)
w_avg_10_4x4_avx2:                                      16.4 ( 2.87x)
w_avg_10_8x8_c:                                        110.0 ( 1.00x)
w_avg_10_8x8_avx2:                                      21.6 ( 5.09x)
w_avg_10_16x16_c:                                      395.2 ( 1.00x)
w_avg_10_16x16_avx2:                                    45.4 ( 8.71x)
w_avg_10_32x32_c:                                     1533.8 ( 1.00x)
w_avg_10_32x32_avx2:                                   142.6 (10.76x)
w_avg_10_64x64_c:                                     6004.4 ( 1.00x)
w_avg_10_64x64_avx2:                                   672.8 ( 8.92x)
w_avg_10_128x128_c:                                  23748.5 ( 1.00x)
w_avg_10_128x128_avx2:                                2198.0 (10.80x)
w_avg_12_2x2_c:                                         17.2 ( 1.00x)
w_avg_12_2x2_avx2:                                      13.9 ( 1.24x)
w_avg_12_4x4_c:                                         51.4 ( 1.00x)
w_avg_12_4x4_avx2:                                      16.5 ( 3.11x)
w_avg_12_8x8_c:                                        109.1 ( 1.00x)
w_avg_12_8x8_avx2:                                      22.0 ( 4.96x)
w_avg_12_16x16_c:                                      395.9 ( 1.00x)
w_avg_12_16x16_avx2:                                    44.9 ( 8.81x)
w_avg_12_32x32_c:                                     1533.5 ( 1.00x)
w_avg_12_32x32_avx2:                                   142.3 (10.78x)
w_avg_12_64x64_c:                                     6002.0 ( 1.00x)
w_avg_12_64x64_avx2:                                   557.5 (10.77x)
w_avg_12_128x128_c:                                  23749.5 ( 1.00x)
w_avg_12_128x128_avx2:                                2202.0 (10.79x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
7bf9c1e3f6 avcodec/x86/vvc/mc: Avoid redundant clipping for 8bit
It is already done by packuswb.

Old benchmarks:
avg_8_2x2_c:                                            11.1 ( 1.00x)
avg_8_2x2_avx2:                                          8.6 ( 1.28x)
avg_8_4x4_c:                                            30.0 ( 1.00x)
avg_8_4x4_avx2:                                         10.8 ( 2.78x)
avg_8_8x8_c:                                           132.0 ( 1.00x)
avg_8_8x8_avx2:                                         25.7 ( 5.14x)
avg_8_16x16_c:                                         254.6 ( 1.00x)
avg_8_16x16_avx2:                                       33.2 ( 7.67x)
avg_8_32x32_c:                                         897.5 ( 1.00x)
avg_8_32x32_avx2:                                      115.6 ( 7.76x)
avg_8_64x64_c:                                        3316.9 ( 1.00x)
avg_8_64x64_avx2:                                      626.5 ( 5.29x)
avg_8_128x128_c:                                     12973.6 ( 1.00x)
avg_8_128x128_avx2:                                   1914.0 ( 6.78x)
w_avg_8_2x2_c:                                          16.7 ( 1.00x)
w_avg_8_2x2_avx2:                                       14.4 ( 1.16x)
w_avg_8_4x4_c:                                          48.2 ( 1.00x)
w_avg_8_4x4_avx2:                                       16.5 ( 2.92x)
w_avg_8_8x8_c:                                         168.1 ( 1.00x)
w_avg_8_8x8_avx2:                                       49.7 ( 3.38x)
w_avg_8_16x16_c:                                       392.4 ( 1.00x)
w_avg_8_16x16_avx2:                                     61.1 ( 6.43x)
w_avg_8_32x32_c:                                      1455.3 ( 1.00x)
w_avg_8_32x32_avx2:                                    224.6 ( 6.48x)
w_avg_8_64x64_c:                                      5632.1 ( 1.00x)
w_avg_8_64x64_avx2:                                    896.9 ( 6.28x)
w_avg_8_128x128_c:                                   22136.3 ( 1.00x)
w_avg_8_128x128_avx2:                                 3626.7 ( 6.10x)

New benchmarks:
avg_8_2x2_c:                                            12.3 ( 1.00x)
avg_8_2x2_avx2:                                          8.1 ( 1.52x)
avg_8_4x4_c:                                            30.3 ( 1.00x)
avg_8_4x4_avx2:                                         11.3 ( 2.67x)
avg_8_8x8_c:                                           131.8 ( 1.00x)
avg_8_8x8_avx2:                                         21.3 ( 6.20x)
avg_8_16x16_c:                                         255.0 ( 1.00x)
avg_8_16x16_avx2:                                       30.6 ( 8.33x)
avg_8_32x32_c:                                         898.5 ( 1.00x)
avg_8_32x32_avx2:                                      104.9 ( 8.57x)
avg_8_64x64_c:                                        3317.7 ( 1.00x)
avg_8_64x64_avx2:                                      540.9 ( 6.13x)
avg_8_128x128_c:                                     12986.5 ( 1.00x)
avg_8_128x128_avx2:                                   1663.4 ( 7.81x)
w_avg_8_2x2_c:                                          16.8 ( 1.00x)
w_avg_8_2x2_avx2:                                       13.9 ( 1.21x)
w_avg_8_4x4_c:                                          48.2 ( 1.00x)
w_avg_8_4x4_avx2:                                       16.2 ( 2.98x)
w_avg_8_8x8_c:                                         168.6 ( 1.00x)
w_avg_8_8x8_avx2:                                       46.3 ( 3.64x)
w_avg_8_16x16_c:                                       392.4 ( 1.00x)
w_avg_8_16x16_avx2:                                     57.7 ( 6.80x)
w_avg_8_32x32_c:                                      1454.6 ( 1.00x)
w_avg_8_32x32_avx2:                                    214.6 ( 6.78x)
w_avg_8_64x64_c:                                      5638.4 ( 1.00x)
w_avg_8_64x64_avx2:                                    875.6 ( 6.44x)
w_avg_8_128x128_c:                                   22133.5 ( 1.00x)
w_avg_8_128x128_avx2:                                 3334.3 ( 6.64x)

Also saves 550B of .text here. The improvements will likely
be even better on Win64, because it avoids using two nonvolatile
registers in the weighted average case.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
b22b65f2f8 avformat/hlsenc: Return error upon error, fix shadowing
Introduced in 65fc0db581.

Reviewed-by: Marvin Scholz <epirat07@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-22 00:23:00 +01:00
Michael Niedermayer
c98346ffaa avcodec/libtheoraenc: make keyframe mask unsigned and handle its larger range
Fixes: left shift of 1 by 31 places cannot be represented in type 'int'
Fixes: 473579864/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_LIBTHEORA_fuzzer-5835688160591872

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-02-21 22:43:41 +00:00
Marvin Scholz
ca011ee754 avformat: Bump version and add APIChanges entry
Needed after the recent addition of the command APIs.
2026-02-21 20:03:52 +01:00
Andreas Rheinhardt
3be4545b67 avcodec/vvc/inter: Deduplicate applying averaging
Reviewed-by: Frank Plowman <post@frankplowman.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-21 12:48:50 +01:00