Michael Niedermayer
f81d6479ec
tools/target_dec_fuzzer: Adjust threshold for MPC8
...
Fixes: Timeout
Fixes: 471587345/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MPC8_fuzzer-4824233864921088
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-23 23:15:19 +01:00
Michael Niedermayer
c8b57f0a1e
tools/target_dec_fuzzer: Adjust threshold for BFI
...
Fixes: timeout
Fixes: 471606773/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BFI_fuzzer-6707440390569984
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-23 23:14:44 +01:00
Michael Niedermayer
4446dfb0e3
avcodec/flashsv: Check for input space before (re)allocating frame
...
Fixes: Timeout
Fixes: 471605680/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLASHSV2_DEC_fuzzer-6210773459468288
Fixes: 471605920/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FLASHSV_DEC_fuzzer-6230719287590912
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-23 22:59:44 +01:00
Michael Niedermayer
40cafc25cf
avcodec/mdec: Check input space vs minimal block size
...
Fixes: Timeout
Fixes: 481006706/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MDEC_fuzzer-6122832651419648
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-23 22:54:38 +01:00
Michael Niedermayer
73681f888d
avcodec/h264_parser: Check remaining input length in loop in scan_mmco_reset()
...
Fixes: read of uninitialized memory
Fixes: 476177761/clusterfuzz-testcase-minimized-ffmpeg_dem_H264_fuzzer-6400884824408064
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-23 22:43:28 +01:00
Niklas Haas
b21f1b6482
tests/swscale: don't pass fake object to av_opt_eval_*
...
This is UB, as the fake object may be used for logging.
Reported-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
Fixes: ea791a4ef1
2026-02-23 20:55:27 +00:00
Niklas Haas
afdb683a3f
swscale: avoid UB on interlaced frames
...
NULL+0 is UB.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
d918551650
swscale/graph: switch SwsPass.output to refstruct
...
Allows multiple passes to share a single output buffer reference. We always
allocate an output buffer so that subpasses can share the same output buffer
reference while still allowing that reference to implicitly point to the
final output image.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
cc346b232d
swscale/graph: store current pass input instead of global args
...
The global args to ff_sws_graph_run() really shouldn't matter inside thread
workers. If they ever do, it indicates a leaky abstraction. The only reason
it was needed in the first place was because of the way the input/output
buffers implicitly defaulted to the global args.
However, we can solve this much more elegantly by just calculating it in
ff_sws_graph_run() directly and storing the computed SwsImg inside the
execution state.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
1e071c8585
swscale/graph: omit memcpy() if src and dst are identical
...
This allows already referenced planes to be skipped, in the case of e.g.
only some of the output planes being sucessfully referenced. Also avoids
what is technically UB, if the user happens to call ff_sws_graph_run() after
already having ref'd an image.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
b98751b13c
swscale/graph: set up palette using current input image
...
Using the original input image here is completely wrong - the format/palette
could have been set to anything else in the meantime. At best, we would want to
use the original input to add_legacy_sws_pass(), but it's impossible for this
to differ from the per-pass input. The only time legacy subpasses are added
is when using cascaded contexts, but in this case, the only context actually
reading from the palette format would be the first one.
I'm not entirely sure why this code was originally written this way, but
I'm reasonably confident that it's not at all necessary. Tested extensively
on both FATE, the self-test, and real-world files.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
0b446cdccd
swscale/graph: switch to an AVBufferRef per plane
...
This annoyingly requires recreating some of the logic inside av_img_alloc(),
because there's no good existing current helper accessible from libswscale
that gives per-plane allocations like this.
The new code is based off the calculations inside libavframe/bufferpool.c.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
afa08f4971
swscale/graph: duplicate buffer dimensions in SwsPassBuffer
...
When multiple passes share a buffer reference, the true buffer dimensions
may be different for each pass, depending on slice alignment. So we can't
rely on the pass dimensions being representative.
Instead, store this information in the SwsPassBuffer itself.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
fe25e54d0f
swscale/graph: move output image into separate struct
...
I want to add more metadata to this and also turn it into a refstruct,
but get the cosmetic diff out of the way first.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
18060a8820
swscale/graph: simplify ff_sws_graph_run() API
...
There's little reason not to directly take an SwsImg here; it's already an
internally visible struct.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Niklas Haas
e1fd274706
swscale/graph: check output plane pointer instead of pixel format
...
To see if the output buffers are allocated or not.
Signed-off-by: Niklas Haas <git@haasn.dev >
2026-02-23 19:39:17 +00:00
Marvin Scholz
64fafd63f0
avformat: remove HLS protocol
...
The use of this protocol was already discouraged and warned about
for years with the recommendation to use the HLS demuxer instead.
2026-02-23 20:20:20 +01:00
Niklas Haas
ea791a4ef1
swscale/tests/swscale: parse flags from string
...
We don't actually have an SwsContext yet at this point, so just use
AV_OPT_SEARCH_FAKE_OBJ. For the actual evaluation, the signature only
requires that we pass a "pointer to a struct that contains an AVClass as
its first member", so passing a double pointer to the class itself is
sufficient.
2026-02-23 19:23:09 +01:00
Marvin Scholz
fba9fc0c6b
lavc: wmadec: limit variable scopes
...
Moves the loop variable declarations to the actual loops,
narrowing their scopes.
2026-02-23 15:29:27 +00:00
Marvin Scholz
d219be03d6
lavc: wmadec: assert channels count
...
This should never exceed MAX_CHANNELS, else there will be several
out of bounds writes.
2026-02-23 15:29:27 +00:00
Lynne
7b15039cdb
Changelog: add changelog entry for Mps212
2026-02-23 07:57:57 +01:00
Lynne
baad75cafa
aacdec_usac: add support for parsing Mpsp212 (MPEG surround)
...
This commit adds the full bitstream parsing for Mps212.
2026-02-23 07:57:57 +01:00
Lynne
86977fdb6b
aacdec_tab: add Mps212 tables
...
To be used in the following commit.
2026-02-23 07:57:57 +01:00
Lynne
a4ab4a98c4
aacdec_tab: split up tables init
2026-02-23 07:57:57 +01:00
James Almer
40e0463113
avformat/mov: free item_name on infe entry parsing failure
...
Fixes regression since 28c330d0f3 .
Signed-off-by: James Almer <jamrial@gmail.com >
2026-02-22 23:16:15 -03:00
Michael Niedermayer
7e10579f49
avcodec/exr: fix AVERROR typo
...
Fixes: out of array read
Fixes: 485866440/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_DEC_fuzzer-4520520419966976
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-23 01:44:49 +00:00
James Almer
c3aa28f23d
avformat/mov: check for EOF in more loops
...
Signed-off-by: James Almer <jamrial@gmail.com >
2026-02-23 00:43:50 +00:00
James Almer
28c330d0f3
avformat/mov: abort if the queried item doesn't exist instead of overwriting it
...
The check for item presence was insufficient as it would result in the last
item in the array being overwritten if it existed even if the id didn't match.
Fixes: Assertion ref failed at src/libavformat/mov.c:10649
Fixes: clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-5312542695292928
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com >
2026-02-23 00:43:50 +00:00
Nariman-Sayed
9bc4109b23
avformat/tls_openssl: fix memory leak in cert_from_pem_string
...
When PEM_read_bio_X509 fails, BIO was not freed, causing memory leak.
Free BIO before returning NULL to prevent resource leak.
2026-02-22 22:39:43 +00:00
Andreas Rheinhardt
53a9a34e23
avcodec/snow: Reduce sizeof(SnowContext)
...
Each SubBand currently contains an array of 519 uint8_t[32],
yet most of these are unused: For both the decoder and the
encoder, at most 34 contexts are actually used: The only
variable index is context+2, where context is the result
of av_log2() and therefore in the 0..31 range.
There are also several accesses using compile-time indices,
the highest of which is 30. FATE passes with 31 contexts
and maybe these are enough, but I don't know.
Reducing the number to 34 reduces sizeof(SnowContext)
from 2141664B to 155104B here (on x64).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 22:05:16 +01:00
Andreas Rheinhardt
bb92009386
avcodec/snow: Only allocate emu_edge_buffer for encoder
...
Also allocate it during init and move it to the encoder's context.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 22:05:15 +01:00
Michael Niedermayer
c7b5f1537d
CONTRIBUTING.md: Add Forgejo
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-22 04:39:22 +00:00
Lynne
13e063ceec
vulkan/ffv1: properly initialize the linecache
2026-02-22 03:39:23 +01:00
Michael Niedermayer
99515a3342
avcodec/jpeg2000htdec: Check Lcup and Lref
...
Fixes: use of uninitialized memory
Fixes: 482494999/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-6467586186608640
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-22 02:31:06 +00:00
Andreas Rheinhardt
6c1c1720cf
avcodec/x86/vvc/dsp_init: Mark dsp init function as av_cold
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 01:05:12 +01:00
Andreas Rheinhardt
af3f8f5bd2
avcodec/x86/vvc/of: Break dependency chain
...
Don't extract and update one word of one and the same register
at a time; use separate src and dst registers, so that pextrw
and bsr can be done in parallel. Also use movd instead of pinsrw
for the first word.
Old benchmarks:
apply_bdof_8_8x16_c: 3275.2 ( 1.00x)
apply_bdof_8_8x16_avx2: 487.6 ( 6.72x)
apply_bdof_8_16x8_c: 3243.1 ( 1.00x)
apply_bdof_8_16x8_avx2: 284.4 (11.40x)
apply_bdof_8_16x16_c: 6501.8 ( 1.00x)
apply_bdof_8_16x16_avx2: 570.0 (11.41x)
apply_bdof_10_8x16_c: 3286.5 ( 1.00x)
apply_bdof_10_8x16_avx2: 461.7 ( 7.12x)
apply_bdof_10_16x8_c: 3274.5 ( 1.00x)
apply_bdof_10_16x8_avx2: 271.4 (12.06x)
apply_bdof_10_16x16_c: 6590.0 ( 1.00x)
apply_bdof_10_16x16_avx2: 543.9 (12.12x)
apply_bdof_12_8x16_c: 3307.6 ( 1.00x)
apply_bdof_12_8x16_avx2: 462.2 ( 7.16x)
apply_bdof_12_16x8_c: 3287.4 ( 1.00x)
apply_bdof_12_16x8_avx2: 271.8 (12.10x)
apply_bdof_12_16x16_c: 6465.7 ( 1.00x)
apply_bdof_12_16x16_avx2: 543.8 (11.89x)
New benchmarks:
apply_bdof_8_8x16_c: 3255.7 ( 1.00x)
apply_bdof_8_8x16_avx2: 349.3 ( 9.32x)
apply_bdof_8_16x8_c: 3262.5 ( 1.00x)
apply_bdof_8_16x8_avx2: 214.8 (15.19x)
apply_bdof_8_16x16_c: 6471.6 ( 1.00x)
apply_bdof_8_16x16_avx2: 429.8 (15.06x)
apply_bdof_10_8x16_c: 3227.7 ( 1.00x)
apply_bdof_10_8x16_avx2: 321.6 (10.04x)
apply_bdof_10_16x8_c: 3250.2 ( 1.00x)
apply_bdof_10_16x8_avx2: 201.2 (16.16x)
apply_bdof_10_16x16_c: 6476.5 ( 1.00x)
apply_bdof_10_16x16_avx2: 400.9 (16.16x)
apply_bdof_12_8x16_c: 3230.7 ( 1.00x)
apply_bdof_12_8x16_avx2: 321.8 (10.04x)
apply_bdof_12_16x8_c: 3210.5 ( 1.00x)
apply_bdof_12_16x8_avx2: 200.9 (15.98x)
apply_bdof_12_16x16_c: 6474.5 ( 1.00x)
apply_bdof_12_16x16_avx2: 400.2 (16.18x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 01:05:12 +01:00
Andreas Rheinhardt
19dc7b79a4
avcodec/x86/vvc/of: Unify shuffling
...
One can use the same shuffles for the width 8 and width 16
case if one also changes the permutation in vpermd (that always
follows pshufb for width 16).
This also allows to load it before checking width.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 01:03:22 +01:00
Andreas Rheinhardt
8e82416434
avcodec/x86/vvc/of: Avoid unused register
...
Avoids a push+pop.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 01:02:20 +01:00
Andreas Rheinhardt
81fb70c833
avcodec/x86/vvc/mc,dsp_init: Avoid pointless wrappers for w_avg
...
They only add overhead (in form of another function call,
sign-extending some parameters to 64bit (although the upper
bits are not used at all) and rederiving the actual number
of bits (from the maximum value (1<<bpp)-1)).
Old benchmarks:
w_avg_8_2x2_c: 16.4 ( 1.00x)
w_avg_8_2x2_avx2: 12.9 ( 1.27x)
w_avg_8_4x4_c: 48.0 ( 1.00x)
w_avg_8_4x4_avx2: 14.9 ( 3.23x)
w_avg_8_8x8_c: 168.2 ( 1.00x)
w_avg_8_8x8_avx2: 22.4 ( 7.49x)
w_avg_8_16x16_c: 396.5 ( 1.00x)
w_avg_8_16x16_avx2: 47.9 ( 8.28x)
w_avg_8_32x32_c: 1466.3 ( 1.00x)
w_avg_8_32x32_avx2: 172.8 ( 8.48x)
w_avg_8_64x64_c: 5629.3 ( 1.00x)
w_avg_8_64x64_avx2: 678.7 ( 8.29x)
w_avg_8_128x128_c: 22122.4 ( 1.00x)
w_avg_8_128x128_avx2: 2743.5 ( 8.06x)
w_avg_10_2x2_c: 18.7 ( 1.00x)
w_avg_10_2x2_avx2: 13.1 ( 1.43x)
w_avg_10_4x4_c: 50.3 ( 1.00x)
w_avg_10_4x4_avx2: 15.9 ( 3.17x)
w_avg_10_8x8_c: 109.3 ( 1.00x)
w_avg_10_8x8_avx2: 20.6 ( 5.30x)
w_avg_10_16x16_c: 395.5 ( 1.00x)
w_avg_10_16x16_avx2: 44.8 ( 8.83x)
w_avg_10_32x32_c: 1534.2 ( 1.00x)
w_avg_10_32x32_avx2: 141.4 (10.85x)
w_avg_10_64x64_c: 6003.6 ( 1.00x)
w_avg_10_64x64_avx2: 557.4 (10.77x)
w_avg_10_128x128_c: 23722.7 ( 1.00x)
w_avg_10_128x128_avx2: 2205.0 (10.76x)
w_avg_12_2x2_c: 18.6 ( 1.00x)
w_avg_12_2x2_avx2: 13.1 ( 1.42x)
w_avg_12_4x4_c: 52.2 ( 1.00x)
w_avg_12_4x4_avx2: 16.1 ( 3.24x)
w_avg_12_8x8_c: 109.2 ( 1.00x)
w_avg_12_8x8_avx2: 20.6 ( 5.29x)
w_avg_12_16x16_c: 396.1 ( 1.00x)
w_avg_12_16x16_avx2: 45.0 ( 8.81x)
w_avg_12_32x32_c: 1532.6 ( 1.00x)
w_avg_12_32x32_avx2: 142.1 (10.79x)
w_avg_12_64x64_c: 6002.2 ( 1.00x)
w_avg_12_64x64_avx2: 557.3 (10.77x)
w_avg_12_128x128_c: 23748.7 ( 1.00x)
w_avg_12_128x128_avx2: 2206.4 (10.76x)
New benchmarks:
w_avg_8_2x2_c: 16.0 ( 1.00x)
w_avg_8_2x2_avx2: 9.3 ( 1.71x)
w_avg_8_4x4_c: 48.4 ( 1.00x)
w_avg_8_4x4_avx2: 12.4 ( 3.91x)
w_avg_8_8x8_c: 168.7 ( 1.00x)
w_avg_8_8x8_avx2: 21.1 ( 8.00x)
w_avg_8_16x16_c: 394.5 ( 1.00x)
w_avg_8_16x16_avx2: 46.2 ( 8.54x)
w_avg_8_32x32_c: 1456.3 ( 1.00x)
w_avg_8_32x32_avx2: 171.8 ( 8.48x)
w_avg_8_64x64_c: 5636.2 ( 1.00x)
w_avg_8_64x64_avx2: 676.9 ( 8.33x)
w_avg_8_128x128_c: 22129.1 ( 1.00x)
w_avg_8_128x128_avx2: 2734.3 ( 8.09x)
w_avg_10_2x2_c: 18.7 ( 1.00x)
w_avg_10_2x2_avx2: 10.3 ( 1.82x)
w_avg_10_4x4_c: 50.8 ( 1.00x)
w_avg_10_4x4_avx2: 13.4 ( 3.79x)
w_avg_10_8x8_c: 109.7 ( 1.00x)
w_avg_10_8x8_avx2: 20.4 ( 5.38x)
w_avg_10_16x16_c: 395.2 ( 1.00x)
w_avg_10_16x16_avx2: 41.7 ( 9.48x)
w_avg_10_32x32_c: 1535.6 ( 1.00x)
w_avg_10_32x32_avx2: 137.9 (11.13x)
w_avg_10_64x64_c: 6002.1 ( 1.00x)
w_avg_10_64x64_avx2: 548.5 (10.94x)
w_avg_10_128x128_c: 23742.7 ( 1.00x)
w_avg_10_128x128_avx2: 2179.8 (10.89x)
w_avg_12_2x2_c: 18.9 ( 1.00x)
w_avg_12_2x2_avx2: 10.3 ( 1.84x)
w_avg_12_4x4_c: 52.4 ( 1.00x)
w_avg_12_4x4_avx2: 13.4 ( 3.91x)
w_avg_12_8x8_c: 109.2 ( 1.00x)
w_avg_12_8x8_avx2: 20.3 ( 5.39x)
w_avg_12_16x16_c: 396.3 ( 1.00x)
w_avg_12_16x16_avx2: 41.7 ( 9.51x)
w_avg_12_32x32_c: 1532.6 ( 1.00x)
w_avg_12_32x32_avx2: 138.6 (11.06x)
w_avg_12_64x64_c: 5996.7 ( 1.00x)
w_avg_12_64x64_avx2: 549.6 (10.91x)
w_avg_12_128x128_c: 23738.0 ( 1.00x)
w_avg_12_128x128_avx2: 2177.2 (10.90x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 01:01:27 +01:00
Andreas Rheinhardt
ea78402e9c
avcodec/x86/vvc/mc,dsp_init: Avoid pointless wrappers for avg
...
Up until now, there were two averaging assembly functions,
one for eight bit content and one for <=16 bit content;
there are also three C-wrappers around these functions,
for 8, 10 and 12 bpp. These wrappers simply forward the
maximum permissible value (i.e. (1<<bpp)-1) and promote
some integer values to ptrdiff_t.
Yet these wrappers are absolutely useless: The assembly functions
rederive the bpp from the maximum and only the integer part
of the promoted ptrdiff_t values is ever used. Of course,
these wrappers also entail an additional call (not a tail call,
because the additional maximum parameter is passed on the stack).
Remove the wrappers and add per-bpp assembly functions instead.
Given that the only difference between 10 and 12 bits are some
constants in registers, the main part of these functions can be
shared (given that this code uses a jumptable, it can even
be done without adding any additional jump).
Old benchmarks:
avg_8_2x2_c: 11.4 ( 1.00x)
avg_8_2x2_avx2: 7.9 ( 1.44x)
avg_8_4x4_c: 30.7 ( 1.00x)
avg_8_4x4_avx2: 10.4 ( 2.95x)
avg_8_8x8_c: 134.5 ( 1.00x)
avg_8_8x8_avx2: 16.6 ( 8.12x)
avg_8_16x16_c: 255.6 ( 1.00x)
avg_8_16x16_avx2: 28.2 ( 9.07x)
avg_8_32x32_c: 897.7 ( 1.00x)
avg_8_32x32_avx2: 83.9 (10.70x)
avg_8_64x64_c: 3320.0 ( 1.00x)
avg_8_64x64_avx2: 321.1 (10.34x)
avg_8_128x128_c: 12981.8 ( 1.00x)
avg_8_128x128_avx2: 1480.1 ( 8.77x)
avg_10_2x2_c: 12.0 ( 1.00x)
avg_10_2x2_avx2: 8.4 ( 1.43x)
avg_10_4x4_c: 34.9 ( 1.00x)
avg_10_4x4_avx2: 9.8 ( 3.56x)
avg_10_8x8_c: 76.8 ( 1.00x)
avg_10_8x8_avx2: 15.1 ( 5.08x)
avg_10_16x16_c: 256.6 ( 1.00x)
avg_10_16x16_avx2: 25.1 (10.20x)
avg_10_32x32_c: 932.9 ( 1.00x)
avg_10_32x32_avx2: 73.4 (12.72x)
avg_10_64x64_c: 3517.9 ( 1.00x)
avg_10_64x64_avx2: 414.8 ( 8.48x)
avg_10_128x128_c: 13695.3 ( 1.00x)
avg_10_128x128_avx2: 1648.1 ( 8.31x)
avg_12_2x2_c: 13.1 ( 1.00x)
avg_12_2x2_avx2: 8.6 ( 1.53x)
avg_12_4x4_c: 35.4 ( 1.00x)
avg_12_4x4_avx2: 10.1 ( 3.49x)
avg_12_8x8_c: 76.6 ( 1.00x)
avg_12_8x8_avx2: 16.7 ( 4.60x)
avg_12_16x16_c: 256.6 ( 1.00x)
avg_12_16x16_avx2: 25.5 (10.07x)
avg_12_32x32_c: 933.2 ( 1.00x)
avg_12_32x32_avx2: 75.7 (12.34x)
avg_12_64x64_c: 3519.1 ( 1.00x)
avg_12_64x64_avx2: 416.8 ( 8.44x)
avg_12_128x128_c: 13695.1 ( 1.00x)
avg_12_128x128_avx2: 1651.6 ( 8.29x)
New benchmarks:
avg_8_2x2_c: 11.5 ( 1.00x)
avg_8_2x2_avx2: 6.0 ( 1.91x)
avg_8_4x4_c: 29.7 ( 1.00x)
avg_8_4x4_avx2: 8.0 ( 3.72x)
avg_8_8x8_c: 131.4 ( 1.00x)
avg_8_8x8_avx2: 12.2 (10.74x)
avg_8_16x16_c: 254.3 ( 1.00x)
avg_8_16x16_avx2: 24.8 (10.25x)
avg_8_32x32_c: 897.7 ( 1.00x)
avg_8_32x32_avx2: 77.8 (11.54x)
avg_8_64x64_c: 3321.3 ( 1.00x)
avg_8_64x64_avx2: 318.7 (10.42x)
avg_8_128x128_c: 12988.4 ( 1.00x)
avg_8_128x128_avx2: 1430.1 ( 9.08x)
avg_10_2x2_c: 12.1 ( 1.00x)
avg_10_2x2_avx2: 5.7 ( 2.13x)
avg_10_4x4_c: 35.0 ( 1.00x)
avg_10_4x4_avx2: 9.0 ( 3.88x)
avg_10_8x8_c: 77.2 ( 1.00x)
avg_10_8x8_avx2: 12.4 ( 6.24x)
avg_10_16x16_c: 256.2 ( 1.00x)
avg_10_16x16_avx2: 24.3 (10.56x)
avg_10_32x32_c: 932.9 ( 1.00x)
avg_10_32x32_avx2: 71.9 (12.97x)
avg_10_64x64_c: 3516.8 ( 1.00x)
avg_10_64x64_avx2: 414.7 ( 8.48x)
avg_10_128x128_c: 13693.7 ( 1.00x)
avg_10_128x128_avx2: 1609.3 ( 8.51x)
avg_12_2x2_c: 14.1 ( 1.00x)
avg_12_2x2_avx2: 5.7 ( 2.48x)
avg_12_4x4_c: 35.8 ( 1.00x)
avg_12_4x4_avx2: 9.0 ( 3.96x)
avg_12_8x8_c: 76.9 ( 1.00x)
avg_12_8x8_avx2: 12.4 ( 6.22x)
avg_12_16x16_c: 256.5 ( 1.00x)
avg_12_16x16_avx2: 24.4 (10.50x)
avg_12_32x32_c: 934.1 ( 1.00x)
avg_12_32x32_avx2: 72.0 (12.97x)
avg_12_64x64_c: 3518.2 ( 1.00x)
avg_12_64x64_avx2: 414.8 ( 8.48x)
avg_12_128x128_c: 13689.5 ( 1.00x)
avg_12_128x128_avx2: 1611.1 ( 8.50x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 00:58:33 +01:00
Andreas Rheinhardt
5a60b3f1a6
avcodec/x86/vvc/mc: Remove always-false branches
...
The C versions of the average and weighted average functions
contains "FFMAX(3, 15 - BIT_DEPTH)" and the code here followed
this; yet it is only instantiated for bit depths 8, 10 and 12,
for which the above is just 15-BIT_DEPTH. So the comparisons
are unnecessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
59f8ff4c18
avcodec/x86/vvc/mc: Remove unused constants
...
Also avoid overaligning .rodata.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
eabf52e787
avcodec/x86/vvc/mc: Avoid unused work
...
The high quadword of these registers is zero for width 2.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
9317fb2b2e
avcodec/x86/vvc/mc: Avoid ymm registers where possible
...
Widths 2 and 4 fit into xmm registers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
caa0ae0cfb
avcodec/x86/vvc/mc: Avoid pextr[dq], v{insert,extract}i128
...
Use mov[dq], movdqu instead if the least significant parts
are set (i.e. if the immediate value is 0x0).
Old benchmarks:
avg_8_2x2_c: 11.3 ( 1.00x)
avg_8_2x2_avx2: 7.5 ( 1.50x)
avg_8_4x4_c: 31.2 ( 1.00x)
avg_8_4x4_avx2: 10.7 ( 2.91x)
avg_8_8x8_c: 133.5 ( 1.00x)
avg_8_8x8_avx2: 21.2 ( 6.30x)
avg_8_16x16_c: 254.7 ( 1.00x)
avg_8_16x16_avx2: 30.1 ( 8.46x)
avg_8_32x32_c: 896.9 ( 1.00x)
avg_8_32x32_avx2: 103.9 ( 8.63x)
avg_8_64x64_c: 3320.7 ( 1.00x)
avg_8_64x64_avx2: 539.4 ( 6.16x)
avg_8_128x128_c: 12991.5 ( 1.00x)
avg_8_128x128_avx2: 1661.3 ( 7.82x)
avg_10_2x2_c: 21.3 ( 1.00x)
avg_10_2x2_avx2: 8.3 ( 2.55x)
avg_10_4x4_c: 34.9 ( 1.00x)
avg_10_4x4_avx2: 10.6 ( 3.28x)
avg_10_8x8_c: 76.3 ( 1.00x)
avg_10_8x8_avx2: 20.2 ( 3.77x)
avg_10_16x16_c: 255.9 ( 1.00x)
avg_10_16x16_avx2: 24.1 (10.60x)
avg_10_32x32_c: 932.4 ( 1.00x)
avg_10_32x32_avx2: 73.3 (12.72x)
avg_10_64x64_c: 3516.4 ( 1.00x)
avg_10_64x64_avx2: 601.7 ( 5.84x)
avg_10_128x128_c: 13690.6 ( 1.00x)
avg_10_128x128_avx2: 1613.2 ( 8.49x)
avg_12_2x2_c: 14.0 ( 1.00x)
avg_12_2x2_avx2: 8.3 ( 1.67x)
avg_12_4x4_c: 35.3 ( 1.00x)
avg_12_4x4_avx2: 10.9 ( 3.26x)
avg_12_8x8_c: 76.5 ( 1.00x)
avg_12_8x8_avx2: 20.3 ( 3.77x)
avg_12_16x16_c: 256.7 ( 1.00x)
avg_12_16x16_avx2: 24.1 (10.63x)
avg_12_32x32_c: 932.5 ( 1.00x)
avg_12_32x32_avx2: 73.3 (12.72x)
avg_12_64x64_c: 3520.5 ( 1.00x)
avg_12_64x64_avx2: 602.6 ( 5.84x)
avg_12_128x128_c: 13689.6 ( 1.00x)
avg_12_128x128_avx2: 1613.1 ( 8.49x)
w_avg_8_2x2_c: 16.7 ( 1.00x)
w_avg_8_2x2_avx2: 13.4 ( 1.25x)
w_avg_8_4x4_c: 44.5 ( 1.00x)
w_avg_8_4x4_avx2: 15.9 ( 2.81x)
w_avg_8_8x8_c: 166.1 ( 1.00x)
w_avg_8_8x8_avx2: 45.7 ( 3.63x)
w_avg_8_16x16_c: 392.9 ( 1.00x)
w_avg_8_16x16_avx2: 57.8 ( 6.80x)
w_avg_8_32x32_c: 1455.5 ( 1.00x)
w_avg_8_32x32_avx2: 215.0 ( 6.77x)
w_avg_8_64x64_c: 5621.8 ( 1.00x)
w_avg_8_64x64_avx2: 875.2 ( 6.42x)
w_avg_8_128x128_c: 22131.3 ( 1.00x)
w_avg_8_128x128_avx2: 3390.1 ( 6.53x)
w_avg_10_2x2_c: 18.0 ( 1.00x)
w_avg_10_2x2_avx2: 14.0 ( 1.28x)
w_avg_10_4x4_c: 53.9 ( 1.00x)
w_avg_10_4x4_avx2: 15.9 ( 3.40x)
w_avg_10_8x8_c: 109.5 ( 1.00x)
w_avg_10_8x8_avx2: 40.4 ( 2.71x)
w_avg_10_16x16_c: 395.7 ( 1.00x)
w_avg_10_16x16_avx2: 44.7 ( 8.86x)
w_avg_10_32x32_c: 1532.7 ( 1.00x)
w_avg_10_32x32_avx2: 142.4 (10.77x)
w_avg_10_64x64_c: 6007.7 ( 1.00x)
w_avg_10_64x64_avx2: 745.5 ( 8.06x)
w_avg_10_128x128_c: 23719.7 ( 1.00x)
w_avg_10_128x128_avx2: 2217.7 (10.70x)
w_avg_12_2x2_c: 18.9 ( 1.00x)
w_avg_12_2x2_avx2: 13.6 ( 1.38x)
w_avg_12_4x4_c: 47.5 ( 1.00x)
w_avg_12_4x4_avx2: 15.9 ( 2.99x)
w_avg_12_8x8_c: 109.3 ( 1.00x)
w_avg_12_8x8_avx2: 40.9 ( 2.67x)
w_avg_12_16x16_c: 395.6 ( 1.00x)
w_avg_12_16x16_avx2: 44.8 ( 8.84x)
w_avg_12_32x32_c: 1531.0 ( 1.00x)
w_avg_12_32x32_avx2: 141.8 (10.80x)
w_avg_12_64x64_c: 6016.7 ( 1.00x)
w_avg_12_64x64_avx2: 732.8 ( 8.21x)
w_avg_12_128x128_c: 23762.2 ( 1.00x)
w_avg_12_128x128_avx2: 2223.4 (10.69x)
New benchmarks:
avg_8_2x2_c: 11.3 ( 1.00x)
avg_8_2x2_avx2: 7.6 ( 1.49x)
avg_8_4x4_c: 31.2 ( 1.00x)
avg_8_4x4_avx2: 10.8 ( 2.89x)
avg_8_8x8_c: 131.6 ( 1.00x)
avg_8_8x8_avx2: 15.6 ( 8.42x)
avg_8_16x16_c: 255.3 ( 1.00x)
avg_8_16x16_avx2: 27.9 ( 9.16x)
avg_8_32x32_c: 897.9 ( 1.00x)
avg_8_32x32_avx2: 81.2 (11.06x)
avg_8_64x64_c: 3320.0 ( 1.00x)
avg_8_64x64_avx2: 335.1 ( 9.91x)
avg_8_128x128_c: 12999.1 ( 1.00x)
avg_8_128x128_avx2: 1456.3 ( 8.93x)
avg_10_2x2_c: 12.0 ( 1.00x)
avg_10_2x2_avx2: 8.6 ( 1.40x)
avg_10_4x4_c: 34.9 ( 1.00x)
avg_10_4x4_avx2: 9.7 ( 3.61x)
avg_10_8x8_c: 76.7 ( 1.00x)
avg_10_8x8_avx2: 16.3 ( 4.69x)
avg_10_16x16_c: 256.3 ( 1.00x)
avg_10_16x16_avx2: 25.2 (10.18x)
avg_10_32x32_c: 932.8 ( 1.00x)
avg_10_32x32_avx2: 73.3 (12.72x)
avg_10_64x64_c: 3518.8 ( 1.00x)
avg_10_64x64_avx2: 416.8 ( 8.44x)
avg_10_128x128_c: 13691.6 ( 1.00x)
avg_10_128x128_avx2: 1612.9 ( 8.49x)
avg_12_2x2_c: 14.1 ( 1.00x)
avg_12_2x2_avx2: 8.7 ( 1.62x)
avg_12_4x4_c: 35.7 ( 1.00x)
avg_12_4x4_avx2: 9.7 ( 3.68x)
avg_12_8x8_c: 77.0 ( 1.00x)
avg_12_8x8_avx2: 16.9 ( 4.57x)
avg_12_16x16_c: 256.2 ( 1.00x)
avg_12_16x16_avx2: 25.7 ( 9.96x)
avg_12_32x32_c: 933.5 ( 1.00x)
avg_12_32x32_avx2: 74.0 (12.62x)
avg_12_64x64_c: 3516.4 ( 1.00x)
avg_12_64x64_avx2: 408.7 ( 8.60x)
avg_12_128x128_c: 13691.6 ( 1.00x)
avg_12_128x128_avx2: 1613.8 ( 8.48x)
w_avg_8_2x2_c: 16.7 ( 1.00x)
w_avg_8_2x2_avx2: 14.0 ( 1.19x)
w_avg_8_4x4_c: 48.2 ( 1.00x)
w_avg_8_4x4_avx2: 16.1 ( 3.00x)
w_avg_8_8x8_c: 168.0 ( 1.00x)
w_avg_8_8x8_avx2: 22.5 ( 7.47x)
w_avg_8_16x16_c: 392.5 ( 1.00x)
w_avg_8_16x16_avx2: 47.9 ( 8.19x)
w_avg_8_32x32_c: 1453.7 ( 1.00x)
w_avg_8_32x32_avx2: 176.1 ( 8.26x)
w_avg_8_64x64_c: 5631.4 ( 1.00x)
w_avg_8_64x64_avx2: 690.8 ( 8.15x)
w_avg_8_128x128_c: 22139.5 ( 1.00x)
w_avg_8_128x128_avx2: 2742.4 ( 8.07x)
w_avg_10_2x2_c: 18.1 ( 1.00x)
w_avg_10_2x2_avx2: 13.8 ( 1.31x)
w_avg_10_4x4_c: 47.0 ( 1.00x)
w_avg_10_4x4_avx2: 16.4 ( 2.87x)
w_avg_10_8x8_c: 110.0 ( 1.00x)
w_avg_10_8x8_avx2: 21.6 ( 5.09x)
w_avg_10_16x16_c: 395.2 ( 1.00x)
w_avg_10_16x16_avx2: 45.4 ( 8.71x)
w_avg_10_32x32_c: 1533.8 ( 1.00x)
w_avg_10_32x32_avx2: 142.6 (10.76x)
w_avg_10_64x64_c: 6004.4 ( 1.00x)
w_avg_10_64x64_avx2: 672.8 ( 8.92x)
w_avg_10_128x128_c: 23748.5 ( 1.00x)
w_avg_10_128x128_avx2: 2198.0 (10.80x)
w_avg_12_2x2_c: 17.2 ( 1.00x)
w_avg_12_2x2_avx2: 13.9 ( 1.24x)
w_avg_12_4x4_c: 51.4 ( 1.00x)
w_avg_12_4x4_avx2: 16.5 ( 3.11x)
w_avg_12_8x8_c: 109.1 ( 1.00x)
w_avg_12_8x8_avx2: 22.0 ( 4.96x)
w_avg_12_16x16_c: 395.9 ( 1.00x)
w_avg_12_16x16_avx2: 44.9 ( 8.81x)
w_avg_12_32x32_c: 1533.5 ( 1.00x)
w_avg_12_32x32_avx2: 142.3 (10.78x)
w_avg_12_64x64_c: 6002.0 ( 1.00x)
w_avg_12_64x64_avx2: 557.5 (10.77x)
w_avg_12_128x128_c: 23749.5 ( 1.00x)
w_avg_12_128x128_avx2: 2202.0 (10.79x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
7bf9c1e3f6
avcodec/x86/vvc/mc: Avoid redundant clipping for 8bit
...
It is already done by packuswb.
Old benchmarks:
avg_8_2x2_c: 11.1 ( 1.00x)
avg_8_2x2_avx2: 8.6 ( 1.28x)
avg_8_4x4_c: 30.0 ( 1.00x)
avg_8_4x4_avx2: 10.8 ( 2.78x)
avg_8_8x8_c: 132.0 ( 1.00x)
avg_8_8x8_avx2: 25.7 ( 5.14x)
avg_8_16x16_c: 254.6 ( 1.00x)
avg_8_16x16_avx2: 33.2 ( 7.67x)
avg_8_32x32_c: 897.5 ( 1.00x)
avg_8_32x32_avx2: 115.6 ( 7.76x)
avg_8_64x64_c: 3316.9 ( 1.00x)
avg_8_64x64_avx2: 626.5 ( 5.29x)
avg_8_128x128_c: 12973.6 ( 1.00x)
avg_8_128x128_avx2: 1914.0 ( 6.78x)
w_avg_8_2x2_c: 16.7 ( 1.00x)
w_avg_8_2x2_avx2: 14.4 ( 1.16x)
w_avg_8_4x4_c: 48.2 ( 1.00x)
w_avg_8_4x4_avx2: 16.5 ( 2.92x)
w_avg_8_8x8_c: 168.1 ( 1.00x)
w_avg_8_8x8_avx2: 49.7 ( 3.38x)
w_avg_8_16x16_c: 392.4 ( 1.00x)
w_avg_8_16x16_avx2: 61.1 ( 6.43x)
w_avg_8_32x32_c: 1455.3 ( 1.00x)
w_avg_8_32x32_avx2: 224.6 ( 6.48x)
w_avg_8_64x64_c: 5632.1 ( 1.00x)
w_avg_8_64x64_avx2: 896.9 ( 6.28x)
w_avg_8_128x128_c: 22136.3 ( 1.00x)
w_avg_8_128x128_avx2: 3626.7 ( 6.10x)
New benchmarks:
avg_8_2x2_c: 12.3 ( 1.00x)
avg_8_2x2_avx2: 8.1 ( 1.52x)
avg_8_4x4_c: 30.3 ( 1.00x)
avg_8_4x4_avx2: 11.3 ( 2.67x)
avg_8_8x8_c: 131.8 ( 1.00x)
avg_8_8x8_avx2: 21.3 ( 6.20x)
avg_8_16x16_c: 255.0 ( 1.00x)
avg_8_16x16_avx2: 30.6 ( 8.33x)
avg_8_32x32_c: 898.5 ( 1.00x)
avg_8_32x32_avx2: 104.9 ( 8.57x)
avg_8_64x64_c: 3317.7 ( 1.00x)
avg_8_64x64_avx2: 540.9 ( 6.13x)
avg_8_128x128_c: 12986.5 ( 1.00x)
avg_8_128x128_avx2: 1663.4 ( 7.81x)
w_avg_8_2x2_c: 16.8 ( 1.00x)
w_avg_8_2x2_avx2: 13.9 ( 1.21x)
w_avg_8_4x4_c: 48.2 ( 1.00x)
w_avg_8_4x4_avx2: 16.2 ( 2.98x)
w_avg_8_8x8_c: 168.6 ( 1.00x)
w_avg_8_8x8_avx2: 46.3 ( 3.64x)
w_avg_8_16x16_c: 392.4 ( 1.00x)
w_avg_8_16x16_avx2: 57.7 ( 6.80x)
w_avg_8_32x32_c: 1454.6 ( 1.00x)
w_avg_8_32x32_avx2: 214.6 ( 6.78x)
w_avg_8_64x64_c: 5638.4 ( 1.00x)
w_avg_8_64x64_avx2: 875.6 ( 6.44x)
w_avg_8_128x128_c: 22133.5 ( 1.00x)
w_avg_8_128x128_avx2: 3334.3 ( 6.64x)
Also saves 550B of .text here. The improvements will likely
be even better on Win64, because it avoids using two nonvolatile
registers in the weighted average case.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 00:57:56 +01:00
Andreas Rheinhardt
b22b65f2f8
avformat/hlsenc: Return error upon error, fix shadowing
...
Introduced in 65fc0db581 .
Reviewed-by: Marvin Scholz <epirat07@gmail.com >
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-22 00:23:00 +01:00
Michael Niedermayer
c98346ffaa
avcodec/libtheoraenc: make keyframe mask unsigned and handle its larger range
...
Fixes: left shift of 1 by 31 places cannot be represented in type 'int'
Fixes: 473579864/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_LIBTHEORA_fuzzer-5835688160591872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-21 22:43:41 +00:00
Marvin Scholz
ca011ee754
avformat: Bump version and add APIChanges entry
...
Needed after the recent addition of the command APIs.
2026-02-21 20:03:52 +01:00
Andreas Rheinhardt
3be4545b67
avcodec/vvc/inter: Deduplicate applying averaging
...
Reviewed-by: Frank Plowman <post@frankplowman.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-21 12:48:50 +01:00