Ramiro Polla
9ee6136ece
avcodec/mjpegdec: remove start_code field from MJpegDecodeContext
...
Instead, pass it as a parameter to the only function that uses it.
2026-02-09 17:52:01 +00:00
Andreas Rheinhardt
1218a8a922
avcodec/rangecoder: Fix indentation
...
Forgotten after 832649986c
and d147b3d7ec .
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-09 17:36:40 +00:00
Hassan Hany
273b161a98
avcodec/exif: skip EXIF entries with invalid TIFF field type 0
...
EXIF IFD entries with TIFF field type 0 are invalid per the specification.
Without a check, exif_read_values() fails to allocate entry->value,
causing an out of memory error.
This patch skips such entries early during parsing, allowing decoding
to continue normally.
Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/21623
2026-02-08 19:56:20 +00:00
Michael Niedermayer
5f84a7263e
avcodec/adpcm: Check input buffer size
...
Larger values will lead to integer overflows in intermediates
No testcase
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-08 14:46:56 +00:00
Andreas Rheinhardt
0fefecd53f
Revert "avcodec/opus/parse: export the packet and extradata parsing functions"
...
This reverts commit aa20d7b3e8 .
Adding these avpriv functions is absolutely overblown: Muxers
can get the desired duration in a few lines themselves.
In particular, using the parse functions from this file
necessitated parsing the extradata (and entailed exporting
the parsing function), although it was only used to know
whether the frames are self-delimiting, but everything of
interest to a muxer does not depend on this at all.
The commit to be reverted also made several structures
part of the ABI, which should be avoided in general.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-06 14:09:15 +01:00
Andreas Rheinhardt
12747e6296
avformat/matroskaenc: Parse Opus packet durations ourselves
...
This avoids avpriv functions from lavc/opus/parse.c
(which parse way more than we need, necessitating
parsing the extradata).
It furthermore makes the output of the muxer consistent,
i.e. no longer depending upon whether the Opus parser
or decoder are enabled (the avpriv functions would just
return AVERROR(ENOSYS)).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-06 14:05:14 +01:00
Andreas Rheinhardt
853843d86f
avcodec/opus/parse: Move frame_duration tab into a file of its own
...
This is in preparation for duplicating it into libavformat.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-02-06 14:04:50 +01:00
James Almer
aa20d7b3e8
avcodec/opus/parse: export the packet and extradata parsing functions
...
Needed for the following commit.
Signed-off-by: James Almer <jamrial@gmail.com >
2026-02-05 23:21:49 -03:00
Michael Niedermayer
8f57b04fe5
avcodec/hevc/sei: Use get_bits64() in decode_nal_sei_3d_reference_displays_info()
...
Fixes: Assertion n>=0 && n<=32 failed at ./libavcodec/get_bits.h:426
Fixes: 468435217/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-4644127078940672
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-05 20:20:08 +00:00
Michael Niedermayer
af86f0ffcc
avcodec/dca_xll: Clear padding in ff_dca_xll_parse()
...
Fixes: Use of uninitialized memory
Fixes: 472020020/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6433045331902464
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-05 18:12:46 +01:00
Michael Niedermayer
189bc0aaf5
avcodec/dxv: Clear tex_data padding on reallocation
...
dxv assumes that newly reallocated memory in tex_data is not uninitialized
thus we have to do that too in case of reallocation in ff_lzf_uncompress()
Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-05 16:29:08 +01:00
Michael Niedermayer
0f35146e27
avcodec/lzf: Remove size messing from ff_lzf_uncompress()
...
size represents the output size
randomly changing it but not reseting it on errors leaks uninitialized memory.
Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-05 16:29:08 +01:00
Michael Niedermayer
5db50e8775
avcodec/ffv1enc: refine end condition
...
In the case where the last sorted value was -1u and we where on the first
pass of run1 we failed to fill the last few values of bitmap
No real world testcase is known
Fixes: use of uninitialized memory
Fixes: 460333808/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-6370167888347136
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-05 16:07:13 +01:00
Michael Niedermayer
11a5afea31
avcodec/dca_xll: Check get_rice_array()
...
Fixes: use of uninitialized memory
Fixes: 451655450/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6527248623796224
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-05 14:37:59 +01:00
Jun Zhao
27dd2f1c70
lavc/hevc: fix missing # in ldrsw immediate offset
...
The ldrsw instruction requires immediate offset with # prefix.
This fixes the syntax error introduced in commit 26752368f0
(aarch64/h26x: Add put_hevc_pel_bi_w_pixels) where the
load_bi_w_pixels_param macro was added.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com >
2026-02-05 09:13:22 +08:00
Zhao Zhili
e250854ecf
aarch64/h264pred: disable inefficient functions
...
These assembly optimizations have been identified as "performance
regressions." Due to advancements in modern CPU micro-architectures
and compiler optimization the C implementations now consistently
outperform these handwritten routines.
Test Name A55-clang M1 A76-gcc-14 A510-clang A715-clang X3-clang
--------------------------------------------------------------------------------------------------------------------
pred8x8_dc_8_neon 55.9 ( 0.79x)! 0.2 ( 0.31x)! 35.7 ( 0.63x)! 98.3 ( 0.37x)! 35.9 ( 0.45x)! 33.6 ( 0.38x)!
pred8x8_dc_10_neon 57.0 ( 1.04x) 0.3 ( 0.36x)! 35.9 ( 0.94x)! 98.2 ( 0.53x)! 35.8 ( 0.58x)! 33.2 ( 0.50x)!
pred8x8_dc_128_8_neon 26.0 ( 0.69x)! 0.1 ( 0.43x)! 15.3 ( 0.73x)! 46.4 ( 0.36x)! 10.6 ( 0.48x)! 10.3 ( 1.09x)
pred8x8_dc_128_10_neon 25.3 ( 0.99x)! 0.1 ( 0.42x)! 19.3 ( 0.48x)! 44.5 ( 0.42x)! 10.0 ( 0.61x)! 11.0 ( 1.00x)
pred8x8_left_dc_8_neon 46.9 ( 0.72x)! 0.2 ( 0.26x)! 30.2 ( 0.49x)! 71.4 ( 0.39x)! 29.8 ( 0.35x)! 26.5 ( 0.44x)!
pred8x8_left_dc_10_neon 45.4 ( 0.82x)! 0.2 ( 0.29x)! 28.1 ( 0.67x)! 70.2 ( 0.47x)! 30.0 ( 0.38x)! 26.5 ( 0.43x)!
pred16x16_dc_8_neon 74.4 ( 1.34x) 0.3 ( 0.62x)! 44.7 ( 0.89x)! 128.0 ( 0.79x)! 48.5 ( 0.67x)! 39.4 ( 0.71x)!
pred16x16_dc_128_8_neon 37.9 ( 0.79x)! 0.1 ( 0.60x)! 20.1 ( 0.80x)! 41.8 ( 0.46x)! 16.2 ( 0.81x)! 12.8 ( 0.95x)!
pred16x16_left_dc_8_neon 69.9 ( 1.19x) 0.3 ( 0.46x)! 49.6 ( 0.54x)! 116.8 ( 0.62x)! 52.8 ( 0.45x)! 44.2 ( 0.51x)!
pred8x8_hori_8_neon 30.6 ( 1.39x) 0.1 ( 0.45x)! 19.4 ( 0.81x)! 71.0 ( 0.50x)! 15.9 ( 0.55x)! 12.2 ( 0.94x)!
pred8x8_hori_10_neon* 29.3 ( 1.82x) 0.1 ( 0.59x)! 18.5 ( 1.56x) 68.9 ( 0.64x)! 15.8 ( 0.62x)! 11.8 ( 0.97x)!
pred8x8_top_dc_8_neon 35.8 ( 0.96x)! 0.1 ( 0.59x)! 16.8 ( 0.81x)! 58.9 ( 0.44x)! 11.3 ( 0.89x)! 11.4 ( 0.99x)!
pred8x8_top_dc_10_neon 37.4 ( 1.24x) 0.1 ( 0.92x)! 20.4 ( 0.81x)! 59.5 ( 0.69x)! 10.5 ( 1.48x) 11.8 ( 1.02x)
pred8x8_vertical_8_neon 18.3 ( 1.08x) 0.1 ( 0.54x)! 12.8 ( 0.89x)! 37.2 ( 0.40x)! 8.3 ( 0.77x)! 11.2 ( 1.00x)
pred8x8_vertical_10_neon 19.0 ( 1.24x) 0.1 ( 0.55x)! 15.3 ( 0.62x)! 39.7 ( 0.50x)! 8.2 ( 0.91x)! 11.1 ( 0.99x)!
- pred8x8_horizontal_10 also underperforms on new architectures, but useful on A55 and A76.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com >
2026-02-04 09:06:37 +00:00
Zhao Zhili
f54841d375
avcodec/aarch64: add pngdsp
...
Test Name A55-gcc-11 M1-clang A76-gcc-12 A510-clang X3-clang
-------------------------------------------------------------------------------------------------------------------
add_bytes_l2_4096_neon 1807.2 ( 2.01x) 1.6 ( 1.94x) 333.0 ( 6.35x) 1058.2 ( 2.34x) 214.3 ( 1.99x)
add_paeth_prediction_3_neon 33036.1 ( 2.41x) 145.1 ( 1.66x) 20443.3 ( 1.97x) 35225.1 ( 1.23x) 19420.8 ( 1.05x)
add_paeth_prediction_4_neon 24368.6 ( 3.26x) 106.7 ( 2.01x) 15163.8 ( 2.77x) 26454.7 ( 1.62x) 14319.0 ( 1.35x)
add_paeth_prediction_6_neon 17900.6 ( 4.44x) 72.0 ( 2.70x) 10214.3 ( 4.20x) 18296.9 ( 2.27x) 9693.1 ( 1.97x)
add_paeth_prediction_8_neon 12615.4 ( 6.31x) 54.1 ( 2.58x) 7706.0 ( 5.45x) 13733.3 ( 2.94x) 7272.6 ( 2.63x)
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com >
2026-02-04 12:05:35 +08:00
Oliver Chang
a795ca89fa
avcodec/qdm2: fix heap-use-after-free in qdm2_decode_frame
...
The `sub_packet` index in `QDM2Context` was not reset to 0 when
`qdm2_decode_frame` started processing a new packet. If an error
occurred during the decoding of a previous packet, `sub_packet` would
retain a non-zero value.
In subsequent calls to `qdm2_decode_frame` with a new packet, this
non-zero `sub_packet` value caused `qdm2_decode` to skip
`qdm2_decode_super_block`. This function is responsible for initializing
packet lists with pointers to the current packet's data. Skipping it led
to the use of stale pointers from the previous (freed) packet, resulting
in a heap-use-after-free vulnerability.
This patch explicitly resets `s->sub_packet = 0` at the beginning of
`qdm2_decode_frame`, ensuring correct initialization for each new
packet.
Fixes: OSS-Fuzz issue 476179569
(https://issues.oss-fuzz.com/issues/476179569 ).
2026-02-03 18:17:32 +00:00
Michael Niedermayer
2df0ef601a
avcodec/jpeg2000dec: allow bpno of -1
...
Fixes: tickets/4663/levels30.jp2
The file decodes without error messages and no integer overflows
The file before the broader M_b check did decode with error messages and integer overflows but also no visual artifacts
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-03 12:39:32 +01:00
Michael Niedermayer
e1472a4e0c
avcodec/jpeg2000dec: allow M_b == 31
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-03 12:39:32 +01:00
Michael Niedermayer
8a3c7c9c32
avcodec/jpeg2000dec: Print bpno level when erroring out
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-03 12:39:32 +01:00
Michael Niedermayer
2efffa9ecd
avcodec/jpeg2000dec: Print M_b value when asking for a sample
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2026-02-03 12:39:31 +01:00
Frank Plowman
364d5dda91
lavc/vvc: Fix unchecked error codes from add_reconstructed_area
2026-01-31 13:46:13 +00:00
Frank Plowman
f9740eb969
lavc/vvc: Fix unchecked error codes from set_qp_y
...
Fixes: clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-4957602162475008
2026-01-31 13:46:13 +00:00
Martin Storsjö
f74c551eaa
aarch64: Fix indentation of a few instructions
...
This file is excempt from the indent checker script, as there
are a few other bits in it that the script wants to reformat
into slightly worse form, or which might not warrant being
reformatted.
But these instructions should indeed be indented this way.
2026-01-30 05:21:27 +00:00
James Almer
041d108958
avcodec/opus/enc: don't remove more samples than needed from the last packet
...
The hardcoded extra 120 samples results in the side data reporting the need to
discard the entire packet rather than the padding samples.
This is in line with the behavior of the libopus encoder.
Signed-off-by: James Almer <jamrial@gmail.com >
2026-01-29 21:09:02 -03:00
James Almer
c3aea7628c
avcodec/opus/enc: set avctx->frame_size to a better guess based on encoder configuration
...
Signed-off-by: James Almer <jamrial@gmail.com >
2026-01-29 21:09:02 -03:00
Andreas Rheinhardt
ca5504fb5c
avcodec/liblc3dec: Simplify sample fmt selection
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 14:08:15 +01:00
Andreas Rheinhardt
ba1aea762b
avcodec/liblc3{dec,enc}: Simplify sample_size, is_planar check
...
Sample size is always sizeof(float), is planar is a simple if
given that these codecs only support float and planar float.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 14:08:15 +01:00
Andreas Rheinhardt
436b74b725
avcodec/x86/hevc/dequant: Add SSSE3 dequant ASM function
...
hevc_dequant_4x4_8_c (GCC): 20.2 ( 1.00x)
hevc_dequant_4x4_8_c (Clang): 21.7 ( 1.00x)
hevc_dequant_4x4_8_ssse3: 5.8 ( 3.51x)
hevc_dequant_8x8_8_c (GCC): 32.9 ( 1.00x)
hevc_dequant_8x8_8_c (Clang): 78.7 ( 1.00x)
hevc_dequant_8x8_8_ssse3: 6.8 ( 4.83x)
hevc_dequant_16x16_8_c (GCC): 105.1 ( 1.00x)
hevc_dequant_16x16_8_c (Clang): 151.1 ( 1.00x)
hevc_dequant_16x16_8_ssse3: 19.3 ( 5.45x)
hevc_dequant_32x32_8_c (GCC): 415.7 ( 1.00x)
hevc_dequant_32x32_8_c (Clang): 602.3 ( 1.00x)
hevc_dequant_32x32_8_ssse3: 78.2 ( 5.32x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
cf359a7907
avcodec/hevc/dsp: Add alignment for dequant
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
0c7f87b136
avcodec/hevc/dsp_template: Optimize impossible branches away
...
Saves 1856B of .text here.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 12:25:33 +01:00
Andreas Rheinhardt
2729c52988
avcodec/x86/hevc/deblock: Reduce usage of GPRs
...
Don't use two GPRs to store two words from xmm registers;
shuffle these words so that they are fit into one GPR.
This reduces the amount of GPRs used and leads to tiny speedups
here. Also avoid rex prefixes whenever possible (for lines
that needed to be modified anyway).
Old benchmarks:
hevc_h_loop_filter_luma8_skip_c: 23.8 ( 1.00x)
hevc_h_loop_filter_luma8_skip_sse2: 8.5 ( 2.80x)
hevc_h_loop_filter_luma8_skip_ssse3: 7.2 ( 3.29x)
hevc_h_loop_filter_luma8_skip_avx: 6.4 ( 3.71x)
hevc_h_loop_filter_luma8_strong_c: 150.4 ( 1.00x)
hevc_h_loop_filter_luma8_strong_sse2: 34.4 ( 4.37x)
hevc_h_loop_filter_luma8_strong_ssse3: 34.5 ( 4.36x)
hevc_h_loop_filter_luma8_strong_avx: 32.3 ( 4.65x)
hevc_h_loop_filter_luma8_weak_c: 103.2 ( 1.00x)
hevc_h_loop_filter_luma8_weak_sse2: 34.5 ( 2.99x)
hevc_h_loop_filter_luma8_weak_ssse3: 7.3 (14.22x)
hevc_h_loop_filter_luma8_weak_avx: 32.4 ( 3.18x)
hevc_h_loop_filter_luma10_skip_c: 23.5 ( 1.00x)
hevc_h_loop_filter_luma10_skip_sse2: 6.6 ( 3.58x)
hevc_h_loop_filter_luma10_skip_ssse3: 6.1 ( 3.86x)
hevc_h_loop_filter_luma10_skip_avx: 5.4 ( 4.34x)
hevc_h_loop_filter_luma10_strong_c: 161.8 ( 1.00x)
hevc_h_loop_filter_luma10_strong_sse2: 32.2 ( 5.03x)
hevc_h_loop_filter_luma10_strong_ssse3: 30.4 ( 5.33x)
hevc_h_loop_filter_luma10_strong_avx: 30.3 ( 5.33x)
hevc_h_loop_filter_luma10_weak_c: 23.5 ( 1.00x)
hevc_h_loop_filter_luma10_weak_sse2: 6.6 ( 3.58x)
hevc_h_loop_filter_luma10_weak_ssse3: 6.1 ( 3.85x)
hevc_h_loop_filter_luma10_weak_avx: 5.4 ( 4.35x)
hevc_h_loop_filter_luma12_skip_c: 18.8 ( 1.00x)
hevc_h_loop_filter_luma12_skip_sse2: 6.6 ( 2.87x)
hevc_h_loop_filter_luma12_skip_ssse3: 6.1 ( 3.08x)
hevc_h_loop_filter_luma12_skip_avx: 6.2 ( 3.06x)
hevc_h_loop_filter_luma12_strong_c: 159.0 ( 1.00x)
hevc_h_loop_filter_luma12_strong_sse2: 36.3 ( 4.38x)
hevc_h_loop_filter_luma12_strong_ssse3: 36.1 ( 4.40x)
hevc_h_loop_filter_luma12_strong_avx: 33.5 ( 4.75x)
hevc_h_loop_filter_luma12_weak_c: 40.1 ( 1.00x)
hevc_h_loop_filter_luma12_weak_sse2: 35.5 ( 1.13x)
hevc_h_loop_filter_luma12_weak_ssse3: 36.1 ( 1.11x)
hevc_h_loop_filter_luma12_weak_avx: 6.2 ( 6.52x)
hevc_v_loop_filter_luma8_skip_c: 25.5 ( 1.00x)
hevc_v_loop_filter_luma8_skip_sse2: 10.6 ( 2.40x)
hevc_v_loop_filter_luma8_skip_ssse3: 11.4 ( 2.24x)
hevc_v_loop_filter_luma8_skip_avx: 8.3 ( 3.07x)
hevc_v_loop_filter_luma8_strong_c: 146.8 ( 1.00x)
hevc_v_loop_filter_luma8_strong_sse2: 43.9 ( 3.35x)
hevc_v_loop_filter_luma8_strong_ssse3: 43.7 ( 3.36x)
hevc_v_loop_filter_luma8_strong_avx: 42.3 ( 3.47x)
hevc_v_loop_filter_luma8_weak_c: 25.5 ( 1.00x)
hevc_v_loop_filter_luma8_weak_sse2: 10.6 ( 2.40x)
hevc_v_loop_filter_luma8_weak_ssse3: 44.0 ( 0.58x)
hevc_v_loop_filter_luma8_weak_avx: 8.3 ( 3.09x)
hevc_v_loop_filter_luma10_skip_c: 20.0 ( 1.00x)
hevc_v_loop_filter_luma10_skip_sse2: 11.3 ( 1.77x)
hevc_v_loop_filter_luma10_skip_ssse3: 11.0 ( 1.82x)
hevc_v_loop_filter_luma10_skip_avx: 9.3 ( 2.15x)
hevc_v_loop_filter_luma10_strong_c: 193.5 ( 1.00x)
hevc_v_loop_filter_luma10_strong_sse2: 46.1 ( 4.19x)
hevc_v_loop_filter_luma10_strong_ssse3: 44.2 ( 4.38x)
hevc_v_loop_filter_luma10_strong_avx: 44.4 ( 4.35x)
hevc_v_loop_filter_luma10_weak_c: 90.3 ( 1.00x)
hevc_v_loop_filter_luma10_weak_sse2: 46.3 ( 1.95x)
hevc_v_loop_filter_luma10_weak_ssse3: 10.8 ( 8.37x)
hevc_v_loop_filter_luma10_weak_avx: 44.4 ( 2.03x)
hevc_v_loop_filter_luma12_skip_c: 16.8 ( 1.00x)
hevc_v_loop_filter_luma12_skip_sse2: 11.8 ( 1.42x)
hevc_v_loop_filter_luma12_skip_ssse3: 11.7 ( 1.43x)
hevc_v_loop_filter_luma12_skip_avx: 8.7 ( 1.93x)
hevc_v_loop_filter_luma12_strong_c: 159.3 ( 1.00x)
hevc_v_loop_filter_luma12_strong_sse2: 45.3 ( 3.52x)
hevc_v_loop_filter_luma12_strong_ssse3: 60.3 ( 2.64x)
hevc_v_loop_filter_luma12_strong_avx: 44.1 ( 3.61x)
hevc_v_loop_filter_luma12_weak_c: 63.6 ( 1.00x)
hevc_v_loop_filter_luma12_weak_sse2: 45.3 ( 1.40x)
hevc_v_loop_filter_luma12_weak_ssse3: 11.7 ( 5.41x)
hevc_v_loop_filter_luma12_weak_avx: 43.9 ( 1.45x)
New benchmarks:
hevc_h_loop_filter_luma8_skip_c: 24.2 ( 1.00x)
hevc_h_loop_filter_luma8_skip_sse2: 8.6 ( 2.82x)
hevc_h_loop_filter_luma8_skip_ssse3: 7.0 ( 3.46x)
hevc_h_loop_filter_luma8_skip_avx: 6.8 ( 3.54x)
hevc_h_loop_filter_luma8_strong_c: 150.4 ( 1.00x)
hevc_h_loop_filter_luma8_strong_sse2: 33.3 ( 4.52x)
hevc_h_loop_filter_luma8_strong_ssse3: 32.7 ( 4.61x)
hevc_h_loop_filter_luma8_strong_avx: 32.7 ( 4.60x)
hevc_h_loop_filter_luma8_weak_c: 104.0 ( 1.00x)
hevc_h_loop_filter_luma8_weak_sse2: 33.2 ( 3.13x)
hevc_h_loop_filter_luma8_weak_ssse3: 7.0 (14.91x)
hevc_h_loop_filter_luma8_weak_avx: 31.3 ( 3.32x)
hevc_h_loop_filter_luma10_skip_c: 19.2 ( 1.00x)
hevc_h_loop_filter_luma10_skip_sse2: 6.2 ( 3.08x)
hevc_h_loop_filter_luma10_skip_ssse3: 6.2 ( 3.08x)
hevc_h_loop_filter_luma10_skip_avx: 5.0 ( 3.85x)
hevc_h_loop_filter_luma10_strong_c: 159.8 ( 1.00x)
hevc_h_loop_filter_luma10_strong_sse2: 30.0 ( 5.32x)
hevc_h_loop_filter_luma10_strong_ssse3: 29.2 ( 5.48x)
hevc_h_loop_filter_luma10_strong_avx: 28.6 ( 5.58x)
hevc_h_loop_filter_luma10_weak_c: 19.2 ( 1.00x)
hevc_h_loop_filter_luma10_weak_sse2: 6.2 ( 3.09x)
hevc_h_loop_filter_luma10_weak_ssse3: 6.2 ( 3.09x)
hevc_h_loop_filter_luma10_weak_avx: 5.0 ( 3.88x)
hevc_h_loop_filter_luma12_skip_c: 18.7 ( 1.00x)
hevc_h_loop_filter_luma12_skip_sse2: 6.2 ( 3.00x)
hevc_h_loop_filter_luma12_skip_ssse3: 5.7 ( 3.27x)
hevc_h_loop_filter_luma12_skip_avx: 5.2 ( 3.61x)
hevc_h_loop_filter_luma12_strong_c: 160.2 ( 1.00x)
hevc_h_loop_filter_luma12_strong_sse2: 34.2 ( 4.68x)
hevc_h_loop_filter_luma12_strong_ssse3: 29.3 ( 5.48x)
hevc_h_loop_filter_luma12_strong_avx: 31.4 ( 5.10x)
hevc_h_loop_filter_luma12_weak_c: 40.2 ( 1.00x)
hevc_h_loop_filter_luma12_weak_sse2: 35.2 ( 1.14x)
hevc_h_loop_filter_luma12_weak_ssse3: 29.3 ( 1.37x)
hevc_h_loop_filter_luma12_weak_avx: 5.0 ( 8.09x)
hevc_v_loop_filter_luma8_skip_c: 25.6 ( 1.00x)
hevc_v_loop_filter_luma8_skip_sse2: 10.2 ( 2.52x)
hevc_v_loop_filter_luma8_skip_ssse3: 10.5 ( 2.45x)
hevc_v_loop_filter_luma8_skip_avx: 8.2 ( 3.11x)
hevc_v_loop_filter_luma8_strong_c: 147.1 ( 1.00x)
hevc_v_loop_filter_luma8_strong_sse2: 42.6 ( 3.45x)
hevc_v_loop_filter_luma8_strong_ssse3: 42.4 ( 3.47x)
hevc_v_loop_filter_luma8_strong_avx: 40.1 ( 3.67x)
hevc_v_loop_filter_luma8_weak_c: 25.6 ( 1.00x)
hevc_v_loop_filter_luma8_weak_sse2: 10.6 ( 2.42x)
hevc_v_loop_filter_luma8_weak_ssse3: 42.7 ( 0.60x)
hevc_v_loop_filter_luma8_weak_avx: 8.2 ( 3.11x)
hevc_v_loop_filter_luma10_skip_c: 16.7 ( 1.00x)
hevc_v_loop_filter_luma10_skip_sse2: 11.0 ( 1.52x)
hevc_v_loop_filter_luma10_skip_ssse3: 10.5 ( 1.59x)
hevc_v_loop_filter_luma10_skip_avx: 9.6 ( 1.74x)
hevc_v_loop_filter_luma10_strong_c: 190.0 ( 1.00x)
hevc_v_loop_filter_luma10_strong_sse2: 44.8 ( 4.24x)
hevc_v_loop_filter_luma10_strong_ssse3: 42.3 ( 4.49x)
hevc_v_loop_filter_luma10_strong_avx: 42.5 ( 4.47x)
hevc_v_loop_filter_luma10_weak_c: 88.3 ( 1.00x)
hevc_v_loop_filter_luma10_weak_sse2: 45.7 ( 1.93x)
hevc_v_loop_filter_luma10_weak_ssse3: 10.5 ( 8.40x)
hevc_v_loop_filter_luma10_weak_avx: 42.4 ( 2.09x)
hevc_v_loop_filter_luma12_skip_c: 16.7 ( 1.00x)
hevc_v_loop_filter_luma12_skip_sse2: 11.7 ( 1.42x)
hevc_v_loop_filter_luma12_skip_ssse3: 10.5 ( 1.59x)
hevc_v_loop_filter_luma12_skip_avx: 8.8 ( 1.90x)
hevc_v_loop_filter_luma12_strong_c: 159.4 ( 1.00x)
hevc_v_loop_filter_luma12_strong_sse2: 45.2 ( 3.53x)
hevc_v_loop_filter_luma12_strong_ssse3: 59.3 ( 2.69x)
hevc_v_loop_filter_luma12_strong_avx: 41.7 ( 3.82x)
hevc_v_loop_filter_luma12_weak_c: 63.3 ( 1.00x)
hevc_v_loop_filter_luma12_weak_sse2: 44.9 ( 1.41x)
hevc_v_loop_filter_luma12_weak_ssse3: 10.5 ( 6.02x)
hevc_v_loop_filter_luma12_weak_avx: 41.7 ( 1.52x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
0843252229
avcodec/x86/hevc/deblock: avoid unused GPR
...
r12 is unused, so use it instead of r13 to reduce
the amount of push/pops.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
0aad8b860a
avcodec/x86/hevc/deblock: Avoid vmovdqa
...
(It would even be possible to avoid a clobbering m10 in
MASKED_COPY and the mask register (%3) in MASKED_COPY2
when VEX encoding is in use.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
c940128fff
avcodec/x86/vp9lpf: Avoid vmovdqa
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 11:54:57 +01:00
Andreas Rheinhardt
c898ddb8fe
avcodec/x86/cfhddsp: Reduce number of xmm registers used
...
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 01:33:40 +01:00
Andreas Rheinhardt
848c3ca772
avcodec/x86/cfhddsp: Avoid pmaddwd
...
The result of using pmaddwd with the coefficients 1,-1,...,1,-1
is just the negative of using pmaddwd with the coefficients
-1,1,...,-1,1, so avoid one pmaddwd.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 01:33:37 +01:00
Andreas Rheinhardt
6224445753
avcodec/x86/cfhdencdsp: Avoid += x, -= x
...
Avoid incrementing lowq and highq inside the loop by using
complex addressing modes, avoiding to undo said modification
at the end of the horizontal loop.
For inputq, modify istrideq outside of the loop so that
it is only modified once at the end of the horizontal loop.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 01:33:34 +01:00
Andreas Rheinhardt
7dd6487800
avcodec/x86/cfhdencdsp: Don't load twice
...
Sign extend the integer arguments directly from the stack
instead of loading qwords, followed by sign-extending the
lower half.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 01:33:30 +01:00
Andreas Rheinhardt
91c7710412
avcodec/x86/cfhdencdsp: Avoid unnecessary constants
...
Up until now, cfhdencdsp used constants consisting
of -1, 1, ...,-1,1 words and 1, -1,...,1,-1 words
for use as constants in pmaddwd. But one can use
the same constants if one shuffles the words in
a dword the opposite order. Similarly for some other
constants. This also allowed to avoid a register in
chfdenc_vert_filter.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 01:33:23 +01:00
Andreas Rheinhardt
cd3d8116fb
avcodec/x86/cfhdencdsp: Avoid load of -1
...
It can be easily generated at runtime.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-29 01:32:57 +01:00
Kasidis Arunruangsirilert
e9e8a32b29
avcodec/nvenc: add 4-way multi nvenc split frame encoding support
2026-01-27 12:58:46 +00:00
Diego de Souza
499b5f5f92
avcodec/nvenc: add b_adapt option for HEVC encoder
...
The b_adapt option allows users to control adaptive B-frame decision
when lookahead is enabled in HEVC encoding. This feature was already
available for H.264 and AV1 encoders, but was missing from HEVC.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com >
2026-01-27 12:58:08 +00:00
Andreas Rheinhardt
bf4d5037b4
avcodec/h264dsp: Remove redundant h264 from H264DSPCtx member names
...
These names are a remnant of dsputil when all the DSP functions
from all codecs were part of DSPcontext.
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net >
Reviewed-by: Sean McGovern <gseanmcg@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-25 22:53:25 +01:00
Andreas Rheinhardt
489aaf4e1c
avcodec/x86/h264_deblock: Don't sign-extend stride
...
Unnecessary (and wrong) since d5d699ab6e .
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
db66e057eb
avcodec/x86/h264_deblock: Avoid reload
...
Old benchmarks:
h264_h_loop_filter_luma_8bpp_c: 60.0 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2: 65.4 ( 0.92x)
h264_h_loop_filter_luma_8bpp_avx: 65.3 ( 0.92x)
New benchmarks:
h264_h_loop_filter_luma_8bpp_c: 60.4 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2: 62.0 ( 0.97x)
h264_h_loop_filter_luma_8bpp_avx: 61.7 ( 0.98x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
8428a412bc
avcodec/x86/h264_deblock: Avoid MMX in deblock_h_luma_8
...
Old benchmarks:
h264_h_loop_filter_luma_8bpp_c: 59.9 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2: 67.9 ( 0.88x)
h264_h_loop_filter_luma_8bpp_avx: 67.4 ( 0.89x)
New benchmarks:
h264_h_loop_filter_luma_8bpp_c: 60.0 ( 1.00x)
h264_h_loop_filter_luma_8bpp_sse2: 65.4 ( 0.92x)
h264_h_loop_filter_luma_8bpp_avx: 65.3 ( 0.92x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
9882973935
avcodec/x86/h264_deblock: Avoid reloading constant
...
No change in benchmarks.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
eaaf45fd79
avcodec/x86/h264_deblock_10bit: Simplify r0+4*r1
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2026-01-25 22:53:21 +01:00