FFmpeg

mirror of https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg synced 2026-04-20 21:00:41 +08:00

Author	SHA1	Message	Date
Ramiro Polla	9ee6136ece	avcodec/mjpegdec: remove start_code field from MJpegDecodeContext Instead, pass it as a parameter to the only function that uses it.	2026-02-09 17:52:01 +00:00
Andreas Rheinhardt	1218a8a922	avcodec/rangecoder: Fix indentation Forgotten after `832649986c` and `d147b3d7ec`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-09 17:36:40 +00:00
Hassan Hany	273b161a98	avcodec/exif: skip EXIF entries with invalid TIFF field type 0 EXIF IFD entries with TIFF field type 0 are invalid per the specification. Without a check, exif_read_values() fails to allocate entry->value, causing an out of memory error. This patch skips such entries early during parsing, allowing decoding to continue normally. Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/21623	2026-02-08 19:56:20 +00:00
Michael Niedermayer	5f84a7263e	avcodec/adpcm: Check input buffer size Larger values will lead to integer overflows in intermediates No testcase Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-08 14:46:56 +00:00
Andreas Rheinhardt	0fefecd53f	Revert "avcodec/opus/parse: export the packet and extradata parsing functions" This reverts commit `aa20d7b3e8`. Adding these avpriv functions is absolutely overblown: Muxers can get the desired duration in a few lines themselves. In particular, using the parse functions from this file necessitated parsing the extradata (and entailed exporting the parsing function), although it was only used to know whether the frames are self-delimiting, but everything of interest to a muxer does not depend on this at all. The commit to be reverted also made several structures part of the ABI, which should be avoided in general. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-06 14:09:15 +01:00
Andreas Rheinhardt	12747e6296	avformat/matroskaenc: Parse Opus packet durations ourselves This avoids avpriv functions from lavc/opus/parse.c (which parse way more than we need, necessitating parsing the extradata). It furthermore makes the output of the muxer consistent, i.e. no longer depending upon whether the Opus parser or decoder are enabled (the avpriv functions would just return AVERROR(ENOSYS)). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-06 14:05:14 +01:00
Andreas Rheinhardt	853843d86f	avcodec/opus/parse: Move frame_duration tab into a file of its own This is in preparation for duplicating it into libavformat. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-02-06 14:04:50 +01:00
James Almer	aa20d7b3e8	avcodec/opus/parse: export the packet and extradata parsing functions Needed for the following commit. Signed-off-by: James Almer <jamrial@gmail.com>	2026-02-05 23:21:49 -03:00
Michael Niedermayer	8f57b04fe5	avcodec/hevc/sei: Use get_bits64() in decode_nal_sei_3d_reference_displays_info() Fixes: Assertion n>=0 && n<=32 failed at ./libavcodec/get_bits.h:426 Fixes: 468435217/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-4644127078940672 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 20:20:08 +00:00
Michael Niedermayer	af86f0ffcc	avcodec/dca_xll: Clear padding in ff_dca_xll_parse() Fixes: Use of uninitialized memory Fixes: 472020020/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6433045331902464 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 18:12:46 +01:00
Michael Niedermayer	189bc0aaf5	avcodec/dxv: Clear tex_data padding on reallocation dxv assumes that newly reallocated memory in tex_data is not uninitialized thus we have to do that too in case of reallocation in ff_lzf_uncompress() Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 16:29:08 +01:00
Michael Niedermayer	0f35146e27	avcodec/lzf: Remove size messing from ff_lzf_uncompress() size represents the output size randomly changing it but not reseting it on errors leaks uninitialized memory. Fixes: 475000819/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXV_DEC_fuzzer-5571269310611456 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 16:29:08 +01:00
Michael Niedermayer	5db50e8775	avcodec/ffv1enc: refine end condition In the case where the last sorted value was -1u and we where on the first pass of run1 we failed to fill the last few values of bitmap No real world testcase is known Fixes: use of uninitialized memory Fixes: 460333808/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_FFV1_fuzzer-6370167888347136 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 16:07:13 +01:00
Michael Niedermayer	11a5afea31	avcodec/dca_xll: Check get_rice_array() Fixes: use of uninitialized memory Fixes: 451655450/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DCA_DEC_fuzzer-6527248623796224 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-05 14:37:59 +01:00
Jun Zhao	27dd2f1c70	lavc/hevc: fix missing # in ldrsw immediate offset The ldrsw instruction requires immediate offset with # prefix. This fixes the syntax error introduced in commit `26752368f0` (aarch64/h26x: Add put_hevc_pel_bi_w_pixels) where the load_bi_w_pixels_param macro was added. Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-02-05 09:13:22 +08:00
Zhao Zhili	e250854ecf	aarch64/h264pred: disable inefficient functions These assembly optimizations have been identified as "performance regressions." Due to advancements in modern CPU micro-architectures and compiler optimization the C implementations now consistently outperform these handwritten routines. Test Name A55-clang M1 A76-gcc-14 A510-clang A715-clang X3-clang -------------------------------------------------------------------------------------------------------------------- pred8x8_dc_8_neon 55.9 ( 0.79x)! 0.2 ( 0.31x)! 35.7 ( 0.63x)! 98.3 ( 0.37x)! 35.9 ( 0.45x)! 33.6 ( 0.38x)! pred8x8_dc_10_neon 57.0 ( 1.04x) 0.3 ( 0.36x)! 35.9 ( 0.94x)! 98.2 ( 0.53x)! 35.8 ( 0.58x)! 33.2 ( 0.50x)! pred8x8_dc_128_8_neon 26.0 ( 0.69x)! 0.1 ( 0.43x)! 15.3 ( 0.73x)! 46.4 ( 0.36x)! 10.6 ( 0.48x)! 10.3 ( 1.09x) pred8x8_dc_128_10_neon 25.3 ( 0.99x)! 0.1 ( 0.42x)! 19.3 ( 0.48x)! 44.5 ( 0.42x)! 10.0 ( 0.61x)! 11.0 ( 1.00x) pred8x8_left_dc_8_neon 46.9 ( 0.72x)! 0.2 ( 0.26x)! 30.2 ( 0.49x)! 71.4 ( 0.39x)! 29.8 ( 0.35x)! 26.5 ( 0.44x)! pred8x8_left_dc_10_neon 45.4 ( 0.82x)! 0.2 ( 0.29x)! 28.1 ( 0.67x)! 70.2 ( 0.47x)! 30.0 ( 0.38x)! 26.5 ( 0.43x)! pred16x16_dc_8_neon 74.4 ( 1.34x) 0.3 ( 0.62x)! 44.7 ( 0.89x)! 128.0 ( 0.79x)! 48.5 ( 0.67x)! 39.4 ( 0.71x)! pred16x16_dc_128_8_neon 37.9 ( 0.79x)! 0.1 ( 0.60x)! 20.1 ( 0.80x)! 41.8 ( 0.46x)! 16.2 ( 0.81x)! 12.8 ( 0.95x)! pred16x16_left_dc_8_neon 69.9 ( 1.19x) 0.3 ( 0.46x)! 49.6 ( 0.54x)! 116.8 ( 0.62x)! 52.8 ( 0.45x)! 44.2 ( 0.51x)! pred8x8_hori_8_neon 30.6 ( 1.39x) 0.1 ( 0.45x)! 19.4 ( 0.81x)! 71.0 ( 0.50x)! 15.9 ( 0.55x)! 12.2 ( 0.94x)! pred8x8_hori_10_neon* 29.3 ( 1.82x) 0.1 ( 0.59x)! 18.5 ( 1.56x) 68.9 ( 0.64x)! 15.8 ( 0.62x)! 11.8 ( 0.97x)! pred8x8_top_dc_8_neon 35.8 ( 0.96x)! 0.1 ( 0.59x)! 16.8 ( 0.81x)! 58.9 ( 0.44x)! 11.3 ( 0.89x)! 11.4 ( 0.99x)! pred8x8_top_dc_10_neon 37.4 ( 1.24x) 0.1 ( 0.92x)! 20.4 ( 0.81x)! 59.5 ( 0.69x)! 10.5 ( 1.48x) 11.8 ( 1.02x) pred8x8_vertical_8_neon 18.3 ( 1.08x) 0.1 ( 0.54x)! 12.8 ( 0.89x)! 37.2 ( 0.40x)! 8.3 ( 0.77x)! 11.2 ( 1.00x) pred8x8_vertical_10_neon 19.0 ( 1.24x) 0.1 ( 0.55x)! 15.3 ( 0.62x)! 39.7 ( 0.50x)! 8.2 ( 0.91x)! 11.1 ( 0.99x)! - pred8x8_horizontal_10 also underperforms on new architectures, but useful on A55 and A76. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-02-04 09:06:37 +00:00
Zhao Zhili	f54841d375	avcodec/aarch64: add pngdsp Test Name A55-gcc-11 M1-clang A76-gcc-12 A510-clang X3-clang ------------------------------------------------------------------------------------------------------------------- add_bytes_l2_4096_neon 1807.2 ( 2.01x) 1.6 ( 1.94x) 333.0 ( 6.35x) 1058.2 ( 2.34x) 214.3 ( 1.99x) add_paeth_prediction_3_neon 33036.1 ( 2.41x) 145.1 ( 1.66x) 20443.3 ( 1.97x) 35225.1 ( 1.23x) 19420.8 ( 1.05x) add_paeth_prediction_4_neon 24368.6 ( 3.26x) 106.7 ( 2.01x) 15163.8 ( 2.77x) 26454.7 ( 1.62x) 14319.0 ( 1.35x) add_paeth_prediction_6_neon 17900.6 ( 4.44x) 72.0 ( 2.70x) 10214.3 ( 4.20x) 18296.9 ( 2.27x) 9693.1 ( 1.97x) add_paeth_prediction_8_neon 12615.4 ( 6.31x) 54.1 ( 2.58x) 7706.0 ( 5.45x) 13733.3 ( 2.94x) 7272.6 ( 2.63x) Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-02-04 12:05:35 +08:00
Oliver Chang	a795ca89fa	avcodec/qdm2: fix heap-use-after-free in qdm2_decode_frame The `sub_packet` index in `QDM2Context` was not reset to 0 when `qdm2_decode_frame` started processing a new packet. If an error occurred during the decoding of a previous packet, `sub_packet` would retain a non-zero value. In subsequent calls to `qdm2_decode_frame` with a new packet, this non-zero `sub_packet` value caused `qdm2_decode` to skip `qdm2_decode_super_block`. This function is responsible for initializing packet lists with pointers to the current packet's data. Skipping it led to the use of stale pointers from the previous (freed) packet, resulting in a heap-use-after-free vulnerability. This patch explicitly resets `s->sub_packet = 0` at the beginning of `qdm2_decode_frame`, ensuring correct initialization for each new packet. Fixes: OSS-Fuzz issue 476179569 (https://issues.oss-fuzz.com/issues/476179569).	2026-02-03 18:17:32 +00:00
Michael Niedermayer	2df0ef601a	avcodec/jpeg2000dec: allow bpno of -1 Fixes: tickets/4663/levels30.jp2 The file decodes without error messages and no integer overflows The file before the broader M_b check did decode with error messages and integer overflows but also no visual artifacts Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-03 12:39:32 +01:00
Michael Niedermayer	e1472a4e0c	avcodec/jpeg2000dec: allow M_b == 31 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-03 12:39:32 +01:00
Michael Niedermayer	8a3c7c9c32	avcodec/jpeg2000dec: Print bpno level when erroring out Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-03 12:39:32 +01:00
Michael Niedermayer	2efffa9ecd	avcodec/jpeg2000dec: Print M_b value when asking for a sample Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-02-03 12:39:31 +01:00
Frank Plowman	364d5dda91	lavc/vvc: Fix unchecked error codes from add_reconstructed_area	2026-01-31 13:46:13 +00:00
Frank Plowman	f9740eb969	lavc/vvc: Fix unchecked error codes from set_qp_y Fixes: clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_VVC_fuzzer-4957602162475008	2026-01-31 13:46:13 +00:00
Martin Storsjö	f74c551eaa	aarch64: Fix indentation of a few instructions This file is excempt from the indent checker script, as there are a few other bits in it that the script wants to reformat into slightly worse form, or which might not warrant being reformatted. But these instructions should indeed be indented this way.	2026-01-30 05:21:27 +00:00
James Almer	041d108958	avcodec/opus/enc: don't remove more samples than needed from the last packet The hardcoded extra 120 samples results in the side data reporting the need to discard the entire packet rather than the padding samples. This is in line with the behavior of the libopus encoder. Signed-off-by: James Almer <jamrial@gmail.com>	2026-01-29 21:09:02 -03:00
James Almer	c3aea7628c	avcodec/opus/enc: set avctx->frame_size to a better guess based on encoder configuration Signed-off-by: James Almer <jamrial@gmail.com>	2026-01-29 21:09:02 -03:00
Andreas Rheinhardt	ca5504fb5c	avcodec/liblc3dec: Simplify sample fmt selection Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 14:08:15 +01:00
Andreas Rheinhardt	ba1aea762b	avcodec/liblc3{dec,enc}: Simplify sample_size, is_planar check Sample size is always sizeof(float), is planar is a simple if given that these codecs only support float and planar float. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 14:08:15 +01:00
Andreas Rheinhardt	436b74b725	avcodec/x86/hevc/dequant: Add SSSE3 dequant ASM function hevc_dequant_4x4_8_c (GCC): 20.2 ( 1.00x) hevc_dequant_4x4_8_c (Clang): 21.7 ( 1.00x) hevc_dequant_4x4_8_ssse3: 5.8 ( 3.51x) hevc_dequant_8x8_8_c (GCC): 32.9 ( 1.00x) hevc_dequant_8x8_8_c (Clang): 78.7 ( 1.00x) hevc_dequant_8x8_8_ssse3: 6.8 ( 4.83x) hevc_dequant_16x16_8_c (GCC): 105.1 ( 1.00x) hevc_dequant_16x16_8_c (Clang): 151.1 ( 1.00x) hevc_dequant_16x16_8_ssse3: 19.3 ( 5.45x) hevc_dequant_32x32_8_c (GCC): 415.7 ( 1.00x) hevc_dequant_32x32_8_c (Clang): 602.3 ( 1.00x) hevc_dequant_32x32_8_ssse3: 78.2 ( 5.32x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 12:25:33 +01:00
Andreas Rheinhardt	cf359a7907	avcodec/hevc/dsp: Add alignment for dequant Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 12:25:33 +01:00
Andreas Rheinhardt	0c7f87b136	avcodec/hevc/dsp_template: Optimize impossible branches away Saves 1856B of .text here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 12:25:33 +01:00
Andreas Rheinhardt	2729c52988	avcodec/x86/hevc/deblock: Reduce usage of GPRs Don't use two GPRs to store two words from xmm registers; shuffle these words so that they are fit into one GPR. This reduces the amount of GPRs used and leads to tiny speedups here. Also avoid rex prefixes whenever possible (for lines that needed to be modified anyway). Old benchmarks: hevc_h_loop_filter_luma8_skip_c: 23.8 ( 1.00x) hevc_h_loop_filter_luma8_skip_sse2: 8.5 ( 2.80x) hevc_h_loop_filter_luma8_skip_ssse3: 7.2 ( 3.29x) hevc_h_loop_filter_luma8_skip_avx: 6.4 ( 3.71x) hevc_h_loop_filter_luma8_strong_c: 150.4 ( 1.00x) hevc_h_loop_filter_luma8_strong_sse2: 34.4 ( 4.37x) hevc_h_loop_filter_luma8_strong_ssse3: 34.5 ( 4.36x) hevc_h_loop_filter_luma8_strong_avx: 32.3 ( 4.65x) hevc_h_loop_filter_luma8_weak_c: 103.2 ( 1.00x) hevc_h_loop_filter_luma8_weak_sse2: 34.5 ( 2.99x) hevc_h_loop_filter_luma8_weak_ssse3: 7.3 (14.22x) hevc_h_loop_filter_luma8_weak_avx: 32.4 ( 3.18x) hevc_h_loop_filter_luma10_skip_c: 23.5 ( 1.00x) hevc_h_loop_filter_luma10_skip_sse2: 6.6 ( 3.58x) hevc_h_loop_filter_luma10_skip_ssse3: 6.1 ( 3.86x) hevc_h_loop_filter_luma10_skip_avx: 5.4 ( 4.34x) hevc_h_loop_filter_luma10_strong_c: 161.8 ( 1.00x) hevc_h_loop_filter_luma10_strong_sse2: 32.2 ( 5.03x) hevc_h_loop_filter_luma10_strong_ssse3: 30.4 ( 5.33x) hevc_h_loop_filter_luma10_strong_avx: 30.3 ( 5.33x) hevc_h_loop_filter_luma10_weak_c: 23.5 ( 1.00x) hevc_h_loop_filter_luma10_weak_sse2: 6.6 ( 3.58x) hevc_h_loop_filter_luma10_weak_ssse3: 6.1 ( 3.85x) hevc_h_loop_filter_luma10_weak_avx: 5.4 ( 4.35x) hevc_h_loop_filter_luma12_skip_c: 18.8 ( 1.00x) hevc_h_loop_filter_luma12_skip_sse2: 6.6 ( 2.87x) hevc_h_loop_filter_luma12_skip_ssse3: 6.1 ( 3.08x) hevc_h_loop_filter_luma12_skip_avx: 6.2 ( 3.06x) hevc_h_loop_filter_luma12_strong_c: 159.0 ( 1.00x) hevc_h_loop_filter_luma12_strong_sse2: 36.3 ( 4.38x) hevc_h_loop_filter_luma12_strong_ssse3: 36.1 ( 4.40x) hevc_h_loop_filter_luma12_strong_avx: 33.5 ( 4.75x) hevc_h_loop_filter_luma12_weak_c: 40.1 ( 1.00x) hevc_h_loop_filter_luma12_weak_sse2: 35.5 ( 1.13x) hevc_h_loop_filter_luma12_weak_ssse3: 36.1 ( 1.11x) hevc_h_loop_filter_luma12_weak_avx: 6.2 ( 6.52x) hevc_v_loop_filter_luma8_skip_c: 25.5 ( 1.00x) hevc_v_loop_filter_luma8_skip_sse2: 10.6 ( 2.40x) hevc_v_loop_filter_luma8_skip_ssse3: 11.4 ( 2.24x) hevc_v_loop_filter_luma8_skip_avx: 8.3 ( 3.07x) hevc_v_loop_filter_luma8_strong_c: 146.8 ( 1.00x) hevc_v_loop_filter_luma8_strong_sse2: 43.9 ( 3.35x) hevc_v_loop_filter_luma8_strong_ssse3: 43.7 ( 3.36x) hevc_v_loop_filter_luma8_strong_avx: 42.3 ( 3.47x) hevc_v_loop_filter_luma8_weak_c: 25.5 ( 1.00x) hevc_v_loop_filter_luma8_weak_sse2: 10.6 ( 2.40x) hevc_v_loop_filter_luma8_weak_ssse3: 44.0 ( 0.58x) hevc_v_loop_filter_luma8_weak_avx: 8.3 ( 3.09x) hevc_v_loop_filter_luma10_skip_c: 20.0 ( 1.00x) hevc_v_loop_filter_luma10_skip_sse2: 11.3 ( 1.77x) hevc_v_loop_filter_luma10_skip_ssse3: 11.0 ( 1.82x) hevc_v_loop_filter_luma10_skip_avx: 9.3 ( 2.15x) hevc_v_loop_filter_luma10_strong_c: 193.5 ( 1.00x) hevc_v_loop_filter_luma10_strong_sse2: 46.1 ( 4.19x) hevc_v_loop_filter_luma10_strong_ssse3: 44.2 ( 4.38x) hevc_v_loop_filter_luma10_strong_avx: 44.4 ( 4.35x) hevc_v_loop_filter_luma10_weak_c: 90.3 ( 1.00x) hevc_v_loop_filter_luma10_weak_sse2: 46.3 ( 1.95x) hevc_v_loop_filter_luma10_weak_ssse3: 10.8 ( 8.37x) hevc_v_loop_filter_luma10_weak_avx: 44.4 ( 2.03x) hevc_v_loop_filter_luma12_skip_c: 16.8 ( 1.00x) hevc_v_loop_filter_luma12_skip_sse2: 11.8 ( 1.42x) hevc_v_loop_filter_luma12_skip_ssse3: 11.7 ( 1.43x) hevc_v_loop_filter_luma12_skip_avx: 8.7 ( 1.93x) hevc_v_loop_filter_luma12_strong_c: 159.3 ( 1.00x) hevc_v_loop_filter_luma12_strong_sse2: 45.3 ( 3.52x) hevc_v_loop_filter_luma12_strong_ssse3: 60.3 ( 2.64x) hevc_v_loop_filter_luma12_strong_avx: 44.1 ( 3.61x) hevc_v_loop_filter_luma12_weak_c: 63.6 ( 1.00x) hevc_v_loop_filter_luma12_weak_sse2: 45.3 ( 1.40x) hevc_v_loop_filter_luma12_weak_ssse3: 11.7 ( 5.41x) hevc_v_loop_filter_luma12_weak_avx: 43.9 ( 1.45x) New benchmarks: hevc_h_loop_filter_luma8_skip_c: 24.2 ( 1.00x) hevc_h_loop_filter_luma8_skip_sse2: 8.6 ( 2.82x) hevc_h_loop_filter_luma8_skip_ssse3: 7.0 ( 3.46x) hevc_h_loop_filter_luma8_skip_avx: 6.8 ( 3.54x) hevc_h_loop_filter_luma8_strong_c: 150.4 ( 1.00x) hevc_h_loop_filter_luma8_strong_sse2: 33.3 ( 4.52x) hevc_h_loop_filter_luma8_strong_ssse3: 32.7 ( 4.61x) hevc_h_loop_filter_luma8_strong_avx: 32.7 ( 4.60x) hevc_h_loop_filter_luma8_weak_c: 104.0 ( 1.00x) hevc_h_loop_filter_luma8_weak_sse2: 33.2 ( 3.13x) hevc_h_loop_filter_luma8_weak_ssse3: 7.0 (14.91x) hevc_h_loop_filter_luma8_weak_avx: 31.3 ( 3.32x) hevc_h_loop_filter_luma10_skip_c: 19.2 ( 1.00x) hevc_h_loop_filter_luma10_skip_sse2: 6.2 ( 3.08x) hevc_h_loop_filter_luma10_skip_ssse3: 6.2 ( 3.08x) hevc_h_loop_filter_luma10_skip_avx: 5.0 ( 3.85x) hevc_h_loop_filter_luma10_strong_c: 159.8 ( 1.00x) hevc_h_loop_filter_luma10_strong_sse2: 30.0 ( 5.32x) hevc_h_loop_filter_luma10_strong_ssse3: 29.2 ( 5.48x) hevc_h_loop_filter_luma10_strong_avx: 28.6 ( 5.58x) hevc_h_loop_filter_luma10_weak_c: 19.2 ( 1.00x) hevc_h_loop_filter_luma10_weak_sse2: 6.2 ( 3.09x) hevc_h_loop_filter_luma10_weak_ssse3: 6.2 ( 3.09x) hevc_h_loop_filter_luma10_weak_avx: 5.0 ( 3.88x) hevc_h_loop_filter_luma12_skip_c: 18.7 ( 1.00x) hevc_h_loop_filter_luma12_skip_sse2: 6.2 ( 3.00x) hevc_h_loop_filter_luma12_skip_ssse3: 5.7 ( 3.27x) hevc_h_loop_filter_luma12_skip_avx: 5.2 ( 3.61x) hevc_h_loop_filter_luma12_strong_c: 160.2 ( 1.00x) hevc_h_loop_filter_luma12_strong_sse2: 34.2 ( 4.68x) hevc_h_loop_filter_luma12_strong_ssse3: 29.3 ( 5.48x) hevc_h_loop_filter_luma12_strong_avx: 31.4 ( 5.10x) hevc_h_loop_filter_luma12_weak_c: 40.2 ( 1.00x) hevc_h_loop_filter_luma12_weak_sse2: 35.2 ( 1.14x) hevc_h_loop_filter_luma12_weak_ssse3: 29.3 ( 1.37x) hevc_h_loop_filter_luma12_weak_avx: 5.0 ( 8.09x) hevc_v_loop_filter_luma8_skip_c: 25.6 ( 1.00x) hevc_v_loop_filter_luma8_skip_sse2: 10.2 ( 2.52x) hevc_v_loop_filter_luma8_skip_ssse3: 10.5 ( 2.45x) hevc_v_loop_filter_luma8_skip_avx: 8.2 ( 3.11x) hevc_v_loop_filter_luma8_strong_c: 147.1 ( 1.00x) hevc_v_loop_filter_luma8_strong_sse2: 42.6 ( 3.45x) hevc_v_loop_filter_luma8_strong_ssse3: 42.4 ( 3.47x) hevc_v_loop_filter_luma8_strong_avx: 40.1 ( 3.67x) hevc_v_loop_filter_luma8_weak_c: 25.6 ( 1.00x) hevc_v_loop_filter_luma8_weak_sse2: 10.6 ( 2.42x) hevc_v_loop_filter_luma8_weak_ssse3: 42.7 ( 0.60x) hevc_v_loop_filter_luma8_weak_avx: 8.2 ( 3.11x) hevc_v_loop_filter_luma10_skip_c: 16.7 ( 1.00x) hevc_v_loop_filter_luma10_skip_sse2: 11.0 ( 1.52x) hevc_v_loop_filter_luma10_skip_ssse3: 10.5 ( 1.59x) hevc_v_loop_filter_luma10_skip_avx: 9.6 ( 1.74x) hevc_v_loop_filter_luma10_strong_c: 190.0 ( 1.00x) hevc_v_loop_filter_luma10_strong_sse2: 44.8 ( 4.24x) hevc_v_loop_filter_luma10_strong_ssse3: 42.3 ( 4.49x) hevc_v_loop_filter_luma10_strong_avx: 42.5 ( 4.47x) hevc_v_loop_filter_luma10_weak_c: 88.3 ( 1.00x) hevc_v_loop_filter_luma10_weak_sse2: 45.7 ( 1.93x) hevc_v_loop_filter_luma10_weak_ssse3: 10.5 ( 8.40x) hevc_v_loop_filter_luma10_weak_avx: 42.4 ( 2.09x) hevc_v_loop_filter_luma12_skip_c: 16.7 ( 1.00x) hevc_v_loop_filter_luma12_skip_sse2: 11.7 ( 1.42x) hevc_v_loop_filter_luma12_skip_ssse3: 10.5 ( 1.59x) hevc_v_loop_filter_luma12_skip_avx: 8.8 ( 1.90x) hevc_v_loop_filter_luma12_strong_c: 159.4 ( 1.00x) hevc_v_loop_filter_luma12_strong_sse2: 45.2 ( 3.53x) hevc_v_loop_filter_luma12_strong_ssse3: 59.3 ( 2.69x) hevc_v_loop_filter_luma12_strong_avx: 41.7 ( 3.82x) hevc_v_loop_filter_luma12_weak_c: 63.3 ( 1.00x) hevc_v_loop_filter_luma12_weak_sse2: 44.9 ( 1.41x) hevc_v_loop_filter_luma12_weak_ssse3: 10.5 ( 6.02x) hevc_v_loop_filter_luma12_weak_avx: 41.7 ( 1.52x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 11:54:57 +01:00
Andreas Rheinhardt	0843252229	avcodec/x86/hevc/deblock: avoid unused GPR r12 is unused, so use it instead of r13 to reduce the amount of push/pops. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 11:54:57 +01:00
Andreas Rheinhardt	0aad8b860a	avcodec/x86/hevc/deblock: Avoid vmovdqa (It would even be possible to avoid a clobbering m10 in MASKED_COPY and the mask register (%3) in MASKED_COPY2 when VEX encoding is in use.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 11:54:57 +01:00
Andreas Rheinhardt	c940128fff	avcodec/x86/vp9lpf: Avoid vmovdqa Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 11:54:57 +01:00
Andreas Rheinhardt	c898ddb8fe	avcodec/x86/cfhddsp: Reduce number of xmm registers used Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:40 +01:00
Andreas Rheinhardt	848c3ca772	avcodec/x86/cfhddsp: Avoid pmaddwd The result of using pmaddwd with the coefficients 1,-1,...,1,-1 is just the negative of using pmaddwd with the coefficients -1,1,...,-1,1, so avoid one pmaddwd. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:37 +01:00
Andreas Rheinhardt	6224445753	avcodec/x86/cfhdencdsp: Avoid += x, -= x Avoid incrementing lowq and highq inside the loop by using complex addressing modes, avoiding to undo said modification at the end of the horizontal loop. For inputq, modify istrideq outside of the loop so that it is only modified once at the end of the horizontal loop. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:34 +01:00
Andreas Rheinhardt	7dd6487800	avcodec/x86/cfhdencdsp: Don't load twice Sign extend the integer arguments directly from the stack instead of loading qwords, followed by sign-extending the lower half. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:30 +01:00
Andreas Rheinhardt	91c7710412	avcodec/x86/cfhdencdsp: Avoid unnecessary constants Up until now, cfhdencdsp used constants consisting of -1, 1, ...,-1,1 words and 1, -1,...,1,-1 words for use as constants in pmaddwd. But one can use the same constants if one shuffles the words in a dword the opposite order. Similarly for some other constants. This also allowed to avoid a register in chfdenc_vert_filter. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:33:23 +01:00
Andreas Rheinhardt	cd3d8116fb	avcodec/x86/cfhdencdsp: Avoid load of -1 It can be easily generated at runtime. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-29 01:32:57 +01:00
Kasidis Arunruangsirilert	e9e8a32b29	avcodec/nvenc: add 4-way multi nvenc split frame encoding support	2026-01-27 12:58:46 +00:00
Diego de Souza	499b5f5f92	avcodec/nvenc: add b_adapt option for HEVC encoder The b_adapt option allows users to control adaptive B-frame decision when lookahead is enabled in HEVC encoding. This feature was already available for H.264 and AV1 encoders, but was missing from HEVC. Signed-off-by: Diego de Souza <ddesouza@nvidia.com>	2026-01-27 12:58:08 +00:00
Andreas Rheinhardt	bf4d5037b4	avcodec/h264dsp: Remove redundant h264 from H264DSPCtx member names These names are a remnant of dsputil when all the DSP functions from all codecs were part of DSPcontext. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Reviewed-by: Sean McGovern <gseanmcg@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:25 +01:00
Andreas Rheinhardt	489aaf4e1c	avcodec/x86/h264_deblock: Don't sign-extend stride Unnecessary (and wrong) since `d5d699ab6e`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	db66e057eb	avcodec/x86/h264_deblock: Avoid reload Old benchmarks: h264_h_loop_filter_luma_8bpp_c: 60.0 ( 1.00x) h264_h_loop_filter_luma_8bpp_sse2: 65.4 ( 0.92x) h264_h_loop_filter_luma_8bpp_avx: 65.3 ( 0.92x) New benchmarks: h264_h_loop_filter_luma_8bpp_c: 60.4 ( 1.00x) h264_h_loop_filter_luma_8bpp_sse2: 62.0 ( 0.97x) h264_h_loop_filter_luma_8bpp_avx: 61.7 ( 0.98x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	8428a412bc	avcodec/x86/h264_deblock: Avoid MMX in deblock_h_luma_8 Old benchmarks: h264_h_loop_filter_luma_8bpp_c: 59.9 ( 1.00x) h264_h_loop_filter_luma_8bpp_sse2: 67.9 ( 0.88x) h264_h_loop_filter_luma_8bpp_avx: 67.4 ( 0.89x) New benchmarks: h264_h_loop_filter_luma_8bpp_c: 60.0 ( 1.00x) h264_h_loop_filter_luma_8bpp_sse2: 65.4 ( 0.92x) h264_h_loop_filter_luma_8bpp_avx: 65.3 ( 0.92x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	9882973935	avcodec/x86/h264_deblock: Avoid reloading constant No change in benchmarks. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00
Andreas Rheinhardt	eaaf45fd79	avcodec/x86/h264_deblock_10bit: Simplify r0+4*r1 Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-01-25 22:53:21 +01:00

1 2 3 4 5 ...

53465 Commits