FFmpeg

mirror of https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg synced 2026-04-20 21:00:41 +08:00

Author	SHA1	Message	Date
Ruikai Peng	e90c2ff4b5	avcodec/libdav1d: fix heap overflow in US ITU-T T.35 metadata parsing The US country_code path in parse_itut_t35_metadata() reads the the provider_code with bytestream2_get_be16u(), which is a unchecked version that does not validate the remaining length before reading. When an AV1 stream contains ITU-T T.35 metadata with country_code set to 0xB5 (which is US) and a payload shorter than 2 bytes, this results in a heap overflow reading 2 bytes past the allocation. The UK country code already guards against this issue by checking it before the unchecked read. We're using the same pattern to the US country code path. Pwno crafted an AV1 IVF with a metadata OBU containing ITU-T T.35 with country_code=0xB5 and a 1-byte payload. Decoding with libdav1d triggers the overflow. ASan says: ERROR: AddressSanitizer: heap-buffer-overflow READ of size 2 at 0x5020000003f0 thread T0 #0 bytestream_get_be16 src/libavcodec/bytestream.h:98 #1 bytestream2_get_be16u src/libavcodec/bytestream.h:98 #2 parse_itut_t35_metadata src/libavcodec/libdav1d.c:376 0x5020000003f1 is located 0 bytes after 1-byte region Found-by: Pwno	2026-04-06 23:39:40 +00:00
James Almer	757cc97790	avcodec/lcevcdec: support differing base and enhancement bitdepths Signed-off-by: James Almer <jamrial@gmail.com>	2026-04-06 14:07:59 -03:00
James Almer	3a2eae155d	avcodec/lcevcdec: add 14bit pixel formats Signed-off-by: James Almer <jamrial@gmail.com>	2026-04-06 14:07:59 -03:00
James Almer	01b0b86225	avcodec/lcevc_parser: move pixel format table to a shared file Signed-off-by: James Almer <jamrial@gmail.com>	2026-04-06 14:07:59 -03:00
Andreas Rheinhardt	7fd2be97b9	avcodec/x86/h264_chromamc: Avoid mmx in chroma_mc8_ssse3 functions No impact on performance here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-06 11:28:49 +02:00
Andreas Rheinhardt	e1297f3080	avcodec/x86/h264_idct: Use tmp reg in SUMSUB_BA if possible It allows to exchange a paddw by a movdqa. Old benchmarks: idct8_add4_8bpp_c: 664.6 ( 1.00x) idct8_add4_8bpp_sse2: 142.2 ( 4.67x) idct8_add_8bpp_c: 215.5 ( 1.00x) idct8_add_8bpp_sse2: 35.1 ( 6.14x) New benchmarks: idct8_add4_8bpp_c: 666.9 ( 1.00x) idct8_add4_8bpp_sse2: 135.3 ( 4.93x) idct8_add_8bpp_c: 217.7 ( 1.00x) idct8_add_8bpp_sse2: 34.0 ( 6.41x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-06 11:28:49 +02:00
Andreas Rheinhardt	ed116bab02	avcodec/x86/me_cmp: Use tmp reg in SUMSUB_BA if possible It allows to exchange a paddw by a movdqa. Old benchmarks: hadamard8_diff_0_c: 366.1 ( 1.00x) hadamard8_diff_0_sse2: 56.4 ( 6.49x) hadamard8_diff_0_ssse3: 53.0 ( 6.90x) hadamard8_diff_1_c: 183.0 ( 1.00x) hadamard8_diff_1_sse2: 28.0 ( 6.53x) hadamard8_diff_1_ssse3: 26.0 ( 7.03x) New benchmarks: hadamard8_diff_0_c: 371.4 ( 1.00x) hadamard8_diff_0_sse2: 55.0 ( 6.76x) hadamard8_diff_0_ssse3: 49.5 ( 7.50x) hadamard8_diff_1_c: 183.4 ( 1.00x) hadamard8_diff_1_sse2: 26.8 ( 6.85x) hadamard8_diff_1_ssse3: 23.1 ( 7.92x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-06 11:28:49 +02:00
Andreas Rheinhardt	da59f288c6	avcodec/hevc/dsp_template: Add restrict to add_residual functions Allows the compiler to optimize the the aliasing checks away and saves 5376B here (GCC 15, -O3). Also, avoid converting the stride to uint16_t for >8bpp: stride /= sizeof(pixel) will use an unsigned division (i.e. a logical right shift), which is not what is intended here. : If size_t is the corresponding unsigned type to ptrdiff_t Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-06 11:28:49 +02:00
Andreas Rheinhardt	759512d36a	avcodec/x86/cavsidct: Use tmp reg in SUMSUB_BA if possible It allows to exchange a paddw by a movdqa. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-06 11:28:49 +02:00
Andreas Rheinhardt	8b700fad94	avcodec/mpegvideoencdsp: Add restrict to shrink Makes GCC avoid creating the aliasing fallback path and saves 1280B of .text here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-06 10:39:17 +02:00
Andreas Rheinhardt	6e95052ac2	avcodec/x86/mpegvideoenc_template: Avoid indirect call Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-06 10:39:17 +02:00
Zhao Zhili	eedf8f0165	avcodec/hevc: workaround hevc-alpha videos generated by VideoToolbox Apple VideoToolbox is the dominant producer of hevc-alpha videos, but early versions generates non-standard VPS extensions that fail to parse and return AVERROR_INVALIDDATA. Fix this by returning AVERROR_PATCHWELCOME instead of AVERROR_INVALIDDATA for unsupported VPS extension configurations. Setting poc_lsb_not_present for the alpha layer in the fallback path when it has no direct dependency on the base layer, so that IDR slices on the alpha layer won't incorrectly read pic_order_cnt_lsb. Fix #22384 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-04-01 22:54:36 +08:00
Zhao Zhili	bba9bf7e7e	avcodec/libdav1d: fix null pointer dereference in LCEVC side data handling ff_frame_new_side_data() may set sd to NULL and return 0 when side_data_pref() determines that existing side data should be preferred. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-04-01 14:17:27 +00:00
Zhao Zhili	f9d289020d	avcodec/av1dec: fix null pointer dereference in LCEVC side data handling ff_frame_new_side_data() may set sd to NULL and return 0 when side_data_pref() determines that existing side data should be preferred. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-04-01 14:17:27 +00:00
Michael Niedermayer	ddcb9dd3b5	avcodec/aac/aacdec_usac: Implement missing bits of otts_bands_phase and residual_bands computation Fixes: out of array access Fixes: matejsmycka/poc.mp4 Introducing commit: `baad75cafa6bac298b72c177f657a2eb8e31cff1` — "aacdec_usac: add support for parsing Mpsp212 (MPEG surround)", 2025-11-17. Found-by: Matěj Smyčka <matejsmycka@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-03-31 22:29:18 +00:00
Lynne	9c04a40136	vulkan/ffv1: implement floating-point decoding Sponsored-by: Sovereign Tech Fund	2026-03-31 23:47:45 +02:00
Lynne	f5054f726d	ffv1enc_vulkan: implement floating-point encoding Sponsored-by: Sovereign Tech Fund	2026-03-31 23:47:45 +02:00
Lynne	29b8614e62	vulkan/ffv1: fix bitstream initialization for Golomb Was broken when we switched to descriptors. Sponsored-by: Sovereign Tech Fund	2026-03-31 23:47:45 +02:00
Andreas Rheinhardt	c1aed85491	avcodec/x86/h264_idct: Avoid spilling register unnecessarily It is only needed in the unlikely codepath. The ordinary one only uses six xmm registers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-31 17:31:58 +02:00
Araz Iusubov	3b55818764	avcodec/amfdec: set context dimensions from decoder size	2026-03-31 14:07:31 +00:00
Jun Zhao	89c21b5ab7	lavc/hevc: add aarch64 NEON for Planar prediction Add NEON-optimized implementation for HEVC intra Planar prediction at 8-bit depth, supporting all block sizes (4x4 to 32x32). Planar prediction implements bilinear interpolation using an incremental base update: base_{y+1}[x] = base_y[x] - (top[x] - left[N]), reducing per-row computation from 4 multiply-adds to 1 subtract + 1 multiply. Uses rshrn for rounded narrowing shifts, eliminating manual rounding bias. All left[y] values are broadcast in the NEON domain, avoiding GP-to-NEON transfers. 4x4 interleaves row computations across 4 rows to break dependencies. 16x16 uses v19-v22 for persistent base/decrement vectors, avoiding callee-saved register spills. 32x32 processes 8 rows per loop iteration (4 iterations total) to reduce code size while maintaining full NEON utilization. Speedup over C on Apple M4 (checkasm --bench): 4x4: 2.25x 8x8: 6.40x 16x16: 9.72x 32x32: 3.21x Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-03-30 14:32:10 +00:00
Jun Zhao	60b372c934	lavc/hevc: add aarch64 NEON for DC prediction Add NEON-optimized implementation for HEVC intra DC prediction at 8-bit depth, supporting all block sizes (4x4 to 32x32). DC prediction computes the average of top and left reference samples using uaddlv, with urshr for rounded division. For luma blocks smaller than 32x32, edge smoothing is applied: the first row and column are blended toward the reference using (ref[i] + 3*dc + 2) >> 2 computed entirely in the NEON domain. Fill stores use pre-computed address patterns to break dependency chains. Also adds the aarch64 initialization framework (Makefile, pred.c/pred.h hooks, hevcpred_init_aarch64.c). Speedup over C on Apple M4 (checkasm --bench): 4x4: 2.28x 8x8: 3.14x 16x16: 3.29x 32x32: 3.02x Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-03-30 14:32:10 +00:00
nyanmisaka	87b7e578ec	avcodec/amfenc: add encoder average QP stats This allows for real-time monitoring of the encoder's average QP in ffmpeg CLI. Signed-off-by: nyanmisaka <nst799610810@gmail.com>	2026-03-30 13:23:56 +00:00
Andreas Rheinhardt	3a1e63e007	avcodec/x86/vvc/alf: Avoid zeroing unnecessarily In case of >8bpp, there is already a zero register available (for clipping); in case of Unix64, one can simply use an unused register. Doing so reduces codesize. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	8901f858eb	avcodec/x86/vvc/alf: Hoist creating shift register out of loop Possible now that this function no longer uses unnecessarily many registers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	a3dfc511a5	avcodec/x86/vvc/alf: Don't push+pop unused register This function only uses 14 GPRs. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	5de2c4c89e	avcodec/x86/vvc/alf: Avoid reload Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	d727b7a64e	avcodec/x86/vvc/alf: Avoid modifying nonvolatile registers Avoids push+pop on Win64; in any case, using registers m0-m7 more often saves codesize. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	b1d6f31d65	avcodec/x86/vvc/alf: Use correct shift amount Fixes a bug in `94f9ad8061`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	2cce9a8279	avcodec/x86/vvc/alf: Avoid modifying nonvolatile registers Avoids push+pop on Win64; in any case, using registers m0-m7 more often saves codesize. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	cb1ffc58ca	avcodec/x86/vvc/of: Don't use ymm regs where xmm are sufficient Also use a register in the 0-7 range as clobber reg, as this reduces codesize (by 51B). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	1785542a80	avcodec/x86/vvc/of: Don't add to zero Instead rewrite the code to use assignment. Saves zeroing and additions. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	06fa26d2e8	avcodec/x86/vvc/of: Deduplicate common code The height 8 and 16 cases differ from the second BDOF mini block onwards, but even the beginning of said mini block is the same and can therefore be deduplicated. This saves 821B here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	002b3bc1b3	avcodec/x86/vvc/of: Avoid punpckldq Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	ada58bd0e2	avcodec/x86/vvc/of: Use xmm registers where sufficient Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	ad34eb2ae6	avcodec/x86/vvc/of: Correct comment Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	2570f5d307	avcodec/x86/vvc/of: Avoid scalar log2 Instead convert the integers to floats and inspect the exponent. Old benchmarks: apply_bdof_8_8x16_c: 3295.2 ( 1.00x) apply_bdof_8_8x16_avx2: 312.7 (10.54x) apply_bdof_8_16x8_c: 3269.1 ( 1.00x) apply_bdof_8_16x8_avx2: 203.6 (16.05x) apply_bdof_8_16x16_c: 6584.8 ( 1.00x) apply_bdof_8_16x16_avx2: 413.6 (15.92x) apply_bdof_10_8x16_c: 3313.9 ( 1.00x) apply_bdof_10_8x16_avx2: 321.5 (10.31x) apply_bdof_10_16x8_c: 3306.5 ( 1.00x) apply_bdof_10_16x8_avx2: 200.4 (16.50x) apply_bdof_10_16x16_c: 6659.7 ( 1.00x) apply_bdof_10_16x16_avx2: 402.4 (16.55x) apply_bdof_12_8x16_c: 3305.7 ( 1.00x) apply_bdof_12_8x16_avx2: 321.8 (10.27x) apply_bdof_12_16x8_c: 3258.1 ( 1.00x) apply_bdof_12_16x8_avx2: 198.6 (16.41x) apply_bdof_12_16x16_c: 6600.2 ( 1.00x) apply_bdof_12_16x16_avx2: 392.6 (16.81x) New benchmarks: apply_bdof_8_8x16_c: 3269.9 ( 1.00x) apply_bdof_8_8x16_avx2: 266.5 (12.27x) apply_bdof_8_16x8_c: 3252.9 ( 1.00x) apply_bdof_8_16x8_avx2: 182.6 (17.81x) apply_bdof_8_16x16_c: 6596.7 ( 1.00x) apply_bdof_8_16x16_avx2: 362.7 (18.19x) apply_bdof_10_8x16_c: 3351.3 ( 1.00x) apply_bdof_10_8x16_avx2: 269.0 (12.46x) apply_bdof_10_16x8_c: 3329.1 ( 1.00x) apply_bdof_10_16x8_avx2: 174.5 (19.08x) apply_bdof_10_16x16_c: 6654.3 ( 1.00x) apply_bdof_10_16x16_avx2: 357.8 (18.60x) apply_bdof_12_8x16_c: 3274.1 ( 1.00x) apply_bdof_12_8x16_avx2: 276.0 (11.86x) apply_bdof_12_16x8_c: 3263.5 ( 1.00x) apply_bdof_12_16x8_avx2: 176.8 (18.46x) apply_bdof_12_16x16_c: 6576.4 ( 1.00x) apply_bdof_12_16x16_avx2: 357.8 (18.38x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	03b83f8feb	avcodec/x86/vvc/of: Remove redundant instructions m8 here (corresponding to a mix of sgx2 and sgy2 in derive_bdof_vx_vy in the C version) is always nonnegative, so the psignd boils down to a check for m8 being zero. But if an entry of m8 is zero, then the corresponding entry of m9 is automatically zero, too, as sgx2 being zero implies sgxdi being zero and sgy2 implies sgxgy, sgydi being zero.* So just remove these redundant instructions. *: In other words, one could remove the sgx2,sgy2>0 checks from the end of derive_bdof_vx_vy() as long as av_log2(0) is defined. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
James Almer	ad7d270935	avcodec/libdav1d: call ff_attach_decode_data() on output frames This will allow the injection of LCEVC side data. Signed-off-by: James Almer <jamrial@gmail.com>	2026-03-28 22:07:54 -03:00
James Almer	823c6fc0b8	avcodec/decode: make LCEVC injection available to decoders that don't call ff_get_buffer() Signed-off-by: James Almer <jamrial@gmail.com>	2026-03-28 22:07:54 -03:00
James Almer	8528c697c7	avcodec/av1dec: add support for LCEVC ITU-T35 payloads Signed-off-by: James Almer <jamrial@gmail.com>	2026-03-28 22:07:54 -03:00
James Almer	4c7a8df34d	avcodec/av1dec: refactor parsing ITU-T35 metadata Use a switch case. Will be useful in the following commit. Signed-off-by: James Almer <jamrial@gmail.com>	2026-03-28 22:07:54 -03:00
James Almer	29d8c2af4d	avcodec/libdav1d: add support for LCEVC ITU-T35 payloads Signed-off-by: James Almer <jamrial@gmail.com>	2026-03-28 22:07:54 -03:00
James Almer	fe1ffd63fb	avcodec/libdav1d: refactor parsing ITU-T35 metadata Use a switch case. Will be useful in the following commit. Signed-off-by: James Almer <jamrial@gmail.com>	2026-03-28 22:07:54 -03:00
Andreas Rheinhardt	1a7979a2f8	avcodec/x86/h26x/h2656_inter: Simplify splatting coefficients For pre-AVX2, vpbroadcastw is emulated via a load, followed by two shuffles. Yet given that one always wants to splat multiple pairs of coefficients which are adjacent in memory, one can do better than that: Load all of them at once, perform a punpcklwd with itself and use one pshufd per register. In case one has to sign-extend the coefficients, too, one can replace the punpcklwd with one pmovsxbw (instead of one per register) and use pshufd directly afterwards. This saved 4816B of .text here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-29 01:05:23 +01:00
Andreas Rheinhardt	a72b00675c	avcodec/x86/h26x/h2656_inter: Don't prepare unused coeffs for hv funcs 8 tap motion compensation functions with both vertical and horizontal components are under severe register pressure, so that the filter coefficients have to be put on the stack. Before this commit, this meant that coefficients for use with pmaddubsw and pmaddwd were always created. Yet this is completely unnecessary, as every such register is only used for exactly one purpose and it is known at compile time which one it is (only 8bit horizontal filters are used with pmaddubsw), so only prepare that one. This also allows to half the amount of stack used. This saves 2432B of .text here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-29 01:05:23 +01:00
Andreas Rheinhardt	88870f33ab	avcodec/x86/h26x/h2656_inter: Remove always-true checks It has already been checked before that we are only dealing with high bitdepth here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-29 01:05:23 +01:00
Andreas Rheinhardt	c00721310f	avcodec/x86/hevc/deblock: Avoid vmovdqa Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-29 01:05:23 +01:00
Andreas Rheinhardt	4c179adeaf	avcodec/Makefile: Add avformat->h2645_parse.o lcevctab.o dependencies Fixes static --disable-everything builds. Forgotten in `053822d9ce` and `49c449b33a`. Reviewed-by: Kacper Michajłow <kasper93@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-28 23:25:31 +01:00
Andreas Rheinhardt	e91727e7ef	avcodec/x86/mpeg4videodsp: Fix build failure without x86asm Since `ba793127c4`, the x86 mpeg4videodsp code uses ff_emulated_edge_mc_sse2() instead of ff_emulated_edge_mc_8. This leads to linker errors when x86asm is disabled. Fix this by also falling back to ff_gmc_c() in case edge emulation is needed with external SSE2 being unavailable. An alternative is to go back to ff_emulated_edge_mc_8(), but this would readd the uglyness to videodsp for a niche case. Reported-by: James Almer <jamrial@gmail.com> Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-28 22:39:05 +01:00

1 2 3 4 5 ...

53896 Commits