FFmpeg

mirror of https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg synced 2026-04-20 21:00:41 +08:00

Author	SHA1	Message	Date
Niklas Haas	50793bc9bd	swscale/ops_chain: remove unused helper function Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-02 11:48:15 +00:00
Niklas Haas	c24d67a0ff	swscale/vulkan/ops: use QSTR/QTYPE to print all rationals Now this helper is a bit more useful. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-02 11:48:15 +00:00
Niklas Haas	7a4cffa25d	swscale/vulkan/ops: simplify QTYPE macro There's no reason for this macro to hard-code op->c.q4[i]. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-04-02 11:48:15 +00:00
Zhao Zhili	eedf8f0165	avcodec/hevc: workaround hevc-alpha videos generated by VideoToolbox Apple VideoToolbox is the dominant producer of hevc-alpha videos, but early versions generates non-standard VPS extensions that fail to parse and return AVERROR_INVALIDDATA. Fix this by returning AVERROR_PATCHWELCOME instead of AVERROR_INVALIDDATA for unsupported VPS extension configurations. Setting poc_lsb_not_present for the alpha layer in the fallback path when it has no direct dependency on the base layer, so that IDR slices on the alpha layer won't incorrectly read pic_order_cnt_lsb. Fix #22384 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-04-01 22:54:36 +08:00
Zhao Zhili	28ab24b717	avformat/matroskadec: avoid calling get_bytes_left() three times with the same state Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-04-01 14:19:35 +00:00
Zhao Zhili	bba9bf7e7e	avcodec/libdav1d: fix null pointer dereference in LCEVC side data handling ff_frame_new_side_data() may set sd to NULL and return 0 when side_data_pref() determines that existing side data should be preferred. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-04-01 14:17:27 +00:00
Zhao Zhili	f9d289020d	avcodec/av1dec: fix null pointer dereference in LCEVC side data handling ff_frame_new_side_data() may set sd to NULL and return 0 when side_data_pref() determines that existing side data should be preferred. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-04-01 14:17:27 +00:00
Andreas Rheinhardt	f6bbd63557	avutil/tests/.gitignore: Add recently added test tools Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-04-01 14:04:16 +00:00
Zhao Zhili	316531e61c	avfilter/vidstabtransform: always use in-place transform path libvidstab's vsTransformPrepare() takes different internal code paths for in-place (src == dest) vs. separate-buffer operation. The separate-buffer path stores a shallow copy of the source frame pointer in td->src without allocating internal memory (srcMalloced stays 0). When a subsequent frame takes the in-place path, vsFrameIsNull(&td->src) is false so vsFrameAllocate() is skipped, and vsFrameCopy() writes into the stale pointer left over from the previous frame, corrupting memory that the caller no longer owns. Whether a given frame is writable depends on pipeline scheduling and frame reference management, which can change between FFmpeg versions. Since FFmpeg 8.1, changes in the scheduler caused some frames to arrive as non-writable, leading to alternation between in-place and separate-buffer paths that triggered the bug. Fix this by marking the input pad with AVFILTERPAD_FLAG_NEEDS_WRITABLE. Fix #22595	2026-04-01 21:56:37 +08:00
Zhao Zhili	c695ad1197	avfilter/vidstabtransform: use existing ctx variable for outlink Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2026-04-01 21:56:37 +08:00
Brad Smith	e64a1d2953	libavutil/ppc: Remove mfspr-based AltiVec detection code for Linux The getauxval() and auxv methods cover the last 25+ years of Linux. Signed-off-by: Brad Smith <brad@comstyle.com>	2026-04-01 04:33:44 +00:00
Michael Niedermayer	ddcb9dd3b5	avcodec/aac/aacdec_usac: Implement missing bits of otts_bands_phase and residual_bands computation Fixes: out of array access Fixes: matejsmycka/poc.mp4 Introducing commit: `baad75cafa6bac298b72c177f657a2eb8e31cff1` — "aacdec_usac: add support for parsing Mpsp212 (MPEG surround)", 2025-11-17. Found-by: Matěj Smyčka <matejsmycka@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2026-03-31 22:29:18 +00:00
Lynne	9c04a40136	vulkan/ffv1: implement floating-point decoding Sponsored-by: Sovereign Tech Fund	2026-03-31 23:47:45 +02:00
Lynne	f5054f726d	ffv1enc_vulkan: implement floating-point encoding Sponsored-by: Sovereign Tech Fund	2026-03-31 23:47:45 +02:00
Lynne	29b8614e62	vulkan/ffv1: fix bitstream initialization for Golomb Was broken when we switched to descriptors. Sponsored-by: Sovereign Tech Fund	2026-03-31 23:47:45 +02:00
Lynne	35c6cdb191	hwcontext_vulkan: add support for GBRPF16/GBRAPF16 Sponsored-by: Sovereign Tech Fund	2026-03-31 23:47:39 +02:00
Martin Storsjö	77ff3bcb90	aarch64: Add AARCH64_VALID_JUMP_CALL_TARGET We currently don't have any cases where this is needed, but include it for completeness and clarity. These macros for BTI were added in `08b4716a9e`. A later comment in this file, added in `248986a0db`, referenced the macro AARCH64_VALID_JUMP_CALL_TARGET which never was added here before.	2026-03-31 19:57:46 +00:00
Martin Storsjö	8ed8e221bd	aarch64: Fix a URL typo This was added in `248986a0db`.	2026-03-31 19:57:46 +00:00
marcos ashton	878eabdfef	tests/fate/libavutil: add FATE test for video_enc_params Unit test covering av_video_enc_params_alloc, av_video_enc_params_block, and av_video_enc_params_create_side_data. Tests allocation for all three codec types (VP9, H264, MPEG2) and the NONE type, with 0 and 4 blocks, with and without size output. Verifies block getter indexing by writing and reading back coordinates, dimensions, and delta_qp values. Tests frame-level qp and delta_qp fields, and side data creation with frame attachment. Coverage for libavutil/video_enc_params.c: 0.00% -> 86.21% (remaining uncovered lines are OOM error paths) Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>	2026-03-31 18:05:51 +01:00
marcos ashton	c8ec660d78	tests/fate/libavutil: add FATE test for detection_bbox Unit test covering av_detection_bbox_alloc, av_get_detection_bbox, and av_detection_bbox_create_side_data. Tests allocation with 0, 1, and 4 bounding boxes, with and without size output. Verifies bbox getter indexing by writing and reading back coordinates, labels, and confidence values. Tests classify fields (labels and confidences), the header source field, and side data creation with frame attachment. Coverage for libavutil/detection_bbox.c: 0.00% -> 86.67% (remaining uncovered lines are OOM error paths) Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>	2026-03-31 18:05:51 +01:00
marcos ashton	be2fa77344	tests/fate/libavutil: add FATE test for spherical Unit test covering all 4 public API functions in libavutil/spherical.c: av_spherical_alloc, av_spherical_projection_name, av_spherical_from_name, and av_spherical_tile_bounds. Tests allocation with and without size output, all 7 projection type name lookups, projection name round-trip verification, out-of-range handling, and tile bounds computation for full-frame, quarter-tile, and centered-tile configurations. Coverage for libavutil/spherical.c: 0.00% -> 100.00% Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>	2026-03-31 18:05:51 +01:00
Andreas Rheinhardt	c1aed85491	avcodec/x86/h264_idct: Avoid spilling register unnecessarily It is only needed in the unlikely codepath. The ordinary one only uses six xmm registers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-31 17:31:58 +02:00
Andreas Rheinhardt	9fdd7e23e3	avfilter/x86/vf_atadenoise: Avoid load Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-31 16:49:51 +02:00
Araz Iusubov	3b55818764	avcodec/amfdec: set context dimensions from decoder size	2026-03-31 14:07:31 +00:00
Ramiro Polla	53537f6cf5	swscale/aarch64: mark CPS kernel functions as indirect branch targets Only the process functions are entered via an indirect _call_ from C. The kernel functions and process_return are dispatched to by indirect _branches_ instead (continuation-passing style design). Make use of the recently added "jumpable" parameter to the function macro in libavutil/aarch64/asm.S to fix these functions when BTI is enabled. Sponsored-by: Sovereign Tech Fund Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>	2026-03-31 11:48:52 +00:00
Ramiro Polla	af443abe99	aarch64: Add support for indirect branch targets in the function macro The function macro emits AARCH64_VALID_CALL_TARGET for exported symbols, marking them as valid destinations for indirect _calls_. Functions that are reached by indirect _branches_ (i.e. tail-call dispatch chains where the link register is not set) require AARCH64_VALID_JUMP_TARGET instead. This commit adds a "jumpable" parameter to the function macro that, when set, emits AARCH64_VALID_JUMP_TARGET instead of AARCH64_VALID_CALL_TARGET. Sponsored-by: Sovereign Tech Fund Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>	2026-03-31 11:48:52 +00:00
Dmitrii Gershenkop	8b93c94f47	avutil/hwcontext_amf: Add AMF_IFACE_CALL macro Using AMF interfaces in C can be cumbersome and visually difficult to process in some cases: i.e.: object->function(object, args). To improve code readability, a new macro is added. This commit is instrumental for future AMF integration refactoring.	2026-03-31 11:33:00 +00:00
Dmitrii Gershenkop	6f75e879b6	avfilter/vf_vpp_amf: Minor clean up. -vf_vpp_amf.c: Remove unused variables. -vf_amf_common.c: Fix hdrmeta_buffer memory leak. -hwcontext_amf.c: Fix av_amf_extract_hdr_metadata not picking up light metadata if display mastering metadata is not set. -doc/filters.texi: Remove irrelevant example with HDR metadata for vpp_amf.	2026-03-31 11:17:51 +00:00
Kacper Michajłow	7d57621b83	avutil/x86/x86util: tone down NASM workaround and use info section The use of code section (.text) was forced by the unreleased NASM 3.02rc3 which made the issue worse, but preventing assambling anything without code section, including when only data was present. This works fine for the most part, but using code (.text) section with IMAGE_COMDAT_SELECT_ANY causes issues with lib.exe after stripping such object: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x2 Esentially it makes our workaround not work in all cases, and while string could be disabled like it already is for MSVC/ICL builds, it used to work so let's preserve that state. This make it not compatible with NASM 3.02rc3 when CV debug info is generated, but hopefully the upstream fix will be merged before release, to avoid this regression: https://github.com/netwide-assembler/nasm/pull/221 Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2026-03-30 19:46:53 +02:00
Jun Zhao	89c21b5ab7	lavc/hevc: add aarch64 NEON for Planar prediction Add NEON-optimized implementation for HEVC intra Planar prediction at 8-bit depth, supporting all block sizes (4x4 to 32x32). Planar prediction implements bilinear interpolation using an incremental base update: base_{y+1}[x] = base_y[x] - (top[x] - left[N]), reducing per-row computation from 4 multiply-adds to 1 subtract + 1 multiply. Uses rshrn for rounded narrowing shifts, eliminating manual rounding bias. All left[y] values are broadcast in the NEON domain, avoiding GP-to-NEON transfers. 4x4 interleaves row computations across 4 rows to break dependencies. 16x16 uses v19-v22 for persistent base/decrement vectors, avoiding callee-saved register spills. 32x32 processes 8 rows per loop iteration (4 iterations total) to reduce code size while maintaining full NEON utilization. Speedup over C on Apple M4 (checkasm --bench): 4x4: 2.25x 8x8: 6.40x 16x16: 9.72x 32x32: 3.21x Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-03-30 14:32:10 +00:00
Jun Zhao	60b372c934	lavc/hevc: add aarch64 NEON for DC prediction Add NEON-optimized implementation for HEVC intra DC prediction at 8-bit depth, supporting all block sizes (4x4 to 32x32). DC prediction computes the average of top and left reference samples using uaddlv, with urshr for rounded division. For luma blocks smaller than 32x32, edge smoothing is applied: the first row and column are blended toward the reference using (ref[i] + 3*dc + 2) >> 2 computed entirely in the NEON domain. Fill stores use pre-computed address patterns to break dependency chains. Also adds the aarch64 initialization framework (Makefile, pred.c/pred.h hooks, hevcpred_init_aarch64.c). Speedup over C on Apple M4 (checkasm --bench): 4x4: 2.28x 8x8: 3.14x 16x16: 3.29x 32x32: 3.02x Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-03-30 14:32:10 +00:00
Jun Zhao	514f57f85d	tests/checkasm: add HEVC intra prediction test Add checkasm test for HEVC intra prediction covering DC, planar, and angular modes at all block sizes (4x4 to 32x32) for 8-bit and 10-bit depth. Signed-off-by: Jun Zhao <barryjzhao@tencent.com>	2026-03-30 14:32:10 +00:00
nyanmisaka	87b7e578ec	avcodec/amfenc: add encoder average QP stats This allows for real-time monitoring of the encoder's average QP in ffmpeg CLI. Signed-off-by: nyanmisaka <nst799610810@gmail.com>	2026-03-30 13:23:56 +00:00
Andreas Rheinhardt	f56d073d7e	swscale/tests/.gitignore: Add sws_ops_aarch64 Forgotten in `a1bfaa0e78`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 14:31:25 +02:00
Andreas Rheinhardt	3a1e63e007	avcodec/x86/vvc/alf: Avoid zeroing unnecessarily In case of >8bpp, there is already a zero register available (for clipping); in case of Unix64, one can simply use an unused register. Doing so reduces codesize. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	8901f858eb	avcodec/x86/vvc/alf: Hoist creating shift register out of loop Possible now that this function no longer uses unnecessarily many registers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	a3dfc511a5	avcodec/x86/vvc/alf: Don't push+pop unused register This function only uses 14 GPRs. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	5de2c4c89e	avcodec/x86/vvc/alf: Avoid reload Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	d727b7a64e	avcodec/x86/vvc/alf: Avoid modifying nonvolatile registers Avoids push+pop on Win64; in any case, using registers m0-m7 more often saves codesize. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	b1d6f31d65	avcodec/x86/vvc/alf: Use correct shift amount Fixes a bug in `94f9ad8061`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	2cce9a8279	avcodec/x86/vvc/alf: Avoid modifying nonvolatile registers Avoids push+pop on Win64; in any case, using registers m0-m7 more often saves codesize. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	cb1ffc58ca	avcodec/x86/vvc/of: Don't use ymm regs where xmm are sufficient Also use a register in the 0-7 range as clobber reg, as this reduces codesize (by 51B). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	1785542a80	avcodec/x86/vvc/of: Don't add to zero Instead rewrite the code to use assignment. Saves zeroing and additions. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	06fa26d2e8	avcodec/x86/vvc/of: Deduplicate common code The height 8 and 16 cases differ from the second BDOF mini block onwards, but even the beginning of said mini block is the same and can therefore be deduplicated. This saves 821B here. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	002b3bc1b3	avcodec/x86/vvc/of: Avoid punpckldq Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	ada58bd0e2	avcodec/x86/vvc/of: Use xmm registers where sufficient Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	ad34eb2ae6	avcodec/x86/vvc/of: Correct comment Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	2570f5d307	avcodec/x86/vvc/of: Avoid scalar log2 Instead convert the integers to floats and inspect the exponent. Old benchmarks: apply_bdof_8_8x16_c: 3295.2 ( 1.00x) apply_bdof_8_8x16_avx2: 312.7 (10.54x) apply_bdof_8_16x8_c: 3269.1 ( 1.00x) apply_bdof_8_16x8_avx2: 203.6 (16.05x) apply_bdof_8_16x16_c: 6584.8 ( 1.00x) apply_bdof_8_16x16_avx2: 413.6 (15.92x) apply_bdof_10_8x16_c: 3313.9 ( 1.00x) apply_bdof_10_8x16_avx2: 321.5 (10.31x) apply_bdof_10_16x8_c: 3306.5 ( 1.00x) apply_bdof_10_16x8_avx2: 200.4 (16.50x) apply_bdof_10_16x16_c: 6659.7 ( 1.00x) apply_bdof_10_16x16_avx2: 402.4 (16.55x) apply_bdof_12_8x16_c: 3305.7 ( 1.00x) apply_bdof_12_8x16_avx2: 321.8 (10.27x) apply_bdof_12_16x8_c: 3258.1 ( 1.00x) apply_bdof_12_16x8_avx2: 198.6 (16.41x) apply_bdof_12_16x16_c: 6600.2 ( 1.00x) apply_bdof_12_16x16_avx2: 392.6 (16.81x) New benchmarks: apply_bdof_8_8x16_c: 3269.9 ( 1.00x) apply_bdof_8_8x16_avx2: 266.5 (12.27x) apply_bdof_8_16x8_c: 3252.9 ( 1.00x) apply_bdof_8_16x8_avx2: 182.6 (17.81x) apply_bdof_8_16x16_c: 6596.7 ( 1.00x) apply_bdof_8_16x16_avx2: 362.7 (18.19x) apply_bdof_10_8x16_c: 3351.3 ( 1.00x) apply_bdof_10_8x16_avx2: 269.0 (12.46x) apply_bdof_10_16x8_c: 3329.1 ( 1.00x) apply_bdof_10_16x8_avx2: 174.5 (19.08x) apply_bdof_10_16x16_c: 6654.3 ( 1.00x) apply_bdof_10_16x16_avx2: 357.8 (18.60x) apply_bdof_12_8x16_c: 3274.1 ( 1.00x) apply_bdof_12_8x16_avx2: 276.0 (11.86x) apply_bdof_12_16x8_c: 3263.5 ( 1.00x) apply_bdof_12_16x8_avx2: 176.8 (18.46x) apply_bdof_12_16x16_c: 6576.4 ( 1.00x) apply_bdof_12_16x16_avx2: 357.8 (18.38x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Andreas Rheinhardt	03b83f8feb	avcodec/x86/vvc/of: Remove redundant instructions m8 here (corresponding to a mix of sgx2 and sgy2 in derive_bdof_vx_vy in the C version) is always nonnegative, so the psignd boils down to a check for m8 being zero. But if an entry of m8 is zero, then the corresponding entry of m9 is automatically zero, too, as sgx2 being zero implies sgxdi being zero and sgy2 implies sgxgy, sgydi being zero.* So just remove these redundant instructions. *: In other words, one could remove the sgx2,sgy2>0 checks from the end of derive_bdof_vx_vy() as long as av_log2(0) is defined. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2026-03-30 13:51:53 +02:00
Ramiro Polla	2517c328fc	swscale/aarch64: add NEON sws_ops backend This commit pieces together the previous few commits to implement the NEON backend for sws_ops. In essence, a tool which runs on the target (sws_ops_aarch64) is used to enumerate all the functions that the backend needs to implement. The list it generates is stored in the repository (ops_entries.c). The list from above is used at build time by a code generator tool (ops_asmgen) to implement all the sws_ops functions the NEON backend supports, and generate a lookup function in C to retrieve the assembly function pointers. At runtime, the NEON backend fetches the function pointers to the assembly functions and chains them together in a continuation-passing style design, similar to the x86 backend. The following speedup is observed from legacy swscale to NEON: A520: Overall speedup=3.780x faster, min=0.137x max=91.928x A720: Overall speedup=4.129x faster, min=0.234x max=92.424x And the following from the C sws_ops implementation to NEON: A520: Overall speedup=5.513x faster, min=0.927x max=14.169x A720: Overall speedup=4.786x faster, min=0.585x max=20.157x The slowdowns from legacy to NEON are the same for C/x86. Mostly low bit-depth conversions that did not perform dithering in legacy. The 0.585x outlier from C to NEON is gbrpf32le -> gbrapf32le, which is mostly memcpy with the C implementation. All other conversions are better. Sponsored-by: Sovereign Tech Fund Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>	2026-03-30 11:38:35 +00:00

1 2 3 4 5 ...

123802 Commits