Commit Graph

53896 Commits

Author SHA1 Message Date
Ruikai Peng
e90c2ff4b5 avcodec/libdav1d: fix heap overflow in US ITU-T T.35 metadata parsing
The US country_code path in parse_itut_t35_metadata() reads the
the provider_code with bytestream2_get_be16u(), which is a
unchecked version that does not validate the remaining
length before reading. When an AV1 stream contains ITU-T T.35
metadata with country_code set to 0xB5 (which is US) and a
payload shorter than 2 bytes, this results in a heap overflow
reading 2 bytes past the allocation.

The UK country code already guards against this issue by
checking it before the unchecked read. We're using the same
pattern to the US country code path.

Pwno crafted an AV1 IVF with a metadata OBU containing ITU-T T.35
with country_code=0xB5 and a 1-byte payload. Decoding with libdav1d
triggers the overflow. ASan says:

ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 2 at 0x5020000003f0 thread T0
  #0 bytestream_get_be16 src/libavcodec/bytestream.h:98
  #1 bytestream2_get_be16u src/libavcodec/bytestream.h:98
  #2 parse_itut_t35_metadata src/libavcodec/libdav1d.c:376

0x5020000003f1 is located 0 bytes after 1-byte region

Found-by: Pwno
2026-04-06 23:39:40 +00:00
James Almer
757cc97790 avcodec/lcevcdec: support differing base and enhancement bitdepths
Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-06 14:07:59 -03:00
James Almer
3a2eae155d avcodec/lcevcdec: add 14bit pixel formats
Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-06 14:07:59 -03:00
James Almer
01b0b86225 avcodec/lcevc_parser: move pixel format table to a shared file
Signed-off-by: James Almer <jamrial@gmail.com>
2026-04-06 14:07:59 -03:00
Andreas Rheinhardt
7fd2be97b9 avcodec/x86/h264_chromamc: Avoid mmx in chroma_mc8_ssse3 functions
No impact on performance here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-06 11:28:49 +02:00
Andreas Rheinhardt
e1297f3080 avcodec/x86/h264_idct: Use tmp reg in SUMSUB_BA if possible
It allows to exchange a paddw by a movdqa.

Old benchmarks:
idct8_add4_8bpp_c:                                     664.6 ( 1.00x)
idct8_add4_8bpp_sse2:                                  142.2 ( 4.67x)
idct8_add_8bpp_c:                                      215.5 ( 1.00x)
idct8_add_8bpp_sse2:                                    35.1 ( 6.14x)

New benchmarks:
idct8_add4_8bpp_c:                                     666.9 ( 1.00x)
idct8_add4_8bpp_sse2:                                  135.3 ( 4.93x)
idct8_add_8bpp_c:                                      217.7 ( 1.00x)
idct8_add_8bpp_sse2:                                    34.0 ( 6.41x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-06 11:28:49 +02:00
Andreas Rheinhardt
ed116bab02 avcodec/x86/me_cmp: Use tmp reg in SUMSUB_BA if possible
It allows to exchange a paddw by a movdqa.

Old benchmarks:
hadamard8_diff_0_c:                                    366.1 ( 1.00x)
hadamard8_diff_0_sse2:                                  56.4 ( 6.49x)
hadamard8_diff_0_ssse3:                                 53.0 ( 6.90x)
hadamard8_diff_1_c:                                    183.0 ( 1.00x)
hadamard8_diff_1_sse2:                                  28.0 ( 6.53x)
hadamard8_diff_1_ssse3:                                 26.0 ( 7.03x)

New benchmarks:
hadamard8_diff_0_c:                                    371.4 ( 1.00x)
hadamard8_diff_0_sse2:                                  55.0 ( 6.76x)
hadamard8_diff_0_ssse3:                                 49.5 ( 7.50x)
hadamard8_diff_1_c:                                    183.4 ( 1.00x)
hadamard8_diff_1_sse2:                                  26.8 ( 6.85x)
hadamard8_diff_1_ssse3:                                 23.1 ( 7.92x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-06 11:28:49 +02:00
Andreas Rheinhardt
da59f288c6 avcodec/hevc/dsp_template: Add restrict to add_residual functions
Allows the compiler to optimize the the aliasing checks away
and saves 5376B here (GCC 15, -O3).
Also, avoid converting the stride to uint16_t for >8bpp:
stride /= sizeof(pixel) will use an unsigned division
(i.e. a logical right shift)*, which is not what is intended here.

*: If size_t is the corresponding unsigned type to ptrdiff_t

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-06 11:28:49 +02:00
Andreas Rheinhardt
759512d36a avcodec/x86/cavsidct: Use tmp reg in SUMSUB_BA if possible
It allows to exchange a paddw by a movdqa.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-06 11:28:49 +02:00
Andreas Rheinhardt
8b700fad94 avcodec/mpegvideoencdsp: Add restrict to shrink
Makes GCC avoid creating the aliasing fallback path
and saves 1280B of .text here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-06 10:39:17 +02:00
Andreas Rheinhardt
6e95052ac2 avcodec/x86/mpegvideoenc_template: Avoid indirect call
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-06 10:39:17 +02:00
Zhao Zhili
eedf8f0165 avcodec/hevc: workaround hevc-alpha videos generated by VideoToolbox
Apple VideoToolbox is the dominant producer of hevc-alpha videos, but
early versions generates non-standard VPS extensions that fail to
parse and return AVERROR_INVALIDDATA. Fix this by returning
AVERROR_PATCHWELCOME instead of AVERROR_INVALIDDATA for unsupported
VPS extension configurations. Setting poc_lsb_not_present for the
alpha layer in the fallback path when it has no direct dependency
on the base layer, so that IDR slices on the alpha layer won't
incorrectly read pic_order_cnt_lsb.

Fix #22384

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-04-01 22:54:36 +08:00
Zhao Zhili
bba9bf7e7e avcodec/libdav1d: fix null pointer dereference in LCEVC side data handling
ff_frame_new_side_data() may set sd to NULL and return 0 when
side_data_pref() determines that existing side data should be
preferred.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-04-01 14:17:27 +00:00
Zhao Zhili
f9d289020d avcodec/av1dec: fix null pointer dereference in LCEVC side data handling
ff_frame_new_side_data() may set sd to NULL and return 0 when
side_data_pref() determines that existing side data should be
preferred.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-04-01 14:17:27 +00:00
Michael Niedermayer
ddcb9dd3b5 avcodec/aac/aacdec_usac: Implement missing bits of otts_bands_phase and residual_bands computation
Fixes: out of array access
Fixes: matejsmycka/poc.mp4

Introducing commit: `baad75cafa6bac298b72c177f657a2eb8e31cff1` — "aacdec_usac: add support for parsing Mpsp212 (MPEG surround)", 2025-11-17.

Found-by: Matěj Smyčka <matejsmycka@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-03-31 22:29:18 +00:00
Lynne
9c04a40136 vulkan/ffv1: implement floating-point decoding
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Lynne
f5054f726d ffv1enc_vulkan: implement floating-point encoding
Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Lynne
29b8614e62 vulkan/ffv1: fix bitstream initialization for Golomb
Was broken when we switched to descriptors.

Sponsored-by: Sovereign Tech Fund
2026-03-31 23:47:45 +02:00
Andreas Rheinhardt
c1aed85491 avcodec/x86/h264_idct: Avoid spilling register unnecessarily
It is only needed in the unlikely codepath. The ordinary one
only uses six xmm registers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-31 17:31:58 +02:00
Araz Iusubov
3b55818764 avcodec/amfdec: set context dimensions from decoder size 2026-03-31 14:07:31 +00:00
Jun Zhao
89c21b5ab7 lavc/hevc: add aarch64 NEON for Planar prediction
Add NEON-optimized implementation for HEVC intra Planar prediction at
8-bit depth, supporting all block sizes (4x4 to 32x32).

Planar prediction implements bilinear interpolation using an incremental
base update: base_{y+1}[x] = base_y[x] - (top[x] - left[N]), reducing
per-row computation from 4 multiply-adds to 1 subtract + 1 multiply.
Uses rshrn for rounded narrowing shifts, eliminating manual rounding
bias. All left[y] values are broadcast in the NEON domain, avoiding
GP-to-NEON transfers.

4x4 interleaves row computations across 4 rows to break dependencies.
16x16 uses v19-v22 for persistent base/decrement vectors, avoiding
callee-saved register spills. 32x32 processes 8 rows per loop iteration
(4 iterations total) to reduce code size while maintaining full NEON
utilization.

Speedup over C on Apple M4 (checkasm --bench):

    4x4: 2.25x    8x8: 6.40x    16x16: 9.72x    32x32: 3.21x

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-03-30 14:32:10 +00:00
Jun Zhao
60b372c934 lavc/hevc: add aarch64 NEON for DC prediction
Add NEON-optimized implementation for HEVC intra DC prediction at 8-bit
depth, supporting all block sizes (4x4 to 32x32).

DC prediction computes the average of top and left reference samples
using uaddlv, with urshr for rounded division. For luma blocks smaller
than 32x32, edge smoothing is applied: the first row and column are
blended toward the reference using (ref[i] + 3*dc + 2) >> 2 computed
entirely in the NEON domain. Fill stores use pre-computed address
patterns to break dependency chains.

Also adds the aarch64 initialization framework (Makefile, pred.c/pred.h
hooks, hevcpred_init_aarch64.c).

Speedup over C on Apple M4 (checkasm --bench):

    4x4: 2.28x    8x8: 3.14x    16x16: 3.29x    32x32: 3.02x

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-03-30 14:32:10 +00:00
nyanmisaka
87b7e578ec avcodec/amfenc: add encoder average QP stats
This allows for real-time monitoring of the encoder's average QP in ffmpeg CLI.

Signed-off-by: nyanmisaka <nst799610810@gmail.com>
2026-03-30 13:23:56 +00:00
Andreas Rheinhardt
3a1e63e007 avcodec/x86/vvc/alf: Avoid zeroing unnecessarily
In case of >8bpp, there is already a zero register available
(for clipping); in case of Unix64, one can simply use an
unused register. Doing so reduces codesize.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
8901f858eb avcodec/x86/vvc/alf: Hoist creating shift register out of loop
Possible now that this function no longer uses unnecessarily many
registers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
a3dfc511a5 avcodec/x86/vvc/alf: Don't push+pop unused register
This function only uses 14 GPRs.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
5de2c4c89e avcodec/x86/vvc/alf: Avoid reload
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
d727b7a64e avcodec/x86/vvc/alf: Avoid modifying nonvolatile registers
Avoids push+pop on Win64; in any case, using registers m0-m7
more often saves codesize.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
b1d6f31d65 avcodec/x86/vvc/alf: Use correct shift amount
Fixes a bug in 94f9ad8061.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
2cce9a8279 avcodec/x86/vvc/alf: Avoid modifying nonvolatile registers
Avoids push+pop on Win64; in any case, using registers m0-m7
more often saves codesize.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
cb1ffc58ca avcodec/x86/vvc/of: Don't use ymm regs where xmm are sufficient
Also use a register in the 0-7 range as clobber reg,
as this reduces codesize (by 51B).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
1785542a80 avcodec/x86/vvc/of: Don't add to zero
Instead rewrite the code to use assignment. Saves zeroing and
additions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
06fa26d2e8 avcodec/x86/vvc/of: Deduplicate common code
The height 8 and 16 cases differ from the second BDOF mini block onwards,
but even the beginning of said mini block is the same and can therefore
be deduplicated. This saves 821B here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
002b3bc1b3 avcodec/x86/vvc/of: Avoid punpckldq
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
ada58bd0e2 avcodec/x86/vvc/of: Use xmm registers where sufficient
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
ad34eb2ae6 avcodec/x86/vvc/of: Correct comment
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
2570f5d307 avcodec/x86/vvc/of: Avoid scalar log2
Instead convert the integers to floats and inspect the exponent.

Old benchmarks:
apply_bdof_8_8x16_c:                                  3295.2 ( 1.00x)
apply_bdof_8_8x16_avx2:                                312.7 (10.54x)
apply_bdof_8_16x8_c:                                  3269.1 ( 1.00x)
apply_bdof_8_16x8_avx2:                                203.6 (16.05x)
apply_bdof_8_16x16_c:                                 6584.8 ( 1.00x)
apply_bdof_8_16x16_avx2:                               413.6 (15.92x)
apply_bdof_10_8x16_c:                                 3313.9 ( 1.00x)
apply_bdof_10_8x16_avx2:                               321.5 (10.31x)
apply_bdof_10_16x8_c:                                 3306.5 ( 1.00x)
apply_bdof_10_16x8_avx2:                               200.4 (16.50x)
apply_bdof_10_16x16_c:                                6659.7 ( 1.00x)
apply_bdof_10_16x16_avx2:                              402.4 (16.55x)
apply_bdof_12_8x16_c:                                 3305.7 ( 1.00x)
apply_bdof_12_8x16_avx2:                               321.8 (10.27x)
apply_bdof_12_16x8_c:                                 3258.1 ( 1.00x)
apply_bdof_12_16x8_avx2:                               198.6 (16.41x)
apply_bdof_12_16x16_c:                                6600.2 ( 1.00x)
apply_bdof_12_16x16_avx2:                              392.6 (16.81x)

New benchmarks:
apply_bdof_8_8x16_c:                                  3269.9 ( 1.00x)
apply_bdof_8_8x16_avx2:                                266.5 (12.27x)
apply_bdof_8_16x8_c:                                  3252.9 ( 1.00x)
apply_bdof_8_16x8_avx2:                                182.6 (17.81x)
apply_bdof_8_16x16_c:                                 6596.7 ( 1.00x)
apply_bdof_8_16x16_avx2:                               362.7 (18.19x)
apply_bdof_10_8x16_c:                                 3351.3 ( 1.00x)
apply_bdof_10_8x16_avx2:                               269.0 (12.46x)
apply_bdof_10_16x8_c:                                 3329.1 ( 1.00x)
apply_bdof_10_16x8_avx2:                               174.5 (19.08x)
apply_bdof_10_16x16_c:                                6654.3 ( 1.00x)
apply_bdof_10_16x16_avx2:                              357.8 (18.60x)
apply_bdof_12_8x16_c:                                 3274.1 ( 1.00x)
apply_bdof_12_8x16_avx2:                               276.0 (11.86x)
apply_bdof_12_16x8_c:                                 3263.5 ( 1.00x)
apply_bdof_12_16x8_avx2:                               176.8 (18.46x)
apply_bdof_12_16x16_c:                                6576.4 ( 1.00x)
apply_bdof_12_16x16_avx2:                              357.8 (18.38x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
Andreas Rheinhardt
03b83f8feb avcodec/x86/vvc/of: Remove redundant instructions
m8 here (corresponding to a mix of sgx2 and sgy2 in derive_bdof_vx_vy
in the C version) is always nonnegative, so the psignd boils down to
a check for m8 being zero. But if an entry of m8 is zero, then
the corresponding entry of m9 is automatically zero, too, as sgx2
being zero implies sgxdi being zero and sgy2 implies sgxgy, sgydi
being zero.* So just remove these redundant instructions.

*: In other words, one could remove the sgx2,sgy2>0 checks from
the end of derive_bdof_vx_vy() as long as av_log2(0) is defined.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-30 13:51:53 +02:00
James Almer
ad7d270935 avcodec/libdav1d: call ff_attach_decode_data() on output frames
This will allow the injection of LCEVC side data.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-03-28 22:07:54 -03:00
James Almer
823c6fc0b8 avcodec/decode: make LCEVC injection available to decoders that don't call ff_get_buffer()
Signed-off-by: James Almer <jamrial@gmail.com>
2026-03-28 22:07:54 -03:00
James Almer
8528c697c7 avcodec/av1dec: add support for LCEVC ITU-T35 payloads
Signed-off-by: James Almer <jamrial@gmail.com>
2026-03-28 22:07:54 -03:00
James Almer
4c7a8df34d avcodec/av1dec: refactor parsing ITU-T35 metadata
Use a switch case. Will be useful in the following commit.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-03-28 22:07:54 -03:00
James Almer
29d8c2af4d avcodec/libdav1d: add support for LCEVC ITU-T35 payloads
Signed-off-by: James Almer <jamrial@gmail.com>
2026-03-28 22:07:54 -03:00
James Almer
fe1ffd63fb avcodec/libdav1d: refactor parsing ITU-T35 metadata
Use a switch case. Will be useful in the following commit.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-03-28 22:07:54 -03:00
Andreas Rheinhardt
1a7979a2f8 avcodec/x86/h26x/h2656_inter: Simplify splatting coefficients
For pre-AVX2, vpbroadcastw is emulated via a load, followed
by two shuffles. Yet given that one always wants to splat
multiple pairs of coefficients which are adjacent in memory,
one can do better than that: Load all of them at once, perform
a punpcklwd with itself and use one pshufd per register.
In case one has to sign-extend the coefficients, too,
one can replace the punpcklwd with one pmovsxbw (instead of one
per register) and use pshufd directly afterwards.

This saved 4816B of .text here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-29 01:05:23 +01:00
Andreas Rheinhardt
a72b00675c avcodec/x86/h26x/h2656_inter: Don't prepare unused coeffs for hv funcs
8 tap motion compensation functions with both vertical and horizontal
components are under severe register pressure, so that the filter
coefficients have to be put on the stack. Before this commit,
this meant that coefficients for use with pmaddubsw and pmaddwd
were always created. Yet this is completely unnecessary, as
every such register is only used for exactly one purpose and
it is known at compile time which one it is (only 8bit horizontal
filters are used with pmaddubsw), so only prepare that one.
This also allows to half the amount of stack used.

This saves 2432B of .text here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-29 01:05:23 +01:00
Andreas Rheinhardt
88870f33ab avcodec/x86/h26x/h2656_inter: Remove always-true checks
It has already been checked before that we are only dealing
with high bitdepth here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-29 01:05:23 +01:00
Andreas Rheinhardt
c00721310f avcodec/x86/hevc/deblock: Avoid vmovdqa
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-29 01:05:23 +01:00
Andreas Rheinhardt
4c179adeaf avcodec/Makefile: Add avformat->h2645_parse.o lcevctab.o dependencies
Fixes static --disable-everything builds.
Forgotten in 053822d9ce
and 49c449b33a.

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-28 23:25:31 +01:00
Andreas Rheinhardt
e91727e7ef avcodec/x86/mpeg4videodsp: Fix build failure without x86asm
Since ba793127c4,
the x86 mpeg4videodsp code uses ff_emulated_edge_mc_sse2()
instead of ff_emulated_edge_mc_8. This leads to linker errors
when x86asm is disabled. Fix this by also falling back to ff_gmc_c()
in case edge emulation is needed with external SSE2 being unavailable.

An alternative is to go back to ff_emulated_edge_mc_8(), but this
would readd the uglyness to videodsp for a niche case.

Reported-by: James Almer <jamrial@gmail.com>
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-03-28 22:39:05 +01:00