Commit Graph

53414 Commits

Author SHA1 Message Date
Andreas Rheinhardt
dbdf514c17 avcodec/x86/h264_deblock_10bit: Remove custom stack allocation code
Allocate it via cglobal as usual. This makes the SSE2/AVX functions
available when HAVE_ALIGNED_STACK is false; it also avoids
modifying rsp unnecessarily in the deblock_h_luma_intra_10 functions
on Win64.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
b1140d3c98 avcodec/x86/h264_deblock: Remove obsolete macro parameters
They are a remnant of the MMX functions (which processed
only eight pixels at a time, so that it was called twice
via a wrapper; the actual MMX function had "v8" in its name
instead of simply v) which have been removed in commit
4618f36a24.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
899475326b avcodec/x86/h264_deblock: Simplify splatting
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
a22149ab3d avcodec/x86/h264_deblock: Remove always-false branches
These functions are always called with alpha and beta > 0.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
982244818b avcodec/x86/h264_deblock: Remove unused macros
Forgotten in 4618f36a24.
Also remove a PASS8ROWS wrapper that seems to have been always
unused.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:53:21 +01:00
Andreas Rheinhardt
6e65d1c945 avcodec/motion_est: Fix left shifts of negative numbers
Fixes ticket #21486.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-25 22:46:39 +01:00
Jun Zhao
8966101fa6 lavc/hevc: add aarch64 neon for 12-bit dequant
Implement NEON optimization for HEVC dequant at 12-bit depth.

For 12-bit: shift = 15 - 12 - log2_size = 3 - log2_size. When shift
is negative, we use shl (shift left) instead of srshr.

Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_12_c:                                   9.9 ( 1.00x)
hevc_dequant_4x4_12_neon:                                5.7 ( 1.74x)

hevc_dequant_8x8_12_c:                                   1.7 ( 1.00x)
hevc_dequant_8x8_12_neon:                                1.3 ( 1.30x)

hevc_dequant_16x16_12_c:                               131.1 ( 1.00x)
hevc_dequant_16x16_12_neon:                              7.9 (16.52x)

hevc_dequant_32x32_12_c:                                69.7 ( 1.00x)
hevc_dequant_32x32_12_neon:                             28.4 ( 2.46x)

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-25 06:55:26 +00:00
Jun Zhao
ce89d974c8 lavc/hevc: add aarch64 neon for 10-bit dequant
Implement NEON optimization for HEVC dequant at 10-bit depth.

For 10-bit: shift = 15 - 10 - log2_size = 5 - log2_size

Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_10_c:                                  16.6 ( 1.00x)
hevc_dequant_4x4_10_neon:                                7.4 ( 2.23x)

hevc_dequant_8x8_10_c:                                  39.7 ( 1.00x)
hevc_dequant_8x8_10_neon:                                7.5 ( 5.28x)

hevc_dequant_16x16_10_c:                               168.7 ( 1.00x)
hevc_dequant_16x16_10_neon:                             10.2 (16.56x)

hevc_dequant_32x32_10_c:                                 1.9 ( 1.00x)
hevc_dequant_32x32_10_neon:                              1.9 ( 1.01x)

Note: 32x32 shift=0 is identity transform (no-op), so NEON has no
advantage over C which is also optimized away by the compiler.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-25 06:55:26 +00:00
Jun Zhao
24f296c7a1 lavc/hevc: optimize dequant for shift=0 case (identity transform)
The HEVC dequantization uses:
  shift = 15 - bit_depth - log2_size

When shift equals 0, the operation becomes an identity transform:
  - For shift > 0: output = (input + offset) >> shift
  - For shift < 0: output = input << (-shift)
  - For shift = 0: output = input << 0 = input (no change)

This occurs in the following cases:
  - 10-bit, 32x32 block: shift = 15 - 10 - 5 = 0
  - 12-bit, 8x8 block:   shift = 15 - 12 - 3 = 0

Previously, the code would still iterate through all coefficients
and perform redundant read-modify-write operations even when shift=0.

This patch adds an early return for shift=0, avoiding unnecessary
memory operations. checkasm benchmarks on Apple M4 show:
  - 10-bit 32x32: 69.1 -> 1.6 cycles (43x faster)
  - 12-bit 8x8:   30.9 -> 1.7 cycles (18x faster)

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-25 06:55:26 +00:00
Jun Zhao
0886e50c6b lavc/hevc: add aarch64 neon for 8-bit dequant
Implement NEON optimization for HEVC dequant at 8-bit depth.

The NEON implementation uses srshr (Signed Rounding Shift Right) which
does both the add with offset and right shift in a single instruction.

Optimization details:
- 4x4 (16 coeffs): Single load-process-store sequence
- 8x8 (64 coeffs): Fully unrolled, no loop overhead
- 16x16 (256 coeffs): Pipelined load/compute/store to hide memory latency
- 32x32 (1024 coeffs): Pipelined with all available NEON registers

Performance benchmark on Apple M4:
./tests/checkasm/checkasm --test=hevc_dequant --bench
hevc_dequant_4x4_8_c:                                   11.3 ( 1.00x)
hevc_dequant_4x4_8_neon:                                 6.3 ( 1.78x)

hevc_dequant_8x8_8_c:                                   33.9 ( 1.00x)
hevc_dequant_8x8_8_neon:                                 6.6 ( 5.11x)

hevc_dequant_16x16_8_c:                                153.8 ( 1.00x)
hevc_dequant_16x16_8_neon:                               9.0 (17.02x)

hevc_dequant_32x32_8_c:                                 78.1 ( 1.00x)
hevc_dequant_32x32_8_neon:                              31.9 ( 2.45x)

Note on Performance Anomaly:
The observation that hevc_dequant_32x32_8_c is faster than 16x16 (78.1 vs 153.8)
is due to Clang auto-vectorizing only for sizes >= 32x32.
Compiler: Apple clang version 17.0.0 (clang-1700.6.3.2)

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-25 06:55:26 +00:00
Zhao Zhili
1e1dde8798 avcodec/libx265: map ffmpeg log level to x265 log level
Previously x265 encoder used its default log level regardless of
FFmpeg's log level setting. Note the log level can be overwritten
by x265-params.

Fix #21462

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-01-25 13:09:30 +08:00
Carl Eugen Hoyos
aab0c23cb8 lavc/j2kdec: Do not ignore colour association for packed formats
Fixes ticket #9468.

Signed-off-by: Carl Eugen Hoyos <ceffmpeg@gmail.com
2026-01-24 20:25:05 +00:00
Christopher Degawa
a5d4c398b4 avcodec/libsvtav1: rename aq_mode for v4.0.0
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-23 23:07:18 -03:00
Lynne
8349565d52 aacsbr_template: fix SBR USAC coupling
This issue hid under the radar since the codebooks between coupling
modes very often result in identical bit counts regardless of the encoded
data, leading to no frame-level bitstream desyncs except in rare cases.

AAC Mps212 data is parsed immediately after the SBR data, where a loss
of sync in SBR will result in Mps212 being wildly different.
2026-01-23 14:40:52 +01:00
Ling, Edison
a93cb79da2 avcodec/d3d12va_encode: Bug fix and refactor for motion estimation precision initialization
Move motion estimation precision check from standalone
`d3d12va_encode_init_motion_estimation_precision()` function into each
codec's init_sequence_params() to reuse existing feature support
queries.

 - fixes AV1 using wrong support structure (SUPPORT instead of SUPPORT1)
 - eliminates duplicate setup code
 - removes redundant CheckFeatureSupport API call
 - no intended functional changes other than bug fix
2026-01-23 13:25:55 +00:00
James Almer
dd2976b9e1 avcodec/mlp: don't duplicate the AV_CRC_8_EBU table
Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-22 17:44:46 -03:00
Hyunjun Ko
b637624046 avcodec/vulkan_av1: fix mi_col_starts and mi_row_starts units
The spec says:
   pMiColStarts is a pointer to an array of TileCols number
   of unsigned integers that corresponds to MiColStarts
   defined in section 6.8.14 of the [AV1 Specification]

And the unit of MiColStarts is MI(mode info).

So is pMiRowStarts.
2026-01-21 10:42:02 +00:00
Ramiro Polla
96d8e19720 avcodec/mjpegdec: fix segfault on extern_huff and no extradata
Regression since 1debadd58e.
2026-01-21 03:26:02 +00:00
Werner Robitza
d25d133df3 avcodec/libx265: add pass and x265-stats option
Add support for standard -pass and -passlogfile options, matching the behavior
of libx264.
Add the -x265-stats option to specify the stats filename.
Update documentation.

Signed-off-by: Werner Robitza <werner.robitza@gmail.com>
2026-01-20 10:10:26 +00:00
Manuel Lauss
d244d438c3 avcodec/sanm: fix BL16 c1/7 source overread
Fix the required size calculation.

Reported-by: Ruikai Peng <ruikai@pwno.io>
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
2026-01-20 09:47:47 +00:00
Andreas Rheinhardt
94b7385592 avcodec/mlpenc: Mark unreachable cases as such
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-20 00:38:35 +00:00
Ramiro Polla
e960f0aa01 avcodec/mjpegdec: remove qscale_table field from MJpegDecodeContext
This field has been unused since 759001c534.
2026-01-19 22:42:09 +00:00
Marton Balint
387a522106 avcodec/libvpxdec: use codec capabilities to determine if external frame buffer can be used
Previously we used the codec or at the time of decoding fb_priv for this, but
fb_priv can be nonzero even if an external frame buffer is not set, so it's
cleaner to use the capability flag directly.

Also check the result of vpx_codec_set_frame_buffer_functions.

Signed-off-by: Marton Balint <cus@passwd.hu>
2026-01-19 21:32:00 +00:00
Marton Balint
a2688827f4 avcodec/libvpxdec: cache the decoder interface
This saves us some #ifdefry.

Signed-off-by: Marton Balint <cus@passwd.hu>
2026-01-19 21:32:00 +00:00
Marton Balint
a6069092af avcodec/libvpxenc: log the error message from the correct encoder
It is possible that the error happens with the alpha encoder, not the normal
one, so let's always pass the affected encoder to the logging function.

Signed-off-by: Marton Balint <cus@passwd.hu>
2026-01-19 21:32:00 +00:00
Michael Niedermayer
fc8a614f3d avcodec/omx: Check extradata size and nFilledLen
No testcase, its unknown if this is a real issue

Reported-by: Peter Teoh <htmldeveloper@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-01-19 20:47:22 +00:00
Michael Niedermayer
09ec2b397a avcodec/exr: use av_realloc_array()
Related to: #YWH-PGM40646-33
See: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21347
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2026-01-19 20:41:04 +00:00
Lynne
3ccafa5906 ffv1_vulkan: generate a CRC table during runtime
Since the recent CRC changes, get_table returns arch-dependent tables.
2026-01-19 16:37:17 +01:00
Lynne
e3a96a69cb vulkan_dpx: remove host image upload path
The main reason this was written was due to Nvidia. Nvidia always
has a fickle upload path, and seemed to have a shortcut for the
host image upload path. This seems to have been patched out of
recent driver versions.

This upload path relies on the driver keeping the same layout,
down to the stride for the images. Which is an assumption that's
not portable.

Rather than relying on this fickle upload path, what we'd like when
we want pure bandwidth is to decouple uploads to a separate queue,
and let the GPU pull the data from RAM via uploads.

It'll be slower with a single-threaded decoder, but currently all
of our compute-based decoders and the decoders that sit underneath
them support frame threading.
2026-01-19 16:37:17 +01:00
Lynne
713e3c4f91 vulkan_decode: do not align single-plane images to subsampling
Unlike multiplane images, single-plane images do not need to be
aligned to chroma width.
Saves a bit of memory.
2026-01-19 16:37:16 +01:00
Lynne
8dcf02ac63 vulkan: remove IS_WITHIN macro
This is the more correct GLSL solution.
2026-01-19 16:37:15 +01:00
Araz Iusubov
850436a517 avcodec/amfenc: fix async_depth deadlock with lookahead
AMF encoders may deadlock when lookahead > async_depth.
Automatically adjust async_depth to lookahead + 1 to prevent hangs.
2026-01-19 15:36:37 +00:00
Gyan Doshi
43dbc011fa avcodec/bsf/setts: add option prescale
When prescale is enabled, time fields are converted to the output
timebase before expression evaluation. This allows option specification
even if the input timebase is unknown.
2026-01-19 16:51:47 +05:30
Gyan Doshi
1ccd2f6243 avcodec/bsf/setts: rescale TS when changing TB
The setts bsf has an option to change TB. However the filter only
changed the TB and did not rescale the ts and duration, so it
effectively and silently stretched or squeezed the stream.

The pts, dts and duration are now rescaled to maintain temporal fidelity.
2026-01-19 16:51:31 +05:30
Zhao Zhili
8f9700bff0 avcodec/d3d12va_encode_h264: simplify deblock option to bool type
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2026-01-19 09:14:06 +00:00
Andreas Rheinhardt
063684efec avcodec/mlp: Don't use internals of CRC API
ff_mlp_restart_checksum() used the (undocumented) layout
of the CRC tables and therefore broke on x86 when the
clmul implementation added in dc03cffe9c
is used. This commit fixes this (and issue #21506).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-01-19 00:30:20 +01:00
Timo Rothenpieler
7379539685 avcodec/prores_raw: use av_popcount instead of limited lookup table
The calculation can yield bigger values than the lookup table provides.
So just use actual popcount instead.

Fixes #21503
2026-01-18 18:52:55 +01:00
James Almer
685ceebd42 avcodec/vc1dec: fix memory leak on error
Regression since 8a1c2779a0.

Fixes CID 732271.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-17 17:56:06 -03:00
averne
3829f4ba6a vulkan/prores: reduce push constants size
The VK specs only mandates 128B, and some platforms
don't actually implement more.  This moves the quantization
matrices to the per-frame buffer.
2026-01-17 17:33:31 +00:00
James Almer
f8e39f6c73 avcodec/hevc/ps: add missing check for profile tier level count
Fixes issue #21488.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-17 12:37:47 -03:00
James Almer
f311969c03 avcodec/qdm2: propagate error values in the entire decoder
And add missing error value checks.

Fixes the rest of of issue #21476.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-17 12:03:51 -03:00
James Almer
1ffcd07400 avcodec/mimic: check return value of init_get_bits()
Fixes part of issue #21476.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-17 12:02:31 -03:00
James Almer
bb29b51876 avcodec/vc1dec: don't overwrite error codes returned by init_get_bits8()
Done by mistake in 8a1c2779a0.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-17 11:01:21 -03:00
Ling, Edison
c3d3377fe1 avcodec/d3d12va_encode: Add H264 deblock filter parameter support
add parameter `deblock` for users to explicitly enable/disable deblocking filter in d3d12 H264 encoding

usage:
-deblock enable or -deblock 1
-deblock disable or -deblock 0
-deblock auto or -deblock -1

sample command line:
```
.\ffmpeg.exe -hwaccel d3d12va -hwaccel_output_format d3d12 -i input.mp4 -c:v h264_d3d12va -deblock enable -y output.mp4
```
2026-01-16 07:03:37 +00:00
Ruikai Peng
be82aef7cc lavc/aacdec_usac: fix CPE channel index in ff_aac_usac_reset_state()
fix a simple index bug in ff_aac_usac_reset_state()
that writes past the end of ChannelElement.ch[2] for CPE

ff_aac_usac_reset_state() loops over channels with j < ch, but
incorrectly takes &che->ch[ch]. For CPE (ch == 2) this becomes
che->ch[2], which is one past the end of ChannelElement.ch[2], and the
subsequent memset() causes an intra-object out-of-bounds write.

index the channel element with the loop variable (j).
2026-01-15 19:32:52 +00:00
James Almer
8a1c2779a0 avcodec/vc1dec: check return values of all init_get_bits() calls
And replace them with init_get_bits8, to prevent integer overflows on huge
values.

Fixes issue #21463.

Signed-off-by: James Almer <jamrial@gmail.com>
2026-01-15 16:07:46 -03:00
Jun Zhao
b326b3a08d lavc/av1_parser: Extract SAR from render_size
Extract the Sample Aspect Ratio (SAR) from render_width_minus_1 and
render_height_minus_1 in the sequence header.

The AV1 specification defines the render dimensions, which can be used
in conjunction with the coded dimensions to determine the pixel aspect
ratio. This ensures consistent aspect ratio handling for AV1 streams
encapsulated in containers like MP4 or MKV, as observed in the updated
FATE tests where SAR changes from 0/1 to 1/1.

Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
2026-01-14 23:56:39 +00:00
Lynne
e51c549f6e vulkan/dpx: drop using the nontemporal extension
Its rarely respected by implementations, its fairly new (1 year old),
and it has a scuffed define (neither glslc nor glslang enable the
"GL_EXT_nontemporal_keyword" define if its enabled, unlike all other extensions).
2026-01-14 16:13:22 +01:00
Lynne
f2a55af9a4 vulkan_dpx: switch to compile-time SPIR-V generation 2026-01-12 17:28:43 +01:00
Lynne
0f4667fc11 vulkan_prores_raw: clean up and optimize 2026-01-12 17:28:42 +01:00