This is a bit more forward-facing than a bare allocation, and importantly,
allows the `swscale/utils.c` code to remain agnostic about how to correctly
uninit this struct.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
By excluding the Vulkan makefile entirely when --disable-unstable is passed.
This also correctly avoids compiling e.g. unused GLSL compilers.
Fixes: #22295
See-Also: #22366
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Use PRIu32/PRIX32 format specifiers instead of %d/%u/%X for uint32_t
variables in av_log calls. On some platforms (e.g. NuttX), uint32_t is
typedef'd as unsigned long rather than unsigned int, which triggers
-Wformat warnings despite both types being 4 bytes. Using PRI macros
is the portable way to match the actual underlying type of uint32_t.
Signed-off-by: zengshuang <zengshuang@xiaomi.com>
Add NEON alpha drop/insert using ldp+tbl+stp instead of ld4/st3 and
ld3/st4 structure operations. Both use a 2-register sliding-window
tbl with post-indexed addressing. Instruction scheduling targets
narrow in-order cores (A55) while remaining neutral on wide OoO.
Scalar tails use coalesced loads/stores (ldr+strh+lsr+strb for alpha
drop, ldrh+ldrb+orr+str for alpha insert) to reduce per-pixel
instruction count. Independent instructions placed between loads and
dependent operations to fill load-use latency on in-order cores.
checkasm --bench on Apple M3 Max (decicycles, 1920px):
rgb32tobgr24_c: 114.4 ( 1.00x)
rgb32tobgr24_neon: 64.3 ( 1.78x)
rgb24tobgr32_c: 128.9 ( 1.00x)
rgb24tobgr32_neon: 80.9 ( 1.59x)
C baseline is clang auto-vectorized; speedup is over compiler NEON.
Signed-off-by: David Christle <dev@christle.is>
Add checkasm coverage for rgb32tobgr24 (alpha drop) and rgb24tobgr32
(alpha insert) with test widths exercising all code tiers and
overwrite detection via sentinel bytes.
Signed-off-by: David Christle <dev@christle.is>
Add a NEON rgb24tobgr24 using ld3/st3 to swap R and B channels in
packed 24bpp RGB buffers. Handles all input sizes with a 16-pixel
NEON fast path, 8-pixel NEON cleanup, and scalar tail.
checkasm --bench on Apple M3 Max (1920*3 = 5760 bytes):
rgb24tobgr24_c: 722.0 ( 1.00x)
rgb24tobgr24_neon: 94.9 ( 7.61x)
Signed-off-by: David Christle <dev@christle.is>
Add checkasm coverage for rgb24tobgr24 with test widths exercising
all code tiers (scalar, 8-pixel NEON, 16-pixel NEON).
Signed-off-by: David Christle <dev@christle.is>
This was calling atoi() on `p + offset`, which is nonsense (p may point to
the start of the cache-control string, which does not necessarilly coincide
with the location of the max-age value). Based on the code, the intent
was clearly to parse the value *after* the matched substring.
The retry path restores this offset, but the failure path does not. This
is especially important for the case of the continuation handler in
http_read_stream(), which may result in subsequent loop iterations (after
repeated failures to read additional data) seeking to the wrong offset.
This (arbitrarily) returns -1, which happens to be AVERROR(EPERM) on my
machine. Return the more descriptive AVERORR(EIO) instead.
Also add a log message to explain what's going on.
The code was evidently designed at one point in time to support "direct"
execution (not via a thread pool) for num_threads == 1, but this was never
implemented.
As a side benefit, reduces context creation overhead in single threaded
mode (relevant e.g. inside the libswscale self test), due to not needing to
spawn and destroy several thousand worker threads.
Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
Ensures samples where a missing Frame Header is handled by a subsequent
Redundant one are parsed correctly.
Signed-off-by: James Almer <jamrial@gmail.com>
(This also stops zero-allocating er_temp_buffer for H.264,
reverting back to the behavior from before commit
0a1dc817237b8546189960efd523257b4f62e1ab.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Given we rewrite these NALUs to remove the encoded data blocks to export as extradata,
we need to do the inverse to remove SC, GC and AI blocks to export as filtered data in
packes.
Signed-off-by: James Almer <jamrial@gmail.com>
write_lcevc_nalu() is meant only for IDR and NON_IDR NALUs. For everything else, just
copy it unchanged.
Signed-off-by: James Almer <jamrial@gmail.com>
Rewrite ff_hevc_put_hevc_qpel_h16_8_neon and h32 to use byte-domain
widening multiply (umull/umlal/umlsl via calc_qpelb/calc_qpelb2 macros)
instead of the previous int16-domain approach (uxtl + mul/mla).
The byte-domain approach eliminates the uxtl expansion step and halves
the ext stride (1 byte vs 2 bytes per tap), reducing per-row instruction
count from ~32 to ~23. The functions are also inlined, removing bl/ret
call overhead.
This benefits all HV-path callers (hv/uni_hv/bi_hv/uni_w_hv/bi_w_hv)
at widths 16/32/48/64.
checkasm benchmarks on Apple M4 (5-run average):
H-pass standalone (NEON):
h16: 34.0 -> 24.4 cycles (1.39x speedup)
h32: 132.0 -> 95.0 cycles (1.39x speedup)
h64: 521.8 -> 373.9 cycles (1.40x speedup)
HV compound paths geometric mean speedup (NEON, width >= 16):
qpel_hv: 1.144x (4 functions)
qpel_bi_hv: 1.158x (4 functions)
qpel_uni_hv: 1.188x (4 functions)
qpel_uni_w_hv: 1.158x (3 functions)
Overall: 1.162x (15 functions)
VVC qpel h16/h32 are separated into self-contained functions retaining
the int16-domain approach, as VVC filters have arbitrary coefficients
incompatible with the hardcoded sign pattern in calc_qpelb.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
The VVC qpel h16 and h32 functions had a redundant 'mov mx, x30'
instruction. The first one was placed before vvc_load_filter had
finished using mx (the filter pointer argument), making it a dead
store immediately overwritten by the second 'mov mx, x30'.
Remove the first instance and reorder so that 'sub src, src, #3'
comes before 'mov mx, x30', ensuring the filter pointer in mx is
fully consumed by vvc_load_filter before being overwritten with the
link register.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
AFter this, depth and the root element flag is stored in space
needed for alignment and thus does not increase the AVExpr size
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
we already check the depth of the parser but the AVExpr tree differs
Fixes: stack exhaustion
Fixes: YWH-PGM40646-39
Found-by: jpraveenrao
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -8659510451449931520 - 2205846422852077376 cannot be represented in type 'int64_t' (aka 'long')
Fixes: 486358507/clusterfuzz-testcase-minimized-ffmpeg_dem_WEBM_DASH_MANIFEST_fuzzer-4896911086911488
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access
Fixes: 471509958/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ADPCM_IMA_MAGIX_DEC_fuzzer-4847227777646592
We ask for a mono sample because the implementation for mono is incomplete
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
scratch[20] doesnt exist in version 0
Fixes: use of uninitialized memory
Fixes: 471664627/clusterfuzz-testcase-minimized-ffmpeg_dem_SEGAFILM_fuzzer-4738726971637760
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Fixes: signed integer overflow: 2147483640 + 32 cannot be represented in type 'int'
Fixes: 473569764/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-5377306970619904
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long long'); cast to an unsigned type to negate this value to itself
Fixes: 473334102/clusterfuzz-testcase-minimized-ffmpeg_dem_MATROSKA_fuzzer-5109540931829760
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes non repeatable checksums
This also avoids allocating the mc only buffer when its not used
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If there is no escape case then reaching that branch is an error
Fixes: shift exponent 32 is too large for 32-bit type 'uint32_t' (aka 'unsigned int')
Fixes: 472335543/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DST_fuzzer-6682453243920384
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: runtime error: shift exponent -1 is negative
Fixes: runtime error: shift exponent 32 is too large for 32-bit type 'int'
Fixes: 471846062/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_JPEG2000_DEC_fuzzer-5835290976780288
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
resample_linear can produce overflows with craftet input,
The added casts should have no effect on the binary output or the operations they
just change things to a defined regime
Fixes: signed integer overflow: 2069416960 + 78151680 cannot be represented in type 'int'
Fixes: 472047214/clusterfuzz-testcase-minimized-ffmpeg_SWR_fuzzer-6374046976770048
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 9223372036854775807 + 3546086691638400 cannot be represented in type 'int64_t' (aka 'long')
Fixes: 471723681/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-4841032488648704
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 1077919680 + 1077936128 cannot be represented in type 'int'
Fixes: 471686763/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ADPCM_N64_DEC_fuzzer-6493712281829376
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>