Add a top-level title and demote former section headings (MD041-style hierarchy).
Add blank lines around headings and fenced code blocks where appropriate (MD022 and MD031-style). Some Markdown parsers, including kramdown, only recognize headings that are preceded by a blank line.
Use a top-level heading on the first line (MD041-style) and adjust section levels for clearer document structure. Improves navigation for assistive technologies that rely on heading outlines.
This muxer seems to intend to support output that does
not begin at zero (instead of e.g. just hardcoding
nb_frames_pos to 16). But then it is possible
that avio_seek() returns values > INT_MAX even
though the part of the file written by us can not
exceed this value. So the return value of avio_seek()
needs to be checked as 64bit integer and not silently
truncated to int.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The number of streams is always one (namely one video stream
with codec id AV_CODEC_ID_PDV) due to the MAX_ONE_OF_EACH,
ONLY_DEFAULT_CODECS flags. Also, the generic code (init_muxer()
in mux.c) checks that video streams have proper dimensions set.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This adds a NEON-optimized function for computing 32x32 Sum of Absolute
Differences (SAD) on AArch64, addressing a gap where x86 had SSE2/AVX2
implementations but AArch64 lacked equivalent coverage.
The implementation mirrors the existing sad8 and sad16 NEON functions,
employing a 4-row unrolled loop with UABAL and UABAL2 instructions for
efficient load-compute interleaving, and four 8x16-bit accumulators to
handle the wider 32-byte rows.
Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge) using checkasm:
sad_32x32_0: C 146.4 cycles -> NEON 98.1 cycles (1.49x speedup)
sad_32x32_1: C 141.4 cycles -> NEON 98.9 cycles (1.43x speedup)
sad_32x32_2: C 140.7 cycles -> NEON 95.0 cycles (1.48x speedup)
Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
This incorrectly lists the libavcodec major version as 60 instead of
62. Also fix the date and commit hash while at it
Fixes: 7faa6ee2aa ("libavformat/matroska: Support smpte 2094-50 metadata")
Signed-off-by: llyyr <llyyr.public@gmail.com>
This change should improve performance on Skylake and later
Intel CPUs (which have only half the ports for saturated adds/subs
for mmx register compared to xmm register): llvm-mca predicts
a 25% performance improvement on Skylake.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Due to a discrepancy between SSE2 and the C version coefficients
for idct_put and idct_add are restricted to a range not causing
overflows.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to use pavgb to reduce the amount of instructions used
to calculate the average; processing two rows via movhps allows
to reduce the amount of pxor and pavgb even further and turned
out to be beneficial.
This patch also avoids a load as the constant used here can be easily
generated at runtime.
Old benchmarks:
put_no_rnd_pixels_l2_c: 13.3 ( 1.00x)
put_no_rnd_pixels_l2_mmx: 11.6 ( 1.15x)
New benchmarks:
put_no_rnd_pixels_l2_c: 13.4 ( 1.00x)
put_no_rnd_pixels_l2_sse2: 7.5 ( 1.77x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This allows to avoid the stack for the 8 bit simple IDCT;
for the other IDCTs, it avoids storing and restoring two
xmm registers on Win64.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This function is exported, so has to abide by the ABI
and therefore issues emms since commit
5b85ca5317. Yet this is
expensive and using SSE2 instead improves performance.
Also avoid the initial zeroing and the last pointer
increment while just at it.
This removes the last usage of mmx from libavutil*.
Old benchmarks:
sad_8x8_0_c: 13.2 ( 1.00x)
sad_8x8_0_mmxext: 27.8 ( 0.48x)
sad_8x8_1_c: 13.2 ( 1.00x)
sad_8x8_1_mmxext: 27.6 ( 0.48x)
sad_8x8_2_c: 13.3 ( 1.00x)
sad_8x8_2_mmxext: 27.6 ( 0.48x)
New benchmarks:
sad_8x8_0_c: 13.3 ( 1.00x)
sad_8x8_0_sse2: 11.7 ( 1.13x)
sad_8x8_1_c: 13.8 ( 1.00x)
sad_8x8_1_sse2: 11.6 ( 1.20x)
sad_8x8_2_c: 13.2 ( 1.00x)
sad_8x8_2_sse2: 11.8 ( 1.12x)
Hint: Using two psadbw or one psadbw and movhps made no difference
in the benchmarks, so I chose the latter due to smaller codesize.
*: except if lavu provides avpriv_emms for other libraries
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We reject inputs that are significantly smaller than the smallest frame.
This check raises the minimum input needed before time consuming computations are performed
it thus improves the computation per input byte and reduces the potential DoS impact
Fixes: Timeout
Fixes: 472769364/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SVQ1_DEC_fuzzer-5519737145851904
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Otherwise the buffer for the hdr10+ blockadditional would
be clobbered if both are present (the buffers can only be
reused after the ebml_writer_write() call).
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
7faa6ee2aa added support
for writing AV_PKT_DATA_DYNAMIC_HDR_SMPTE_2094_APP5,
yet forgot to update the size of the EBML element buffer.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Precompute the SILK NLSF residual weights from the stage-1 codebooks and use the table during LPC decode. This removes the per-coefficient mandated fixed-point weight calculation in silk_decode_lpc() while preserving the same decoded values.
Instead of implicitly relying on SwsComps.unused, which contains the exact
same information. (cf. ff_sws_op_list_update_comps)
Signed-off-by: Niklas Haas <git@haasn.dev>
The implementation of AARCH64_SWS_OP_LINEAR loops over elements of this mask
to determine which *output* rows to compute. However, it is being set by this
loop to `op->comps.unused`, which is a mask of unused *input* rows. As such,
it should be looking at `next->comps.unused` instead.
This did not result in problems in practice, because none of the linear
matrices happened to trigger this case (more input columns than output rows).
Signed-off-by: Niklas Haas <git@haasn.dev>
Needed to allow us to phase out SwsComps.unused altogether.
It's worth pointing out the change in semantics; while unused tracks the
unused *input* components, the mask is defined as representing the
computed *output* components.
This is 90% the same, expect for read/write, pack/unpack, and clear; which
are the only operations that can be used to change the number of components.
Signed-off-by: Niklas Haas <git@haasn.dev>
Makes this logic a lot simpler and less brittle. We can trivially adjust the
list of linear masks that are required, whenever it changes as a result of any
future modifications.
Signed-off-by: Niklas Haas <git@haasn.dev>
Using the power of libswscale/tests/sws_ops -summarize lets us see which
kernels are actually needed by real op lists.
Note: I'm working on a separate series which will obsolete this implementation
whack-a-mole game altogether, by generating a list of all possible op kernels
at compile time.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is far more commonly used without an offset than with; so having it there
prevents these special cases from actually doing much good.
Signed-off-by: Niklas Haas <git@haasn.dev>
First vector is %2, not %3. This was never triggered before because all of
the existing masks never hit this exact case.
Signed-off-by: Niklas Haas <git@haasn.dev>
Since this now has an explicit mask, we can just check that directly, instead
of relying on the unused comps hack/trick.
Additionally, this also allows us to distinguish between fixed value and
arbitrary value clears by just having the SwsOpEntry contain NAN values iff
they support any clear value.
Signed-off-by: Niklas Haas <git@haasn.dev>
This does come with a slight change in behavior, as we now don't print the
range information in the case that the range is only known for *unused*
components. However, in practice, that's already guaranteed by update_comps()
stripping the range info explicitly in this case.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of implicitly excluding NAN values if ignore_den0 is set. This
gives callers more explicit control over which values to print, and in
doing so, makes sure "unintended" NaN values are properly printed as such.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of implicitly testing for NaN values. This is mostly a straightforward
translation, but we need some slight extra boilerplate to ensure the mask
is correctly updated when e.g. commuting past a swizzle.
Signed-off-by: Niklas Haas <git@haasn.dev>
This accidentally unconditionally overwrote the entire clear mask, since
Q(n) always set the denominator to 1, resulting in all channels being
cleared instead of just the ones with nonzero denominators.
Signed-off-by: Niklas Haas <git@haasn.dev>
This currently completely fails for images smaller than 12x12; and even in that
case, the limited resolution makes these tests a bit useless.
At the risk of triggering a lot of spurious SSIM regressions for very
small sizes (due to insufficiently modelling the effects of low resolution on
the expected noise), this patch allows us to at least *run* such tests.
Incidentally, 8x8 is the smallest size that passes the SSIM check.