Using the power of libswscale/tests/sws_ops -summarize lets us see which
kernels are actually needed by real op lists.
Note: I'm working on a separate series which will obsolete this implementation
whack-a-mole game altogether, by generating a list of all possible op kernels
at compile time.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is far more commonly used without an offset than with; so having it there
prevents these special cases from actually doing much good.
Signed-off-by: Niklas Haas <git@haasn.dev>
First vector is %2, not %3. This was never triggered before because all of
the existing masks never hit this exact case.
Signed-off-by: Niklas Haas <git@haasn.dev>
Since this now has an explicit mask, we can just check that directly, instead
of relying on the unused comps hack/trick.
Additionally, this also allows us to distinguish between fixed value and
arbitrary value clears by just having the SwsOpEntry contain NAN values iff
they support any clear value.
Signed-off-by: Niklas Haas <git@haasn.dev>
This does come with a slight change in behavior, as we now don't print the
range information in the case that the range is only known for *unused*
components. However, in practice, that's already guaranteed by update_comps()
stripping the range info explicitly in this case.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of implicitly excluding NAN values if ignore_den0 is set. This
gives callers more explicit control over which values to print, and in
doing so, makes sure "unintended" NaN values are properly printed as such.
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of implicitly testing for NaN values. This is mostly a straightforward
translation, but we need some slight extra boilerplate to ensure the mask
is correctly updated when e.g. commuting past a swizzle.
Signed-off-by: Niklas Haas <git@haasn.dev>
This accidentally unconditionally overwrote the entire clear mask, since
Q(n) always set the denominator to 1, resulting in all channels being
cleared instead of just the ones with nonzero denominators.
Signed-off-by: Niklas Haas <git@haasn.dev>
This currently completely fails for images smaller than 12x12; and even in that
case, the limited resolution makes these tests a bit useless.
At the risk of triggering a lot of spurious SSIM regressions for very
small sizes (due to insufficiently modelling the effects of low resolution on
the expected noise), this patch allows us to at least *run* such tests.
Incidentally, 8x8 is the smallest size that passes the SSIM check.
Not only does this take into account extreme edge cases where the plane
padding can significantly exceed the actual width/stride, but it also
correctly takes into account the filter offsets when scaling; which the
previous code completely ignored.
Simpler, robuster, and more correct. Now valgrind passes for 100% of format
conversions for me, with and without scaling.
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a mostly straightforward internal mechanical change that I wanted
to isolate from the following commit to make bisection easier in the case of
regressions.
While the number of tail blocks could theoretically be different for input
vs output memcpy, the extra complexity of handling that mismatch (and
adjusting all of the tail offsets, strides etc.) seems not worth it.
I tested this commit by manually setting `p->tail_blocks` to higher values
and seeing if that still passed the self-check under valgrind.
Signed-off-by: Niklas Haas <git@haasn.dev>
The x86 kernel e.g. assumes that at least one block is processed; so avoid
calling this with an empty width. This is currently only possible if e.g.
operating on an unpadded, very small image whose total linesize is less than
a single block.
Signed-off-by: Niklas Haas <git@haasn.dev>
This code had two issues:
1. It was over-allocating bytes for the input offset map case, and
2. It was hard-coding the assumption that there is only a single tail block
We can fix both of these issues by rewriting the way the tail size is derived.
In the non-offset case, and assuming only 1 tail block:
aligned_w - safe_width
= num_blocks * block_size - (num_blocks - 1) * block_size
= block_size
Additionally, the FFMAX(tail_size_in/out) is unnecessary, because:
tail_size = pass->width - safe_width <= aligned_w - safe_width
In the input offset case, we instead realize that the input kernel already
never over-reads the input due to the filter size adjustment/clamping, so
the only thing we need to ensure is that we allocate extra bytes for the
input over-read.
Signed-off-by: Niklas Haas <git@haasn.dev>
The over_read/write fields are not documented as depending on the subsampling
factor. Actually, they are not documented as depending on the plane at all.
If and when we do actually add support for horizontal subsampling to this
code, it will most likely be by turning all of these key variables into
arrays, which will be an upgrade we get basically for free.
Signed-off-by: Niklas Haas <git@haasn.dev>
This makes it far less likely to accidentally add or remove a +7 bias when
repeating this often-used expression.
Signed-off-by: Niklas Haas <git@haasn.dev>
This could trigger if e.g. a backend tries to operate on monow formats with
a block size that is not a multiple of 1. In this case, `block_size_in`
would previously be miscomputed (to e.g. 0), which is obviously wrong.
Signed-off-by: Niklas Haas <git@haasn.dev>
As well as weird edge cases like trying to filter `monow` and pixels landing
in the middle of a byte. Realistically, this will never happen - we'd instead
pre-process it into something byte-aligned, and then dispatch a byte-aligned
filter on it.
However, I need to add a check for overflow in any case, so we might as well
add the alignment check at the same time. It's basically free.
Signed-off-by: Niklas Haas <git@haasn.dev>
Prevents valgrind from complaining about operating on uninitialized bytes.
This should be cheap as it's only done once during setup().
Signed-off-by: Niklas Haas <git@haasn.dev>
This code made the input read conditional on the byte count, but not the
output, leading to a lot of over-write for cases like 15, 5.
Signed-off-by: Niklas Haas <git@haasn.dev>
These align the filter size to a multiple of the internal tap grouping
(either 1/2/4 for vpgatherdd, or the XMM size for the 4x4 transposed kernel).
This may over-read past the natural end of the input buffer, if the aligned
size exceeds the true size.
Signed-off-by: Niklas Haas <git@haasn.dev>
The V-Nova LCEVC pipeline processes frames on internal background
worker threads. LCEVC_ReceiveDecoderPicture returns LCEVC_Again (-1)
when the worker has not yet completed the frame, which is the
documented "not ready, try again" response. The original code treated
any non-zero return as a fatal error (AVERROR_EXTERNAL), causing decode
to abort mid-stream.
Poll until LCEVC_Success or a genuine error is returned.
Signed-off-by: Peter von Kaenel <Peter.vonKaenel@harmonicinc.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Avoids the post_process_opaque_free callback; the only user of
this is already a RefStruct reference and presumably other users
would want to use a pool for this, too, so they would use
RefStruct-objects, too.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When use_loop == true and idx < 0, we would incorrectly check
in_stride[idx], which is OOB read. Reorder conditions to avoid that.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
When the WAV muxer's `-rf64 auto` option is used, the output is intended
to be a normal WAV file if possible, only extended to RF64 format when
the file size grows too large. This was accomplished by reserving space
for the extra RF64-specific data using a standard JUNK chunk (ignored by
readers), then overwriting the reserved space later with a ds64 chunk if
needed.
In the original rf64 auto implementation, the JUNK chunk was placed
right after the RIFF/WAVE file header, before the fmt chunk; this is the
design suggested by the "Achieving compatibility between BWF and RF64"
section of the RF64 spec:
RIFF 'WAVE' <JUNK chunk> <fmt-ck> ...
However, this approach means that the fmt chunk is no longer in its
conventional location at the beginning of the file, and some WAV-reading
tools are confused by this layout. For example, the `file` tool is not
able to show the format information for a file with the extra JUNK chunk
before fmt.
This change shuffles the order of the chunks for `-rf64 auto` mode so
that the reserved space follows fmt instead of preceding it:
RIFF 'WAVE' <fmt-ck> <JUNK chunk> ...
With this small modification, tools expecting the fmt chunk to be the
first chunk in the file work with files produced by `-rf64 auto`.
This means the fmt chunk won't be in the location required by RF64, so
if the automatic RF64 conversion is triggered, the fmt chunk needs to be
relocated by rewriting it following the ds64 chunk during the conversion:
RF64 'WAVE' <ds64 chunk> <fmt-ck> ...
H.264 only uses these functions with height 2 or 4 and
the aarch64, arm and mips versions of them optimize based
on this. Yet this is not true when these functions are used
by the lowres code in mpegvideo_dec.c. So revert back to
the C versions of these functions for mpegvideo_dec so that
the H.264 decoder can still use fully optimized functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Frame side data unfortunately lacks padding, which CBS needs, so we can't reuse
the existing AVBufferRef.
Signed-off-by: James Almer <jamrial@gmail.com>
These can randomly trigger the alpha/zero fast paths, resulting in spurious
tests or randomly diverging performance if the backend happens to implement
that particular fast path.
Signed-off-by: Niklas Haas <git@haasn.dev>
This was not actually testing integer path. Additionally, for integer
scales, there is a special fast path for expansion from bits to full range,
which we should separate from the random value test.
The overhead of the loop and memcpy call is less than the overhead of
possibly spilling into one extra unnecessary cache line. 64 is still a
good rule of thumb for L1 cache line size in 2026.
I leave it to future code archeologists to find and tweak this constant if
it ever becomes unnecessary.
Signed-off-by: Niklas Haas <git@haasn.dev>
Most of these filters don't test anything meaningfully different relative to
each other; the only filters that really have special significant are POINT
(for now) and maybe BILINEAR down the line.
Apart from that, SINC, combined with the src size loop, already tests both
extreme cases (large and small filters), with large, oscillating unwindonwed
weights.
The other filters are not adding anything of substance to this, while massively
slowing down the runtime of this test. We can, of course, change this if the
backends ever get more nuanced handling.
checkasm: all 855 tests passed (down from 1575)
Signed-off-by: Niklas Haas <git@haasn.dev>