Instead of implicitly testing for NaN values. This is mostly a straightforward
translation, but we need some slight extra boilerplate to ensure the mask
is correctly updated when e.g. commuting past a swizzle.
Signed-off-by: Niklas Haas <git@haasn.dev>
This currently completely fails for images smaller than 12x12; and even in that
case, the limited resolution makes these tests a bit useless.
At the risk of triggering a lot of spurious SSIM regressions for very
small sizes (due to insufficiently modelling the effects of low resolution on
the expected noise), this patch allows us to at least *run* such tests.
Incidentally, 8x8 is the smallest size that passes the SSIM check.
It was a bit clunky, lacked semantic contextual information, and made it
harder to reason about the effects of extending this struct. There should be
zero runtime overhead as a result of the fact that this is already a big
union.
I made the changes in this commit by hand, but due to the length and noise
level of the commit, I used Opus 4.6 to verify that I did not accidentally
introduce any bugs or typos.
Signed-off-by: Niklas Haas <git@haasn.dev>
The NEON sws_ops backend will use a build-time code generator for the
various operation functions it needs to implement. This build time code
generator (ops_asmgen) will need a list of the operations that must be
implemented. This commit adds a tool (sws_ops_aarch64) that generates
such a list (ops_entries.c).
The list is generated by iterating over all possible conversion
combinations and collecting the parameters for each NEON assembly
function that has to be implemented, defined by an unique set of
parameters derived from SwsOp. Whenever swscale evolves, with improved
optimization passes, new pixel formats, or improvements to the backend
itself, this file (ops_entries.c) should be regenerated by running:
$ make sws_ops_entries_aarch64
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This is an expected consequence of the fact that the new ops code does not
yet do error diffusion, which only really affects formats like rgb4 and monow.
Specifically, this avoids erroring out with the following error:
loss 0.214988 is WORSE by 0.0111071, ref loss 0.203881
SSIM {Y=0.745148 U=1.000000 V=1.000000 A=1.000000}
When scaling monow -> monow from 96x96 to 128x96.
We can remove this hack again in the future when error diffusion is implemented,
but for now, this check prevents me from easily testing the scaling code.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This can be used to either manually verify, or perhaps programmatically
generate, the list of operation patterns that need to be supported by a
backend to be feature-complete.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
The legacy scaler is no longer implicitly used to generate a reference
to perform comparisons for every conversion. It is now up to the user
to generate a reference file and use it as input for a separate run to
perform comparisons.
It is now possible to compare against previous runs of the graph-based
scaler, for example to test for newer optimizations.
This reduces the overall time necessary to obtain speedup numbers from
the legacy scaler to the graph-based scaler (or any other comparison,
for that matter) since the reference must only be run once.
For example, to check the speedup between the legacy scaler and the
graph-based scaler:
./libswscale/tests/swscale [...] -bench 50 -legacy 1 > legacy_ref.txt
./libswscale/tests/swscale [...] -bench 50 -ref legacy_ref.txt
If no -ref file is specified, we are assuming that we are generating a
reference file, and therefore all information is printed (including
ssim/loss, and benchmarks if -bench is used).
If a -ref file is specified, the output printed depends on whether we
are testing for correctness (ssim/loss only) or benchmarking (time/
speedup only, along with overall speedup).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This emphasizes the order of magnitude of the loss, which is what is
important for us.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The format of the reference file is the output which is printed to
stdout from this tool itself.
Malformed reference files cause an error, with a more descriptive error
message. Running a subset of the reference conversions is still
supported through -src and/or -dst.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The test results (along with SSIM) are printed to stdout again so that
the output can be parsed by -ref.
Benchmark results have also been added to the output.
We still need to re-run the reference tests to perform benchmarks, but
this will be simplified in the next few commits.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The conversion parameters, ssim/loss, and benchmark results will
eventually be merged into the same output line.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The low bit depth workaround code is duplicated in this commit, but the
other occurrence will be removed in a few commits, so I see no reason
to factor it out.
The legacy scaler still has some conversions that give results much
worse than the expected loss, but we still want them as reference, so
we don't trigger expected loss errors on conversions with the legacy
scaler.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
We will eventually be able to select between running the new graph-based
scaler or the legacy scaler.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Support for input and output formats are already checked in run_self_tests().
This reverts commit a22faeb992.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The ref->src conversion only needs to be performed once per source
pixel format.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This prevents the propagation of dither_error across frames, and should
also improve reproducibility across platforms.
Also remove setting of flags for sws_src_dst early on, since it will
inevitably be overwritten during the tests.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Remove dimension checks originally added to please static analysis
tools. There is little reason to have arbitrary limits in this
developer test tool. The reference files are under control by the user.
This reverts f70a651b3f and c0f0bec2f2.
Legacy swscale may overwrite the pixel formats in the context (see
handle_formats() in libswscale/utils.c). This may lead to an issue
where, when sws_frame_start() allocates a new frame, it uses the wrong
pixel format.
Instead of fixing the issue in swscale, just make sure dst is always
allocated prior to calling the legacy scaler.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Otherwise, we always pass frames that already have buffers allocated, which
breaks the no-op refcopy optimizations.
Testing with -p 0.1 -threads 16 -bench 10, on an AMD Ryzen 9 9950X3D:
Before:
Overall speedup=2.776x faster, min=0.133x max=629.496x
yuv444p 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1
time=9 us, ref=9 us, speedup=1.043x faster
After:
Overall speedup=2.721x faster, min=0.140x max=574.034x
yuv444p 1920x1080 -> yuv444p 1920x1080, flags=0x100000 dither=1
time=0 us, ref=28 us, speedup=516.504x faster
(The slowdown in the legacy swscale case is from swscale's lack of a no-op
refcopy optimizaton, plus the fact that it's now actually doing memory
work instead of a no-op / redundant memset)
Signed-off-by: Niklas Haas <git@haasn.dev>
This was originally intended to also include performance gains/losses
due to complicated setup logic, but in practice it just means that changing
the number of iterations dramatically affects the measured speedup; which
makes it harder to do quick bench runs during development.
This gives more information about each operation and helps catch issues
earlier on.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
We don't actually have an SwsContext yet at this point, so just use
AV_OPT_SEARCH_FAKE_OBJ. For the actual evaluation, the signature only
requires that we pass a "pointer to a struct that contains an AVClass as
its first member", so passing a double pointer to the class itself is
sufficient.