If any of the dstStrides is not aligned mod 16, the warning
above this one will be triggered, setting stride_unaligned_warned,
so that the following check for stride_unaligned_warned will
be always false.
Reviewed-by: Niklas Haas <ffmpeg@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Matches the semantics of sws_frame_begin(), which also cleans up any
allocated buffers on error.
This is an issue introduced by the commit that allowed ff_sws_graph_run()
to fail in the first place.
Fixes: 563cc8216b
The major consequence of this is that we start allocating buffers per plane,
instead of allocating one contiguous buffer. This makes the no-op/refcopy
case slightly slower, but doesn't meaningfully affect the rest:
yuva444p -> yuva444p, time=157/1000 us (ref=78/1000 us), speedup=0.497x slower
Overall speedup=1.016x faster, min=0.983x max=1.092x
However, this is a necessary consequence of the desire to allow partial plane
allocations / single plane refcopies. This slowdown also does not affect
vf_scale, which already uses avfilter/framepool.c (via ff_get_video_buffer).
Signed-off-by: Niklas Haas <git@haasn.dev>
Useful for a handful of reasons, including Vulkan (which depends on external
device resources), but also a change I want to make to the tail handling.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Results in IMHO slightly more readable code flow, and will be useful in an
upcoming commit (that adds logic to ref individual planes).
Signed-off-by: Niklas Haas <git@haasn.dev>
The legacy API is defined by sws_init_context(), sws_scale() etc., whereas
the "modern" API is defined by just using sws_scale_frame() without prior
init call.
This int allows us to cleanly distinguish the type of context, paving the
way for some minor refactoring.
As an immediate benefit, we now gain a bunch of explict error checks to
ensure the API is used correctly (i.e. sws_scale() not called before
sws_init_context()).
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
By excluding the Vulkan makefile entirely when --disable-unstable is passed.
This also correctly avoids compiling e.g. unused GLSL compilers.
Fixes: #22295
See-Also: #22366
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
And have ff_sws_graph_run() just take a bare AVFrame. This will help with
an upcoming change, aside from being a bit friendlier towards API users
in general.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Prepare for xyz12Torgb48 architecture-specific optimizations in
subsequent patches by:
- Grouping XYZ+RGB gamma LUTs and 3x3 matrices into SwsColorXform
(ctx->xyz2rgb and ctx->rgb2xyz), replacing scattered fields.
- Dropping the unused last matrix column giving the same or smaller
SwsInternal size.
- Renaming ff_xyz12Torgb48 and ff_rgb48Toxyz12 and routing calls via
the new per-context function pointer (ctx->xyz12Torgb48 and
ctx->rgb48Toxyz12) in graph.c and swscale.c.
- Adding ff_sws_init_xyzdsp and invoking it in swscale init paths
(normal and unscaled).
- Making fill_xyztables public to ease its setup later in checkasm.
These modifications do not introduce any functional changes.
Signed-off-by: Arpad Panyik <Arpad.Panyik@arm.com>
384fe39623 introduced a regression in the
range conversion offset calculation, resulting in a slight green tint
in full-range RGB to YUV conversions of grayscale values.
The offset being calculated was not taking into consideration a bias
needed for correctly rounding the result from the multiplication stage,
leading to a truncated value.
Fixes issue #11646.
Fixes: shift exponent 32 is too large for 32-bit type 'unsigned int'
Fixes: division by zero
Fixes: 391981061/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6691017763389440
Fixes: 392929028/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5142088307507200
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
swscale internals don't distinguish between 16-bit and higher bit depth
output formats internally when it comes to the choice of intermediate
representation.
Clamping this value both prevents a SIGFPE and also aligns the check
with reality.
Values in csp, prim, trc, etc, are irrelevant if there's no conversion needed.
Reviewed-by: Niklas Haas <ffmpeg@haasn.xyz>
Signed-off-by: James Almer <jamrial@gmail.com>
The current logic uses 12-bit linear light math, which is woefully insufficient
and leads to nasty postarization artifacts. This patch simply switches the
internal logic to 16-bit precision.
This raises the memory requirement of these tables from 32 kB to 272 kB.
All relevant FATE tests updated for improved accuracy.
Fixes: #4829
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
Only add the condensed values that we actually care about. Group them into
a new struct to make it easier to discard or replace this metadata.
Define a special comparison function that does not choke on undefined/unknown
metadata.
There is an issue with the constants used in YUV to YUV range conversion,
where the upper bound is not respected when converting to mpeg range.
With this commit, the constants are calculated at runtime, depending on
the bit depth. This approach also allows us to more easily understand how
the constants are derived.
For bit depths <= 14, the number of fixed point bits has been set to 14
for all conversions, to simplify the code.
For bit depths > 14, the number of fixed points bits has been raised and
set to 18, to allow for the conversion to be accurate enough for the mpeg
range to be respected.
The convert functions now take the conversion constants (coeff and offset)
as function arguments.
For bit depths <= 14, coeff is unsigned 16-bit and offset is 32-bit.
For bit depths > 14, coeff is unsigned 32-bit and offset is 64-bit.
x86_64:
chrRangeFromJpeg8_1920_c: 2127.4 2125.0 (1.00x)
chrRangeFromJpeg16_1920_c: 2325.2 2127.2 (1.09x)
chrRangeToJpeg8_1920_c: 3166.9 3168.7 (1.00x)
chrRangeToJpeg16_1920_c: 2152.4 3164.8 (0.68x)
lumRangeFromJpeg8_1920_c: 1263.0 1302.5 (0.97x)
lumRangeFromJpeg16_1920_c: 1080.5 1299.2 (0.83x)
lumRangeToJpeg8_1920_c: 1886.8 2112.2 (0.89x)
lumRangeToJpeg16_1920_c: 1077.0 1906.5 (0.56x)
aarch64 A55:
chrRangeFromJpeg8_1920_c: 28835.2 28835.6 (1.00x)
chrRangeFromJpeg16_1920_c: 28839.8 32680.8 (0.88x)
chrRangeToJpeg8_1920_c: 23074.7 23075.4 (1.00x)
chrRangeToJpeg16_1920_c: 17318.9 24996.0 (0.69x)
lumRangeFromJpeg8_1920_c: 15389.7 15384.5 (1.00x)
lumRangeFromJpeg16_1920_c: 15388.2 17306.7 (0.89x)
lumRangeToJpeg8_1920_c: 19227.8 19226.6 (1.00x)
lumRangeToJpeg16_1920_c: 15387.0 21146.3 (0.73x)
aarch64 A76:
chrRangeFromJpeg8_1920_c: 6324.4 6268.1 (1.01x)
chrRangeFromJpeg16_1920_c: 6339.9 11521.5 (0.55x)
chrRangeToJpeg8_1920_c: 9656.0 9612.8 (1.00x)
chrRangeToJpeg16_1920_c: 6340.4 11651.8 (0.54x)
lumRangeFromJpeg8_1920_c: 4422.0 4420.8 (1.00x)
lumRangeFromJpeg16_1920_c: 4420.9 5762.0 (0.77x)
lumRangeToJpeg8_1920_c: 5949.1 5977.5 (1.00x)
lumRangeToJpeg16_1920_c: 4446.8 5946.2 (0.75x)
NOTE: all simd optimizations for range_convert have been disabled.
they will be re-enabled when they are fixed for each architecture.
NOTE2: the same issue still exists in rgb2yuv conversions, which is not
addressed in this commit.
As part of a larger, ongoing effort to modernize and partially rewrite
libswscale, it was decided and generally agreed upon to introduce a new
public API for libswscale. This API is designed to be less stateful, more
explicitly defined, and considerably easier to use than the existing one.
Most of the API work has been already accomplished in the previous commits,
this commit merely introduces the ability to use sws_scale_frame()
dynamically, without prior sws_init_context() calls. Instead, the new API
takes frame properties from the frames themselves, and the implementation is
based on the new SwsGraph API, which we simply reinitialize as needed.
This high-level wrapper also recreates the logic that used to live inside
vf_scale for scaling interlaced frames, enabling it to be reused more easily
by end users.
Finally, this function is designed to simply copy refs directly when nothing
needs to be done, substantially improving throughput of the noop fast path.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a purely cosmetic commit aimed at replacing accesses to
SwsInternal.opts by direct access to SwsContext wherever convenient.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is a preliminary step to separating these into a new struct. This
commit contains no functional changes, it is a pure search-and-replace.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Same as it's done in lumRangeToJpeg16_c(). Plenty of allowed input values can
overflow here.
Fixes: src/libswscale/swscale.c:198:47: runtime error: signed integer overflow: 475328 * 4663 cannot be represented in type 'int'
Signed-off-by: James Almer <jamrial@gmail.com>
This commit also fixes the issue that the call to ff_sws_init_range_convert()
from sws_init_swscale() was not setting up the arch-specific optimizations.
And preserve the public SwsContext as separate name. The motivation here
is that I want to turn SwsContext into a public struct, while keeping the
internal implementation hidden. Additionally, I also want to be able to
use multiple internal implementations, e.g. for GPU devices.
This commit does not include any functional changes. For the most part, it is
a simple rename. The only complications arise from the public facing API
functions, which preserve their current type (and hence require an additional
unwrapping step internally), and the checkasm test framework, which directly
accesses SwsInternal.
For consistency, the affected functions that need to maintain a distionction
have generally been changed to refer to the SwsContext as *sws, and the
SwsInternal as *c.
In an upcoming commit, I will provide a backing definition for the public
SwsContext, and update `sws_internal()` to dereference the internal struct
instead of merely casting it.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
I want to pull options out of SwsInternal, so we need to make this field
a dedicated int that gets updated as appropriate in ff_swscale().
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Used as an intermediate entry point for the new swscale context. The extra
constification is a consistency measure, as I want to move the memcpy of
stride and plane pointers to the functions that actually need to mutate them.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Instead of taking an int16_t pointer and a stride in halfwords, follow the
usual convention of treating all planes and strides as byte-addressed.
This does not have any immediate effect but makes these functions more
reusable without unintended "gotchas".
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This fixes an 11-year-old bug in the rgb2xyz functions, when used with a
negative stride. The current loop bounds turned it into a no-op.
Additionally, this increases performance on highly cropped images, whose
stride may be substantially higher than the effective width.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
I have not checked that the constant is correct, this just fixes the undefined behavior
Fixes: signed integer overflow: -646656 * 3517 cannot be represented in type 'int
Fixes: 70559/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5209368631508992
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is unlikely to make a difference
Fixes: CID1591896 Unintentional integer overflow
Fixes: CID1591901 Unintentional integer overflow
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>