The major consequence of this is that we start allocating buffers per plane,
instead of allocating one contiguous buffer. This makes the no-op/refcopy
case slightly slower, but doesn't meaningfully affect the rest:
yuva444p -> yuva444p, time=157/1000 us (ref=78/1000 us), speedup=0.497x slower
Overall speedup=1.016x faster, min=0.983x max=1.092x
However, this is a necessary consequence of the desire to allow partial plane
allocations / single plane refcopies. This slowdown also does not affect
vf_scale, which already uses avfilter/framepool.c (via ff_get_video_buffer).
Signed-off-by: Niklas Haas <git@haasn.dev>
The NEON sws_ops backend will use a build-time code generator for the
various operation functions it needs to implement. This build time code
generator (ops_asmgen) will need a list of the operations that must be
implemented. This commit adds a tool (sws_ops_aarch64) that generates
such a list (ops_entries.c).
The list is generated by iterating over all possible conversion
combinations and collecting the parameters for each NEON assembly
function that has to be implemented, defined by an unique set of
parameters derived from SwsOp. Whenever swscale evolves, with improved
optimization passes, new pixel formats, or improvements to the backend
itself, this file (ops_entries.c) should be regenerated by running:
$ make sws_ops_entries_aarch64
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
This is a complete rewrite of the math in swscale/utils.c initFilter(), using
floating point math and with a bit more polished UI and internals. I have
also included a substantial number of improvements, including a method to
numerically compute the true filter support size from the parameters, and a
more robust logic for the edge conditions. The upshot of these changes is
that the filter weight computation is now much simpler and faster, and with
fewer edge cases.
I copy/pasted the actual underlying kernel functions from libplacebo, so this
math is already quite battle-tested. I made some adjustments to the defaults
to align with the existing defaults in libswscale, for backwards compatibility.
Note that this commit introduces a lot more filter kernels than what we
actually expose; but they are cheap to carry around, don't take up binary
space, and will probably save some poor soul from incorrectly reimplementing
them in the future. Plus, I have plans to expand the list of functions down
the line, so it makes sense to just define them all, even if we don't
necessarily use them yet.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This code is self-contained and logically distinct from the ops-related
helpers in ops.c, so it belongs in its own file.
Purely cosmetic; no functional change.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
By excluding the Vulkan makefile entirely when --disable-unstable is passed.
This also correctly avoids compiling e.g. unused GLSL compilers.
Fixes: #22295
See-Also: #22366
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is similar to swscale/tests/swscale.c, but significantly cheaper - it
merely prints the generated (optimized) operation list for every format
conversion.
Mostly useful for my own purposes as a regression test when making changes
to the ops optimizer. Note the distinction between this and tests/swscale.c,
the latter of which tests the result of *applying* an operation list for
equality.
There is an argument to be made that the two tests could be merged, but
I think the amount of overlap is small enough to not be worth the amount
of differences.
Provides a generic fast path for any operation list that can be decomposed
into a series of memcpy and memset operations.
25% faster than the x86 backend for yuv444p -> yuva444p
33% faster than the x86 backend for gray -> yuvj444p
This will serve as a reference for the SIMD backends to come. That said,
with auto-vectorization enabled, the performance of this is not atrocious.
It easily beats the old C code and sometimes even the old SIMD.
In theory, we can dramatically speed it up by using GCC vectors instead of
arrays, but the performance gains from this are too dependent on exact GCC
versions and flags, so it practice it's not a substitute for a SIMD
implementation.
This is responsible for taking a "naive" ops list and optimizing it
as much as possible. Also includes a small analyzer that generates component
metadata for use by the optimizer.
See docs/swscale-v2.txt for an in-depth introduction to the new approach.
This commit merely introduces the ops definitions and boilerplate functions.
The subsequent commits will flesh out the underlying implementation.
utils.c is getting quite crowded, and I need a new place to dump a lot of
format handling code for the ongoing rewrite. Rather than bloating this file
even more, start splitting format handling helpers off into a new file.
This also renames the existing utils.h header, which didn't really contain
anything except the SwsFormat definition anyway (the prototypes for what should
have been in utils.h are all still in the legacy swscale_internal.h).
This is a lightweight wrapper around the underlying color management system,
whose job it is merely to manage the 3DLUT state and apply them to the frame
data. This is where we might add platform-specific optimizations in the future.
I also plan on adding support for more pixel formats in the future. In
particular, we could support YUV or XYZ input formats directly using only
negligible additional code in the 3DLUT setup functions. This would eliminate
the major source of slowdown, which is currently the roundtrip to RGBA64.
The underlying color mapping logic was ported as straightforwardly as possible
from libplacebo, although the API and glue code has been very heavily
refactored / rewritten. In particular, the generalization of gamut mapping
methods is replaced by a single ICC intent selection, and constants have been
hard-coded.
To minimize the amount of overall operations, this gamut mapping LUT now embeds
a direct end-to-end transformation to the output color space; something that
libplacebo does in shaders, but which is prohibitively expensive in software.
In order to preserve compatibility with dynamic tone mapping without severely
regressing performance, we add the ability to generate a pair of "split" LUTS,
one for encoding the input and output to the perceptual color space, and a
third to embed the tone mapping operation. Additionally, this intermediate
space could be used for additional subjective effect (e.g. changing
saturation or brightness).
The big downside of the new approach is that generating a static color mapping
LUT is now fairly slow, as the chromaticity lobe peaks have to be recomputed
for every single RGB value, since correlated RGB colors are not necessarily
aligned in ICh space. Generating a split 3DLUT significantly alleviates this
problem because the expensive step is done as part of the IPT input LUT, which
can share the same hue peak calculation at least for all input intensities.
This interface has been designed from the ground up to serve as a new
framework for dispatching various scaling operations at a high level. This
will eventually replace the old ad-hoc system of using cascaded contexts,
as well as allowing us to plug in more dynamic scaling passes requiring
intermediate steps, such as colorspace conversions, etc.
The starter implementation merely piggybacks off the existing sws_init() and
sws_scale(), functions, though it does bring the immediate improvement of
splitting up cascaded functions and pre/post conversion functions into
separate filter passes, which allows them to e.g. be executed in parallel
even when the main scaler is required to be single threaded. Additionally,
a dedicated (multi-threaded) noop memcpy pass substantially improves
throughput of that fast path.
Follow-up commits will eventually expand this to move all of the scaling
decision logic into the graph init function, and also eliminate some of the
current special cases.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This is by no means perfect, since at least ddagrab will return scRGB
data with values outside of 0.0f to 1.0f for HDR values.
Its primary purpose is to be able to work with the format at all.
This avoids having to rebuild big files every time FFMPEG_VERSION
changes (which it does with every commit).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In case of shared builds, some object files containing tables
are currently duplicated into other libraries: log2_tab.c,
golomb.c, reverse.c. The check for whether this is duplicated
is simply whether CONFIG_SHARED is true. Yet this is crude:
E.g. libavdevice includes reverse.c for shared builds, but only
needs it for the decklink input device, which given that decklink
is not enabled by default will be unused in most libavdevice.so.
This commit changes this by making it more explicit about what
to duplicate from other libraries. To do this, two new Makefile
variables were added: SHLIBOBJS and STLIBOBJS. SHLIBOBJS contains
the objects that are duplicated from other libraries in case of
shared builds; STLIBOBJS contains stuff that a library has to
provide for other libraries in case of static builds. These new
variables provide a way to enable/disable with a finer granularity
than just whether shared builds are enabled or not. E.g. lavd's
Makefile now contains: SHLIBOBJS-$(CONFIG_DECKLINK_INDEV) += reverse.o
Another example is provided by the golomb tables. These are provided
by lavc for static builds, even if one uses a build configuration
that makes only lavf use them. Therefore lavc's Makefile contains
STLIBOBJS-$(CONFIG_MXF_MUXER) += golomb.o, whereas lavf's Makefile
has a corresponding SHLIBOBJS-$(CONFIG_MXF_MUXER) += golomb_tab.o.
E.g. in case the MXF muxer is the only component needing these tables
only libavformat.so will contain them for shared builds; currently
libavcodec.so does so, too.
(There is currently a CONFIG_EXTRA group for golomb. But actually
one would need two groups (golomb_avcodec and golomb_avformat) in
order to know when and where to include these tables. Therefore
this commit uses a Makefile-based approach for this and stops
using these groups for the users in libavformat.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
changes since v1:
- made into fate test
- fixed c90 warnings
- tests more intermediate formats
- tested on BE mips too
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit '92db5083077a8b0f8e1050507671b456fd155125':
build: Generate pkg-config files from Make and not from configure
build: Store library version numbers in .version files
Includes cherry-picked commits 8a34f36593 and
ee164727dd to fix issues.
Changes were also made to retain support for raise_major and build_suffix.
Reviewed-by: ubitux
Merged-by: James Almer <jamrial@gmail.com>
* commit '11a9320de54759340531177c9f2b1e31e6112cc2':
build: Move build-system-related helper files to a separate subdirectory
"ffbuild" directory name is used instead of "avbuild".
Merged-by: Clément Bœsch <u@pkh.me>
This moves work from the configure to the Make stage where it can
be parallelized and ensures that pkgconfig files are updated when
library versions change.
Bug-Id: 449
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
+ split color conversion from scaling
- disabled gamma correction, until it's refactored too
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
libswscale uses the table but wasn't duplicating it like the rest of the libs.
This should fix compilation failures on msvc/icl after lavu stopped exporting
internal functions and tables.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Originally written by James Almer <jamrial@gmail.com>
With the following contributions by Timothy Gu <timothygu99@gmail.com>
* Use descriptions of libraries from the pkg-config file generation function
* Use "FFmpeg Project" as CompanyName (suggested by Alexander Strasser)
* Use "FFmpeg" for ProductName as MSDN says "name of the product with which the
file is distributed" [1].
* Use FFmpeg's version (N-xxxxx-gxxxxxxx) for ProductVersion per MSDN [1].
* Only build the .rc files when --enable-small is not enabled.
[1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa381058.aspx
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master: (29 commits)
lavfi: reclassify showfiltfmts as a TESTPROG
graph2dot: fix printf format specifier
swscale: yuv2planeX 8bit >=sse2 functions need aligned stack on x86-32.
vp8: loopfilter >=sse2 functions need aligned stack on x86-32.
amr: remove shift out of the AMR_BIT() macro.
dsputilenc: group yasm and inline asm function pointer assignment.
mov: use forward declaration of a function instead of a table.
Clarify Doxygen comment for FF_API_* #defines.
configure: simplify get_version()
Create version.h headers for libraries that lack them
gitignore: Use full path instead of relative path to specify patterns
mpegvideo: remove VLAs
Add XTEA encryption support in libavutil
Add Blowfish encryption support in libavutil
eval: Add the isinf() function and tests for it
flacdec: move lpc filter to flacdsp
flacdec: split off channel decorrelation as flacdsp
avplay: Add an option for not limiting the input buffer size
FATE: add a test for WMA cover art.
FATE: add a test for apetag cover art
...
Conflicts:
.gitignore
configure
ffplay.c
libavcodec/Makefile
libavcodec/error_resilience.c
libavcodec/mpegvideo.c
libavcodec/ratecontrol.c
libavdevice/avdevice.h
libavfilter/Makefile
libavfilter/filtfmts.c
libavfilter/version.h
libavformat/mov.c
libavformat/version.h
libavutil/Makefile
libavutil/avutil.h
libavutil/version.h
libswscale/swscale.h
libswscale/x86/swscale_mmx.c
tests/fate/libavutil.mak
tests/lavfi-regression.sh
tools/graph2dot.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
libschroedinger: Switch to function names more in line with Libav style.
Move code shared between libdirac and libschroedinger to libschroedinger.
lavfi: uninline avfilter_copy_buffer_ref_props().
lavf: add missing '*' in a doxy.
h264: Remove a commented-out function pointer typedef.
txd: Remove write-only variable in txd_decode_frame().
mmvideo.c: Remove unused variable in mm_decode_pal().
build: cosmetics: Add missing end-of-line backslashes to item lists.
build: cosmetics: Split HEADERS/OBJS/PROGS lists into one entry per line.
libschroedinger: Move a function to avoid a forward declaration.
pthread: warn on high thread counts
vf_yadif: fix missing error handling for avfilter_poll_frame()
avprobe: allow showing only one container/stream property.
lavfi: support audio in avfilter_copy_frame_props().
lavfi: avfilter_merge_formats: handle case where inputs are same
lavc: add sample rate and channel layout to AVFrame.
zerocodec: check if the previous frame is missing
doc: clarify check for NULL pointer style
Conflicts:
doc/APIchanges
doc/developer.texi
ffprobe.c
libavcodec/Makefile
libavcodec/avcodec.h
libavcodec/libdirac_libschro.c
libavcodec/libdirac_libschro.h
libavcodec/mmvideo.c
libavcodec/txd.c
libavcodec/version.h
libavcodec/zerocodec.c
libavfilter/Makefile
libavfilter/avfilter.c
libavfilter/version.h
libavformat/Makefile
libavutil/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
swscale: K&R formatting cosmetics (part II)
tiffdec: Add a malloc check and refactor another.
faxcompr: Check malloc results and unify return path
configure: escape colons in values written to config.fate
ac3dsp: call femms/emms at the end of float_to_fixed24() for 3DNow and SSE
matroska: Fix leaking memory allocated for laces.
pthread: Fix crash due to fctx->delaying not being cleared.
vp3: Assert on invalid filter_limit values.
h264: fix 10bit biweight functions after recent x86inc.asm fixes.
ffv1: Fix size mismatch in encode_line.
movenc: Remove a dead initialization
git-howto: Explain how to avoid Windows line endings in git checkouts.
build: Move all arch OBJS declarations into arch subdirectory Makefiles.
Conflicts:
configure
libavcodec/vp3.c
libavformat/matroskadec.c
libavutil/Makefile
libswscale/Makefile
libswscale/swscale.c
libswscale/swscale_internal.h
libswscale/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
build: ppc: drop stray leftover backslash
build: Only clean the architecture subdirectory we build for.
build: drop some unnecessary dependencies from the H.264 parser
build: prettyprinting cosmetics
libavutil: Remove pointless rational test program.
libavutil: Remove broken and pointless lzo test program.
lavf doxy: expand AVStream.codec doxy.
lavf doxy: improve AVStream.time_base doxy.
lavf doxy: add some basic documentation about reading from the demuxer.
lavf doxy: document passing options to demuxers.
lavf doxy: clarify that an AVPacket contains encoded data.
mpegtsenc: allow user triggered PES packet flushing
APIchanges: mark the place where 0.7 was cut.
APIchanges: mark the place where 0.8 was cut.
APIchanges: fill in missing dates and hashes.
smacker: convert palette and header reading to bytestream2.
alac: convert extradata reading to bytestream2.
Conflicts:
doc/APIchanges
libavcodec/smacker.c
libavcodec/x86/Makefile
libavfilter/Makefile
libavutil/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>