10 Commits

Author SHA1 Message Date
Niklas Haas
df4fe85ae3 swscale/ops_chain: replace SwsOpEntry.unused by SwsCompMask
Needed to allow us to phase out SwsComps.unused altogether.

It's worth pointing out the change in semantics; while unused tracks the
unused *input* components, the mask is defined as representing the
computed *output* components.

This is 90% the same, expect for read/write, pack/unpack, and clear; which
are the only operations that can be used to change the number of components.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-04-16 23:25:10 +02:00
Kacper Michajłow
1092852406 swscale/ops: remove type from continuation functions
The glue code doesn't care about types, so long the functions are
chained correctly. Let's not pretend there is any type safety there, as
the function pointers were casted anyway from unrelated types.
Particularly some f32 and u32 are shared.

This fixes errors like so:
src/libswscale/ops_tmpl_int.c:471:1: runtime error: call to function linear_diagoff3_f32 through pointer to incorrect function type 'void (*)(struct SwsOpIter *, const struct SwsOpImpl *, unsigned int *, unsigned int *, unsigned int *, unsigned int *)'
libswscale/ops_tmpl_float.c:208: note: linear_diagoff3_f32 defined here

Fixes: #22332
2026-04-13 23:28:30 +00:00
Kacper Michajłow
9a2a0557ad swscale/ops: remove optimize attribute from op functions
It was added to force auto vectorization on GCC builds. Since then auto
vectorization has been enabled for whole code base, 1464930696.

According to GCC documentaiton, the optimize attribute should be used
for debugging purposes only. It is not suitable in production code.

In particular it's unclear whether the attribute is applied, as it's is
actually lost when function is inlined, so usage of it is quite fragile.

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2026-04-13 23:28:30 +00:00
Niklas Haas
e787f75ec8 swscale/ops_backend: add support for SWS_OP_FILTER_V
These could be implemented as a special case of DECL_READ(), but the
amount of extra noise that entails is not worth it; especially due to the
extra setup/free code that needs to be used here.

I've decided that, for now, the canonical implementation shall convert the
weights to floating point before doing the actual scaling. This is not a huge
efficiency loss (since the result will be 32-bit anyways, and mulps/addps are
1-cycle ops); so the main downside comes from the single extra float conversion
on the input pixels.

In theory, we may revisit this later if it turns out that using e.g. pmaddwd
is a win even for vertical scaling, but for now, this works and is a simple
starting point. Vertical scaling also tends to happen after horizontal scaling,
at which point the input will be F32 already to begin with.

For smaller types/kernels (e.g. U8 input with a reasonably sized kernel),
the result here is exact either way, since the resulting 8+14 bit sum fits
exactly into float.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas
fce3deaa3b swscale/ops_backend: add SwsOpExec to SwsOpIter
Needed for the scaling kernel, which accesses line strides.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas
00d1f41b2e swscale/ops_backend: avoid UB (null pointer arithmetic)
Just use uintptr_t, it accomplishes the exact same thing while being defined
behavior.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-24 13:20:59 +00:00
Niklas Haas
ef114cedef swscale/ops_chain: refactor setup() signature
This is basically a cosmetic commit that groups all of the parameters to
setup() into a single struct, as well as the return type. This gives the
immediate benefit of freeing up 8 bytes per op table entry, though the
main motivation will come in the following commits.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-18 09:09:44 +00:00
Niklas Haas
e729f49645 swscale/ops_backend: allocate block storage up-front
Instead of in each read() function. Not only is this slightly faster, due
to promoting more tail calls, but it also allows us to have operation chains
that don't start with a read.

Also simplifies the implementations.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-02-19 19:44:46 +00:00
Kacper Michajłow
1294ab5db1 swscale/ops_tmpl_int: remove unused arguments from wrap read decl
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-09-13 19:12:44 +02:00
Niklas Haas
5aef513fb4 swscale/ops_backend: add reference backend basend on C templates
This will serve as a reference for the SIMD backends to come. That said,
with auto-vectorization enabled, the performance of this is not atrocious.
It easily beats the old C code and sometimes even the old SIMD.

In theory, we can dramatically speed it up by using GCC vectors instead of
arrays, but the performance gains from this are too dependent on exact GCC
versions and flags, so it practice it's not a substitute for a SIMD
implementation.
2025-09-01 19:28:36 +02:00