109 Commits

Author SHA1 Message Date
Andreas Rheinhardt
999ccf6495 swresample/x86/{audio_convert,rematrix}: Remove remnants of MMX
Forgotten in 2b94f23b06,
4e51e48ebd and
374b3ab03c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-13 01:16:46 +02:00
Andreas Rheinhardt
374b3ab03c swresample/x86/audio_convert: Remove remnants of MMX
Forgotten in 2b94f23b06.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-02-13 09:21:33 +01:00
Andreas Rheinhardt
d5a47bf2b3 swresample/x86/Makefile: Only compile ASM init files when X86ASM is enabled
To do so, simply add these init files to X86ASM-OBJS instead of OBJS
in the Makefile. The former is already used for the actual assembly
files, but using them for the C init files just works, because the build
system uses file extensions to derive whether it is a C or a NASM file.

This avoids compiling unused function stubs and also reduces our
reliance on DCE: We don't add %if checks to the asm files except
for AVX, AVX2, FMA3, FMA4, XOP and AVX512, so all the MMX-SSE4
functions will be available. It also allows to remove HAVE_X86ASM checks
in these init files.

(x86/ops.c has already been put in X86ASM-OBJS.)

Reviewed-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-30 22:20:13 +01:00
Kacper Michajłow
d6cb0d2c2b ALL: move av_unused to conform with standard requirement
This is required placement by standard [[maybe_unused]] attribute, works
the same for __attribute__((unused)).

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-09-26 16:15:46 +00:00
Andreas Rheinhardt
fe06d5533f swresample/x86/rematrix_init: Avoid allocation for native_simd_one
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-09-15 17:52:22 +02:00
Andreas Rheinhardt
853229d140 swresample/rematrix: Avoid allocation for native_one
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-09-15 17:52:22 +02:00
Andreas Rheinhardt
790f793844 avutil/common: Don't auto-include mem.h
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.

Keep it for external users in order to not cause breakages.

Also improve the other headers a bit while just at it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:43 +01:00
James Almer
223c70cf1d swresample/swresample: add a used channel layout option using the new API
Replaces the "used channel count" option, which is now deprecated.

Signed-off-by: James Almer <jamrial@gmail.com>
2023-02-19 18:28:45 -03:00
Lynne
bbe95f7353 x86: replace explicit REP_RETs with RETs
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)

x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.

The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.

In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
2023-02-01 04:23:55 +01:00
Andreas Rheinhardt
dd61d6489b swresample/x86/resample: Remove obsolete MMXEXT functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems which benefit
from the MMXEXT resamplers (which are overridden by SSE2)
are truely ancient 32bit x86s they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-14 01:28:29 +02:00
Andreas Rheinhardt
4e51e48ebd swresample/x86/rematrix: Remove obsolete MMX functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-14 01:28:29 +02:00
Andreas Rheinhardt
2b94f23b06 swresample/x86/audio_convert: Remove obsolete MMX functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-14 01:28:29 +02:00
Andreas Rheinhardt
1ea3650823 Replace all occurences of av_mallocz_array() by av_calloc()
They do the same.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-09-20 01:03:52 +02:00
Andreas Rheinhardt
f3c197b129 Include attributes.h directly
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2021-04-19 14:34:10 +02:00
Marcin Gorzel
8b710ea5e7 swresample: Use channel count in rematrix initialization
Rematrixing supports up to 64 channels. However, there is only a limited number of channel layouts defined. Since the in/out channel count is currently obtained from the channel layout, for undefined layouts (e.g. for 9, 10, 11 channels etc.) the rematrixing fails.

This patch changes rematrix init methods to use in (used) and out channel count directly instead of computing it from channel layout.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-07-26 02:42:42 +02:00
Diego Biurrun
fd502f4f5f build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4d4)

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
Muhammad Faiz
de1308429a swresample/x86/resample: extend resample_double to support avx and fma3
benchmark:
sse2 10.670s
avx   8.763s
fma3  8.380s

Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2017-03-19 12:24:41 +07:00
Muhammad Faiz
06f94149c6 swresample/resample: optimize exact_rational=on:linear_interp=on case
separate dsp.resample to dsp.resample_common and dsp.resample_linear
and choose to call faster resample_common even when linear_interp=on
when c->frac and c->dst_incr_mod are both zero

speed up resampling when exact_rational and linear_interp are both
enabled because exact_rational force c->frac and c->dst_incr_mod to
be zero when soft compensation does not happen

benchmark on exact_rational=on:linear_interp=on
        old     new
real    8.432s  5.097s
user    7.679s  4.989s
sys     0.125s  0.107s

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2016-11-25 03:22:04 +07:00
Muhammad Faiz
6031e5d1af swresample/x86: add support for exact_rational
phase_shift and phase_mask is removed
generally exact_rational=on is faster than exact_rational=off

Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2016-06-21 05:18:21 +07:00
Muhammad Faiz
b8c6e5a661 swresample: add exact_rational option
give high quality resampling
as good as with linear_interp=on
as fast as without linear_interp=on
tested visually with ffplay
ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000, showcqt=gamma=5"
ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000:linear_interp=on, showcqt=gamma=5"
ffplay -f lavfi "aevalsrc='sin(10000*t*t)', aresample=osr=48000:exact_rational=on, showcqt=gamma=5"

slightly speed improvement
for fair comparison with -cpuflags 0
audio.wav is ~ 1 hour 44100 stereo 16bit wav file
ffmpeg -i audio.wav -af aresample=osr=48000 -f null -
        old         new
real    13.498s     13.121s
user    13.364s     12.987s
sys      0.131s      0.129s

linear_interp=on
        old         new
real    23.035s     23.050s
user    22.907s     22.917s
sys      0.119s     0.125s

exact_rational=on
real    12.418s
user    12.298s
sys      0.114s

possibility to decrease memory usage if soft compensation is ignored

Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2016-06-13 12:36:01 +07:00
James Almer
70d685a77f x86: use the new helper macros where useful
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:21 -03:00
James Almer
acdd672506 x86/audio_convert: fix clobbering of xmm registers
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-01 22:40:50 -03:00
James Almer
5750d6c5e9 x86: move XOP emulation code back to x86inc
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.

This further reduces differences with x264's x86inc.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03 17:11:13 -03:00
James Almer
f37a5dcb55 swresample/x86: add missing colon to labels
Silences warnings with Nasm

Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:51:13 -03:00
James Almer
c16e99e3b3 x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-01 00:15:35 +02:00
Michael Niedermayer
c0e3b46118 swresample: add av_cold to init functions
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-21 00:33:09 +01:00
James Almer
f7ed997a6d x86/swr: make pack_8ch functions work with compilers without aligned stack
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-15 13:57:37 -03:00
Michael Niedermayer
b74ecb82fa swresample/x86/rematrix_init: Check av_malloc* return codes, forward errors
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-09 10:15:56 +01:00
Michael Niedermayer
48ffaaaaef swresample/x86/rematrix_init: Use av_mallocz_array()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-09 10:15:56 +01:00
James Almer
59ac93f6af x86/swr: add SSE/AVX unpack_6ch functions
int32/float only

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-12 15:40:03 -03:00
James Almer
6abf00d615 x86/swr: load constants outside the loop in pack_6ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-01-11 01:11:46 -03:00
James Almer
975ff6a3c6 x86/swr: disable pack_8ch functions on msvc/icl x86_32
Until a proper fix is committed.

Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-31 16:38:33 -03:00
James Almer
5f14f9e984 x86/swr: add missing alignment check to pack_6ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-31 13:35:11 -03:00
James Almer
37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
James Almer
edff061fb0 x86/swr: add ff_float_to_int32_a_avx2
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips

Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-07 15:01:35 -03:00
James Almer
b385c4c6a3 x86/swr: replace sse4 instructions in pack_6ch with sse ones
There's no benefit from using blendps here except on CPUs with AVX, where
it's faster than shufps according to Intel's documentation.
As such, rename the sse4 functions to sse/sse2 and use shufps instead.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-11-06 20:54:00 -03:00
James Almer
9937362c54 x86/swr: use lavu helper macros to check CPU extensions
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-04 02:12:16 +02:00
James Almer
8279a15284 x86/swr: split audioconvert and rematrix DSP into separate files
Also rename resample_x86_dsp.c to resample_init.c

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-04 02:00:11 +02:00
James Almer
857cd1f33b swr: initialize only the necessary resample dsp functions
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-04 01:37:41 +02:00
James Almer
b5f0eac068 swr: rename swresample_dsp init functions to swri_resample_dsp
The swresample_ prefix is not for internal functions

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 13:18:30 +02:00
James Almer
c45b7f0d80 x86/swr: add ff_resample_{common, linear}_int16_xop
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 01:11:20 +02:00
James Almer
1a69224f44 x86/swr: add ff_resample_{common, linear}_float_fma
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-02 01:09:53 +02:00
James Almer
dd2c9034b1 x86/swr: convert resample_{common, linear}_double_sse2 to yasm
Signed-off-by: James Almer <jamrial@gmail.com>

312531 -> 311528 dezicycles

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 17:57:36 +02:00
Ronald S. Bultje
847bb638c0 swr: convert resample_common/linear_int16_mmx2/sse2 to yasm.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-30 20:11:50 +02:00
Ronald S. Bultje
faa1471ffc swr: rewrite resample_common/linear_float_sse/avx in yasm.
Linear interpolation goes from 63 (llvm) or 58 (gcc) to 48 (yasm)
cycles/sample on 64bit, or from 66 (llvm/gcc) to 52 (yasm) cycles/
sample on 32bit. Bon-linear goes from 43 (llvm) or 38 (gcc) to
32 (yasm) cycles/sample on 64bit, or from 46 (llvm) or 44 (gcc) to
38 (yasm) cycles/sample on 32bit (all testing on OSX 10.9.2, llvm
5.1 and gcc 4.8/9).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-28 17:06:47 +02:00
Ronald S. Bultje
083cd3d1f7 swr: compile mmx2 s16p functions only on x86-32.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 13:34:53 +02:00
James Almer
7f4dfbd080 swr: add prototypes for resample dsp functions
Should fix compilation failures with MSVC and any other compiler
without inline asm support.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 01:33:17 +02:00
Ronald S. Bultje
ada8f9c046 swr: remove obsolete function prototypes.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 00:07:25 +02:00
Ronald S. Bultje
7128a35f8c swr: split out DSP functions.
DSP bits of swri_resample go into their own mini-DSP functions; DSP
init goes from a per-call branch in multiple_resample to a proper
DSP init routine; x86 bits go into x86/; swri_resample() moves out of
resample_template.c into resample.c because it's independent of DSP
code or sample type; multiple_resample() is simplified.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-14 20:21:39 +02:00
James Almer
a9bf713d35 swresample: add swri_resample_float_avx
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-16 05:27:03 +02:00