Files
FFmpeg/libavcodec/ac3enc.c
Andreas Rheinhardt d91b1559e0 avcodec/x86/me_cmp: Replace MMXEXT size 16 funcs by unaligned SSE2 funcs
Snow calls some of the me_cmp_funcs with insufficient alignment
for the first pointer (see get_block_rd() in snowenc.c);
therefore SSE2 functions which really need this alignment
don't get set for Snow and 542765ce3e
consequently didn't remove MMXEXT functions which are overridden
by these SSE2 functions for normal codecs.

For reference, here is a command line which would segfault
if one simply used the ordinary SSE2 functions for Snow:

./ffmpeg -i mm-short.mpg -an -vcodec snow -t 0.2 -pix_fmt yuv444p \
-vstrict -2 -qscale 2 -flags +qpel -motion_est iter 444iter.avi

This commit adds unaligned SSE2 versions of these functions
and removes the MMXEXT ones. This in particular implies that
sad 16x16 now never uses MMX which allows to remove an emms_c
from ac3enc.c.

Benchmarks (u means unaligned version):
sad_0_c:                                                 8.2 ( 1.00x)
sad_0_mmxext:                                           10.8 ( 0.76x)
sad_0_sse2:                                              6.2 ( 1.33x)
sad_0_sse2u:                                             6.7 ( 1.23x)

vsad_0_c:                                               44.7 ( 1.00x)
vsad_0_mmxext (approx):                                 12.2 ( 3.68x)
vsad_0_sse2 (approx):                                    7.8 ( 5.75x)

vsad_4_c:                                               88.4 ( 1.00x)
vsad_4_mmxext:                                           7.1 (12.46x)
vsad_4_sse2:                                             4.2 (21.15x)
vsad_4_sse2u:                                            5.5 (15.96x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-17 13:05:07 +02:00

88 KiB