mirror of
https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg
synced 2026-04-20 21:00:41 +08:00
This function is exported, so has to abide by the ABI
and therefore issues emms since commit
5b85ca5317. Yet this is
expensive and using SSE2 instead improves performance.
Also avoid the initial zeroing and the last pointer
increment while just at it.
This removes the last usage of mmx from libavutil*.
Old benchmarks:
sad_8x8_0_c: 13.2 ( 1.00x)
sad_8x8_0_mmxext: 27.8 ( 0.48x)
sad_8x8_1_c: 13.2 ( 1.00x)
sad_8x8_1_mmxext: 27.6 ( 0.48x)
sad_8x8_2_c: 13.3 ( 1.00x)
sad_8x8_2_mmxext: 27.6 ( 0.48x)
New benchmarks:
sad_8x8_0_c: 13.3 ( 1.00x)
sad_8x8_0_sse2: 11.7 ( 1.13x)
sad_8x8_1_c: 13.8 ( 1.00x)
sad_8x8_1_sse2: 11.6 ( 1.20x)
sad_8x8_2_c: 13.2 ( 1.00x)
sad_8x8_2_sse2: 11.8 ( 1.12x)
Hint: Using two psadbw or one psadbw and movhps made no difference
in the benchmarks, so I chose the latter due to smaller codesize.
*: except if lavu provides avpriv_emms for other libraries
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>