avcodec/x86/vp3dsp: Port ff_put_vp_no_rnd_pixels8_l2_mmx to SSE2

This allows to use pavgb to reduce the amount of instructions used to calculate the average; processing two rows via movhps allows to reduce the amount of pxor and pavgb even further and turned out to be beneficial. This patch also avoids a load as the constant used here can be easily generated at runtime. Old benchmarks: put_no_rnd_pixels_l2_c: 13.3 ( 1.00x) put_no_rnd_pixels_l2_mmx: 11.6 ( 1.15x) New benchmarks: put_no_rnd_pixels_l2_c: 13.4 ( 1.00x) put_no_rnd_pixels_l2_sse2: 7.5 ( 1.77x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-04-20 21:00:41 +08:00 · 2026-04-12 22:39:04 +02:00
parent 37bc3a237b
commit 84b9de0633
3 changed files with 28 additions and 36 deletions
--- a/tests/checkasm/vp3dsp.c
+++ b/tests/checkasm/vp3dsp.c
@@ -47,7 +47,7 @@ static void vp3_check_put_no_rnd_pixels_l2(const VP3DSPContext *const vp3dsp)
        BUF_SIZE     = MAX_STRIDE * (HEIGHT - 1) + WIDTH,
        SRC_BUF_SIZE = BUF_SIZE + (WIDTH - 1), ///< WIDTH-1 to use misaligned input
    };
-    declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *dst,
+    declare_func(void, uint8_t *dst,
                 const uint8_t *a, const uint8_t *b,
                 ptrdiff_t stride, int h);