mirror of
https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg
synced 2026-04-20 21:00:41 +08:00
avcodec/x86/vp3dsp: Port ff_put_vp_no_rnd_pixels8_l2_mmx to SSE2
This allows to use pavgb to reduce the amount of instructions used to calculate the average; processing two rows via movhps allows to reduce the amount of pxor and pavgb even further and turned out to be beneficial. This patch also avoids a load as the constant used here can be easily generated at runtime. Old benchmarks: put_no_rnd_pixels_l2_c: 13.3 ( 1.00x) put_no_rnd_pixels_l2_mmx: 11.6 ( 1.15x) New benchmarks: put_no_rnd_pixels_l2_c: 13.4 ( 1.00x) put_no_rnd_pixels_l2_sse2: 7.5 ( 1.77x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit is contained in:
@@ -47,7 +47,7 @@ static void vp3_check_put_no_rnd_pixels_l2(const VP3DSPContext *const vp3dsp)
|
||||
BUF_SIZE = MAX_STRIDE * (HEIGHT - 1) + WIDTH,
|
||||
SRC_BUF_SIZE = BUF_SIZE + (WIDTH - 1), ///< WIDTH-1 to use misaligned input
|
||||
};
|
||||
declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *dst,
|
||||
declare_func(void, uint8_t *dst,
|
||||
const uint8_t *a, const uint8_t *b,
|
||||
ptrdiff_t stride, int h);
|
||||
|
||||
|
||||
Reference in New Issue
Block a user