avcodec/x86/lossless_videodsp: Avoid aligned/unaligned versions

For AVX2, movdqu is as fast as movdqa when used on aligned addresses,
so don't instantiate aligned/unaligned versions.

(The check was btw overtly strict: The AVX2 code only uses 16 byte
stores, so it would be enough for dst to be 16-byte aligned.)

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This commit is contained in:
Andreas Rheinhardt
2025-12-18 23:15:12 +01:00
parent 6368d2baae
commit 9314d5cae8

View File

@@ -151,6 +151,7 @@ cglobal add_left_pred_unaligned, 3,3,7, dst, src, w, left
VBROADCASTI128 m3, [pb_zz11zz55zz99zzdd]
movd xm0, leftm
pslldq xm0, 15
%if notcpuflag(avx2)
test srcq, mmsize - 1
jnz .src_unaligned
test dstq, mmsize - 1
@@ -159,6 +160,7 @@ cglobal add_left_pred_unaligned, 3,3,7, dst, src, w, left
.dst_unaligned:
ADD_LEFT_LOOP u, a
.src_unaligned:
%endif
ADD_LEFT_LOOP u, u
%endmacro