Adapted from the corresponding me_cmp code. Only the width 16 function has been adapted, because it seems that the width 8 function actually reads 16 bytes per line. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>