The major consequence of this is that we start allocating buffers per plane,
instead of allocating one contiguous buffer. This makes the no-op/refcopy
case slightly slower, but doesn't meaningfully affect the rest:
yuva444p -> yuva444p, time=157/1000 us (ref=78/1000 us), speedup=0.497x slower
Overall speedup=1.016x faster, min=0.983x max=1.092x
However, this is a necessary consequence of the desire to allow partial plane
allocations / single plane refcopies. This slowdown also does not affect
vf_scale, which already uses avfilter/framepool.c (via ff_get_video_buffer).
Signed-off-by: Niklas Haas <git@haasn.dev>