Every container signals it as such, and the data media stream type is too
undefined and unsupported across the codebase that even if no standalone
decoder can be present for it, it's preferable to it.
This is technically an API break, but LCEVC support has been minimal until
now so it should be safe.
Signed-off-by: James Almer <jamrial@gmail.com>
Some video codecs are not meant to output frames on their own but to be applied
on top of frames generated by other codecs, as is the case of LCEVC, Dolby Vision,
etc. Add a codec prop to signal this kind of codec, so that library users may know
to not expect a standalone decoder for them to be present.
Signed-off-by: James Almer <jamrial@gmail.com>
It is only used by the MPEG-2 encoder, so replace it
by a private option instead. Use a more elaborate term
for it: intra_dc_precision ("dc" could be anything).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This option is only allowed to be set for MPEG-2, so ignore it
except for MPEG-2 and move handling of it to mpeg12enc.c.
This is in preparation for deprecating the AVCodecContext option.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Without the specification, limiting the index is the best that can be done.
Fixes: out of array access
Fixes: 487591441/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_LATM_fuzzer-6205915698364416
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
A new Sequence Header or a Temporal Delimiter OBU invalidate any previous frame
if not yet complete (As is the case of missing Tile Groups).
Similarly, a new Frame Header invalidates any onging Tile Group parsing.
Fixes: out of array access
Fixes: av1dec_tile_desync.mp4
Fixes: av1dec_tile_desync_bypass.mp4
Signed-off-by: James Almer <jamrial@gmail.com>
Section 6.8.1 of the AV1 specification states:
"If obu_type is equal to OBU_REDUNDANT_FRAME_HEADER, it is a requirement of
bitstream conformance that SeenFrameHeader is equal to 1."
Leave the existing behavior for reading scenarios as such a file may still
be readable.
Signed-off-by: James Almer <jamrial@gmail.com>
The value of vb_pos at vb_bottom, vb_above is known
at compile-time, so one can avoid the modifications
to vb_pos and just compare against immediates.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
They are completely unnecessary for the 8bit case (which only
handles 8bit) and overtly complicated for the 10 and 12bit cases:
All one needs to do is set up the (1<<bpp)-1 vector register
and jmp from (say) the 12bpp function stub inside the 10bpp
function. The way it is done here even allows to share the
prologue between the two functions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It can be combined with an earlier lea for the loop
processing 16 pixels at a time; it is unnecessary
for the tail, because the new values will be overwritten
immediately afterwards anyway.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The vvc_alf_filter functions don't use x86inc's stack managment
feature at all; they merely push and pop some regs themselves.
So don't tell x86inc to provide stack (which in this case
entails aligning the stack).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Each luma alf block has 2*12 auxiliary coefficients associated
with it that the alf_filter functions consume; the C version
simply increments the pointers.
The x64 dsp function meanwhile does things differenty:
The vvc_alf_filter functions have three levels of loops.
The middle layer uses two counters, one of which is
just the horizontal offset xd in the current line. It is only
used for addressing these auxiliary coefficients and
yet one needs to perform work translate from it to
the coefficient offset, namely a *3 via lea and a *2 scale.
Furthermore, the base pointers of the coefficients are incremented
in the outer loop; the stride used for this is calculated
in the C wrapper functions. Furthermore, due to GPR pressure xd
is reused as loop counter for the innermost loop; the
xd from the middle loop is pushed to the stack.
Apart from the translation from horizontal offset to coefficient
offset all of the above has been done for chroma, too, although
the coefficient pointers don't get modified for them at all.
This commit changes this to just increment the pointers
after reading the relevant coefficients.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The permutation that was applied before the write macro
is actually only beneficial when one has 16 entries to write,
so move it into the macro to write 16 entries and optimize
the other macro.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
One always has eight samples when processing the luma remainder,
so xmm registers are sufficient for everything. In fact, this
actually simplifies loading the luma parameters.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When width is known to be 8 (i.e. for luma that is not width 16),
the upper lane is unused, so use an xmm-sized packuswb and avoid
the vpermq altogether. For chroma not known to be 16 (i.e. 4,8 or
12) defer extracting from the high lane until it is known to be needed.
Also do so via vextracti128 instead of vpermq (also do this for
bpp>8).
Also use vextracti128 and an xmm-sized packuswb in case of width 16
instead of an ymm-sized packuswb followed by vextracti128.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
At the end of the height==8 codepath, a jump to RET at the end
of the height==16 codepath is performed. Yet the epilogue
is so cheap on Unix64 that this jump is not worthwhile.
For Win64 meanwhile, one can still avoid jumps, because
for width 16 >8bpp and width 8 8bpp content a jump is performed
to the end of the height==8 position, immediately followed
by a jump to RET. These two jumps can be combined into one.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Write them in assembly instead; this exchanges a call+ret
with a jmp and also avoids the stack for (1<<bpp)-1.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Both the 8bpp width 16 and >8bpp width 8 cases write
16 contiguous bytes; deduplicate writing them. In fact,
by putting this block of code at the end of the SAVE macro,
one can even save a jmp for the width 16 8bpp case
(without adversely affecting the other cases).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
For 8bpp width 8 content, an unnecessary jump was performed
for every write: First to the end of the SAVE_8BPC macro,
then to the end of the SAVE macro. This commit changes this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>