The NEON sws_ops backend will use a build-time code generator for the
various operation functions it needs to implement. This build time code
generator (ops_asmgen) will need a list of the operations that must be
implemented. This commit adds a tool (sws_ops_aarch64) that generates
such a list (ops_entries.c).
The list is generated by iterating over all possible conversion
combinations and collecting the parameters for each NEON assembly
function that has to be implemented, defined by an unique set of
parameters derived from SwsOp. Whenever swscale evolves, with improved
optimization passes, new pixel formats, or improvements to the backend
itself, this file (ops_entries.c) should be regenerated by running:
$ make sws_ops_entries_aarch64
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
The existing fate-lavf-yuv420p.y4m covers only the default format.
Add four entries that pass -pix_fmt explicitly to the lavf_video
macro: yuv422p, yuv444p, yuv411p, and gray.
These exercise the branches in yuv4mpegpipe_write_header() that write
the "C422", "C444", "C411", and "Cmono" chroma descriptor strings in
the stream header. All four are gated on ENCDEC(RAWVIDEO,YUV4MPEGPIPE)
and added to FATE_LAVF_VIDEO_SCALE so they inherit the requirement for
CONFIG_SCALE_FILTER that lavf_video's -auto_conversion_filters needs.
Reference files were generated from the actual encoder output and
follow the md5+size+CRC format used by the other lavf references.
Signed-off-by: Soham Kute <officialsohamkute@gmail.com>
Add tests/api/api-enc-parser-test.c, a generic encoder+parser round-trip
test that takes codec_name, width, and height on the command line
(defaults: h261 176 144).
Three cases are tested:
garbage - a single av_parser_parse2() call on 8 bytes with no Picture
Start Code; verifies out_size == 0 so the parser emits no spurious data.
bulk - encodes 2 frames, concatenates the raw packets, feeds the whole
buffer to a fresh parser in one call, then flushes. Verifies that
exactly 2 non-empty frames come out and that the parser found the PSC
boundary between them.
split - the same buffer fed in two halves (chunk boundary falls inside
frame 0). Verifies the parser still emits exactly 2 frames when input
arrives incrementally, and that the collected bytes are identical to
the bulk output (checked with memcmp).
Implementation notes: avcodec_get_supported_config() selects the pixel
format; chroma height uses AV_CEIL_RSHIFT with log2_chroma_h from
AVPixFmtDescriptor; data[1] and data[2] are checked independently so
semi-planar formats work; the encoded buffer is given
AV_INPUT_BUFFER_PADDING_SIZE zero bytes at the end; parse_stream()
skips the fed chunk if consumed==0 to prevent an infinite loop.
Two FATE entries in tests/fate/api.mak: QCIF (176x144) and CIF
(352x288), both standard H.261 resolutions.
Signed-off-by: Soham Kute <officialsohamkute@gmail.com>
The original test only mapped the source file and printed its content,
exercising none of the error branches in av_file_map().
Replace it with a test that maps a real file (path via argv[1] for
out-of-tree builds) and verifies it is non-empty, then calls
av_file_map() on a nonexistent file twice: once with log_offset=0 to
confirm the error is logged at AV_LOG_ERROR, and once with log_offset=1
to confirm the level is raised by one, covering the
log_level_offset_offset path in av_vlog(). A custom av_log callback
captures the emitted level independently of the global log level.
The two error cases share a single for() loop to avoid duplication.
Add a FATE entry in tests/fate/libavutil.mak with CMP=null since
there is no fixed stdout to compare.
Signed-off-by: Soham Kute <officialsohamkute@gmail.com>
This is now fully redundant with the previous op's output; because unused
components are always marked as garbage on the input side.
Signed-off-by: Niklas Haas <git@haasn.dev>
May allow more efficient implementations that rely on the value range being
constrained.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
Halfs the amount of pmaddwd and improves performance a lot:
sbc_analyze_4_c: 55.7 ( 1.00x)
sbc_analyze_4_mmx: 7.0 ( 7.94x)
sbc_analyze_4_sse2: 4.3 (12.93x)
sbc_analyze_8_c: 131.1 ( 1.00x)
sbc_analyze_8_mmx: 22.4 ( 5.84x)
sbc_analyze_8_sse2: 10.7 (12.25x)
It also saves 224B of .text and allows to remove the emms_c()
from sbcenc.c (notice that ff_sbc_calc_scalefactors_mmx()
issues emms on its own, so it already abides by the ABI).
Hint: A pshufd could be avoided per function if the constants
were reordered.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The three *_from_name() functions used av_strstart() for prefix matching,
which returns incorrect results when one name is a prefix of another.
av_stereo3d_from_name("side by side (quincunx subsampling)") matched
"side by side" at index 1 and returned AV_STEREO3D_SIDEBYSIDE instead of
AV_STEREO3D_SIDEBYSIDE_QUINCUNX. Similarly,
av_stereo3d_primary_eye_from_name("nonexistent") matched "none" and
returned AV_PRIMARY_EYE_NONE instead of -1.
Switch all three functions from av_strstart() to strcmp() for exact
matching. No in-tree callers rely on prefix matching.
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Demuxers like mov will export packets not meant for presentation (e.g. because
an edit list doesn't include them) by flagging them as discard, but the mov
muxer completely ignored this, resulting in output edit lists considering every
packet.
Fixes issue #22552
Signed-off-by: James Almer <jamrial@gmail.com>
Add enc_dec_pcm roundtrip tests for the pcm_bluray codec covering
mono, stereo, 5.1, 7.0, and 7.1 channel layouts in s16. The 5.1
and 7.0 tests use an explicit pan filter for channel layout
conversion so the PAN_FILTER dependency is declared only where
needed. An additional s32 test uses a FATE sample file with real
>16-bit content (divertimenti_2ch_96kHz_s24.wav) and decodes to
s32le to verify the full 32-bit round-trip.
enc_dec_pcm is used instead of transcode because the MPEGTS muxer
produces different binary output on 32-bit and 64-bit platforms,
causing the intermediate file checksum to fail on 32-bit CI.
Coverage for libavcodec/pcm-bluray.c: 0.00% -> 93.75%
Coverage for libavcodec/pcm-blurayenc.c: 0.00% -> 91.71%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Add a unit test covering av_stereo3d_alloc, av_stereo3d_alloc_size,
av_stereo3d_create_side_data, av_stereo3d_type_name,
av_stereo3d_from_name, av_stereo3d_view_name,
av_stereo3d_view_from_name, and av_stereo3d_primary_eye_name.
The from_name calls are driven by a static name table so each
string appears exactly once. Round-trip inverse checks verify
that type_name/from_name and view_name/view_from_name are
consistent with each other.
Coverage for libavutil/stereo3d.c: 0.00% -> 100.00%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Add a unit test covering alloc, create_side_data, and select
for AV1 and H.274 film grain parameter types (22 cases).
Coverage for libavutil/film_grain_params.c: 0.00% -> 97.73%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
It beats MMX by a lot, because it has to process eight words.
Also notice that the MMX code expects registers to be preserved
between separate inline assembly blocks which is not guaranteed;
the new code meanwhile does not presume this.
Benchmarks:
gmc_c: 817.8 ( 1.00x)
gmc_mmx: 210.7 ( 3.88x)
gmc_ssse3: 80.7 (10.14x)
The MMX version has been removed.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The expression ((8*(MAX_ARGS - 8) + 15) & ~15 + 16)
evaluates to zero on Apple platforms due to assembler operator
precedence differences. LLVM's integrated assembler uses different
precedence rules depending on the target:
unsigned AsmParser::getBinOpPrecedence(AsmToken::TokenKind K,
MCBinaryExpr::Opcode &Kind) {
bool ShouldUseLogicalShr = MAI.shouldUseLogicalShr();
return IsDarwin ? getDarwinBinOpPrecedence(K, Kind, ShouldUseLogicalShr)
: getGNUBinOpPrecedence(MAI, K, Kind, ShouldUseLogicalShr);
}
In Darwin mode (Apple targets), arithmetic operators (+, -) have
higher precedence than bitwise operators (&, |, ^), similar to C.
In GNU mode (ELF targets), bitwise operators have higher precedence
than arithmetic operators.
The specification for H.26{4,5,6} states that start codes may be three or four
bytes long long except for the first NALU in an AU, and for NALUs of parameter
set types, which must be four bytes long.
This is checked by ff_cbs_h2645_unit_requires_zero_byte(), which is made
available outside of CBS for this change.
Signed-off-by: James Almer <jamrial@gmail.com>
Based on the behaviour from cbs_h2645, which removes actual
trailing_zero_8bits bytes and possibly also work arounds issues in
ff_h2645_extract_rbsp(). In this case, the same issue could be
present in ff_nal_find_startcode().
Signed-off-by: James Almer <jamrial@gmail.com>
This should match the number of lines. As an aside, align these declarations.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
This change has no measurable impact on performance here;
it is intended to avoid unpredictable behavior with floating
point operation like the one that led to commit
57a29f2e7d.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is only used by the MPEG-2 encoder, so replace it
by a private option instead. Use a more elaborate term
for it: intra_dc_precision ("dc" could be anything).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
refs does not belong to AVCodecParameters, so require a decoder:
it should only be showed when frames are actually processed by ffprobe.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
Accept up to 15 ULP difference.
This fixes running "checkasm --test=ac3dsp <seed>" for the seeds
2043066705, 24168 and 111972 on ARM, and the seeds 40552 and
209754 on aarch64.
This is the same change as 8e4c904c8e,
increasing the tolerance further.
With this change, checkasm passes for over 500 000 seeds on both
ARM and aarch64.