This fixes an ABI violation, as mul_thrmat did not issue emms.
It seems that this ABI violation could reach the user, namely
if ff_get_video_buffer() fails. Notice that ff_get_video_buffer()
itself could fail because of this, namely if the allocator uses
floating point registers.
On x64 (where GCC already used SSE2 in the C version)
mul_thrmat_c: 4.4 ( 1.00x)
mul_thrmat_mmx: 8.6 ( 0.52x)
mul_thrmat_sse2: 4.4 ( 1.00x)
On 32bit (where SSE2 is not known to be available):
mul_thrmat_c: 56.0 ( 1.00x)
mul_thrmat_sse2: 6.0 ( 9.40x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This reverts commit 7b18eafabd.
That commit added tests that don't work on Windows, and which
also fail in setups with cross/remote testing (with --target-exec
and --target-path).
See https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20876 for more
discussions about issues with that commit.
The frei0r API expects the time in seconds, but was given it in
milliseconds. The bug might exist since 41f1d3a (~14 years ago),
but plugins depending on the time are unwatchable without this
patch. For example:
ffmpeg -filter_complex "testsrc2=d=5,frei0r=distort0r" out.mp4
Signed-off-by: Stefan Breunig <stefan-ffmpeg-devel@breunig.xyz>
384fe39623 introduced a regression in the
range conversion offset calculation, resulting in a slight green tint
in full-range RGB to YUV conversions of grayscale values.
The offset being calculated was not taking into consideration a bias
needed for correctly rounding the result from the multiplication stage,
leading to a truncated value.
Fixes issue #11646.
This one takes about 2.93s on my machine, but ensures that every pixel
format conversion roundtrips correctly. Note that due to existing bugs in
libswscale, this one only passes when using the new format conversion code.
Restrict the test to -v 16 (AV_LOG_ERROR) to avoid excess amounts of output.
This makes the final file truly hybrid: Externally the file
is a regular, non-fragmented file, but internally, the fragmented
form also exists un-overwritten.
To make any use of that, first, the fragments need to be muxed in
a position independent form, i.e. with empty_moov+default_base_moof
(or the dash or cmaf meta-flags).
Making use of the fragmented form when the file is finalized is
not entirely obvious though. One can dump the contents of the
single mdat box, and get the fragmented form. (This is a neat
trick, but not something that anybody really is expected to
want to do.)
The main expected use case is accessing fragments in the form of
byte range segments, for e.g. HLS.
Previously, the start of the file would look like this:
- ftyp
- free
- moov
- (moov contents)
After finalizing the file, it would look like this:
- ftyp
- free
- mdat (previously moov)
- (moov contents)
In this form, the size and type of the original moov box were
overwritten, and the original moov contents is just leftover
as unused data in the mdat box.
To avoid this issue, the start of the file now looks like this:
- ftyp
- free
- free
- ftyp
- moov
- (moov contents)
The second, hidden ftyp box inside mdat, would normally never be
seen.
After finalizing, the difference is that the mdat box now is
extended to cover the ftyp and the whole moov including its header
(and all the following fragments).
I.e., the start of the file looks like this:
- ftyp
- free
- mdat
- ftyp
- moov
- (moov contents)
This allows accessing the "ftyp+moov" pair sequentially as such,
with a byte range - this range is untouched when finalizing,
producing the same ftyp+moov pair both while writing, when the
file is fragmented, and after finalizing, when the file is
transformed to non-fragmented externally.
Note; the sequential two "free+free" boxes may look slightly
silly; it could be tempting to make the second one an mdat
from the get-go. However, some players of fragmented mp4 (in
particular, Apple's HLS player) bail out if the initialization
segment contains an mdat box - therefore, use a free box.
It could also be possible to use just one single free box with
8 bytes of padding at the start - but that would require more
changes to the finalization logic.
For a segmenting user of the muxer, the only unclarity is how
to determine the right byte range for the internal ftyp+moov
pair. Currently, this requires parsing the muxer output and skip
past anything up to the start of the non-empty free box.
It is only used by mpegvideo decoders (for lowres). It is also only used
for bitdepth == 8, so don't build the bitdepth == 16 function at all any
more.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The drawvg filter can draw vector graphics on top of a video, using libcairo. It
is enabled if FFmpeg is configured with `--enable-cairo`.
The language for drawvg scripts is documented in `doc/drawvg-reference.texi`.
There are two new tests:
- `fate-filter-drawvg-interpreter` launch a script with most commands, and
verify which libcairo functions are executed.
- `fate-filter-drawvg-video` render a very simple image, just to verify that
libcairo is working as expected.
Signed-off-by: Ayose <ayosec@gmail.com>
fate-lavf-flv is dependency of fate-api-seek, but the fate-api-seek is
setting CMP = null, which affects fate-lavf-flv. Reset CMP value to
expected by the flv test and use default diff compare.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This reverts commit 301141b576.
cluster[0].dts, pts and frag_info[0].time are already in presentation
timeline, so they shouldn't be shift by start_pts.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Improves performance and no longer breaks the ABI (by forgetting
to call emms).
Old benchmarks:
add_8x8basis_c: 43.6 ( 1.00x)
add_8x8basis_ssse3: 12.3 ( 3.55x)
New benchmarks:
add_8x8basis_c: 43.0 ( 1.00x)
add_8x8basis_ssse3: 6.3 ( 6.79x)
Notice that the output of try_8x8basis_ssse3 changes a bit:
Before this commit, it computes certain values and adds the values
for i,i+1,i+4 and i+5 before right shifting them; now it adds
the values for i,i+1,i+8,i+9. The second pair in these lists
could be avoided (by shifting xmm0 and xmm1 before adding both together
instead of only shifting xmm0 after adding them), but the former
i,i+1 is inherent in using pmaddwd. This is the reason that this
function is not bitexact.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Check only on arches that need said check.
(Btw: I do not see how h_loop_filter benefits from alignment
at all and why h_loop_filter_unaligned exists.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The old code operated on bytes and did lots of tricks
due to their limited range; it did not completely succeed,
which is why the old versions were not used when bitexact
output was requested.
In contrast, the new version is much simpler: It operates
on signed 16 bit words whose range is more than sufficient.
This means that these functions don't need a check for bitexactness
(and can be used in FATE).
Old benchmarks (for this, the AV_CODEC_FLAG_BITEXACT check has been
removed from checkasm):
h_loop_filter_c: 29.8 ( 1.00x)
h_loop_filter_mmxext: 32.2 ( 0.93x)
h_loop_filter_unaligned_c: 29.9 ( 1.00x)
h_loop_filter_unaligned_mmxext: 31.4 ( 0.95x)
v_loop_filter_c: 39.3 ( 1.00x)
v_loop_filter_mmxext: 14.2 ( 2.78x)
v_loop_filter_unaligned_c: 38.9 ( 1.00x)
v_loop_filter_unaligned_mmxext: 14.3 ( 2.72x)
New benchmarks:
h_loop_filter_c: 29.2 ( 1.00x)
h_loop_filter_sse2: 28.6 ( 1.02x)
h_loop_filter_unaligned_c: 29.0 ( 1.00x)
h_loop_filter_unaligned_sse2: 26.9 ( 1.08x)
v_loop_filter_c: 38.3 ( 1.00x)
v_loop_filter_sse2: 11.0 ( 3.47x)
v_loop_filter_unaligned_c: 35.5 ( 1.00x)
v_loop_filter_unaligned_sse2: 11.2 ( 3.18x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Blocksize 2 is Snow-only, so move all the code pertaining
to it to snow.c. Also make the put array in H264QpelContext
smaller -- it only needs three sets of 16 function pointers.
This continues 6eb8bc4217
and b0c91c2fba.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The muxed subtitle is created by libavformat, and as such that's what should be
reported. This is in line with the string we write for every other muxer.
After this change, the muxer will no longer be recompiled every time a commit
is made.
Signed-off-by: James Almer <jamrial@gmail.com>
If the information is coded at the container level, then that's what should be
exported. The user will still have access to values coded at the bitstream
level by firing a decoder.
Fixes issue #20121
Signed-off-by: James Almer <jamrial@gmail.com>
The 2x2 put functions are only used by Snow and Snow uses
only the eight bit versions. The rest is dead code. Disabling
it saved 41277B here.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>