This adds a NEON-optimized function for computing 32x32 Sum of Absolute
Differences (SAD) on AArch64, addressing a gap where x86 had SSE2/AVX2
implementations but AArch64 lacked equivalent coverage.
The implementation mirrors the existing sad8 and sad16 NEON functions,
employing a 4-row unrolled loop with UABAL and UABAL2 instructions for
efficient load-compute interleaving, and four 8x16-bit accumulators to
handle the wider 32-byte rows.
Benchmarks on AWS Graviton3 (Neoverse V1, c7g.xlarge) using checkasm:
sad_32x32_0: C 146.4 cycles -> NEON 98.1 cycles (1.49x speedup)
sad_32x32_1: C 141.4 cycles -> NEON 98.9 cycles (1.43x speedup)
sad_32x32_2: C 140.7 cycles -> NEON 95.0 cycles (1.48x speedup)
Signed-off-by: Jeongkeun Kim <variety0724@gmail.com>
This function is exported, so has to abide by the ABI
and therefore issues emms since commit
5b85ca5317. Yet this is
expensive and using SSE2 instead improves performance.
Also avoid the initial zeroing and the last pointer
increment while just at it.
This removes the last usage of mmx from libavutil*.
Old benchmarks:
sad_8x8_0_c: 13.2 ( 1.00x)
sad_8x8_0_mmxext: 27.8 ( 0.48x)
sad_8x8_1_c: 13.2 ( 1.00x)
sad_8x8_1_mmxext: 27.6 ( 0.48x)
sad_8x8_2_c: 13.3 ( 1.00x)
sad_8x8_2_mmxext: 27.6 ( 0.48x)
New benchmarks:
sad_8x8_0_c: 13.3 ( 1.00x)
sad_8x8_0_sse2: 11.7 ( 1.13x)
sad_8x8_1_c: 13.8 ( 1.00x)
sad_8x8_1_sse2: 11.6 ( 1.20x)
sad_8x8_2_c: 13.2 ( 1.00x)
sad_8x8_2_sse2: 11.8 ( 1.12x)
Hint: Using two psadbw or one psadbw and movhps made no difference
in the benchmarks, so I chose the latter due to smaller codesize.
*: except if lavu provides avpriv_emms for other libraries
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The multiplanar image with storage_bit enabled fails to be exported
to DMA-BUF on the QCOM turnip driver, thus triggering this double-free issue.
```
[Parsed_hwmap_2 @ 0xffff5c002a70] Configure hwmap vulkan -> drm_prime.
[hwmap @ 0xffff5c001180] Filter input: vulkan, 1920x1080 (0).
[AVHWFramesContext @ 0xffff5c004e00] Unable to export the image as a FD!
free(): double free detected in tcache 2
Aborted
```
Additionally, add back an av_unused attribute. Otherwise, the compiler
will complain about unused variables when CUDA is not enabled.
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
ff_vk_find_struct returns const void *, so storing it in const void *drm_create_pnext
fixes the initialization warning but then dpb_hwfc->create_pnext = drm_create_pnext
assigns const void * to void *, triggering the same warning at that line. The right
fix is a (void *) cast at the call site, same as done for buf_pnext.
Also restrict the GetPhysicalDeviceImageFormatProperties2 verbose log in
try_export_flags to the DRM modifier path only: when has_mods is false the log
always printed mod[0]=0x0, which is misleading since no DRM modifier is involved.
Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
When mapping Vulkan Video frames to DMA-BUF, synchronize using an exportable
binary semaphore and sync_fd where supported. Submit a lightweight exec that
waits on each plane's timeline semaphore at the current value, signals a
SYNC_FD-exportable binary semaphore, then export with vkGetSemaphoreFdKHR.
Store that binary semaphore in AVVkFrameInternal and reuse it across maps
instead of creating and destroying each time: for
VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_SYNC_FD_BIT, copy transference means a
successful vkGetSemaphoreFdKHR unsignals the semaphore like a wait, so it can
be signaled again on the next map submit. If export is unavailable, fall back
to vkWaitSemaphores.
Moved drm_sync_sem destroy to vulkan_free_internal
Export dma-buf fds with GetMemoryFdKHR for each populated f->mem[i], iterating
up to the sw_format plane count instead of stopping at the image count, so
multi-memory bindings are not skipped. Describe DRM layers using
max(sw planes, image count) and query subresource layout with the correct
aspect and image index when one VkImage backs multiple planes. Reference the
source hw_frames_ctx on the mapped frame and close dma-buf fds on failure paths.
For DMA-BUF-capable pools, honor VK_EXTERNAL_MEMORY_FEATURE_DEDICATED_ONLY_BIT
from format export queries when binding memory. With DRM modifiers and a
video profile in create_pnext, preserve caller usage and image flags instead of
overwriting them from generic supported_usage probing; use the modifier list
create info when probing export flags for modifier tiling.
Include VK_IMAGE_USAGE_VIDEO_DECODE_DPB_BIT_KHR from the output frames
context's usage together with DST (fixes
VUID-VkVideoBeginCodingInfoKHR-slotIndex-07245) instead of adding DPB usage
only when !is_current.
In ff_vk_decode_add_slice, pass VkVideoProfileListInfoKHR (from the output
frames context's create_pnext) as the pNext argument to
ff_vk_get_pooled_buffer instead of the full create_pnext chain. In
ff_vk_frame_params, set tiling to OPTIMAL only when it is not already
DRM_FORMAT_MODIFIER_EXT. In ff_vk_decode_init, when the output pool's
create_pnext includes VkImageDrmFormatModifierListCreateInfoEXT, initialize the
DPB pool with that modifier-list pNext and DRM_FORMAT_MODIFIER_EXT tiling;
otherwise use VkVideoProfileListInfoKHR and OPTIMAL as before. When
VK_VIDEO_DECODE_CAPABILITY_DPB_AND_OUTPUT_DISTINCT_BIT_KHR is unset, the output
and DPB pools cannot use different layouts or tiling, so the DPB pool must
match the output pool.
Also fix av_hwframe_map ioctl sync_fd export, multi-planar semaphore handling,
and related failure-path cleanup.
Signed-off-by: Tymur Boiko <tboiko@nvidia.com>
SMPTE-2094-50 is an upcoming standard that is close to being
finalized.
Define a side data type for carrying this metadata. And add
functions for parsing and writing it. This is very similar to
the handling of HDR10+ metadata.
The spec is available here: https://github.com/SMPTE/st2094-50
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Test the integer math utility functions: av_gcd, av_rescale,
av_rescale_rnd (all rounding modes including PASS_MINMAX),
av_rescale_q, av_compare_ts, av_compare_mod, av_rescale_delta,
and av_add_stable. Includes large-value tests that exercise the
128-bit multiply path in av_rescale_rnd.
av_bessel_i0 is not tested since it uses floating point math
that is not bitexact across platforms.
Coverage for libavutil/mathematics.c: 0.00% -> 82.03%
Remaining uncovered lines are av_bessel_i0 (float, 23 lines)
and one edge case fallback in av_rescale_delta.
Test all public API functions: name/format round-trip lookups,
bytes_per_sample, is_planar, packed/planar conversions,
alt_sample_fmt, get_sample_fmt_string, samples_get_buffer_size,
samples_alloc, samples_alloc_array_and_samples, samples_copy,
and samples_set_silence. OOM error paths are exercised via
av_max_alloc().
Coverage for libavutil/samplefmt.c: 0.00% -> 95.28%
Remaining uncovered lines are the fill_arrays failure path
and the overlapping memmove branch in samples_copy.
Test the three public API functions: av_rc4_alloc, av_rc4_init,
and av_rc4_crypt. Verifies keystream output against RFC 6229
test vectors for 40, 56, 64, and 128-bit keys, encrypt/decrypt
round-trip, inplace operation, and the invalid key_bits error path.
Coverage for libavutil/rc4.c: 0.00% -> 100.00%
The main issue is that BGR formats only semi-exist in Vulkan. Unlike all
other formats, they require the user to manually remap the pixel order, and
are also forbidden from being written to without a format in shaders. The main
reason for this was conservative - Vulkan is supposed to work everywhere, including
platforms where there is no write-time remapping/swizzing support.
Sponsored-by: Sovereign Tech Fund
We currently don't have any cases where this is needed, but include
it for completeness and clarity.
These macros for BTI were added in
08b4716a9e.
A later comment in this file, added in
248986a0db, referenced the macro
AARCH64_VALID_JUMP_CALL_TARGET which never was added here before.
Unit test covering av_video_enc_params_alloc,
av_video_enc_params_block, and
av_video_enc_params_create_side_data.
Tests allocation for all three codec types (VP9, H264, MPEG2) and
the NONE type, with 0 and 4 blocks, with and without size output.
Verifies block getter indexing by writing and reading back
coordinates, dimensions, and delta_qp values. Tests frame-level qp
and delta_qp fields, and side data creation with frame attachment.
Coverage for libavutil/video_enc_params.c: 0.00% -> 86.21%
(remaining uncovered lines are OOM error paths)
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Unit test covering av_detection_bbox_alloc, av_get_detection_bbox,
and av_detection_bbox_create_side_data.
Tests allocation with 0, 1, and 4 bounding boxes, with and without
size output. Verifies bbox getter indexing by writing and reading
back coordinates, labels, and confidence values. Tests classify
fields (labels and confidences), the header source field, and
side data creation with frame attachment.
Coverage for libavutil/detection_bbox.c: 0.00% -> 86.67%
(remaining uncovered lines are OOM error paths)
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Unit test covering all 4 public API functions in libavutil/spherical.c:
av_spherical_alloc, av_spherical_projection_name, av_spherical_from_name,
and av_spherical_tile_bounds.
Tests allocation with and without size output, all 7 projection type
name lookups, projection name round-trip verification, out-of-range
handling, and tile bounds computation for full-frame, quarter-tile,
and centered-tile configurations.
Coverage for libavutil/spherical.c: 0.00% -> 100.00%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
The function macro emits AARCH64_VALID_CALL_TARGET for exported symbols,
marking them as valid destinations for indirect _calls_. Functions that
are reached by indirect _branches_ (i.e. tail-call dispatch chains
where the link register is not set) require AARCH64_VALID_JUMP_TARGET
instead.
This commit adds a "jumpable" parameter to the function macro that, when
set, emits AARCH64_VALID_JUMP_TARGET instead of AARCH64_VALID_CALL_TARGET.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
Using AMF interfaces in C can be cumbersome and visually difficult to process in some cases: i.e.: object->function(object, args). To improve code readability, a new macro is added. This commit is instrumental for future AMF integration refactoring.
-vf_vpp_amf.c: Remove unused variables.
-vf_amf_common.c: Fix hdrmeta_buffer memory leak.
-hwcontext_amf.c: Fix av_amf_extract_hdr_metadata not picking up light metadata if display mastering metadata is not set.
-doc/filters.texi: Remove irrelevant example with HDR metadata for vpp_amf.
The use of code section (.text) was forced by the unreleased NASM
3.02rc3 which made the issue worse, but preventing assambling anything
without code section, including when only data was present.
This works fine for the most part, but using code (.text) section with
IMAGE_COMDAT_SELECT_ANY causes issues with lib.exe after stripping such
object:
fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x2
Esentially it makes our workaround not work in all cases, and while
string could be disabled like it already is for MSVC/ICL builds, it used
to work so let's preserve that state.
This make it not compatible with NASM 3.02rc3 when CV debug info is
generated, but hopefully the upstream fix will be merged before release,
to avoid this regression:
https://github.com/netwide-assembler/nasm/pull/221
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This is needed to cover the case when assembled source doesn't have
.text section. NASM documentation suggest to add $ suffix to section
name for COMDAT in .text, but this actually requires the main .text
section to exist also. And use less generic suffix for our dummy
sub-section.
Third time's the charm.
Fixes: 80cd067715
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
The original test only mapped the source file and printed its content,
exercising none of the error branches in av_file_map().
Replace it with a test that maps a real file (path via argv[1] for
out-of-tree builds) and verifies it is non-empty, then calls
av_file_map() on a nonexistent file twice: once with log_offset=0 to
confirm the error is logged at AV_LOG_ERROR, and once with log_offset=1
to confirm the level is raised by one, covering the
log_level_offset_offset path in av_vlog(). A custom av_log callback
captures the emitted level independently of the global log level.
The two error cases share a single for() loop to avoid duplication.
Add a FATE entry in tests/fate/libavutil.mak with CMP=null since
there is no fixed stdout to compare.
Signed-off-by: Soham Kute <officialsohamkute@gmail.com>
The three *_from_name() functions used av_strstart() for prefix matching,
which returns incorrect results when one name is a prefix of another.
av_stereo3d_from_name("side by side (quincunx subsampling)") matched
"side by side" at index 1 and returned AV_STEREO3D_SIDEBYSIDE instead of
AV_STEREO3D_SIDEBYSIDE_QUINCUNX. Similarly,
av_stereo3d_primary_eye_from_name("nonexistent") matched "none" and
returned AV_PRIMARY_EYE_NONE instead of -1.
Switch all three functions from av_strstart() to strcmp() for exact
matching. No in-tree callers rely on prefix matching.
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
This enables av_flatten on Clang in particular.
It was disabled because at the time this attribute was not supported.
It was implemented in Clang/LLVM 3.5 [1].
Use `__has_attribute` to check for availability. This has been added in
Clang 2.9 [2].
This reverts change 5858a67f13.
[1] 41af7c2fdc
[2] 274a70ed7f
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Add a unit test covering av_stereo3d_alloc, av_stereo3d_alloc_size,
av_stereo3d_create_side_data, av_stereo3d_type_name,
av_stereo3d_from_name, av_stereo3d_view_name,
av_stereo3d_view_from_name, and av_stereo3d_primary_eye_name.
The from_name calls are driven by a static name table so each
string appears exactly once. Round-trip inverse checks verify
that type_name/from_name and view_name/view_from_name are
consistent with each other.
Coverage for libavutil/stereo3d.c: 0.00% -> 100.00%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Add a unit test covering alloc, create_side_data, and select
for AV1 and H.274 film grain parameter types (22 cases).
Coverage for libavutil/film_grain_params.c: 0.00% -> 97.73%
Signed-off-by: marcos ashton <marcosashiglesias@gmail.com>
Whenever the link register is stored on the stack, sign it
before storing it and validate at a symmetrical point (with the
stack at the same level as when it was signed).
These macros only have an effect if built with PAC enabled (e.g.
through -mbranch-protection=standard), otherwise they don't
generate any extra instructions.
None of these cases were present when PAC support was added
in 248986a0db in 2022.
Without these changes, PAC still had an effect in the compiler
generated code and in the existing cases where we these macros were
used - but make it apply to the remaining cases of link register
on the stack.
Signal that our assembly is compliant with the GCS feature, if
the GCS feature is enabled in the compiler (available since Clang
18 and GCC 15) - this is enabled by -mbranch-protection=standard
with a new enough compiler.
GCS doesn't require any specific modifications to the assembly
code, but requires that all functions return to the expected call
address (checked through a shadow stack).
Attributes with the language-supported [[attr]] style are only supported
since C++11 and C23 respectively, so this needs to be accounted for in
these checks.
This solves a huge amount of warning spam of:
warning: [[]] attributes are a C23 extension [-Wc23-extensions]
when using --enable-extra-warnings.
Fixes: integer overflow
Fixes: testcase that calls av_timecode_init_from_components() with hh set explicitly to INT_MAX
Found-by: Youngjae Choi, Mingyoung Ban, Seunghoon Woo
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes compilation errors on newer Clang/GCC that errors out on
incompatible pointers.
error: incompatible pointer types passing 'unsigned long long *' to
parameter of type 'amf_uint64 *' (aka 'unsigned long *')
[-Wincompatible-pointer-types]
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>