FFmpeg

mirror of https://mirror.skon.top/https://github.com/FFmpeg/FFmpeg synced 2026-04-23 10:20:54 +08:00

Author	SHA1	Message	Date
Andreas Rheinhardt	697da64c8e	avcodec/x86/h264_qpel: Port pixel8_l2_shift5 from MMXEXT to SSE2 This abides by the ABI (no missing emms) and yields a tiny performance improvement here. Old benchmarks: avg_h264_qpel_8_mc12_8_c: 419.9 ( 1.00x) avg_h264_qpel_8_mc12_8_sse2: 78.9 ( 5.32x) avg_h264_qpel_8_mc12_8_ssse3: 71.7 ( 5.86x) avg_h264_qpel_8_mc32_8_c: 429.1 ( 1.00x) avg_h264_qpel_8_mc32_8_sse2: 76.9 ( 5.58x) avg_h264_qpel_8_mc32_8_ssse3: 73.4 ( 5.84x) put_h264_qpel_8_mc12_8_c: 424.0 ( 1.00x) put_h264_qpel_8_mc12_8_sse2: 78.6 ( 5.40x) put_h264_qpel_8_mc12_8_ssse3: 70.6 ( 6.00x) put_h264_qpel_8_mc32_8_c: 425.7 ( 1.00x) put_h264_qpel_8_mc32_8_sse2: 75.2 ( 5.66x) put_h264_qpel_8_mc32_8_ssse3: 70.4 ( 6.05x) New benchmarks: avg_h264_qpel_8_mc12_8_c: 425.7 ( 1.00x) avg_h264_qpel_8_mc12_8_sse2: 77.5 ( 5.49x) avg_h264_qpel_8_mc12_8_ssse3: 69.8 ( 6.10x) avg_h264_qpel_8_mc32_8_c: 423.7 ( 1.00x) avg_h264_qpel_8_mc32_8_sse2: 74.6 ( 5.68x) avg_h264_qpel_8_mc32_8_ssse3: 71.9 ( 5.89x) put_h264_qpel_8_mc12_8_c: 422.2 ( 1.00x) put_h264_qpel_8_mc12_8_sse2: 75.8 ( 5.57x) put_h264_qpel_8_mc12_8_ssse3: 67.9 ( 6.22x) put_h264_qpel_8_mc32_8_c: 421.8 ( 1.00x) put_h264_qpel_8_mc32_8_sse2: 72.6 ( 5.81x) put_h264_qpel_8_mc32_8_ssse3: 67.7 ( 6.23x) Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	4ac9162beb	avcodec/x86/h264_qpel: Don't use ff_ prefix for static functions Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	cd077e88d1	avcodec/x86/h264_qpel: Add ff_{avg,put}_h264_qpel16_h_lowpass_l2_sse2() These functions are currently emulated via four calls to the versions for 8x8 blocks. In fact, the size savings from the simplified calls in h264_qpel.c (GCC 1344B, Clang 1280B) more than outweigh the size of the added functions (512B) here. It is also beneficial performance-wise. Old benchmarks: avg_h264_qpel_16_mc11_8_c: 1414.1 ( 1.00x) avg_h264_qpel_16_mc11_8_sse2: 206.2 ( 6.86x) avg_h264_qpel_16_mc11_8_ssse3: 177.7 ( 7.96x) avg_h264_qpel_16_mc13_8_c: 1417.0 ( 1.00x) avg_h264_qpel_16_mc13_8_sse2: 207.4 ( 6.83x) avg_h264_qpel_16_mc13_8_ssse3: 178.2 ( 7.95x) avg_h264_qpel_16_mc21_8_c: 1632.8 ( 1.00x) avg_h264_qpel_16_mc21_8_sse2: 349.3 ( 4.67x) avg_h264_qpel_16_mc21_8_ssse3: 291.3 ( 5.60x) avg_h264_qpel_16_mc23_8_c: 1640.2 ( 1.00x) avg_h264_qpel_16_mc23_8_sse2: 351.3 ( 4.67x) avg_h264_qpel_16_mc23_8_ssse3: 290.8 ( 5.64x) avg_h264_qpel_16_mc31_8_c: 1411.7 ( 1.00x) avg_h264_qpel_16_mc31_8_sse2: 203.4 ( 6.94x) avg_h264_qpel_16_mc31_8_ssse3: 178.9 ( 7.89x) avg_h264_qpel_16_mc33_8_c: 1409.7 ( 1.00x) avg_h264_qpel_16_mc33_8_sse2: 204.6 ( 6.89x) avg_h264_qpel_16_mc33_8_ssse3: 178.1 ( 7.92x) put_h264_qpel_16_mc11_8_c: 1391.0 ( 1.00x) put_h264_qpel_16_mc11_8_sse2: 197.4 ( 7.05x) put_h264_qpel_16_mc11_8_ssse3: 176.1 ( 7.90x) put_h264_qpel_16_mc13_8_c: 1395.9 ( 1.00x) put_h264_qpel_16_mc13_8_sse2: 196.7 ( 7.10x) put_h264_qpel_16_mc13_8_ssse3: 177.7 ( 7.85x) put_h264_qpel_16_mc21_8_c: 1609.5 ( 1.00x) put_h264_qpel_16_mc21_8_sse2: 341.1 ( 4.72x) put_h264_qpel_16_mc21_8_ssse3: 289.2 ( 5.57x) put_h264_qpel_16_mc23_8_c: 1604.0 ( 1.00x) put_h264_qpel_16_mc23_8_sse2: 340.9 ( 4.71x) put_h264_qpel_16_mc23_8_ssse3: 289.6 ( 5.54x) put_h264_qpel_16_mc31_8_c: 1390.2 ( 1.00x) put_h264_qpel_16_mc31_8_sse2: 194.6 ( 7.14x) put_h264_qpel_16_mc31_8_ssse3: 176.4 ( 7.88x) put_h264_qpel_16_mc33_8_c: 1400.4 ( 1.00x) put_h264_qpel_16_mc33_8_sse2: 198.5 ( 7.06x) put_h264_qpel_16_mc33_8_ssse3: 176.2 ( 7.95x) New benchmarks: avg_h264_qpel_16_mc11_8_c: 1413.3 ( 1.00x) avg_h264_qpel_16_mc11_8_sse2: 171.8 ( 8.23x) avg_h264_qpel_16_mc11_8_ssse3: 173.0 ( 8.17x) avg_h264_qpel_16_mc13_8_c: 1423.2 ( 1.00x) avg_h264_qpel_16_mc13_8_sse2: 172.0 ( 8.27x) avg_h264_qpel_16_mc13_8_ssse3: 173.4 ( 8.21x) avg_h264_qpel_16_mc21_8_c: 1641.3 ( 1.00x) avg_h264_qpel_16_mc21_8_sse2: 322.1 ( 5.10x) avg_h264_qpel_16_mc21_8_ssse3: 291.3 ( 5.63x) avg_h264_qpel_16_mc23_8_c: 1629.1 ( 1.00x) avg_h264_qpel_16_mc23_8_sse2: 323.0 ( 5.04x) avg_h264_qpel_16_mc23_8_ssse3: 293.3 ( 5.55x) avg_h264_qpel_16_mc31_8_c: 1409.2 ( 1.00x) avg_h264_qpel_16_mc31_8_sse2: 172.0 ( 8.19x) avg_h264_qpel_16_mc31_8_ssse3: 173.7 ( 8.11x) avg_h264_qpel_16_mc33_8_c: 1402.5 ( 1.00x) avg_h264_qpel_16_mc33_8_sse2: 172.5 ( 8.13x) avg_h264_qpel_16_mc33_8_ssse3: 173.6 ( 8.08x) put_h264_qpel_16_mc11_8_c: 1393.7 ( 1.00x) put_h264_qpel_16_mc11_8_sse2: 170.4 ( 8.18x) put_h264_qpel_16_mc11_8_ssse3: 178.2 ( 7.82x) put_h264_qpel_16_mc13_8_c: 1398.0 ( 1.00x) put_h264_qpel_16_mc13_8_sse2: 170.2 ( 8.21x) put_h264_qpel_16_mc13_8_ssse3: 178.6 ( 7.83x) put_h264_qpel_16_mc21_8_c: 1619.6 ( 1.00x) put_h264_qpel_16_mc21_8_sse2: 320.6 ( 5.05x) put_h264_qpel_16_mc21_8_ssse3: 297.2 ( 5.45x) put_h264_qpel_16_mc23_8_c: 1617.4 ( 1.00x) put_h264_qpel_16_mc23_8_sse2: 320.0 ( 5.05x) put_h264_qpel_16_mc23_8_ssse3: 297.4 ( 5.44x) put_h264_qpel_16_mc31_8_c: 1389.7 ( 1.00x) put_h264_qpel_16_mc31_8_sse2: 169.9 ( 8.18x) put_h264_qpel_16_mc31_8_ssse3: 178.1 ( 7.80x) put_h264_qpel_16_mc33_8_c: 1394.0 ( 1.00x) put_h264_qpel_16_mc33_8_sse2: 170.9 ( 8.16x) put_h264_qpel_16_mc33_8_ssse3: 176.9 ( 7.88x) Notice that the SSSE3 versions of mc21 and mc23 benefit from an optimized version of hv2_lowpass. Also notice that there is no SSE2 version of the purely horizontal motion compensation. This means that src2 is currently always aligned when calling the SSE2 functions (and that srcStride is always equal to the block width). Yet this has not been exploited (yet). Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	4880fa4dca	avcodec/x86/h264_qpel_8bit: Remove dead macro Forgotten in `4011a76494`. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	35aaf697e9	avcodec/x86/h264_qpel_8bit: Replace qpel8_h_lowpass_l2 MMXEXT by SSE2 Using xmm registers here is very natural, as it allows to operate on eight words at a time. It also saves 48B here and does not clobber the MMX state. Old benchmarks (only tests affected by the modified function are shown): avg_h264_qpel_8_mc11_8_c: 352.2 ( 1.00x) avg_h264_qpel_8_mc11_8_sse2: 70.4 ( 5.00x) avg_h264_qpel_8_mc11_8_ssse3: 53.9 ( 6.53x) avg_h264_qpel_8_mc13_8_c: 353.3 ( 1.00x) avg_h264_qpel_8_mc13_8_sse2: 72.8 ( 4.86x) avg_h264_qpel_8_mc13_8_ssse3: 53.8 ( 6.57x) avg_h264_qpel_8_mc21_8_c: 404.0 ( 1.00x) avg_h264_qpel_8_mc21_8_sse2: 116.1 ( 3.48x) avg_h264_qpel_8_mc21_8_ssse3: 94.3 ( 4.28x) avg_h264_qpel_8_mc23_8_c: 398.9 ( 1.00x) avg_h264_qpel_8_mc23_8_sse2: 118.6 ( 3.36x) avg_h264_qpel_8_mc23_8_ssse3: 94.8 ( 4.21x) avg_h264_qpel_8_mc31_8_c: 352.7 ( 1.00x) avg_h264_qpel_8_mc31_8_sse2: 71.4 ( 4.94x) avg_h264_qpel_8_mc31_8_ssse3: 53.8 ( 6.56x) avg_h264_qpel_8_mc33_8_c: 354.0 ( 1.00x) avg_h264_qpel_8_mc33_8_sse2: 70.6 ( 5.01x) avg_h264_qpel_8_mc33_8_ssse3: 53.7 ( 6.59x) avg_h264_qpel_16_mc11_8_c: 1417.0 ( 1.00x) avg_h264_qpel_16_mc11_8_sse2: 276.9 ( 5.12x) avg_h264_qpel_16_mc11_8_ssse3: 178.8 ( 7.92x) avg_h264_qpel_16_mc13_8_c: 1427.3 ( 1.00x) avg_h264_qpel_16_mc13_8_sse2: 277.4 ( 5.14x) avg_h264_qpel_16_mc13_8_ssse3: 179.7 ( 7.94x) avg_h264_qpel_16_mc21_8_c: 1634.1 ( 1.00x) avg_h264_qpel_16_mc21_8_sse2: 421.3 ( 3.88x) avg_h264_qpel_16_mc21_8_ssse3: 291.2 ( 5.61x) avg_h264_qpel_16_mc23_8_c: 1627.0 ( 1.00x) avg_h264_qpel_16_mc23_8_sse2: 420.8 ( 3.87x) avg_h264_qpel_16_mc23_8_ssse3: 291.0 ( 5.59x) avg_h264_qpel_16_mc31_8_c: 1418.4 ( 1.00x) avg_h264_qpel_16_mc31_8_sse2: 278.5 ( 5.09x) avg_h264_qpel_16_mc31_8_ssse3: 178.6 ( 7.94x) avg_h264_qpel_16_mc33_8_c: 1407.3 ( 1.00x) avg_h264_qpel_16_mc33_8_sse2: 277.6 ( 5.07x) avg_h264_qpel_16_mc33_8_ssse3: 179.9 ( 7.82x) put_h264_qpel_8_mc11_8_c: 348.1 ( 1.00x) put_h264_qpel_8_mc11_8_sse2: 69.1 ( 5.04x) put_h264_qpel_8_mc11_8_ssse3: 53.8 ( 6.47x) put_h264_qpel_8_mc13_8_c: 349.3 ( 1.00x) put_h264_qpel_8_mc13_8_sse2: 69.7 ( 5.01x) put_h264_qpel_8_mc13_8_ssse3: 53.7 ( 6.51x) put_h264_qpel_8_mc21_8_c: 398.5 ( 1.00x) put_h264_qpel_8_mc21_8_sse2: 115.0 ( 3.46x) put_h264_qpel_8_mc21_8_ssse3: 95.3 ( 4.18x) put_h264_qpel_8_mc23_8_c: 399.9 ( 1.00x) put_h264_qpel_8_mc23_8_sse2: 120.8 ( 3.31x) put_h264_qpel_8_mc23_8_ssse3: 95.4 ( 4.19x) put_h264_qpel_8_mc31_8_c: 350.4 ( 1.00x) put_h264_qpel_8_mc31_8_sse2: 69.6 ( 5.03x) put_h264_qpel_8_mc31_8_ssse3: 54.2 ( 6.47x) put_h264_qpel_8_mc33_8_c: 353.1 ( 1.00x) put_h264_qpel_8_mc33_8_sse2: 71.0 ( 4.97x) put_h264_qpel_8_mc33_8_ssse3: 54.2 ( 6.51x) put_h264_qpel_16_mc11_8_c: 1384.2 ( 1.00x) put_h264_qpel_16_mc11_8_sse2: 272.9 ( 5.07x) put_h264_qpel_16_mc11_8_ssse3: 178.3 ( 7.76x) put_h264_qpel_16_mc13_8_c: 1393.6 ( 1.00x) put_h264_qpel_16_mc13_8_sse2: 271.1 ( 5.14x) put_h264_qpel_16_mc13_8_ssse3: 178.3 ( 7.82x) put_h264_qpel_16_mc21_8_c: 1612.6 ( 1.00x) put_h264_qpel_16_mc21_8_sse2: 416.5 ( 3.87x) put_h264_qpel_16_mc21_8_ssse3: 289.1 ( 5.58x) put_h264_qpel_16_mc23_8_c: 1621.3 ( 1.00x) put_h264_qpel_16_mc23_8_sse2: 416.9 ( 3.89x) put_h264_qpel_16_mc23_8_ssse3: 289.4 ( 5.60x) put_h264_qpel_16_mc31_8_c: 1408.4 ( 1.00x) put_h264_qpel_16_mc31_8_sse2: 273.5 ( 5.15x) put_h264_qpel_16_mc31_8_ssse3: 176.9 ( 7.96x) put_h264_qpel_16_mc33_8_c: 1396.4 ( 1.00x) put_h264_qpel_16_mc33_8_sse2: 276.3 ( 5.05x) put_h264_qpel_16_mc33_8_ssse3: 176.4 ( 7.92x) New benchmarks: avg_h264_qpel_8_mc11_8_c: 352.1 ( 1.00x) avg_h264_qpel_8_mc11_8_sse2: 52.5 ( 6.71x) avg_h264_qpel_8_mc11_8_ssse3: 53.9 ( 6.54x) avg_h264_qpel_8_mc13_8_c: 350.8 ( 1.00x) avg_h264_qpel_8_mc13_8_sse2: 54.7 ( 6.42x) avg_h264_qpel_8_mc13_8_ssse3: 54.3 ( 6.46x) avg_h264_qpel_8_mc21_8_c: 400.1 ( 1.00x) avg_h264_qpel_8_mc21_8_sse2: 98.6 ( 4.06x) avg_h264_qpel_8_mc21_8_ssse3: 95.5 ( 4.19x) avg_h264_qpel_8_mc23_8_c: 400.4 ( 1.00x) avg_h264_qpel_8_mc23_8_sse2: 101.4 ( 3.95x) avg_h264_qpel_8_mc23_8_ssse3: 95.9 ( 4.18x) avg_h264_qpel_8_mc31_8_c: 352.4 ( 1.00x) avg_h264_qpel_8_mc31_8_sse2: 52.9 ( 6.67x) avg_h264_qpel_8_mc31_8_ssse3: 54.4 ( 6.48x) avg_h264_qpel_8_mc33_8_c: 354.5 ( 1.00x) avg_h264_qpel_8_mc33_8_sse2: 52.9 ( 6.70x) avg_h264_qpel_8_mc33_8_ssse3: 54.4 ( 6.52x) avg_h264_qpel_16_mc11_8_c: 1420.4 ( 1.00x) avg_h264_qpel_16_mc11_8_sse2: 204.8 ( 6.93x) avg_h264_qpel_16_mc11_8_ssse3: 177.9 ( 7.98x) avg_h264_qpel_16_mc13_8_c: 1409.8 ( 1.00x) avg_h264_qpel_16_mc13_8_sse2: 206.4 ( 6.83x) avg_h264_qpel_16_mc13_8_ssse3: 178.0 ( 7.92x) avg_h264_qpel_16_mc21_8_c: 1634.1 ( 1.00x) avg_h264_qpel_16_mc21_8_sse2: 349.6 ( 4.67x) avg_h264_qpel_16_mc21_8_ssse3: 290.0 ( 5.63x) avg_h264_qpel_16_mc23_8_c: 1624.1 ( 1.00x) avg_h264_qpel_16_mc23_8_sse2: 350.0 ( 4.64x) avg_h264_qpel_16_mc23_8_ssse3: 291.9 ( 5.56x) avg_h264_qpel_16_mc31_8_c: 1407.2 ( 1.00x) avg_h264_qpel_16_mc31_8_sse2: 205.8 ( 6.84x) avg_h264_qpel_16_mc31_8_ssse3: 178.2 ( 7.90x) avg_h264_qpel_16_mc33_8_c: 1400.5 ( 1.00x) avg_h264_qpel_16_mc33_8_sse2: 206.3 ( 6.79x) avg_h264_qpel_16_mc33_8_ssse3: 179.4 ( 7.81x) put_h264_qpel_8_mc11_8_c: 349.7 ( 1.00x) put_h264_qpel_8_mc11_8_sse2: 50.2 ( 6.96x) put_h264_qpel_8_mc11_8_ssse3: 51.3 ( 6.82x) put_h264_qpel_8_mc13_8_c: 349.8 ( 1.00x) put_h264_qpel_8_mc13_8_sse2: 50.7 ( 6.90x) put_h264_qpel_8_mc13_8_ssse3: 51.7 ( 6.76x) put_h264_qpel_8_mc21_8_c: 398.0 ( 1.00x) put_h264_qpel_8_mc21_8_sse2: 96.5 ( 4.13x) put_h264_qpel_8_mc21_8_ssse3: 92.3 ( 4.31x) put_h264_qpel_8_mc23_8_c: 401.4 ( 1.00x) put_h264_qpel_8_mc23_8_sse2: 102.3 ( 3.92x) put_h264_qpel_8_mc23_8_ssse3: 92.8 ( 4.32x) put_h264_qpel_8_mc31_8_c: 349.4 ( 1.00x) put_h264_qpel_8_mc31_8_sse2: 50.8 ( 6.88x) put_h264_qpel_8_mc31_8_ssse3: 51.8 ( 6.75x) put_h264_qpel_8_mc33_8_c: 351.1 ( 1.00x) put_h264_qpel_8_mc33_8_sse2: 52.2 ( 6.73x) put_h264_qpel_8_mc33_8_ssse3: 51.7 ( 6.79x) put_h264_qpel_16_mc11_8_c: 1391.1 ( 1.00x) put_h264_qpel_16_mc11_8_sse2: 196.6 ( 7.07x) put_h264_qpel_16_mc11_8_ssse3: 178.2 ( 7.81x) put_h264_qpel_16_mc13_8_c: 1385.2 ( 1.00x) put_h264_qpel_16_mc13_8_sse2: 195.6 ( 7.08x) put_h264_qpel_16_mc13_8_ssse3: 176.6 ( 7.84x) put_h264_qpel_16_mc21_8_c: 1607.5 ( 1.00x) put_h264_qpel_16_mc21_8_sse2: 341.0 ( 4.71x) put_h264_qpel_16_mc21_8_ssse3: 289.1 ( 5.56x) put_h264_qpel_16_mc23_8_c: 1616.7 ( 1.00x) put_h264_qpel_16_mc23_8_sse2: 340.8 ( 4.74x) put_h264_qpel_16_mc23_8_ssse3: 288.6 ( 5.60x) put_h264_qpel_16_mc31_8_c: 1397.6 ( 1.00x) put_h264_qpel_16_mc31_8_sse2: 197.3 ( 7.08x) put_h264_qpel_16_mc31_8_ssse3: 175.4 ( 7.97x) put_h264_qpel_16_mc33_8_c: 1394.3 ( 1.00x) put_h264_qpel_16_mc33_8_sse2: 197.7 ( 7.05x) put_h264_qpel_16_mc33_8_ssse3: 175.2 ( 7.96x) As can be seen, the SSE2 version is often neck-to-neck with the SSSE3 version (which also benefits from a better hv2_lowpass SSSE3 implementation for mc21 and mc23) for eight byte block sizes. Unsurprisingly, SSSE3 beats SSE2 for 16x16 blocks: For SSE2, these blocks are processed by calling the 8x8 function four times whereas SSSE3 has a dedicated function (on x64). This implementation should also be extendable to an AVX version for 16x16 blocks. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	fa9ea5113b	avcodec/x86/h264_qpel_8bit: Optimize branch away ff_{avg,put}_h264_qpel8or16_hv2_lowpass_ssse3() currently is almost the disjoint union of the codepaths for sizes 8 and 16. This size is a compile-time constant at every callsite. So split the function and avoid the runtime branch. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	400203c00c	avcodec/x86/h264_qpel: Remove unused parameter from hv2_lowpass funcs tmpstride is unused. This also allows to remove said parameter from lots of functions in h264_qpel.c. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	b84c818c83	avcodec/x86/h264_qpel: Remove constant parameters from shift5 funcs They are constant since the size 16 version is no longer emulated via the size 8 version. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	810bd3e62a	avcodec/x86/h264_qpel: Add ff_{avg,put}_pixels16_l2_shift5_sse2 Up until now this function was emulated via two calls to ff_{avg,pull}_pixels8_l2_shift5_mmxext(). Adding a dedicated function proved beneficial both size wise and performance wise: The new functions take 192B, yet the simplified calls save 256B with GCC and 320B with Clang here. This change will also allow further optimizations. Old benchmarks: avg_h264_qpel_16_mc12_8_c: 1735.8 ( 1.00x) avg_h264_qpel_16_mc12_8_sse2: 300.8 ( 5.77x) avg_h264_qpel_16_mc12_8_ssse3: 233.3 ( 7.44x) avg_h264_qpel_16_mc32_8_c: 1777.9 ( 1.00x) avg_h264_qpel_16_mc32_8_sse2: 275.6 ( 6.45x) avg_h264_qpel_16_mc32_8_ssse3: 235.7 ( 7.54x) put_h264_qpel_16_mc12_8_c: 1808.2 ( 1.00x) put_h264_qpel_16_mc12_8_sse2: 267.2 ( 6.77x) put_h264_qpel_16_mc12_8_ssse3: 231.9 ( 7.80x) put_h264_qpel_16_mc32_8_c: 1766.9 ( 1.00x) put_h264_qpel_16_mc32_8_sse2: 272.9 ( 6.47x) put_h264_qpel_16_mc32_8_ssse3: 229.5 ( 7.70x) New benchmarks: avg_h264_qpel_16_mc12_8_c: 1742.3 ( 1.00x) avg_h264_qpel_16_mc12_8_sse2: 240.3 ( 7.25x) avg_h264_qpel_16_mc12_8_ssse3: 214.8 ( 8.11x) avg_h264_qpel_16_mc32_8_c: 1748.0 ( 1.00x) avg_h264_qpel_16_mc32_8_sse2: 238.0 ( 7.35x) avg_h264_qpel_16_mc32_8_ssse3: 209.2 ( 8.35x) put_h264_qpel_16_mc12_8_c: 2014.4 ( 1.00x) put_h264_qpel_16_mc12_8_sse2: 243.7 ( 8.27x) put_h264_qpel_16_mc12_8_ssse3: 211.5 ( 9.52x) put_h264_qpel_16_mc32_8_c: 1800.0 ( 1.00x) put_h264_qpel_16_mc32_8_sse2: 238.8 ( 7.54x) put_h264_qpel_16_mc32_8_ssse3: 206.7 ( 8.71x) Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	279b6f3cf5	avcodec/fpel: Avoid loop in ff_avg_pixels4_mmxext() It is only used by h264_qpel.c and only with height four (which is unrolled) and uses a loop in order to handle multiples of four as height. Remove the loop and the height parameter and move the function to h264_qpel_8bit.asm. This leads to a bit of code duplication, but this is simpler than all the %if checks necessary to achieve the same outcome in fpel.asm. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	e340f31b89	avcodec/x86/fpel: Remove redundant repetition The repetition count is always one since `2cf9e733c6`. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	b0c91c2fba	avcodec/h264qpel: Make avg_h264_qpel_pixels_tab smaller avg_h264_qpel only supports 16x16,8x8 and 4x4 blocksizes, so it is currently unnecessarily large. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	6eb8bc4217	avcodec/h264qpel: Don't build unused 2x2 size funcs for bitdepths > 8 The 2x2 put functions are only used by Snow and Snow uses only the eight bit versions. The rest is dead code. Disabling it saved 41277B here. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:33 +02:00
Andreas Rheinhardt	92ae9d1ffc	configure: Remove vc1dsp->qpeldsp dependency It only needs it for some x86 fpel functions; instead add a direct dependency for that. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	16d5e074dc	avcodec/mips/Makefile: Fix VC1DSP build rules Affected standalone builds of the VC-1 parser. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	d09f4f3c78	configure: Remove h263_decoder->h263_parser,qpeldsp dependency The former is unnecessary since `3ceffe7839`. The latter is since ff_mpeg4_workaround_bugs() (and thereby setting the "old" qpeldsp functions) has been moved inside #if CONFIG_MPEG4_DECODER. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	0035d99c61	configure: Avoid mpeg4video_parser->{h263,qpel}dsp dependency This can be easily achieved by moving code only used by the MPEG-4 decoder behind #if CONFIG_MPEG4_DECODER. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	770f78b24a	configure: Remove mss2->qpeldsp dependency Forgotten in `9cc38cc636`. (mss2 still has an implicit dependency on qpeldsp via the VC-1 decoder.) Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	c4c616db53	avcodec/x86/qpel: Move ff_{put,avg}_pixels4_l2_mmxext to h264_qpel Only used there. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	1e11fdff52	avcodec/x86/qpel{,dsp_init}: Remove constant function parameters ff_avg_pixels{4,8,16}_l2_mmxext() are always called with height equal to their blocksize. And ff_{put,avg}_pixels4_l2_mmxext() are furthermore always called with both strides being equal. So remove these redundant function parameters. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	52a77128fd	avcodec/x86/qpel{dsp,dsp_init}: Use ptrdiff_t for stride This is more correct given that qpel_mc_func already uses ptrdiff_t; it also allows to avoid movsxdifnidn. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	cacf854fe7	avcodec/x86/qpel: Remove always-false branches The ff_avg_pixels{4,8,16}_l2_mmxext() functions are only ever used in the last step (the one that actually writes to the dst buffer) where the number of lines to process is always equal to the dimensions of the block, whereas ff_put_pixels{8,16}_mmxext() are also used in intermediate calculations where the number of lines can be 9 or 17. The code in qpel.asm uses common macros for both and processes more than one line per loop iteration; it therefore checks for whether the number of lines is odd and treats this line separately; yet this special handling is only needed for the put functions, not the avg functions. It has therefore been %if'ed away for these. The check is also not needed for ff_put_pixels4_l2_mmxext() which is only used by H.264 which always processes four lines. Because ff_{avg,put}_pixels4_l2_mmxext() processes four lines in a single loop iteration, not only the odd-height handling, but the whole loop could be removed. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	8820e2205c	tests/checkasm/hpeldsp: Use instruction-set independent height Otherwise the benchmark numbers are incomparable. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	9a0581fca0	tests/checkasm: Add qpeldsp checkasm Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 07:06:32 +02:00
Andreas Rheinhardt	15a9c8dea3	avcodec/liblc3enc: Avoid allocating buffer to send a zero frame liblc3 supports arbitrary strides, so one can simply use a stride of zero to make it read the same zero value again and again. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-04 06:07:37 +02:00
Andreas Rheinhardt	ab7d1c64c9	avcodec/x86/h263_loopfilter: Port loop filter to SSE2 Old benchmarks: h263dsp.h_loop_filter_c: 41.2 ( 1.00x) h263dsp.h_loop_filter_mmx: 39.5 ( 1.04x) h263dsp.v_loop_filter_c: 43.5 ( 1.00x) h263dsp.v_loop_filter_mmx: 16.9 ( 2.57x) New benchmarks: h263dsp.h_loop_filter_c: 41.6 ( 1.00x) h263dsp.h_loop_filter_sse2: 28.2 ( 1.48x) h263dsp.v_loop_filter_c: 42.4 ( 1.00x) h263dsp.v_loop_filter_sse2: 15.1 ( 2.81x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-03 17:05:46 +00:00
Andreas Rheinhardt	a8a16c15c8	tests/checkasm/llviddsp: Use the same width for each cpuflag Otherwise the benchmark numbers would be incomparable nonsense. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-03 17:05:46 +00:00
Cameron Gutman	df4587789f	avcodec/amfenc: avoid unnecessary output delay in low delay mode The code optimizes throughput by letting the encoder work on frame N until frame N+1 is ready for submission, but this hurts low-delay uses by delaying output by one frame. Don't delay output beyond what is necessary when AV_CODEC_FLAG_LOW_DELAY is used. Signed-off-by: Cameron Gutman <aicommander@gmail.com>	2025-10-03 11:05:03 +00:00
Marton Balint	f1d5114103	avformat/tls_openssl: do not cleanup tls after a successful dtls_start() Regression since `8e11e2cdb8`. Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-02 18:41:47 +02:00
Michael Niedermayer	61b6877637	avcodec/mjpegdec: Explain buf_size/width/height check Suggested-by: Ramiro Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-10-02 12:52:43 +00:00
Zhao Zhili	1a02412170	avformat/movenc_ttml: fix memleaks Memory leaks can happen on normal case when break from while loop early, and it can happen on error path with goto cleanup. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2025-10-01 22:31:03 +08:00
Romain Beauxis	cb4052beae	libavformat/oggparseopus.c: Parse comments from secondary chained streams header packet.	2025-10-01 14:20:55 +00:00
Romain Beauxis	45d7d5d3e2	libavformat/oggparseflac.c: Parse ogg/flac comments in new ogg packets, add them to ogg stream new_metadata.	2025-10-01 14:20:55 +00:00
Romain Beauxis	7dbf7d2a45	libavformat/oggdec.c: Use AV_PKT_DATA_STRINGS_METADATA to pass metadata updates.	2025-10-01 14:20:55 +00:00
Romain Beauxis	cebbb6ae8a	libavformat/oggdec.h, libavformat/oggparsevorbis.c: Factor out vorbis metadata update mechanism.	2025-10-01 14:20:55 +00:00
Romain Beauxis	de8d57e4c5	ogg/vorbis: implement header packet skip in chained ogg bitstreams.	2025-10-01 14:20:55 +00:00
James Almer	5511641365	avcodec/atrac9dec: use av_zero_extend() Signed-off-by: James Almer <jamrial@gmail.com>	2025-10-01 01:26:19 +00:00
James Almer	7ce3a14496	avcodec/apv_entropy: use av_zero_extend() Signed-off-by: James Almer <jamrial@gmail.com>	2025-10-01 01:26:19 +00:00
James Almer	776ee07990	avcodec/aom_film_grain: use av_zero_extend() Signed-off-by: James Almer <jamrial@gmail.com>	2025-10-01 01:26:19 +00:00
Marton Balint	8e11e2cdb8	avformat/tls_openssl: initialize underlying protocol early for dtls_start() The same way we do with TLS, so all tls URL options will be properly supported. Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-01 00:34:19 +02:00
Marton Balint	2762ae74c5	avformat/tls: use ff_parse_opts_from_query_string() to set URL parameters Note that this changes the code to work the same way as other protocols where an URL parameter can override an AVOption. Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-01 00:34:19 +02:00
Marton Balint	3166e3b539	avformat/rtpproto: use ff_parse_opts_from_query_string() to set URL parameters Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-01 00:34:19 +02:00
Marton Balint	f231439ee7	avformat/sctp: use ff_parse_opts_from_query_string() to set URL parameters Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-01 00:34:19 +02:00
Marton Balint	49c6e6cc44	avformat/tcp: use ff_parse_opts_from_query_string() to set URL parameters Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-01 00:34:19 +02:00
Marton Balint	7e58fff9d0	avformat/udp: use ff_parse_opts_from_query_string() to set URL parameters Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-01 00:34:19 +02:00
Marton Balint	2d06ed9308	avformat/libsrt: use ff_parse_opts_from_query_string() to set URL parameters Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-01 00:34:19 +02:00
Marton Balint	70e0e3e257	avformat/utils: add helper function to set opts from query string Signed-off-by: Marton Balint <cus@passwd.hu>	2025-10-01 00:34:18 +02:00
Marton Balint	c5be4b7075	avformat: compile urldecode unconditionally It will be used by the generic helper function to set options from URLs. Signed-off-by: Marton Balint <cus@passwd.hu>	2025-09-30 23:48:14 +02:00
Marton Balint	6f17053e6c	avformat/urldecode: add ff_urldecode_len function This will be used later to decode partial strings. Signed-off-by: Marton Balint <cus@passwd.hu>	2025-09-30 23:48:14 +02:00
Michael Niedermayer	8cb1ff78ac	avformat/dhav: Factorize some code in get_duration() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-09-30 21:13:56 +00:00

1 2 3 4 5 ...

121325 Commits