Opened 9 years ago
Last modified 3 years ago
#5570 open enhancement
POWER8 VSX vectorization libswscale/input.c
Reported by: | David Edelsohn | Owned by: | |
---|---|---|---|
Priority: | wish | Component: | swscale |
Version: | git-master | Keywords: | bounty vsx |
Cc: | linuxer@xakep.ru, daniel@pocock.pro | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Optimize approximately 50 functions in libswscale/input.c for POWER8 VSX SIMD instructions on PPC64 Linux.
Change History (15)
comment:1 by , 9 years ago
Keywords: | bounty added |
---|---|
Version: | unspecified → git-master |
comment:2 by , 9 years ago
Priority: | normal → wish |
---|---|
Status: | new → open |
comment:3 by , 8 years ago
commit 1df908f33f658979b32599489ca6f1a39821013c breaks build on not POWER8 VSX SIMD
make -C ffmpeg libswscale/libswscale.a
make[1]: Entering directory '/home/mike/software/mplayer/ffmpeg'
CC libswscale/swscale.o
libswscale/swscale.c:569:9: error: use of undeclared identifier 'HAVE_VSX'
if (HAVE_VSX && (!HAVE_BIGENDIAN)) {
common.mak:60: recipe for target 'libswscale/swscale.o' failed
make[1]: Leaving directory '/home/mike/software/mplayer/ffmpeg'
Makefile:740: recipe for target 'ffmpeg/libswscale/libswscale.a' failed
comment:4 by , 8 years ago
Keywords: | vsx added |
---|
comment:5 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | open → closed |
comment:6 by , 5 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
comment:7 by , 5 years ago
Cc: | added |
---|
PPC input.c coverage now exceeds no vsx cpu. Speedups 2-12x, usually in line with no vsx cpu. Some functions got speed less than cpu, because it is impossible to vectorize them correctly(For example palToUV, palToY)I commented on them. Some functions got speed equivalent no vsx cpu(For example planar_rgb16_to_a)
Sorry for accidentally closing the ticket. As the reporter, you should close it.
comment:8 by , 5 years ago
Status: | reopened → open |
---|
You should remove the pal functions from your patch: They were not needed, even more so if they show no significant speed improvement.
On this bug tracker, tickets get closed once the patch gets committed.
comment:10 by , 5 years ago
Please, check my patch. I have been waiting for its commiting more than a year. I want to get my bounty from bountysource platform.
comment:11 by , 4 years ago
Please, check my patch. I have been waiting for its commiting more than a year. I want to get my bounty.
comment:12 by , 4 years ago
Cc: | added |
---|
I tried to add the patch to a build of 4.3.1 on Debian buster. It fails with some minor errors.
a) the timer.h includes and STOP_TIMER macros need to be removed, like this
https://github.com/FFmpeg/FFmpeg/commit/cc2a9509ce79793fcd00f91bb6ca3f4c53721e9e
b) some functions are commented out and attempts to use them cause an error
There are also some compiler warnings, it would be nice to tidy them up.
gcc -I. -Isrc/ -Wdate-time -D_FORTIFY_SOURCE=2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DZLIB_CONST -DHAVE_AV_CONFIG_H -DBUILDING_swscale -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-overflow -fstack-protector-all -fPIE -std=c11 -fomit-frame-pointer -maltivec -mabi=altivec -mvsx -pthread -I/usr/include/p11-kit-1 -I/usr/include/lilv-0 -I/usr/include/sratom-0 -I/usr/include/sord-0 -I/usr/include/serd-0 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/powerpc64le-linux-gnu/glib-2.0/include -I/usr/include/uuid -I/usr/include/fribidi -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/libxml2 -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/bs2b -I/usr/include/libdrm -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/fribidi -I/usr/include/openjpeg-2.3 -I/usr/include/opus -I/usr/include/opus -D_REENTRANT -pthread -I/usr/include/librsvg-2.0 -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib/powerpc64le-linux-gnu/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -isystem /usr/include/mit-krb5 -I/usr/include/pgm-5.2 -I/usr/include/libxml2 -I/usr/include/sphinxbase -I/usr/include/pocketsphinx -I/usr/include/powerpc64le-linux-gnu -I/usr/include/powerpc64le-linux-gnu/sphinxbase -I/usr/include/alsa -g -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wno-pointer-to-int-cast -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-variable -Wno-bool-operation -Wno-char-subscripts -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -Werror=format-security -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -Werror=vla -Wformat -fdiagnostics-color=auto -Wno-maybe-uninitialized -D_REENTRANT -I/usr/include/SDL2 -MMD -MF libswscale/ppc/swscale_altivec.d -MT libswscale/ppc/swscale_altivec.o -c -o libswscale/ppc/swscale_altivec.o src/libswscale/ppc/swscale_altivec.c src/libswscale/ppc/input_vsx.c: In function ‘rgb64ToUV_c_template_vsx’: src/libswscale/ppc/input_vsx.c:192:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] int i, width_adj, is_BE ; ^~~ src/libswscale/ppc/input_vsx.c: In function ‘rgb48ToUV_c_template_vsx’: src/libswscale/ppc/input_vsx.c:483:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] int i, width_adj, is_BE ; ^~~ In file included from src/libavutil/internal.h:42, from src/libavutil/common.h:533, from src/libavutil/avutil.h:296, from src/libswscale/ppc/input_vsx.c:33: src/libswscale/ppc/input_vsx.c: In function ‘rgb48ToUV_half_c_template’: src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’? tend = AV_READ_TIME(); \ ^~~~ src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’ STOP_TIMER("3.1") ^~~~~~~~~~ src/libavutil/timer.h:134:5: note: each undeclared identifier is reported only once for each function it appears in tend = AV_READ_TIME(); \ ^~~~ src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’ STOP_TIMER("3.1") ^~~~~~~~~~ In file included from src/libavutil/common.h:106, from src/libavutil/avutil.h:296, from src/libswscale/ppc/input_vsx.c:33: src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’? TIMER_REPORT(id, tend - tstart) ^~~~~~ src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’ # define ff_log2(x) (31 - __builtin_clz((x)|1)) ^ src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’ TIMER_REPORT(id, tend - tstart) ^~~~~~~~~~~~ src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’ STOP_TIMER("3.1") ^~~~~~~~~~ src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToY_c_template_vsx’: src/libswscale/ppc/input_vsx.c:710:51: warning: unused variable ‘v_val’ [-Wunused-variable] vector signed short v_rd0, v_rd1, v_px,v_sign,v_val; ^~~~~ src/libswscale/ppc/input_vsx.c:710:44: warning: unused variable ‘v_sign’ [-Wunused-variable] vector signed short v_rd0, v_rd1, v_px,v_sign,v_val; ^~~~~~ src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToUV_c_template_vsx’: src/libswscale/ppc/input_vsx.c:859:53: warning: unused variable ‘v_val’ [-Wunused-variable] vector signed short v_rd0, v_rd1, v_px, v_sign, v_val; ^~~~~ src/libswscale/ppc/input_vsx.c:859:45: warning: unused variable ‘v_sign’ [-Wunused-variable] vector signed short v_rd0, v_rd1, v_px, v_sign, v_val; ^~~~~~ src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToUV_half_c_template_vsx’: src/libswscale/ppc/input_vsx.c:1035:47: warning: unused variable ‘v_val’ [-Wunused-variable] vector signed short v_rd0, v_rd1, v_sign, v_val; ^~~~~ src/libswscale/ppc/input_vsx.c:1035:39: warning: unused variable ‘v_sign’ [-Wunused-variable] vector signed short v_rd0, v_rd1, v_sign, v_val; ^~~~~~ src/libswscale/ppc/input_vsx.c: In function ‘uyvyToY_c_vsx’: src/libswscale/ppc/input_vsx.c:2349:35: warning: unused variable ‘sample2’ [-Wunused-variable] vector unsigned char sample1, sample2; ^~~~~~~ src/libswscale/ppc/input_vsx.c: In function ‘p010LEToY_c_vsx’: src/libswscale/ppc/input_vsx.c:2486:26: warning: unused variable ‘sample’ [-Wunused-variable] vector unsigned char sample; ^~~~~~ src/libswscale/ppc/input_vsx.c: In function ‘rgb24ToUV_c_vsx’: src/libswscale/ppc/input_vsx.c:3131:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] int i, width_adj; ^~~ src/libswscale/ppc/input_vsx.c: In function ‘rgb24ToUV_half_c_vsx’: src/libswscale/ppc/input_vsx.c:3241:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] int i, width_adj; ^~~ src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_y_vsx’: src/libswscale/ppc/input_vsx.c:3624:15: warning: unused variable ‘src_addr’ [-Wunused-variable] uintptr_t src_addr = (uintptr_t)src; ^~~~~~~~ src/libswscale/ppc/input_vsx.c:3614:41: warning: unused variable ‘v_rd2’ [-Wunused-variable] vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1; ^~~~~ src/libswscale/ppc/input_vsx.c:3614:34: warning: unused variable ‘v_rd1’ [-Wunused-variable] vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1; ^~~~~ src/libswscale/ppc/input_vsx.c:3614:27: warning: unused variable ‘v_rd0’ [-Wunused-variable] vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1; ^~~~~ src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_a_vsx’: src/libswscale/ppc/input_vsx.c:3704:34: warning: unused variable ‘v_a’ [-Wunused-variable] vector unsigned short v_rd0, v_a, v_dst, shift; ^~~ src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_uv_vsx’: src/libswscale/ppc/input_vsx.c:3742:41: warning: unused variable ‘v_rd2’ [-Wunused-variable] vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1; ^~~~~ src/libswscale/ppc/input_vsx.c:3742:34: warning: unused variable ‘v_rd1’ [-Wunused-variable] vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1; ^~~~~ src/libswscale/ppc/input_vsx.c:3742:27: warning: unused variable ‘v_rd0’ [-Wunused-variable] vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1; ^~~~~ In file included from src/libavutil/internal.h:42, from src/libavutil/common.h:533, from src/libavutil/avutil.h:296, from src/libswscale/ppc/input_vsx.c:33: src/libswscale/ppc/input_vsx.c: In function ‘grayf32ToY16_c_vsx’: src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’? tend = AV_READ_TIME(); \ ^~~~ src/libswscale/ppc/input_vsx.c:3864:5: note: in expansion of macro ‘STOP_TIMER’ STOP_TIMER("47") ^~~~~~~~~~ In file included from src/libavutil/common.h:106, from src/libavutil/avutil.h:296, from src/libswscale/ppc/input_vsx.c:33: src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’? TIMER_REPORT(id, tend - tstart) ^~~~~~ src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’ # define ff_log2(x) (31 - __builtin_clz((x)|1)) ^ src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’ TIMER_REPORT(id, tend - tstart) ^~~~~~~~~~~~ src/libswscale/ppc/input_vsx.c:3864:5: note: in expansion of macro ‘STOP_TIMER’ STOP_TIMER("47") ^~~~~~~~~~ In file included from src/libavutil/internal.h:42, from src/libavutil/common.h:533, from src/libavutil/avutil.h:296, from src/libswscale/ppc/input_vsx.c:33: src/libswscale/ppc/input_vsx.c: In function ‘grayf32ToY16_bswap_c_vsx’: src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’? tend = AV_READ_TIME(); \ ^~~~ src/libswscale/ppc/input_vsx.c:3878:5: note: in expansion of macro ‘STOP_TIMER’ STOP_TIMER("48") ^~~~~~~~~~ In file included from src/libavutil/common.h:106, from src/libavutil/avutil.h:296, from src/libswscale/ppc/input_vsx.c:33: src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’? TIMER_REPORT(id, tend - tstart) ^~~~~~ src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’ # define ff_log2(x) (31 - __builtin_clz((x)|1)) ^ src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’ TIMER_REPORT(id, tend - tstart) ^~~~~~~~~~~~ src/libswscale/ppc/input_vsx.c:3878:5: note: in expansion of macro ‘STOP_TIMER’ STOP_TIMER("48") ^~~~~~~~~~ src/libswscale/ppc/input_vsx.c: In function ‘ff_sws_init_input_funcs_vsx’: src/libswscale/ppc/input_vsx.c:4029:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] enum AVPixelFormat srcFormat = c->srcFormat; ^~~~ src/libswscale/ppc/input_vsx.c:4055:24: error: ‘palToUV_c_vsx’ undeclared (first use in this function); did you mean ‘palToA_c_vsx’? c->chrToYV12 = palToUV_c_vsx; ^~~~~~~~~~~~~ palToA_c_vsx src/libswscale/ppc/input_vsx.c:4468:24: error: ‘palToY_c_vsx’ undeclared (first use in this function); did you mean ‘palToA_c_vsx’? c->lumToYV12 = palToY_c_vsx; ^~~~~~~~~~~~ palToA_c_vsx make[2]: *** [/<<PKGBUILDDIR>>/ffbuild/common.mak:59: libswscale/ppc/input_vsx.o] Error 1
comment:13 by , 4 years ago
Hi! Functions that were commented out did not well speed. The actual patch with little fixes and without START_TIME/STOP_TIME macroses can be taken from https://patchwork.ffmpeg.org/project/ffmpeg/patch/20201230120822.8891-1-pestov.vyach@yandex.ru/
comment:15 by , 3 years ago
I copied the patches from this bug #5570 and also #9077 into the build tree for the Debian package version 7:4.3.2-0+deb11u2. I put everything in this branch based on the Debian packaging:
https://gitlab.com/debify/ffmpeg/-/tree/pocock/bullseye-ppc64el
I ran the build on a Debian 11 (bullseye) system
dpkg-buildpackage -rfakeroot -i.git -j65 -b --no-sign
Building with both patches some of the unit tests fail.
Building without the patch here for #5077 the unit tests pass.
I've copied the test output below
--- BUILD_DIR/tests/ref/fate/filter-pixfmts-scale 2021-09-19 17:23:40.129078035 +0200 +++ tests/data/fate/filter-pixfmts-scale 2021-09-21 07:43:08.567738778 +0200 @@ -9,12 +9,12 @@ bgr444le 01be36a28ebca1a11eb4d192986cd4e9 bgr48be 3ae02769c69d2512eaa26fff65763acb bgr48le a6ce2344f07b77438258b6787fe5c24c -bgr4_byte 01efea74088e5e3343c19ee053b95f31 +bgr4_byte 191a304d3ab99e7f3e7c0e00b353d928 bgr555be ab353278d103d379e1ec86e5cabb645f bgr555le 16ccbf59297e4b9ab25fd8af5a84a95d bgr565be 3477e19fc11f95285836f30fdff26c1d bgr565le 82a81e7c9d4e0431fa22f4df9694afdc -bgr8 2c57e76ccf04d51de6acafcf35d6fa70 +bgr8 2b8447aa9ec0a1c19efb03d5f98fe8d7 bgra d8316272bc3a360ef9dff3ecc84520a3 bgra64be 4e6a1b9f9c18b881c27d76611d45f737 bgra64le efeee0abcc658ebcff049d5e74d74943 @@ -63,19 +63,19 @@ p010le 4b316f2b9e18972299beb73511278fa8 p016be 31e204018cbb53f8988c4e1174ea8ce9 p016le d5afe557f492a09317e525d7cb782f5b -pal8 29e10892009b2cfe431815ec3052ed3b +pal8 0c9a2544b27b62e3d1fd0809bae67bbd rgb0 fbd27e98154efb7535826afed41e9bb0 rgb24 e022e741451e81f2ecce1c7240b93e87 rgb444be db52b9ecdf98479b693e3f4bd9e77bac rgb444le 63288425c05f146cde5c82b85bb126e0 rgb48be 45b25016f10d54cf36eef3479afd8249 rgb48le 40577b147620ecfb115717473d000697 -rgb4_byte 9e540a2e7193ebcbf1c7f85d192a0c4e +rgb4_byte 03192eea530b12d078edba6446a4d233 rgb555be cb5407a0d40f3d0120155daeaaa9a222 rgb555le c15540d1fc887882c35860634009c439 rgb565be c69fa7d6e458509de65e911d147629a8 rgb565le a4a6ef89cdc10282b428cb1392f2a353 -rgb8 bcdc033b4ef0979d060dbc8893d4db58 +rgb8 092fb66393af3d56713d49f652405de4 rgba 85bb5d03cea1c6e8002ced3373904336 rgba64be ee73e57923af984b31cc7795d13929da rgba64le 783d2779adfafe3548bdb671ec0de69e BUILD_DIR/debian/standard/ffmpeg -nostdin -nostats -cpuflags all -flags +bitexact -idct simple -sws_flags +accurate_rnd+bitexact -fflags +bitexact -threads 1 -f image2 -vcodec pgmyuv -hwaccel none -threads 1 -thread_type frame+slice -i BUILD_DIR/debian/standard/tests/vsynth1/%02d.pgm -flags +bitexact -sws_flags +accurate_rnd+bitexact -fflags +bitexact -flags +bitexact -idct simple -sws_flags +accurate_rnd+bitexact -fflags +bitexact -threads 1 -dct fastint -vf format=yvyu422,vflip= -vcodec rawvideo -frames:v 5 -pix_fmt yvyu422 -frames:v 1 -f nut md5: Test filter-pixfmts-scale failed. Look at tests/data/fate/filter-pixfmts-scale.err for details.
https://www.bountysource.com/issues/34315232-power8-vsx-vectorization-libswscale-input-c