Enable NEON optimization for cvRound on newer devices by mtamburrano · Pull Request #4103 · opencv/opencv

mtamburrano · 2015-06-08T14:25:12Z

cvRound seems was not optimized with NEON instructions on newer devices.
Tested on an NVIDIA SHIELD TABLET and with the flag CV_VFP on, the speed up is greater than 4x.

Still I'm not sure why TEGRA_OPTIMIZATION is not enabled, if forced with a define, the header tries to include a mysterious file "tegra_round.hpp", which is not found

bhack · 2015-06-08T19:03:24Z

The optimization are the same vfp instruction but now enabled in more flags cases. I really suggest and put in evidence in documentation to enable this flags in arm and android apps on this targets because cvround it is inline so depend on app flags and not on what flags was used to build opencv in manager. This little thing could be one of the performance blocker on arm/android for third party developers.

vpisarev · 2015-06-09T00:27:51Z

@bhack, in the case of iOS it happens automatically. in the case of Android there is tutorial:
http://docs.opencv.org/master/d9/d3f/tutorial_android_dev_intro.html, with a dedicated section about composing user's Android.mk and Application.mk. Can you, please, submit PR with the words that you think should be added?

bhack · 2015-06-09T17:32:04Z

@vpisarev I don't know if we have the same problem also on SSE. Do you have an Intel android phone or tablet? We have other inline that depend on flags?

vpisarev · 2015-06-10T09:49:12Z

@bhack, for SSE there is completely different branch, which should use SSE2 intrinsics

mshabunin · 2015-06-10T10:51:01Z

@mtamburrano, what compiler do you use? Doesn't it have the __ARM_PCS_VFP predefined macro?

mtamburrano · 2015-06-10T11:06:56Z

hi @mshabunin,
I'm using arm-linux-androideabi-gcc, version 4.8, and not, it doesn't seems to have that predefined macro.
This is the list of predefined macros regarding ARM or NEON instructions that this compiler uses:
#define ARM_SIZEOF_WCHAR_T 32
#define __ARM_ARCH_ISA_ARM 1
#define __ARMEL 1
#define ARM_ARCH_5TE 1
#define ARM_FP 12
#define __ARM_NEON_FP 4
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_PCS 1
#define __ARM_FEATURE_QBIT 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_ARCH_ISA_THUMB 1
#define __ARM_ARCH 5
#define __ARM_EABI 1
#define ARM_FEATURE_DSP 1
#define __VFP_FP 1

mshabunin · 2015-06-10T13:30:30Z

Does ARMv5TE architecture support NEON? This is strange.
I'd suggest to check for CV_NEON instead of __ARM_NEON_FP.

bhack · 2015-06-10T13:51:46Z

@mshabunin I don't know if generally will be safe for a target android application to use runtime detection on android: https://developer.android.com/ndk/guides/cpu-arm-neon.html
Generally I think that on NDK it is enough with armeabi-v7a to have neon. So CV_NEON could be checked.

bhack · 2015-06-10T14:27:39Z

But the variable to be checked is always also ARM_NEON_FP.
__ARM_NEON is not defined.

mshabunin · 2015-06-10T15:33:23Z

@bhack, I've checked several toolchains and now I think we do not need NEON checks in VFP detection instruction:

older toolchain has __ARM_NEON__ defined, but does not support vcvtr instruction:

$ ./android-ndk-r8b/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-g++ -mfloat-abi=hard -mfpu=neon -c test.cpp
test.cpp:7:2: warning: #warning "__ARM_NEON__ defined"
/tmp/ccEwu06a.s: Assembler messages:
/tmp/ccEwu06a.s:37: Error: bad instruction `vcvtr.s32.f64 s15,d16'

modern toolchain has __ARM_NEON_FP defined, but can not compile this instruction

$ ./android-ndk-r10d/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-g++ -mfloat-abi=soft  -c test.cpp
test.cpp:3:2: warning: #warning "__ARM_NEON_FP defined" [-Wcpp]
test.cpp: In function 'int cvRound(double)':
test.cpp:37:129: error: inconsistent operand constraints in an 'asm'
     asm("vcvtr.s32.f64 %[temp], %P[value] \n vmov %[res], %[temp]" : [res] "=r" (res), [temp] "=w" (temp) : [value] "w" (value));

Thus, I don't think NEON-related macros imply required VFPv3 instruction/register compiler support. Currently I'm sure __ARM_PCS_VFP indicates necessary support for gcc.

Also, I tried to build android pack with suggested patch and it had failed: http://pullrequest.opencv.org/buildbot/builders/master_pack-android/builds/114/steps/shell/logs/stdio

bhack · 2015-06-10T16:46:54Z

@mshabunin Do you have __ARM_PCS_VFP without -mfloat-abi=hard?

mshabunin · 2015-06-10T20:11:11Z

Do you have __ARM_PCS_VFP without -mfloat-abi=hard?

@bhack, no, it is not defined.

I checked some variants of VFP check with this script and test file and it seems that the best version is:

#if defined __GNUC__ && defined __arm__ && (defined __ARM_PCS_VFP || defined __ARM_VFPV3__ || defined __ARM_NEON__) && !defined __SOFTFP__

It will compile for all tested configurations (NDKr8d / NDKr10d / clang3.5-gnueabi / clang3.5-android * soft / softfp / hard) without errors.

However, these tests do not cover AArch64 and I'm still not sure if any runtime check is it needed to run compiled code.

bhack · 2015-06-11T00:02:51Z

@mshabunin I.e. Is your test covering -mfpu=neon-vfpv4 or vfpv4?

bhack · 2015-06-11T00:17:29Z

Generally i think that on Android a developer want that its distributed application is optimized dynamically at runtime based on instruction availability. The problem is that things like cvround, that make a really big difference not optimized on this platforms, are implemented inline in the header. When a developer deploys an opencv app on Play marketplace he doesn't really know on what target optimized instruction set will be installed. But things like cvround are hereditarily given to the app by opencv headers. So it is generally out of control at runtime with the android mechanisms cited in the link of my previous message.

mshabunin · 2015-06-11T09:49:01Z

Is your test covering -mfpu=neon-vfpv4 or vfpv4?

No, only -mfpu=neon.

bhack · 2015-06-11T10:01:29Z

I think that with mfpu=neon-vfpv4 or vfpv4 (and others) the optimized instruction are supported. But i'dont know if you condition cover this ones.

bhack · 2015-06-11T10:24:41Z

See also https://wiki.debian.org/ArmHardFloatPort/VfpComparison

vpisarev · 2015-06-17T11:51:48Z

so, what's the conclusion? should we put this patch in?

bhack · 2015-06-17T11:58:00Z

My opinion is that runtime detection is the best solution but i don't know if we want/could introduce that in opencv. I think that selecting some good default flags could be a good compromise.

mshabunin · 2015-06-17T14:26:22Z

@vpisarev, current proposed condition does not compile for android-armeabi, so it should not be merged yet. There is needed at least && !defined __SOFTFP__ part to fix it.

@bhack, for NEON there is checkHardwareSupport(CV_CPU_NEON) runtime check, however it does not use Android routines and there are still many places where only compile-time check exists. I'm not sure if we need to use runtime check everywhere, because the most of processors with armv7-a architecture have NEON support and Android applications built with corresponding option should work on the most of user devices.
As for discussed cvRound and CV_VFP check, this function is too small to make runtime checks at every call. I agree that current compilation condition is quite strict and can be relaxed like proposed in this PR, but predefined compiler macros used for this check should be selected carefully to avoid compilation process breakage. Finally, those users who understand what they do can manually define CV_VFP when building thier code with OpenCV to enable this optimization.

@mtamburrano, I do not see any direct way to ensure that compiler (GCC) supports VFPv3, probably your proposed method will be enough if you exclude legacy platforms with soft float abi.
You can also try to use Tegra Android Development Pack from NVIDIA if you are targeting some specific device. It contains OpenCV 2.4 and uses mentioned TEGRA_OPTIMIZATION flags.

bhack · 2015-06-17T14:41:12Z

@mshabunin I think SOFTFP is still valid. Probably -mfloat-abi=soft is not valid because disable neon and vfp (but probably we already check that with other ifs)

mshabunin · 2015-06-17T14:50:29Z

@bhack, you are right, but it is some kind of misleading macro naming: compiler with -mfloat-abi=soft flag has __SOFTFP__ macro defined and with -mfloat-abi=softfp/hard has not.

bhack · 2015-06-17T14:52:04Z

@mshabunin I've not tested extensively but ok.. I don't know if there are some compilers bugs :)

mtamburrano · 2015-06-17T15:43:13Z

@mshabunin I updated the patch with your suggestions, I don't know why __ARM_NEON_FP is always defined with all -mfloat-abi types (soft/softfp/hard) and even without -mfpu=neon. Instead __ARM_NEON seems enabled only when -mfpu=neon is enabled and the mfloat-abi type is not soft.

By the way I've already tried opencv 2.4.8 included in the Nvidia dev pack, but it seems that the tegra optimization flag is also not raised there, and anyway if it is defined the tegra_round.hpp inclusion fails, I can't find this file anywhere

mshabunin · 2015-06-18T08:47:31Z

@mtamburrano, it's OK. Can you, please, squash all commits into one?

mshabunin · 2015-06-18T09:21:27Z

👍

vpisarev assigned mshabunin Jun 10, 2015

mtamburrano force-pushed the master branch from bb8fa63 to 36bd840 Compare June 18, 2015 09:13

Enable NEON optimization for cvRound on newer devices

a55a8c9

mtamburrano force-pushed the master branch from 36bd840 to a55a8c9 Compare June 18, 2015 09:20

opencv-pushbot merged commit a55a8c9 into opencv:master Jun 18, 2015

mshabunin added a commit that referenced this pull request Jun 18, 2015

Merge pull request #4103 from mtamburrano:master

00fb8a3

bhack mentioned this pull request Jun 20, 2015

Make HDF5 components optional for configurable build BVLC/caffe#2619

Closed

mshabunin removed their assignment Oct 12, 2015

Uh oh!

Conversation

mtamburrano commented Jun 8, 2015

Uh oh!

bhack commented Jun 8, 2015

Uh oh!

vpisarev commented Jun 9, 2015

Uh oh!

bhack commented Jun 9, 2015

Uh oh!

vpisarev commented Jun 10, 2015

Uh oh!

mshabunin commented Jun 10, 2015

Uh oh!

mtamburrano commented Jun 10, 2015

Uh oh!

mshabunin commented Jun 10, 2015

Uh oh!

bhack commented Jun 10, 2015

Uh oh!

bhack commented Jun 10, 2015

Uh oh!

mshabunin commented Jun 10, 2015

Uh oh!

bhack commented Jun 10, 2015

Uh oh!

mshabunin commented Jun 10, 2015

Uh oh!

bhack commented Jun 11, 2015

Uh oh!

bhack commented Jun 11, 2015

Uh oh!

mshabunin commented Jun 11, 2015

Uh oh!

bhack commented Jun 11, 2015

Uh oh!

bhack commented Jun 11, 2015

Uh oh!

vpisarev commented Jun 17, 2015

Uh oh!

bhack commented Jun 17, 2015

Uh oh!

mshabunin commented Jun 17, 2015

Uh oh!

bhack commented Jun 17, 2015

Uh oh!

mshabunin commented Jun 17, 2015

Uh oh!

bhack commented Jun 17, 2015

Uh oh!

mtamburrano commented Jun 17, 2015

Uh oh!

mshabunin commented Jun 18, 2015

Uh oh!

mshabunin commented Jun 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants