Skip to content

Enable NEON optimization for cvRound on newer devices#4103

Merged
opencv-pushbot merged 1 commit intoopencv:masterfrom
mtamburrano:master
Jun 18, 2015
Merged

Enable NEON optimization for cvRound on newer devices#4103
opencv-pushbot merged 1 commit intoopencv:masterfrom
mtamburrano:master

Conversation

@mtamburrano
Copy link
Copy Markdown
Contributor

cvRound seems was not optimized with NEON instructions on newer devices.
Tested on an NVIDIA SHIELD TABLET and with the flag CV_VFP on, the speed up is greater than 4x.

Still I'm not sure why TEGRA_OPTIMIZATION is not enabled, if forced with a define, the header tries to include a mysterious file "tegra_round.hpp", which is not found

@bhack
Copy link
Copy Markdown

bhack commented Jun 8, 2015

The optimization are the same vfp instruction but now enabled in more flags cases. I really suggest and put in evidence in documentation to enable this flags in arm and android apps on this targets because cvround it is inline so depend on app flags and not on what flags was used to build opencv in manager. This little thing could be one of the performance blocker on arm/android for third party developers.

@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Jun 9, 2015

@bhack, in the case of iOS it happens automatically. in the case of Android there is tutorial:
http://docs.opencv.org/master/d9/d3f/tutorial_android_dev_intro.html, with a dedicated section about composing user's Android.mk and Application.mk. Can you, please, submit PR with the words that you think should be added?

@bhack
Copy link
Copy Markdown

bhack commented Jun 9, 2015

@vpisarev I don't know if we have the same problem also on SSE. Do you have an Intel android phone or tablet? We have other inline that depend on flags?

@vpisarev
Copy link
Copy Markdown
Contributor

@bhack, for SSE there is completely different branch, which should use SSE2 intrinsics

@mshabunin
Copy link
Copy Markdown
Contributor

@mtamburrano, what compiler do you use? Doesn't it have the __ARM_PCS_VFP predefined macro?

@mtamburrano
Copy link
Copy Markdown
Contributor Author

hi @mshabunin,
I'm using arm-linux-androideabi-gcc, version 4.8, and not, it doesn't seems to have that predefined macro.
This is the list of predefined macros regarding ARM or NEON instructions that this compiler uses:
#define ARM_SIZEOF_WCHAR_T 32
#define __ARM_ARCH_ISA_ARM 1
#define __ARMEL
1
#define ARM_ARCH_5TE 1
#define ARM_FP 12
#define __ARM_NEON_FP 4
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_PCS 1
#define __ARM_FEATURE_QBIT 1
#define __ARM_FEATURE_CLZ 1
#define __ARM_ARCH_ISA_THUMB 1
#define __ARM_ARCH 5
#define __ARM_EABI
1
#define ARM_FEATURE_DSP 1
#define __VFP_FP
1

@mshabunin
Copy link
Copy Markdown
Contributor

Does ARMv5TE architecture support NEON? This is strange.
I'd suggest to check for CV_NEON instead of __ARM_NEON_FP.

@bhack
Copy link
Copy Markdown

bhack commented Jun 10, 2015

@mshabunin I don't know if generally will be safe for a target android application to use runtime detection on android: https://developer.android.com/ndk/guides/cpu-arm-neon.html
Generally I think that on NDK it is enough with armeabi-v7a to have neon. So CV_NEON could be checked.

@bhack
Copy link
Copy Markdown

bhack commented Jun 10, 2015

But the variable to be checked is always also ARM_NEON_FP.
__ARM_NEON
is not defined.

@mshabunin
Copy link
Copy Markdown
Contributor

@bhack, I've checked several toolchains and now I think we do not need NEON checks in VFP detection instruction:

  • older toolchain has __ARM_NEON__ defined, but does not support vcvtr instruction:
$ ./android-ndk-r8b/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-g++ -mfloat-abi=hard -mfpu=neon -c test.cpp
test.cpp:7:2: warning: #warning "__ARM_NEON__ defined"
/tmp/ccEwu06a.s: Assembler messages:
/tmp/ccEwu06a.s:37: Error: bad instruction `vcvtr.s32.f64 s15,d16'
  • modern toolchain has __ARM_NEON_FP defined, but can not compile this instruction
$ ./android-ndk-r10d/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-g++ -mfloat-abi=soft  -c test.cpp
test.cpp:3:2: warning: #warning "__ARM_NEON_FP defined" [-Wcpp]
test.cpp: In function 'int cvRound(double)':
test.cpp:37:129: error: inconsistent operand constraints in an 'asm'
     asm("vcvtr.s32.f64 %[temp], %P[value] \n vmov %[res], %[temp]" : [res] "=r" (res), [temp] "=w" (temp) : [value] "w" (value));

Thus, I don't think NEON-related macros imply required VFPv3 instruction/register compiler support. Currently I'm sure __ARM_PCS_VFP indicates necessary support for gcc.

Also, I tried to build android pack with suggested patch and it had failed: http://pullrequest.opencv.org/buildbot/builders/master_pack-android/builds/114/steps/shell/logs/stdio

@bhack
Copy link
Copy Markdown

bhack commented Jun 10, 2015

@mshabunin Do you have __ARM_PCS_VFP without -mfloat-abi=hard?

@mshabunin
Copy link
Copy Markdown
Contributor

Do you have __ARM_PCS_VFP without -mfloat-abi=hard?

@bhack, no, it is not defined.

I checked some variants of VFP check with this script and test file and it seems that the best version is:

#if defined __GNUC__ && defined __arm__ && (defined __ARM_PCS_VFP || defined __ARM_VFPV3__ || defined __ARM_NEON__) && !defined __SOFTFP__

It will compile for all tested configurations (NDKr8d / NDKr10d / clang3.5-gnueabi / clang3.5-android * soft / softfp / hard) without errors.

However, these tests do not cover AArch64 and I'm still not sure if any runtime check is it needed to run compiled code.

@bhack
Copy link
Copy Markdown

bhack commented Jun 11, 2015

@mshabunin I.e. Is your test covering -mfpu=neon-vfpv4 or vfpv4?

@bhack
Copy link
Copy Markdown

bhack commented Jun 11, 2015

Generally i think that on Android a developer want that its distributed application is optimized dynamically at runtime based on instruction availability. The problem is that things like cvround, that make a really big difference not optimized on this platforms, are implemented inline in the header. When a developer deploys an opencv app on Play marketplace he doesn't really know on what target optimized instruction set will be installed. But things like cvround are hereditarily given to the app by opencv headers. So it is generally out of control at runtime with the android mechanisms cited in the link of my previous message.

@mshabunin
Copy link
Copy Markdown
Contributor

Is your test covering -mfpu=neon-vfpv4 or vfpv4?

No, only -mfpu=neon.

@bhack
Copy link
Copy Markdown

bhack commented Jun 11, 2015

I think that with mfpu=neon-vfpv4 or vfpv4 (and others) the optimized instruction are supported. But i'dont know if you condition cover this ones.

@bhack
Copy link
Copy Markdown

bhack commented Jun 11, 2015

@vpisarev
Copy link
Copy Markdown
Contributor

so, what's the conclusion? should we put this patch in?

@bhack
Copy link
Copy Markdown

bhack commented Jun 17, 2015

My opinion is that runtime detection is the best solution but i don't know if we want/could introduce that in opencv. I think that selecting some good default flags could be a good compromise.

@mshabunin
Copy link
Copy Markdown
Contributor

@vpisarev, current proposed condition does not compile for android-armeabi, so it should not be merged yet. There is needed at least && !defined __SOFTFP__ part to fix it.

@bhack, for NEON there is checkHardwareSupport(CV_CPU_NEON) runtime check, however it does not use Android routines and there are still many places where only compile-time check exists. I'm not sure if we need to use runtime check everywhere, because the most of processors with armv7-a architecture have NEON support and Android applications built with corresponding option should work on the most of user devices.
As for discussed cvRound and CV_VFP check, this function is too small to make runtime checks at every call. I agree that current compilation condition is quite strict and can be relaxed like proposed in this PR, but predefined compiler macros used for this check should be selected carefully to avoid compilation process breakage. Finally, those users who understand what they do can manually define CV_VFP when building thier code with OpenCV to enable this optimization.

@mtamburrano, I do not see any direct way to ensure that compiler (GCC) supports VFPv3, probably your proposed method will be enough if you exclude legacy platforms with soft float abi.
You can also try to use Tegra Android Development Pack from NVIDIA if you are targeting some specific device. It contains OpenCV 2.4 and uses mentioned TEGRA_OPTIMIZATION flags.

@bhack
Copy link
Copy Markdown

bhack commented Jun 17, 2015

@mshabunin I think SOFTFP is still valid. Probably -mfloat-abi=soft is not valid because disable neon and vfp (but probably we already check that with other ifs)

@mshabunin
Copy link
Copy Markdown
Contributor

@bhack, you are right, but it is some kind of misleading macro naming: compiler with -mfloat-abi=soft flag has __SOFTFP__ macro defined and with -mfloat-abi=softfp/hard has not.

@bhack
Copy link
Copy Markdown

bhack commented Jun 17, 2015

@mshabunin I've not tested extensively but ok.. I don't know if there are some compilers bugs :)

@mtamburrano
Copy link
Copy Markdown
Contributor Author

@mshabunin I updated the patch with your suggestions, I don't know why __ARM_NEON_FP is always defined with all -mfloat-abi types (soft/softfp/hard) and even without -mfpu=neon. Instead __ARM_NEON seems enabled only when -mfpu=neon is enabled and the mfloat-abi type is not soft.

By the way I've already tried opencv 2.4.8 included in the Nvidia dev pack, but it seems that the tegra optimization flag is also not raised there, and anyway if it is defined the tegra_round.hpp inclusion fails, I can't find this file anywhere

@mshabunin
Copy link
Copy Markdown
Contributor

@mtamburrano, it's OK. Can you, please, squash all commits into one?

@mshabunin
Copy link
Copy Markdown
Contributor

👍

@opencv-pushbot opencv-pushbot merged commit a55a8c9 into opencv:master Jun 18, 2015
mshabunin added a commit that referenced this pull request Jun 18, 2015
@mshabunin mshabunin removed their assignment Oct 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants