Skip to content

Conversation

@zrphercule
Copy link
Contributor

@zrphercule zrphercule commented Sep 19, 2018

migrant all tests in aten to use gtest except of basic.cpp
Sinc features of gtest are different from catch test, some of the tests has been re-writted with similar meaning.

Basic test has some version conflict with valgrind according to CI, therefore this testcase is still implementing catch.
It will be resolved by a different pr.

@zrphercule zrphercule changed the title [Unfinished][Dont Merge]Aten: catch2gtest Aten: catch2gtest Sep 21, 2018
Copy link
Contributor

@goldsborough goldsborough left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, but please fix the naming and use of EXPECT_*

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@zrphercule
Copy link
Contributor Author

Seeking for help:

It seems there are some cuda-memory leaking problems when testing, and yet I have not figure out why.

This error only happens in linux-xenial-cuda environment, and only in one test case: basic.

The CPU test of basic is fine. After commented all testcases in CUDA test of basic (see commit "basic test 2"), it works well. But even only left one line in CUDA test ( like set a random seed, manual_seed(123, at::kCUDA) ), the error would occur.

Thanks!

This comment was marked as off-topic.

This comment was marked as off-topic.

@ezyang
Copy link
Contributor

ezyang commented Sep 26, 2018

@zrphercule When you say leak problems, do you mean this error?

03:27:12 ==6370== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: noted but unhandled ioctl 0x7ff with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: set address range perms: large range [0x1000000000, 0x4e00000000) (noaccess)
03:27:14 ==6370== Warning: set address range perms: large range [0x200000000, 0x700000000) (noaccess)
03:27:14 ==6370== Warning: set address range perms: large range [0x3c14d000, 0x5c14c000) (noaccess)
03:27:14 vex amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x44 0x24 0xC 0xF
03:27:14 vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
03:27:14 vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
03:27:14 vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
03:27:14 ==6370== valgrind: Unrecognised instruction at address 0x40e6a41.
03:27:14 ==6370==    at 0x40E6A41: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0xFFEFFDF0F: ???
03:27:14 ==6370==    by 0x73: ???
03:27:14 ==6370==    by 0x337B641F: ???
03:27:14 ==6370==    by 0x40E6B2A: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0xFFEFFDF0F: ???
03:27:14 ==6370==    by 0x40E6A83: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0x337B57EF: ???
03:27:14 ==6370==    by 0x337C067F: ???
03:27:14 ==6370== Your program just tried to execute an instruction that Valgrind
03:27:14 ==6370== did not recognise.  There are two possible reasons for this.
03:27:14 ==6370== 1. Your program has a bug and erroneously jumped to a non-code
03:27:14 ==6370==    location.  If you are running Memcheck and you just saw a
03:27:14 ==6370==    warning about a bad jump, it's probably your program's fault.
03:27:14 ==6370== 2. The instruction is legitimate but Valgrind doesn't handle it,
03:27:14 ==6370==    i.e. it's Valgrind's fault.  If you think this is the case or
03:27:14 ==6370==    you are not sure, please let us know and we'll try to fix it.
03:27:14 ==6370== Either way, Valgrind will now raise a SIGILL signal which will
03:27:14 ==6370== probably kill your program.
03:27:14 ==6370== 
03:27:14 ==6370== Process terminating with default action of signal 4 (SIGILL): dumping core
03:27:14 ==6370==  Illegal opcode at address 0x40E6A41
03:27:14 ==6370==    at 0x40E6A41: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0xFFEFFDF0F: ???
03:27:14 ==6370==    by 0x73: ???
03:27:14 ==6370==    by 0x337B641F: ???
03:27:14 ==6370==    by 0x40E6B2A: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0xFFEFFDF0F: ???
03:27:14 ==6370==    by 0x40E6A83: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0x337B57EF: ???
03:27:14 ==6370==    by 0x337C067F: ???
03:27:16 ==6370== 
03:27:16 ==6370== HEAP SUMMARY:
03:27:16 ==6370==     in use at exit: 7,066,633 bytes in 72,797 blocks
03:27:16 ==6370==   total heap usage: 1,287,426 allocs, 1,214,629 frees, 490,560,597 bytes allocated
03:27:16 ==6370== 
03:27:16 ==6370== LEAK SUMMARY:
03:27:16 ==6370==    definitely lost: 0 bytes in 0 blocks
03:27:16 ==6370==    indirectly lost: 0 bytes in 0 blocks
03:27:16 ==6370==      possibly lost: 4,400 bytes in 32 blocks
03:27:16 ==6370==    still reachable: 7,062,233 bytes in 72,765 blocks
03:27:16 ==6370==         suppressed: 0 bytes in 0 blocks
03:27:16 ==6370== Rerun with --leak-check=full to see details of leaked memory
03:27:16 ==6370== 
03:27:16 ==6370== For counts of detected and suppressed errors, rerun with: -v
03:27:16 ==6370== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
03:27:16 aten/tools/run_tests.sh: line 40:  6370 Killed                  valgrind --suppressions="$VALGRIND_SUP" --error-exitcode=1 ./basic "[cpu]"
03:27:16 + cleanup
03:27:16 + retcode=137
03:27:16 + set +x
03:27:16 =================== sccache compilation log ===================
03:27:16 =========== If your build fails, please take a look at the log above for possible reasons ===========
03:27:16 Compile requests               150
03:27:16 Compile requests executed       73
03:27:16 Cache hits                      23
03:27:16 Cache misses                    50
03:27:16 Cache timeouts                   0
03:27:16 Cache read errors                0
03:27:16 Forced recaches                  0
03:27:16 Cache write errors               0
03:27:16 Compilation failures             0
03:27:16 Cache errors                     0
03:27:16 Non-cacheable compilations       0
03:27:16 Non-cacheable calls             30
03:27:16 Non-compilation calls           47
03:27:16 Unsupported compiler calls       0
03:27:16 Average cache write          0.000 s
03:27:16 Average cache read miss      1.663 s
03:27:16 Average cache read hit       0.010 s
03:27:16 Cache location             Local disk: "/var/lib/jenkins/.cache/sccache"
03:27:16 Cache size                       7 MiB
03:27:16 Max cache size                  10 GiB
03:27:16 Stopping sccache server...
03:27:16 Compile requests               150
03:27:16 Compile requests executed       73
03:27:16 Cache hits                      23
03:27:16 Cache misses                    50
03:27:16 Cache timeouts                   0
03:27:16 Cache read errors                0
03:27:16 Forced recaches                  0
03:27:16 Cache write errors               0
03:27:16 Compilation failures             0
03:27:16 Cache errors                     0
03:27:16 Non-cacheable compilations       0
03:27:16 Non-cacheable calls             30
03:27:16 Non-compilation calls           47
03:27:16 Unsupported compiler calls       0
03:27:16 Average cache write          0.000 s
03:27:16 Average cache read miss      1.663 s
03:27:16 Average cache read hit       0.010 s
03:27:16 Cache location             Local disk: "/var/lib/jenkins/.cache/sccache"
03:27:16 Cache size                       7 MiB
03:27:16 Max cache size                  10 GiB
03:27:16 + echo 'Stopping container...'
03:27:16 Stopping container...

Valgrind is choking for a very different reason: it's hitting an unrecognized instruction. 0xF 0xC7 is RDRAND https://en.wikipedia.org/wiki/RdRand and it is known to not be supported on some versions of valgrind. One way to fix this problem is try to install a newer version of valgrind on these servers.

@zrphercule
Copy link
Contributor Author

@zrphercule When you say leak problems, do you mean this error?

03:27:12 ==6370== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: noted but unhandled ioctl 0x7ff with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
03:27:12 ==6370==    This could cause spurious value errors to appear.
03:27:12 ==6370==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
03:27:12 ==6370== Warning: set address range perms: large range [0x1000000000, 0x4e00000000) (noaccess)
03:27:14 ==6370== Warning: set address range perms: large range [0x200000000, 0x700000000) (noaccess)
03:27:14 ==6370== Warning: set address range perms: large range [0x3c14d000, 0x5c14c000) (noaccess)
03:27:14 vex amd64->IR: unhandled instruction bytes: 0xF 0xC7 0xF0 0x89 0x44 0x24 0xC 0xF
03:27:14 vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
03:27:14 vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
03:27:14 vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
03:27:14 ==6370== valgrind: Unrecognised instruction at address 0x40e6a41.
03:27:14 ==6370==    at 0x40E6A41: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0xFFEFFDF0F: ???
03:27:14 ==6370==    by 0x73: ???
03:27:14 ==6370==    by 0x337B641F: ???
03:27:14 ==6370==    by 0x40E6B2A: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0xFFEFFDF0F: ???
03:27:14 ==6370==    by 0x40E6A83: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0x337B57EF: ???
03:27:14 ==6370==    by 0x337C067F: ???
03:27:14 ==6370== Your program just tried to execute an instruction that Valgrind
03:27:14 ==6370== did not recognise.  There are two possible reasons for this.
03:27:14 ==6370== 1. Your program has a bug and erroneously jumped to a non-code
03:27:14 ==6370==    location.  If you are running Memcheck and you just saw a
03:27:14 ==6370==    warning about a bad jump, it's probably your program's fault.
03:27:14 ==6370== 2. The instruction is legitimate but Valgrind doesn't handle it,
03:27:14 ==6370==    i.e. it's Valgrind's fault.  If you think this is the case or
03:27:14 ==6370==    you are not sure, please let us know and we'll try to fix it.
03:27:14 ==6370== Either way, Valgrind will now raise a SIGILL signal which will
03:27:14 ==6370== probably kill your program.
03:27:14 ==6370== 
03:27:14 ==6370== Process terminating with default action of signal 4 (SIGILL): dumping core
03:27:14 ==6370==  Illegal opcode at address 0x40E6A41
03:27:14 ==6370==    at 0x40E6A41: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0xFFEFFDF0F: ???
03:27:14 ==6370==    by 0x73: ???
03:27:14 ==6370==    by 0x337B641F: ???
03:27:14 ==6370==    by 0x40E6B2A: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0xFFEFFDF0F: ???
03:27:14 ==6370==    by 0x40E6A83: ??? (in /opt/conda/lib/libstdc++.so.6.0.25)
03:27:14 ==6370==    by 0x337B57EF: ???
03:27:14 ==6370==    by 0x337C067F: ???
03:27:16 ==6370== 
03:27:16 ==6370== HEAP SUMMARY:
03:27:16 ==6370==     in use at exit: 7,066,633 bytes in 72,797 blocks
03:27:16 ==6370==   total heap usage: 1,287,426 allocs, 1,214,629 frees, 490,560,597 bytes allocated
03:27:16 ==6370== 
03:27:16 ==6370== LEAK SUMMARY:
03:27:16 ==6370==    definitely lost: 0 bytes in 0 blocks
03:27:16 ==6370==    indirectly lost: 0 bytes in 0 blocks
03:27:16 ==6370==      possibly lost: 4,400 bytes in 32 blocks
03:27:16 ==6370==    still reachable: 7,062,233 bytes in 72,765 blocks
03:27:16 ==6370==         suppressed: 0 bytes in 0 blocks
03:27:16 ==6370== Rerun with --leak-check=full to see details of leaked memory
03:27:16 ==6370== 
03:27:16 ==6370== For counts of detected and suppressed errors, rerun with: -v
03:27:16 ==6370== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
03:27:16 aten/tools/run_tests.sh: line 40:  6370 Killed                  valgrind --suppressions="$VALGRIND_SUP" --error-exitcode=1 ./basic "[cpu]"
03:27:16 + cleanup
03:27:16 + retcode=137
03:27:16 + set +x
03:27:16 =================== sccache compilation log ===================
03:27:16 =========== If your build fails, please take a look at the log above for possible reasons ===========
03:27:16 Compile requests               150
03:27:16 Compile requests executed       73
03:27:16 Cache hits                      23
03:27:16 Cache misses                    50
03:27:16 Cache timeouts                   0
03:27:16 Cache read errors                0
03:27:16 Forced recaches                  0
03:27:16 Cache write errors               0
03:27:16 Compilation failures             0
03:27:16 Cache errors                     0
03:27:16 Non-cacheable compilations       0
03:27:16 Non-cacheable calls             30
03:27:16 Non-compilation calls           47
03:27:16 Unsupported compiler calls       0
03:27:16 Average cache write          0.000 s
03:27:16 Average cache read miss      1.663 s
03:27:16 Average cache read hit       0.010 s
03:27:16 Cache location             Local disk: "/var/lib/jenkins/.cache/sccache"
03:27:16 Cache size                       7 MiB
03:27:16 Max cache size                  10 GiB
03:27:16 Stopping sccache server...
03:27:16 Compile requests               150
03:27:16 Compile requests executed       73
03:27:16 Cache hits                      23
03:27:16 Cache misses                    50
03:27:16 Cache timeouts                   0
03:27:16 Cache read errors                0
03:27:16 Forced recaches                  0
03:27:16 Cache write errors               0
03:27:16 Compilation failures             0
03:27:16 Cache errors                     0
03:27:16 Non-cacheable compilations       0
03:27:16 Non-cacheable calls             30
03:27:16 Non-compilation calls           47
03:27:16 Unsupported compiler calls       0
03:27:16 Average cache write          0.000 s
03:27:16 Average cache read miss      1.663 s
03:27:16 Average cache read hit       0.010 s
03:27:16 Cache location             Local disk: "/var/lib/jenkins/.cache/sccache"
03:27:16 Cache size                       7 MiB
03:27:16 Max cache size                  10 GiB
03:27:16 + echo 'Stopping container...'
03:27:16 Stopping container...

Valgrind is choking for a very different reason: it's hitting an unrecognized instruction. 0xF 0xC7 is RDRAND https://en.wikipedia.org/wiki/RdRand and it is known to not be supported on some versions of valgrind. One way to fix this problem is try to install a newer version of valgrind on these servers.

Aha, I thought it was because of memory leaking > <

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update the commit message so that it accurately describes the current changes.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zrphercule is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zrphercule zrphercule deleted the migrant_aten branch September 27, 2018 04:10
zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 27, 2018
Summary:
migrant all tests in aten to use gtest except of basic.cpp
Sinc features of gtest are different from catch test, some of the tests has been re-writted with similar meaning.

Basic test has some version conflict with valgrind according to CI, therefore this testcase is still implementing catch.
It will be resolved by a different pr.

Pull Request resolved: pytorch/pytorch#11846

Differential Revision: D10080860

Pulled By: zrphercule

fbshipit-source-id: 439d4cf33fb6ccbe79b797860342853c63e59081
goldsborough added a commit to goldsborough/pytorch that referenced this pull request Oct 6, 2018
Summary:
In our #better-engineering quest of removing all uses of catch in favor of gtest, this PR ports JIT tests to gtest. After pytorch#11846 lands, we will be able to delete catch.

I don't claim to use/write these tests much (though I wrote the custom operator tests) so please do scrutinize whether you will want to write tests in the way I propose. Basically:

1. One function declaration per "test case" in test/cpp/jit/test.h
2. One definition in test/cpp/jit/test.cpp
3. If you want to be able to run it in Python, add it to `runJitTests()` which is called from Python tests
4. If you want to be able to run it in C++, add a `JIT_TEST` line in test/cpp/jit/gtest.cpp

Notice also I was able to share support code between C++ frontend and JIT tests, which is healthy.

ezyang apaszke zdevito
Pull Request resolved: pytorch#12030

Differential Revision: D10207745

fbshipit-source-id: c1e7d55d05840943bb2f92aa207082e836009416
facebook-github-bot pushed a commit that referenced this pull request Oct 7, 2018
Summary:
In our #better-engineering quest of removing all uses of catch in favor of gtest, this PR ports JIT tests to gtest. After #11846 lands, we will be able to delete catch.

I don't claim to use/write these tests much (though I wrote the custom operator tests) so please do scrutinize whether you will want to write tests in the way I propose. Basically:

1. One function declaration per "test case" in test/cpp/jit/test.h
2. One definition in test/cpp/jit/test.cpp
3. If you want to be able to run it in Python, add it to `runJitTests()` which is called from Python tests
4. If you want to be able to run it in C++, add a `JIT_TEST` line in test/cpp/jit/gtest.cpp

Notice also I was able to share support code between C++ frontend and JIT tests, which is healthy.

ezyang apaszke zdevito
Pull Request resolved: #12030

Differential Revision: D10207745

Pulled By: goldsborough

fbshipit-source-id: d4bae087e4d03818b72b8853cd5802d79a4cf32e
facebook-github-bot pushed a commit that referenced this pull request Nov 6, 2018
Summary:
In #11846 , we immigranted all catch tests in Aten/test/ to use gtest except of basic.cpp for a GPU bug (valgrind related).
In this PR, we will find out what the bug is, and immigrant last piece of aten catch to use gtest.
Pull Request resolved: #12142

Differential Revision: D12946980

Pulled By: zrphercule

fbshipit-source-id: cf3b21f23ddec3e363ac8ec4bdeb4bc4fe35f83b
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants