Skip to content

Presumed race condition in caching #449

@jedbrown

Description

@jedbrown

In trying to address remaining concerns in CEED/libCEED#688, I've been switching between occa main and 1.1.0 (the version that was working previously). I clean, remove ~/.occa, and rebuild. Sometimes I get errors like below. Seems to be a race conditioning related to caching. We typically make -j64 test.

If this issue is hard to fix (or others may persist), how would you recommend isolating our build environments so that there is no chance of colliding with stale caches?

not ok 33 t521-operator /gpu/hip/occa stderr
# +terminate called after throwing an instance of 'occa::exception'
# +what():
# +---[ Error ]--------------------------------------------------------------------
# +File     : /home/gitlab-runner/builds/N8GCsqus/0/libceed/occa-1.1.0/src/tools/sys.cpp
# +Line     : 855
# +Function : dlopen
# +Message  : Error loading binary [5f0237e1aff76bcc/launcher_binary] with dlopen: /home/gitlab-runner/.occa/cache/5f0237e1aff76bcc/launcher_binary: file too short
# +Stack
# +19 occa::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# +18 occa::sys::dlopen(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, occa::io::lock_t const&)
# +17 occa::serial::device::buildKernelFromBinary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_t
raits<char>, std::allocator<char> > const&, occa::properties const&, occa::lang::kernelMetadata_t&)
# +16 occa::serial::device::buildKernelFromBinary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_t
raits<char>, std::allocator<char> > const&, occa::properties const&)
# +15 occa::serial::device::buildKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char
>, std::allocator<char> > const&, occa::hash_t, occa::properties const&, bool)
# +14 occa::serial::device::buildLauncherKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_tra
its<char>, std::allocator<char> > const&, occa::hash_t)
# +13 occa::launchedModeDevice_t::buildLauncherKernel(occa::hash_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string
<char, std::char_traits<char>, std::allocator<char> > const&, occa::lang::sourceMetadata_t)
# +12 occa::launchedModeDevice_t::buildKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_trait
s<char>, std::allocator<char> > const&, occa::hash_t, bool, occa::properties const&)
# +11 occa::launchedModeDevice_t::buildKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_trait
s<char>, std::allocator<char> > const&, occa::hash_t, occa::properties const&)
# +10 occa::device::buildKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> > const&, occa::properties const&) const
# +9 occa::device::buildKernelFromString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<cha
r>, std::allocator<char> > const&, occa::properties const&) const
# +8 ceed::occa::CpuOperator::buildApplyAddKernel()
# +7 ceed::occa::Operator::applyAdd(ceed::occa::Vector*, ceed::occa::Vector*, CeedRequest_private**)
# +6 ceed::occa::Operator::ceedApplyAdd(CeedOperator_private*, CeedVector_private*, CeedVector_private*, CeedRequest_private**)
# +5 /home/gitlab-runner/libceed/lib/libceed_test.so(CeedOperatorApplyAdd+0x42)
# +4 /home/gitlab-runner/libceed/lib/libceed_test.so(CeedOperatorApply+0x20d)
# +3 build/t521-operator(+0x1daa)
# +2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)
# +1 build/t521-operator(+0x11fa)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugUse this label when reporting bugs!

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions