Skip to content

Conversation

@MMathisLab
Copy link
Member

updates to appl package allows for a single conda file install, no download step needed. Installs fine and reports Metal M1 GPU in use!

However, training crashes:

*** Received signal 11 ***
*** BEGIN MANGLED STACK TRACE ***
0   libtensorflow_framework.2.dylib     0x000000010ac902fc _ZN10tensorflow7testingL17StacktraceHandlerEiP9__siginfoPv + 160
1   libsystem_platform.dylib            0x000000018ef874a4 _sigtramp + 56
2   MPSNDArray                          0x000000019969dd38 MPSSetResourcesOnCommandEncoder + 9684
3   MPSNDArray                          0x000000019969fbbc MPSSetResourcesOnCommandEncoder + 17496
4   MPSNDArray                          0x00000001996a01e8 MPSSetResourcesOnCommandEncoder + 19076
5   libmetal_plugin.dylib               0x000000017fc31024 ___Z17dispatchOneKernelI18MPSNDArrayIdentityEdP11MetalStreamPT_P7NSArrayP10MPSNDArrayPKcP18MPSKernelDAGObject_block_invoke + 120
6   libdispatch.dylib                   0x000000018edac1b4 _dispatch_client_callout + 20
7   libdispatch.dylib                   0x000000018edbb414 _dispatch_lane_barrier_sync_invoke_and_complete + 56
8   libmetal_plugin.dylib               0x000000017fc349c4 _ZN12metal_plugin18MPSApplyMomentumOpIfE7ComputeEPNS_15OpKernelContextE + 3564
9   libmetal_plugin.dylib               0x000000017fc33990 _ZN12metal_pluginL15ComputeOpKernelINS_18MPSApplyMomentumOpIfEEEEvPvP18TF_OpKernelContext + 44
10  _pywrap_tensorflow_internal.so      0x000000029bf7aaa4 _ZN10tensorflow15PluggableDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE + 148
11  libtensorflow_framework.2.dylib     0x000000010a4daf90 _ZN10tensorflow12_GLOBAL__N_113ExecutorStateINS_15PropagatorStateEE7ProcessENS2_10TaggedNodeEx + 3148
12  libtensorflow_framework.2.dylib     0x000000010a4dc530 _ZNSt3__110__function6__funcIZN10tensorflow12_GLOBAL__N_113ExecutorStateINS2_15PropagatorStateEE7RunTaskIZNS6_13ScheduleReadyEPN4absl12lts_2021110213InlinedVectorINS5_10TaggedNodeELm8ENS_9allocatorISB_EEEEPNS5_20TaggedNodeReadyQueueEEUlvE0_EEvOT_EUlvE_NSC_ISL_EEFvvEEclEv + 56
13  _pywrap_tensorflow_internal.so      0x000000029cdfc2a8 _ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi + 1496
14  _pywrap_tensorflow_internal.so      0x000000029cdfbba4 _ZZN10tensorflow6thread16EigenEnvironment12CreateThreadENSt3__18functionIFvvEEEENKUlvE_clEv + 80
15  libtensorflow_framework.2.dylib     0x000000010ac9419c _ZN10tensorflow12_GLOBAL__N_17PThread8ThreadFnEPv + 120
16  libsystem_pthread.dylib             0x000000018ef7026c _pthread_start + 148
17  libsystem_pthread.dylib             0x000000018ef6b08c thread_start + 8
*** END MANGLED STACK TRACE ***

*** Begin stack trace ***
	tensorflow::CurrentStackTrace()
	tensorflow::testing::StacktraceHandler(int, __siginfo*, void*)
	_sigtramp
	MPSSetResourcesOnCommandEncoder
	MPSSetResourcesOnCommandEncoder
	MPSSetResourcesOnCommandEncoder
	invocation function for block in double dispatchOneKernel<MPSNDArrayIdentity>(MetalStream*, MPSNDArrayIdentity*, NSArray*, MPSNDArray*, char const*, MPSKernelDAGObject*)
	_dispatch_client_callout
	_dispatch_lane_barrier_sync_invoke_and_complete
	metal_plugin::MPSApplyMomentumOp<float>::Compute(metal_plugin::OpKernelContext*)
	void metal_plugin::ComputeOpKernel<metal_plugin::MPSApplyMomentumOp<float> >(void*, TF_OpKernelContext*)
	tensorflow::PluggableDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*)
	tensorflow::(anonymous namespace)::ExecutorState<tensorflow::PropagatorState>::Process(tensorflow::PropagatorState::TaggedNode, long long)
	std::__1::__function::__func<void tensorflow::(anonymous namespace)::ExecutorState<tensorflow::PropagatorState>::RunTask<tensorflow::(anonymous namespace)::ExecutorState<tensorflow::PropagatorState>::ScheduleReady(absl::lts_20211102::InlinedVector<tensorflow::PropagatorState::TaggedNode, 8ul, std::__1::allocator<tensorflow::PropagatorState::TaggedNode> >*, tensorflow::PropagatorState::TaggedNodeReadyQueue*)::'lambda0'()>(tensorflow::(anonymous namespace)::ExecutorState<tensorflow::PropagatorState>::ScheduleReady(absl::lts_20211102::InlinedVector<tensorflow::PropagatorState::TaggedNode, 8ul, std::__1::allocator<tensorflow::PropagatorState::TaggedNode> >*, tensorflow::PropagatorState::TaggedNodeReadyQueue*)::'lambda0'()&&)::'lambda'(), std::__1::allocator<void tensorflow::(anonymous namespace)::ExecutorState<tensorflow::PropagatorState>::RunTask<tensorflow::(anonymous namespace)::ExecutorState<tensorflow::PropagatorState>::ScheduleReady(absl::lts_20211102::InlinedVector<tensorflow::PropagatorState::TaggedNode, 8ul, std::__1::allocator<tensorflow::PropagatorState::TaggedNode> >*, tensorflow::PropagatorState::TaggedNodeReadyQueue*)::'lambda0'()>(tensorflow::(anonymous namespace)::ExecutorState<tensorflow::PropagatorState>::ScheduleReady(absl::lts_20211102::InlinedVector<tensorflow::PropagatorState::TaggedNode, 8ul, std::__1::allocator<tensorflow::PropagatorState::TaggedNode> >*, tensorflow::PropagatorState::TaggedNodeReadyQueue*)::'lambda0'()&&)::'lambda'()>, void ()>::operator()()
	Eigen::ThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int)
	tensorflow::thread::EigenEnvironment::CreateThread(std::__1::function<void ()>)::'lambda'()::operator()() const
	tensorflow::(anonymous namespace)::PThread::ThreadFn(void*)
	_pthread_start
	thread_start
*** End stack trace ***

systemMemory: 32.00 GB
maxCacheSize: 10.67 GB

@MMathisLab MMathisLab marked this pull request as draft August 30, 2022 15:04
@MMathisLab MMathisLab requested a review from jeylau September 13, 2022 21:34
@MMathisLab MMathisLab marked this pull request as ready for review October 6, 2022 09:57
@MMathisLab
Copy link
Member Author

closed via #2022 -- better solution

@MMathisLab MMathisLab closed this Nov 4, 2022
@MMathisLab MMathisLab deleted the mwm/apple-m1 branch November 4, 2022 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants