-
Notifications
You must be signed in to change notification settings - Fork 549
matmulTN fails when the two parameters are the same array (Intel/OpenCL) #1711
Copy link
Copy link
Closed
Description
Original issue: arrayfire/arrayfire-python#114 (comment)
clinfo:
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2
Platform Name: Intel(R) OpenCL
Platform Vendor: Intel(R) Corporation
Platform Extensions: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0
Platform Name: Experimental OpenCL 2.0 CPU Only Platform
Platform Vendor: Intel(R) Corporation
Platform Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_local_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_fp64 cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing
Platform Name: Intel(R) OpenCL
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 32902
Max compute units: 20
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 512
Max work group size: 512
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
Max clock frequency: 1200Mhz
Address bits: 14757395255531667520
Max memory allocation: 390280806
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 128
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 262144
Global memory size: 1561123226
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Error correction support: 0
Profiling timer resolution: 80
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 01020620
Name: Intel(R) HD Graphics 4600
Vendor: Intel(R) Corporation
Driver version: 20.19.15.4531
Profile: FULL_PROFILE
Version: OpenCL 1.2
Extensions: cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_ctz cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing cl_intel_motion_estimation cl_intel_simultaneous_sharing cl_intel_subgroups cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 32902
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 8192
Max work items[1]: 8192
Max work items[2]: 8192
Max work group size: 8192
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Max clock frequency: 3600Mhz
Address bits: 14757395255531667488
Max memory allocation: 536838144
Image support: Yes
Max number of images read arguments: 480
Max number of images write arguments: 480
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 480
Max size of kernel argument: 3840
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 262144
Global memory size: 2147352576
Constant buffer size: 131072
Max number of constant args: 480
Local memory type: Global
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 285
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 01020620
Name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Vendor: Intel(R) Corporation
Driver version: 5.2.0.10094
Profile: FULL_PROFILE
Version: OpenCL 1.2 (Build 10094)
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64
Platform Name: Experimental OpenCL 2.0 CPU Only Platform
Number of devices: 1
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 32902
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 8192
Max work items[1]: 8192
Max work items[2]: 8192
Max work group size: 8192
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Max clock frequency: 3600Mhz
Address bits: 14757395255531667488
Max memory allocation: 536838144
Image support: Yes
Max number of images read arguments: 480
Max number of images write arguments: 480
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 480
Max size of kernel argument: 3840
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 262144
Global memory size: 2147352576
Constant buffer size: 131072
Max number of constant args: 480
Local memory type: Global
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 285
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 01058E78
Name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Vendor: Intel(R) Corporation
Driver version: 6.0.0.1049
Profile: FULL_PROFILE
Version: OpenCL 2.0 (Build 162)
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_local_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_fp64 cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing
ArrayFire Python's backend and device is the default.
Error:
ERROR:root:Traceback (most recent call last):
File "C:\git\arrayfire-python\arrayfire\tests\simple\_util.py", line 32, in run
test(verbose)
File "C:\git\arrayfire-python\arrayfire\tests\simple\lapack.py", line 42, in simple_lapack
a = af.matmulTN(a, a) + 10 * af.identity(5,5)
File "C:\git\arrayfire-python\arrayfire\blas.py", line 88, in matmulTN
MATPROP.TRANS.value, MATPROP.NONE.value))
File "C:\git\arrayfire-python\arrayfire\util.py", line 79, in safe_call
raise RuntimeError(to_str(err_str))
RuntimeError: In function class std::shared_ptr<float> __cdecl opencl::Array<float>::getMappedPtr(void) const
In file src\backend\opencl\Array.hpp:260
OpenCL Error (-59): Invalid Operation when calling clEnqueueMapBuffer
Reactions are currently unavailable