Skip to content

matmulTN fails when the two parameters are the same array (Intel/OpenCL) #1711

@unbornchikken

Description

@unbornchikken

Original issue: arrayfire/arrayfire-python#114 (comment)

clinfo:

Number of platforms:				 2
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 1.2 
  Platform Name:				 Intel(R) OpenCL
  Platform Vendor:				 Intel(R) Corporation
  Platform Extensions:				 cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 
  Platform Name:				 Experimental OpenCL 2.0 CPU Only Platform
  Platform Vendor:				 Intel(R) Corporation
  Platform Extensions:				 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_local_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_fp64 cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing


  Platform Name:				 Intel(R) OpenCL
Number of devices:				 2
  Device Type:					 CL_DEVICE_TYPE_GPU
  Device ID:					 32902
  Max compute units:				 20
  Max work items dimensions:			 3
    Max work items[0]:				 512
    Max work items[1]:				 512
    Max work items[2]:				 512
  Max work group size:				 512
  Preferred vector width char:			 1
  Preferred vector width short:			 1
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 0
  Max clock frequency:				 1200Mhz
  Address bits:					 14757395255531667520
  Max memory allocation:			 390280806
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 128
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 No
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 262144
  Global memory size:				 1561123226
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 65536
  Error correction support:			 0
  Profiling timer resolution:			 80
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 01020620
  Name:						 Intel(R) HD Graphics 4600
  Vendor:					 Intel(R) Corporation
  Driver version:				 20.19.15.4531
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 
  Extensions:					 cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_ctz cl_intel_d3d11_nv12_media_sharing cl_intel_dx9_media_sharing cl_intel_motion_estimation cl_intel_simultaneous_sharing cl_intel_subgroups cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_gl_sharing cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir 


  Device Type:					 CL_DEVICE_TYPE_CPU
  Device ID:					 32902
  Max compute units:				 8
  Max work items dimensions:			 3
    Max work items[0]:				 8192
    Max work items[1]:				 8192
    Max work items[2]:				 8192
  Max work group size:				 8192
  Preferred vector width char:			 1
  Preferred vector width short:			 1
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Max clock frequency:				 3600Mhz
  Address bits:					 14757395255531667488
  Max memory allocation:			 536838144
  Image support:				 Yes
  Max number of images read arguments:		 480
  Max number of images write arguments:		 480
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 480
  Max size of kernel argument:			 3840
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 No
    Round to +ve and infinity:			 No
    IEEE754-2008 fused multiply-add:		 No
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 262144
  Global memory size:				 2147352576
  Constant buffer size:				 131072
  Max number of constant args:			 480
  Local memory type:				 Global
  Local memory size:				 32768
  Error correction support:			 0
  Profiling timer resolution:			 285
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 Yes
  Queue properties:				 
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 01020620
  Name:						 Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  Vendor:					 Intel(R) Corporation
  Driver version:				 5.2.0.10094
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 (Build 10094)
  Extensions:					 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 


  Platform Name:				 Experimental OpenCL 2.0 CPU Only Platform
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_CPU
  Device ID:					 32902
  Max compute units:				 8
  Max work items dimensions:			 3
    Max work items[0]:				 8192
    Max work items[1]:				 8192
    Max work items[2]:				 8192
  Max work group size:				 8192
  Preferred vector width char:			 1
  Preferred vector width short:			 1
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Max clock frequency:				 3600Mhz
  Address bits:					 14757395255531667488
  Max memory allocation:			 536838144
  Image support:				 Yes
  Max number of images read arguments:		 480
  Max number of images write arguments:		 480
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 480
  Max size of kernel argument:			 3840
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 No
    Round to +ve and infinity:			 No
    IEEE754-2008 fused multiply-add:		 No
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 262144
  Global memory size:				 2147352576
  Constant buffer size:				 131072
  Max number of constant args:			 480
  Local memory type:				 Global
  Local memory size:				 32768
  Error correction support:			 0
  Profiling timer resolution:			 285
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 Yes
  Queue properties:				 
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 01058E78
  Name:						 Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
  Vendor:					 Intel(R) Corporation
  Driver version:				 6.0.0.1049
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 2.0 (Build 162)
  Extensions:					 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_local_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_fp64 cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing 

ArrayFire Python's backend and device is the default.

Error:

ERROR:root:Traceback (most recent call last):
  File "C:\git\arrayfire-python\arrayfire\tests\simple\_util.py", line 32, in run
    test(verbose)
  File "C:\git\arrayfire-python\arrayfire\tests\simple\lapack.py", line 42, in simple_lapack
    a = af.matmulTN(a, a) + 10 * af.identity(5,5)
  File "C:\git\arrayfire-python\arrayfire\blas.py", line 88, in matmulTN
    MATPROP.TRANS.value, MATPROP.NONE.value))
  File "C:\git\arrayfire-python\arrayfire\util.py", line 79, in safe_call
    raise RuntimeError(to_str(err_str))
RuntimeError: In function class std::shared_ptr<float> __cdecl opencl::Array<float>::getMappedPtr(void) const
In file src\backend\opencl\Array.hpp:260
OpenCL Error (-59): Invalid Operation when calling clEnqueueMapBuffer

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions