Skip to content

ValueError: buffer source array is read-only in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__ with ray #21685

@joshua-cogliati-inl

Description

@joshua-cogliati-inl

Describe the bug

When we tried upgrading from scikit-learn 0.24.2 to 1.0.1, we got this error when using scikit-learn with with ray:

(pid=18644) ValueError: buffer source array is read-only
Traceback (most recent call last):
  File "/Users/fred/raven/opensource/raven/framework/Driver.py", line 313, in <module>
    raven()
  File "/Users/fred/raven/opensource/raven/framework/Driver.py", line 266, in raven
    simulation.run()
  File "/Users/fred/raven/opensource/raven/framework/Simulation.py", line 764, in run
    stepInstance.takeAstep(stepInputDict)
  File "/Users/fred/raven/opensource/raven/framework/Steps/Step.py", line 326, in takeAstep
    self._localTakeAstepRun(inDictionary)
  File "/Users/fred/raven/opensource/raven/framework/Steps/MultiRun.py", line 179, in _localTakeAstepRun
    myLambda([finishedJob,outputs[outIndex]])
  File "/Users/fred/raven/opensource/raven/framework/Steps/MultiRun.py", line 109, in <lambda>
    self._outputCollectionLambda.append( (lambda x: inDictionary['Model'].collectOutput(x[0],x[1]), outIndex) )
  File "/Users/fred/raven/opensource/raven/framework/Models/Dummy.py", line 219, in collectOutput
    result = finishedJob.getEvaluation()
  File "/Users/fred/raven/opensource/raven/framework/Runners/InternalRunner.py", line 97, in getEvaluation
    self._collectRunnerResponse()
  File "/Users/fred/raven/opensource/raven/framework/Runners/DistributedMemoryRunner.py", line 83, in _collectRunnerResponse
    self.runReturn = ray.get(self.thread) if im.isLibAvail("ray") else self.thread()
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
    return func(*args, **kwargs)
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/worker.py", line 1564, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError: ray::evaluateSample() (pid=18644, ip=192.168.0.102)
  File "python/ray/_raylet.pyx", line 493, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 514, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 384, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: buffer source array is read-only
traceback: Traceback (most recent call last):
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 251, in deserialize_objects
    obj = self._deserialize_object(data, metadata, object_ref)
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 189, in _deserialize_object
    return self._deserialize_msgpack_data(data, metadata_fields)
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 167, in _deserialize_msgpack_data
    python_objects = self._deserialize_pickle5_data(pickle5_data)
  File "/Users/fred/miniconda3/envs/raven_libraries_tf26set/lib/python3.9/site-packages/ray/serialization.py", line 155, in _deserialize_pickle5_data
    obj = pickle.loads(in_band, buffers=buffers)
  File "sklearn/neighbors/_dist_metrics.pyx", line 223, in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only

The relevant code in scikit-learn is:

    def __setstate__(self, state):
        """
        set state for pickling
        """
        self.p = state[0]
        self.vec = state[1] #line 223
        self.mat = state[2]
        if self.__class__.__name__ == "PyFuncDistance":
            self.func = state[3]
            self.kwargs = state[4]
        self.size = self.vec.shape[0]

Steps/Code to Reproduce

I do not have a reduced case. If I have time, I will try and create one.
This fails:
https://github.com/joshua-cogliati-inl/raven/tree/cogljj/update_libraries with commit
joshua-cogliati-inl/raven@8de1e24
When running tests/framework/InternalParallelTests/ROMscikit which is code designed to test the class:
https://github.com/joshua-cogliati-inl/raven/blob/cogljj/update_libraries/framework/SupervisedLearning/ScikitLearn/Neighbors/KNeighborsRegressor.py

Basically, we are using ray to distribute a sklearn.neighbors.KNeighborsRegressor and we get the above error.

Expected Results

Scikit learn can distribute DistanceMetric remotely with ray.

Actual Results

  File "sklearn/neighbors/_dist_metrics.pyx", line 223, in sklearn.neighbors._dist_metrics.DistanceMetric.__setstate__
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only

Versions

>>> import sklearn; sklearn.show_versions()

System:
    python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 20:33:18)  [Clang 11.1.0 ]
executable: /Users/fred/miniconda3/envs/raven_libraries_tf26set/bin/python
   machine: macOS-10.14.6-x86_64-i386-64bit

Python dependencies:
          pip: 21.3.1
   setuptools: 58.5.3
      sklearn: 1.0.1
        numpy: 1.19.5
        scipy: 1.7.1
       Cython: None
       pandas: 1.3.4
   matplotlib: 3.4.3
       joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions