Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 29, 2024

What changes were proposed in this pull request?

This is a follow-up of #47083 to recover PySpark RDD tests.

Why are the changes needed?

PySpark Core test should not fail on optional dependencies.

BEFORE

$ python/run-tests.py --python-executables python3 --modules pyspark-core
...
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/core/rdd.py", line 5376, in _test
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

AFTER

$ python/run-tests.py --python-executables python3 --modules pyspark-core
...
Tests passed in 189 seconds

Skipped tests in pyspark.tests.test_memory_profiler with python3:
    test_assert_vanilla_mode (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_assert_vanilla_mode) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_aggregate_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_aggregate_in_pandas) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_clear (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_clear) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_cogroup_apply_in_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_arrow) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_cogroup_apply_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_pandas) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_group_apply_in_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_arrow) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_group_apply_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_pandas) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_map_in_pandas_not_supported (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_map_in_pandas_not_supported) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_pandas_udf (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_pandas_udf_iterator_not_supported (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_iterator_not_supported) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_pandas_udf_window (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_window) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_udf (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_udf_multiple_actions (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_multiple_actions) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_udf_registered (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_registered) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_udf_with_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_with_arrow) ... skipped 'Must have memory-profiler installed.'
    test_profilers_clear (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_profilers_clear) ... skipped 'Must have memory-profiler installed.'
    test_code_map (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_code_map) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_memory_profiler) ... skipped 'Must have memory-profiler installed.'
    test_profile_pandas_function_api (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_function_api) ... skipped 'Must have memory-profiler installed.'
    test_profile_pandas_udf (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_udf) ... skipped 'Must have memory-profiler installed.'
    test_udf_line_profiler (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_udf_line_profiler) ... skipped 'Must have memory-profiler installed.'

Skipped tests in pyspark.tests.test_rdd with python3:
    test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (pyspark.tests.test_rdd.RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock) ... skipped 'NumPy or Pandas not installed'

Skipped tests in pyspark.tests.test_serializers with python3:
    test_statcounter_array (pyspark.tests.test_serializers.NumPyTests.test_statcounter_array) ... skipped 'NumPy not installed'
    test_serialize (pyspark.tests.test_serializers.SciPyTests.test_serialize) ... skipped 'SciPy not installed'

Skipped tests in pyspark.tests.test_worker with python3:
    test_memory_limit (pyspark.tests.test_worker.WorkerMemoryTest.test_memory_limit) ... skipped "Memory limit feature in Python worker is dependent on Python's 'resource' module on Linux; however, not found or not on Linux."
    test_python_segfault (pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault) ... skipped 'SPARK-46130: Flaky with Python 3.12'
    test_python_segfault (pyspark.tests.test_worker.WorkerSegfaultTest.test_python_segfault) ... skipped 'SPARK-46130: Flaky with Python 3.12'

Does this PR introduce any user-facing change?

No. The failure happens during testing.

How was this patch tested?

Pass the CIs and do the manual test without optional dependencies.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !

@HyukjinKwon
Copy link
Member

Merged to master.

@dongjoon-hyun
Copy link
Member Author

Thank you!

@dongjoon-hyun dongjoon-hyun deleted the SPARK-48710 branch July 30, 2024 01:26
@dongjoon-hyun
Copy link
Member Author

Oh, we are in a difficult situation.

Since this PR is a follow-up of SPARK-48710, we need to land at branch-3.5.
However, the original patch was merged about 1 month ago and Apache Spark 3.5.2 RC4 is also already finished. So, we cannot land this as a follow-up. We need a new JIRA issue for this minor follow-up because the Fixed Version of this commit will be 3.5.3 instead of 3.5.2.

WDYT, @HyukjinKwon and @yaooqinn ?

@codesorcery
Copy link
Contributor

@dongjoon-hyun on branch-3.5 we only set an upper bound for numpy (see #47175). So this follow-up doesn't need to be (and can't be) applied there.

@dongjoon-hyun
Copy link
Member Author

@dongjoon-hyun on branch-3.5 we only set an upper bound for numpy (see #47175). So this follow-up doesn't need to be (and can't be) applied there.

Oh, that's great. Thank you for the confirmation, @codesorcery .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants