[SPARK-48710][PYTHON][FOLLOWUP] PySpark rdd test should not fail on optional dependencies #47526

dongjoon-hyun · 2024-07-29T22:58:08Z

What changes were proposed in this pull request?

This is a follow-up of #47083 to recover PySpark RDD tests.

Why are the changes needed?

PySpark Core test should not fail on optional dependencies.

BEFORE

$ python/run-tests.py --python-executables python3 --modules pyspark-core
...
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/core/rdd.py", line 5376, in _test
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

AFTER

$ python/run-tests.py --python-executables python3 --modules pyspark-core
...
Tests passed in 189 seconds

Skipped tests in pyspark.tests.test_memory_profiler with python3:
    test_assert_vanilla_mode (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_assert_vanilla_mode) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_aggregate_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_aggregate_in_pandas) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_clear (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_clear) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_cogroup_apply_in_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_arrow) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_cogroup_apply_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_pandas) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_group_apply_in_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_arrow) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_group_apply_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_pandas) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_map_in_pandas_not_supported (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_map_in_pandas_not_supported) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_pandas_udf (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_pandas_udf_iterator_not_supported (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_iterator_not_supported) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_pandas_udf_window (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_window) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_udf (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_udf_multiple_actions (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_multiple_actions) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_udf_registered (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_registered) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler_udf_with_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_with_arrow) ... skipped 'Must have memory-profiler installed.'
    test_profilers_clear (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_profilers_clear) ... skipped 'Must have memory-profiler installed.'
    test_code_map (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_code_map) ... skipped 'Must have memory-profiler installed.'
    test_memory_profiler (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_memory_profiler) ... skipped 'Must have memory-profiler installed.'
    test_profile_pandas_function_api (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_function_api) ... skipped 'Must have memory-profiler installed.'
    test_profile_pandas_udf (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_udf) ... skipped 'Must have memory-profiler installed.'
    test_udf_line_profiler (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_udf_line_profiler) ... skipped 'Must have memory-profiler installed.'

Skipped tests in pyspark.tests.test_rdd with python3:
    test_take_on_jrdd_with_large_rows_should_not_cause_deadlock (pyspark.tests.test_rdd.RDDTests.test_take_on_jrdd_with_large_rows_should_not_cause_deadlock) ... skipped 'NumPy or Pandas not installed'

Skipped tests in pyspark.tests.test_serializers with python3:
    test_statcounter_array (pyspark.tests.test_serializers.NumPyTests.test_statcounter_array) ... skipped 'NumPy not installed'
    test_serialize (pyspark.tests.test_serializers.SciPyTests.test_serialize) ... skipped 'SciPy not installed'

Skipped tests in pyspark.tests.test_worker with python3:
    test_memory_limit (pyspark.tests.test_worker.WorkerMemoryTest.test_memory_limit) ... skipped "Memory limit feature in Python worker is dependent on Python's 'resource' module on Linux; however, not found or not on Linux."
    test_python_segfault (pyspark.tests.test_worker.WorkerSegfaultNonDaemonTest.test_python_segfault) ... skipped 'SPARK-46130: Flaky with Python 3.12'
    test_python_segfault (pyspark.tests.test_worker.WorkerSegfaultTest.test_python_segfault) ... skipped 'SPARK-46130: Flaky with Python 3.12'

Does this PR introduce any user-facing change?

No. The failure happens during testing.

How was this patch tested?

Pass the CIs and do the manual test without optional dependencies.

Was this patch authored or co-authored using generative AI tooling?

No.

…ptional dependencies

dongjoon-hyun · 2024-07-29T23:02:06Z

cc @codesorcery , @HyukjinKwon , @zhengruifeng , @itholic

dongjoon-hyun · 2024-07-29T23:05:19Z

Thank you, @HyukjinKwon !

HyukjinKwon · 2024-07-30T00:46:11Z

Merged to master.

dongjoon-hyun · 2024-07-30T01:26:14Z

Thank you!

dongjoon-hyun · 2024-07-30T04:25:37Z

Oh, we are in a difficult situation.

Since this PR is a follow-up of SPARK-48710, we need to land at branch-3.5.
However, the original patch was merged about 1 month ago and Apache Spark 3.5.2 RC4 is also already finished. So, we cannot land this as a follow-up. We need a new JIRA issue for this minor follow-up because the Fixed Version of this commit will be 3.5.3 instead of 3.5.2.

WDYT, @HyukjinKwon and @yaooqinn ?

codesorcery · 2024-07-30T05:40:06Z

@dongjoon-hyun on branch-3.5 we only set an upper bound for numpy (see #47175). So this follow-up doesn't need to be (and can't be) applied there.

dongjoon-hyun · 2024-07-30T05:48:14Z

@dongjoon-hyun on branch-3.5 we only set an upper bound for numpy (see #47175). So this follow-up doesn't need to be (and can't be) applied there.

Oh, that's great. Thank you for the confirmation, @codesorcery .

[SPARK-48710][PYTHON][FOLLOWUP] PySpark rdd test should not fail on o…

d50ba2a

…ptional dependencies

github-actions bot added the PYTHON label Jul 29, 2024

dongjoon-hyun mentioned this pull request Jul 29, 2024

[SPARK-48710][PYTHON] Use NumPy 2.0 compatible types #47083

Closed

HyukjinKwon approved these changes Jul 29, 2024

View reviewed changes

HyukjinKwon closed this in 0a1f5ab Jul 30, 2024

dongjoon-hyun deleted the SPARK-48710 branch July 30, 2024 01:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-48710][PYTHON][FOLLOWUP] PySpark rdd test should not fail on optional dependencies #47526

[SPARK-48710][PYTHON][FOLLOWUP] PySpark rdd test should not fail on optional dependencies #47526

Uh oh!

dongjoon-hyun commented Jul 29, 2024 •

edited

Loading

Uh oh!

dongjoon-hyun commented Jul 29, 2024

Uh oh!

dongjoon-hyun commented Jul 29, 2024

Uh oh!

HyukjinKwon commented Jul 30, 2024

Uh oh!

dongjoon-hyun commented Jul 30, 2024

Uh oh!

dongjoon-hyun commented Jul 30, 2024

Uh oh!

codesorcery commented Jul 30, 2024

Uh oh!

dongjoon-hyun commented Jul 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-48710][PYTHON][FOLLOWUP] PySpark rdd test should not fail on optional dependencies #47526

[SPARK-48710][PYTHON][FOLLOWUP] PySpark rdd test should not fail on optional dependencies #47526

Uh oh!

Conversation

dongjoon-hyun commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Jul 29, 2024

Uh oh!

dongjoon-hyun commented Jul 29, 2024

Uh oh!

HyukjinKwon commented Jul 30, 2024

Uh oh!

dongjoon-hyun commented Jul 30, 2024

Uh oh!

dongjoon-hyun commented Jul 30, 2024

Uh oh!

codesorcery commented Jul 30, 2024

Uh oh!

dongjoon-hyun commented Jul 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongjoon-hyun commented Jul 29, 2024 •

edited

Loading