Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4
Description
If you attempt to run
df = df.replace(float('nan'), somethingToReplaceWith)
It will replace all 0 s in columns of type Integer
Example code snippet to repro this:
from pyspark.sql import SQLContext spark = SQLContext(sc).sparkSession df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) df.show() df = df.replace(float('nan'), 5) df.show()
Here's the output I get when I run this code:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.4
/_/
Using Python version 3.7.5 (default, Nov 1 2019 02:16:32)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> spark = SQLContext(sc).sparkSession
>>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
>>> df.show()
+-----+-----+
|index|value|
+-----+-----+
| 1| 0|
| 2| 3|
| 3| 0|
+-----+-----+
>>> df = df.replace(float('nan'), 5)
>>> df.show()
+-----+-----+
|index|value|
+-----+-----+
| 1| 5|
| 2| 3|
| 3| 5|
+-----+-----+
>>>