Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21100

Add summary method as alternative to describe that gives quartiles similar to Pandas

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.1.1
    • 2.3.0
    • SQL
    • None

    Description

      The DataFrame describe method should also include quartiles (25th, 50th, and 75th percentiles) like Pandas.

      Example pandas output:

      In [4]: df.describe()
      Out[4]:
             Unnamed: 0       displ         year         cyl         cty         hwy
      count  234.000000  234.000000   234.000000  234.000000  234.000000  234.000000
      mean   117.500000    3.471795  2003.500000    5.888889   16.858974   23.440171
      std     67.694165    1.291959     4.509646    1.611534    4.255946    5.954643
      min      1.000000    1.600000  1999.000000    4.000000    9.000000   12.000000
      25%     59.250000    2.400000  1999.000000    4.000000   14.000000   18.000000
      50%    117.500000    3.300000  2003.500000    6.000000   17.000000   24.000000
      75%    175.750000    4.600000  2008.000000    8.000000   19.000000   27.000000
      max    234.000000    7.000000  2008.000000    8.000000   35.000000   44.000000
      

      Attachments

        Issue Links

          Activity

            People

              a1ray Andrew Ray
              a1ray Andrew Ray
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: