{"id":205319,"date":"2026-05-27T10:00:09","date_gmt":"2026-05-27T14:00:09","guid":{"rendered":"https:\/\/www.kdnuggets.com\/?p=205319"},"modified":"2026-05-25T19:21:47","modified_gmt":"2026-05-25T23:21:47","slug":"pandas-groupby-explained-with-examples","status":"publish","type":"post","link":"https:\/\/www.kdnuggets.com\/pandas-groupby-explained-with-examples","title":{"rendered":"Pandas GroupBy Explained With Examples"},"content":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/kdn-pandas-groupby-explained-with-examples.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Introduction<\/h2>\n<p>&nbsp;<br \/>\n<strong><a href=\"https:\/\/pandas.pydata.org\/\" target=\"_blank\">Pandas<\/a><\/strong> is one of the most popular Python libraries for data analysis. It gives you simple tools for cleaning, reshaping, summarizing, and exploring structured data. One of the most useful features in pandas is <strong>GroupBy<\/strong>. It helps you answer questions that require grouping rows by one or more categories.<\/p><div class=\"kdnug-after-first-paragraph kdnug-entity-placement\" id=\"kdnug-4150341630\"><div id=\"kdnug-3977784721\"><a data-no-instant=\"1\" href=\"https:\/\/www.pny.com\/nvidia-rtx-pro-6000-blackwell?iscommercial=true&#038;utm_source=KDNuggets+Banner+300x250&#038;utm_medium=Web+Banners&#038;utm_campaign=Blackwell+Server&#038;utm_id=RTX+PRO+6000\" rel=\"noopener nofollow\" class=\"a2t-link\" target=\"_blank\"><p>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" style=\"max-width: 100%; height: auto;\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/s-pny-2606-1.jpg\" alt=\"NVIDIA RTX PRO 6000 Blackwell Server Edition\" \/><br \/>\nLearn more<\/p>\n<\/a><\/div><\/div>\n<p>For example, if you are working with sales data, you may want to calculate total revenue by region, average order value by product category, or the number of orders handled by each sales representative. Instead of manually filtering each category one by one, GroupBy lets you perform these calculations in a clean and efficient way.<\/p><div class=\"kdnug-in-content-1 kdnug-entity-placement\" style=\"text-align: center;padding-bottom: 180px;padding-top: 20px;\" id=\"kdnug-3125986422\"><div id=\"kdnug-3452077800\"><a data-no-instant=\"1\" href=\"https:\/\/www.snowflake.com\/en\/dev-day\/americas-virtual\/?utm_source=kdnuggets&#038;utm_medium=display\" rel=\"noopener nofollow\" class=\"a2t-link\" target=\"_blank\"><p><img decoding=\"async\" style=\"max-width: 100%; height: auto;\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/s-snowflake-2606.png\" alt=\"Snowflake Dev Day \/><br \/>\nRegister today\tRegister today\t<\/p>\n<\/a><\/div><\/div>\n<p>In this tutorial, we will walk through practical examples of using Pandas GroupBy with a small sales dataset. I am using <strong><a href=\"https:\/\/deepnote.com\/\" target=\"_blank\">Deepnote<\/a><\/strong> as the coding environment, so some outputs are shown as notebook screenshots directly under the code blocks.<\/p>\n<p>&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Creating a Sample Dataset<\/h2>\n<p>&nbsp;<br \/>\nBefore using GroupBy, we first create a small retail sales dataset with columns such as <code style=\"background: #F5F5F5;\">order_id<\/code>, <code style=\"background: #F5F5F5;\">region<\/code>, <code style=\"background: #F5F5F5;\">category<\/code>, <code style=\"background: #F5F5F5;\">sales_rep<\/code>, <code style=\"background: #F5F5F5;\">units<\/code>, <code style=\"background: #F5F5F5;\">unit_price<\/code>, <code style=\"background: #F5F5F5;\">discount<\/code>, and <code style=\"background: #F5F5F5;\">order_date<\/code>. We then convert the dictionary into a pandas <code style=\"background: #F5F5F5;\">DataFrame<\/code> and create two new columns: <code style=\"background: #F5F5F5;\">gross_sales<\/code> and <code style=\"background: #F5F5F5;\">net_sales<\/code>.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>data = {\r\n    \"order_id\": [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112],\r\n    \"region\": [\"North\", \"South\", \"North\", \"West\", \"South\", \"West\", \"North\", \"South\", \"West\", \"North\", \"South\", \"West\"],\r\n    \"category\": [\"Electronics\", \"Furniture\", \"Electronics\", \"Furniture\", \"Clothing\", \"Electronics\",\r\n                 \"Clothing\", \"Furniture\", \"Clothing\", \"Furniture\", \"Electronics\", \"Clothing\"],\r\n    \"sales_rep\": [\"Ayesha\", \"Bilal\", \"Ayesha\", \"Chen\", \"Bilal\", \"Chen\",\r\n                  \"Ayesha\", \"Bilal\", \"Chen\", \"Ayesha\", \"Bilal\", \"Chen\"],\r\n    \"units\": [2, 1, 3, 2, 5, 4, 6, 2, 7, 1, 2, 8],\r\n    \"unit_price\": [500, 800, 450, 700, 60, 550, 55, 850, 65, 750, 520, 70],\r\n    \"discount\": [0.05, 0.10, 0.00, 0.08, 0.00, 0.12, 0.05, 0.10, 0.00, 0.07, 0.03, 0.00],\r\n    \"order_date\": pd.to_datetime([\r\n        \"2026-01-05\", \"2026-01-06\", \"2026-01-08\", \"2026-01-10\",\r\n        \"2026-01-12\", \"2026-01-15\", \"2026-02-02\", \"2026-02-05\",\r\n        \"2026-02-08\", \"2026-02-12\", \"2026-02-15\", \"2026-02-20\"\r\n    ])\r\n}\r\n\r\ndf = pd.DataFrame(data)\r\n\r\ndf[\"gross_sales\"] = df[\"units\"] * df[\"unit_price\"]\r\ndf[\"net_sales\"] = df[\"gross_sales\"] * (1 - df[\"discount\"])\r\n\r\ndf<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The <code style=\"background: #F5F5F5;\">gross_sales<\/code> column is calculated by multiplying <code style=\"background: #F5F5F5;\">units<\/code> by <code style=\"background: #F5F5F5;\">unit_price<\/code>, while <code style=\"background: #F5F5F5;\">net_sales<\/code> adjusts that value after applying the discount. This gives us a clean dataset that we can use for all GroupBy examples.<\/p>\n<p>&nbsp;<br \/>\n<img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_1.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Using the Basic GroupBy Syntax<\/h2>\n<p>&nbsp;<br \/>\nThe most basic GroupBy operation follows a simple pattern: select a grouping column, select the value column, and apply an aggregation function. In this example, we group the data by <code style=\"background: #F5F5F5;\">region<\/code> and calculate the total <code style=\"background: #F5F5F5;\">net_sales<\/code> for each region.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>df.groupby(\"region\")[\"net_sales\"].sum()<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The result shows that North, South, and West each have their own total sales value. This is the simplest and most common use case for GroupBy when summarizing data.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>region\r\nNorth    3311.0\r\nSouth    3558.8\r\nWest     4239.0\r\nName: net_sales, dtype: float64<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Using GroupBy With <code>as_index=False<\/code><\/h2>\n<p>&nbsp;<br \/>\nBy default, pandas uses the grouped column as the index in the output. While this is useful in some cases, it is often easier to work with a normal <code style=\"background: #F5F5F5;\">DataFrame<\/code> where the grouped column remains a regular column. That is where <code style=\"background: #F5F5F5;\">as_index=False<\/code> is useful.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>df.groupby(\"region\", as_index=False)[\"net_sales\"].sum()<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>In this example, we again calculate total net sales by region, but the result is returned as a clean <code style=\"background: #F5F5F5;\">DataFrame<\/code>, which is easier to export, merge, or use in reports.<\/p>\n<p>&nbsp;<br \/>\n<img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_12.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Applying Multiple Aggregations on One Column<\/h2>\n<p>&nbsp;<br \/>\nGroupBy is not limited to a single calculation. You can apply multiple aggregation functions to the same column using <code style=\"background: #F5F5F5;\">agg()<\/code>.<\/p>\n<p>In this example, we calculate the sum, mean, minimum, maximum, and count of <code style=\"background: #F5F5F5;\">net_sales<\/code> for each region.<\/p>\n<p>This gives us a quick statistical summary of regional sales performance and helps us compare not only total revenue but also average order size and order volume.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>df.groupby(\"region\")[\"net_sales\"].agg([\"sum\", \"mean\", \"min\", \"max\", \"count\"])<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_2.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Using Named Aggregations<\/h2>\n<p>&nbsp;<br \/>\nNamed aggregations make GroupBy outputs easier to read and use. Instead of returning generic column names like <code style=\"background: #F5F5F5;\">sum<\/code> or <code style=\"background: #F5F5F5;\">mean<\/code>, we define our own names such as <code style=\"background: #F5F5F5;\">total_sales<\/code>, <code style=\"background: #F5F5F5;\">average_order_value<\/code>, <code style=\"background: #F5F5F5;\">total_units<\/code>, and <code style=\"background: #F5F5F5;\">number_of_orders<\/code>.<\/p>\n<p>This is especially helpful when preparing analysis for dashboards, reports, or tutorials because the output column names clearly explain what each metric represents.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>region_summary = (\r\n    df.groupby(\"region\", as_index=False)\r\n      .agg(\r\n          total_sales=(\"net_sales\", \"sum\"),\r\n          average_order_value=(\"net_sales\", \"mean\"),\r\n          total_units=(\"units\", \"sum\"),\r\n          number_of_orders=(\"order_id\", \"count\")\r\n      )\r\n)\r\n\r\nregion_summary<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_7.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Grouping by Multiple Columns<\/h2>\n<p>&nbsp;<br \/>\nYou can also group data by more than one column. In this example, we group by both <code style=\"background: #F5F5F5;\">region<\/code> and <code style=\"background: #F5F5F5;\">category<\/code> to calculate total net sales for each product category within each region.<\/p>\n<p>This gives us a more detailed view of the data compared to grouping by region alone. Multi-column grouping is useful when you want to analyze performance across different dimensions, such as region and product, department and employee, or month and customer segment.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>df.groupby([\"region\", \"category\"], as_index=False)[\"net_sales\"].sum()<\/code><\/pre>\n<\/div>\n<p>&nbsp;<br \/>\n<img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_11.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Sorting GroupBy Results<\/h2>\n<p>&nbsp;<br \/>\nAfter grouping and aggregating data, you often want to sort the results to find the highest or lowest values.<\/p>\n<p>In this example, we calculate total sales by product category and then sort the results in descending order.<\/p>\n<p>This makes it easy to identify which category generated the most revenue. Sorting grouped results is a simple but powerful step when turning raw summaries into useful insights.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>category_sales = (\r\n    df.groupby(\"category\", as_index=False)\r\n      .agg(total_sales=(\"net_sales\", \"sum\"))\r\n      .sort_values(\"total_sales\", ascending=False)\r\n)\r\n\r\ncategory_sales<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_6.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Understanding Count vs Size<\/h2>\n<p>&nbsp;<br \/>\nPandas provides both <code style=\"background: #F5F5F5;\">count()<\/code> and <code style=\"background: #F5F5F5;\">size()<\/code>, but they are not exactly the same. The <code style=\"background: #F5F5F5;\">size()<\/code> method counts the total number of rows in each group, including rows with missing values. The <code style=\"background: #F5F5F5;\">count()<\/code> method counts only non-missing values in a selected column.<\/p>\n<p>In this example, we intentionally add a missing value to the <code style=\"background: #F5F5F5;\">sales_rep<\/code> column. The output shows that <code style=\"background: #F5F5F5;\">size()<\/code> still counts four rows for each region, while <code style=\"background: #F5F5F5;\">count()<\/code> returns three for North because one <code style=\"background: #F5F5F5;\">sales_rep<\/code> value is missing.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>import numpy as np\r\n\r\ndf_missing = df.copy()\r\ndf_missing.loc[2, \"sales_rep\"] = np.nan\r\n\r\nprint(\"Using size():\")\r\ndisplay(df_missing.groupby(\"region\").size())\r\n\r\nprint(\"Using count() on sales_rep:\")\r\ndisplay(df_missing.groupby(\"region\")[\"sales_rep\"].count())<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Output:<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>Using size():\r\nregion\r\nNorth    4\r\nSouth    4\r\nWest     4\r\ndtype: int64\r\n\r\nUsing count() on sales_rep:\r\nregion\r\nNorth    3\r\nSouth    4\r\nWest     4\r\nName: sales_rep, dtype: int64<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Using <code>transform()<\/code> for Group-Level Features<\/h2>\n<p>&nbsp;<br \/>\nThe <code style=\"background: #F5F5F5;\">transform()<\/code> method is useful when you want to calculate a group-level value and add it back to the original <code style=\"background: #F5F5F5;\">DataFrame<\/code>.<\/p>\n<p>In this example, we calculate total sales for each region and store it in a new column called <code style=\"background: #F5F5F5;\">region_total_sales<\/code>.<\/p>\n<p>We then calculate each order's share of its region's total sales. Unlike <code style=\"background: #F5F5F5;\">agg()<\/code>, which reduces the data to one row per group, <code style=\"background: #F5F5F5;\">transform()<\/code> returns values aligned with the original rows, making it very useful for feature engineering.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>df[\"region_total_sales\"] = df.groupby(\"region\")[\"net_sales\"].transform(\"sum\")\r\ndf[\"order_share_of_region\"] = df[\"net_sales\"] \/ df[\"region_total_sales\"]\r\n\r\ndf[[\"order_id\", \"region\", \"net_sales\", \"region_total_sales\", \"order_share_of_region\"]]<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_5.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Filtering Groups With <code>filter()<\/code><\/h2>\n<p>&nbsp;<br \/>\nThe <code style=\"background: #F5F5F5;\">filter()<\/code> method lets you keep or remove entire groups based on a condition. In this example, we keep only the regions where total net sales are greater than 3,000.<\/p>\n<p>Instead of returning one summary row per group, <code style=\"background: #F5F5F5;\">filter()<\/code> returns the original rows from the groups that meet the condition. This is useful when you want to remove low-performing groups or keep only groups that satisfy a business rule.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>high_sales_regions = df.groupby(\"region\").filter(lambda group: group[\"net_sales\"].sum() > 3000)\r\n\r\nhigh_sales_regions<\/code><\/pre>\n<\/div>\n<p>&nbsp;<br \/>\n<img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_8.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Applying Custom Logic With <code>apply()<\/code><\/h2>\n<p>&nbsp;<br \/>\nThe <code style=\"background: #F5F5F5;\">apply()<\/code> method gives you more flexibility because it allows you to run custom logic on each group.<\/p>\n<p>In this example, we use <code style=\"background: #F5F5F5;\">apply()<\/code> with <code style=\"background: #F5F5F5;\">nlargest()<\/code> to find the top order by net sales in each region. This is useful when built-in aggregation functions are not enough for your analysis.<\/p>\n<p>However, <code style=\"background: #F5F5F5;\">apply()<\/code> can be slower than built-in methods like <code style=\"background: #F5F5F5;\">sum()<\/code>, <code style=\"background: #F5F5F5;\">mean()<\/code>, <code style=\"background: #F5F5F5;\">agg()<\/code>, and <code style=\"background: #F5F5F5;\">transform()<\/code>, so it is best to use it only when you need custom group-wise operations.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>top_order_by_region = (\r\n    df.groupby(\"region\", group_keys=False)\r\n      .apply(lambda group: group.nlargest(1, \"net_sales\"))\r\n)\r\n\r\ntop_order_by_region<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_4.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Grouping by Dates<\/h2>\n<p>&nbsp;<br \/>\nGroupBy is also very useful for time-based analysis.<\/p>\n<p>In this example, we extract the month from the <code style=\"background: #F5F5F5;\">order_date<\/code> column and group the data by month.<\/p>\n<p>We then calculate total sales and total orders for each month. This approach is helpful when analyzing trends over time, such as monthly sales, weekly user activity, or yearly revenue growth.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>df[\"month\"] = df[\"order_date\"].dt.to_period(\"M\").astype(str)\r\n\r\nmonthly_sales = (\r\n    df.groupby(\"month\", as_index=False)\r\n      .agg(total_sales=(\"net_sales\", \"sum\"), total_orders=(\"order_id\", \"count\"))\r\n)\r\n\r\nmonthly_sales<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_10.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Grouping by Dates With <code>pd.Grouper<\/code><\/h2>\n<p>&nbsp;<br \/>\n<code style=\"background: #F5F5F5;\">pd.Grouper<\/code> provides a cleaner way to group time series data without manually creating a separate month column.<\/p>\n<p>In this example, we group the <code style=\"background: #F5F5F5;\">DataFrame<\/code> by <code style=\"background: #F5F5F5;\">order_date<\/code> using a monthly frequency and calculate total sales and total orders.<\/p>\n<p>This is especially useful when working with real-world datasets that contain timestamps and you want to summarize data by day, week, month, quarter, or year.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>monthly_sales_grouper = (\r\n    df.groupby(pd.Grouper(key=\"order_date\", freq=\"M\"))\r\n      .agg(total_sales=(\"net_sales\", \"sum\"), total_orders=(\"order_id\", \"count\"))\r\n      .reset_index()\r\n)\r\n\r\nmonthly_sales_grouper<\/code><\/pre>\n<\/div>\n<p>&nbsp;<br \/>\n<img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_13.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Creating a Pivot-Style Summary With GroupBy<\/h2>\n<p>&nbsp;<br \/>\nYou can combine <code style=\"background: #F5F5F5;\">groupby()<\/code> with <code style=\"background: #F5F5F5;\">unstack()<\/code> to create a pivot-style summary table.<\/p>\n<p>In this example, we group the data by <code style=\"background: #F5F5F5;\">region<\/code> and <code style=\"background: #F5F5F5;\">category<\/code>, calculate total net sales, and then reshape the result so that categories become columns. This makes the output easier to compare across regions and categories. It is a great technique when you want a compact table for reporting or quick analysis.<\/p>\n<div style=\"width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;\">\n<pre><code>region_category_table = (\r\n    df.groupby([\"region\", \"category\"])[\"net_sales\"]\r\n      .sum()\r\n      .unstack(fill_value=0)\r\n)\r\n\r\nregion_category_table<\/code><\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/awan_pandas_groupby_explained_examples_3.png\" alt=\"Pandas GroupBy Explained With Examples\" width=\"100%\" \/><br \/>\n&nbsp;<\/p>\n<h2><font color=\"#f3ac35\">#&nbsp;<\/font>Conclusion<\/h2>\n<p>&nbsp;<br \/>\nPandas GroupBy is one of the most powerful tools for data analysis in Python. It helps you summarize data, compare groups, create new features, filter results, and apply custom calculations without writing unnecessary manual logic.<\/p>\n<p>While working on this tutorial, I realized how much depth there is in GroupBy. Even after working with data for years, I learned new and better ways to solve common problems. Features like <code style=\"background: #F5F5F5;\">pd.Grouper<\/code>, custom aggregation functions, and <code style=\"background: #F5F5F5;\">transform()<\/code> stood out because they make many tasks faster, cleaner, and easier to maintain.<\/p>\n<p>This is also why understanding the native tools matters. It is tempting to rely on vibe coding or quick custom solutions, but those can often produce slower, more complicated code. When you know what pandas already provides, you can write solutions that are more efficient, reusable, and practical for real-world data analysis.<\/p>\n<p>In this tutorial, we covered the most useful GroupBy operations, including basic aggregation, named aggregation, multi-column grouping, sorting, <code style=\"background: #F5F5F5;\">count()<\/code> vs <code style=\"background: #F5F5F5;\">size()<\/code>, <code style=\"background: #F5F5F5;\">transform()<\/code>, <code style=\"background: #F5F5F5;\">filter()<\/code>, <code style=\"background: #F5F5F5;\">apply()<\/code>, date grouping, and pivot-style summaries. Once you understand these patterns, you can use GroupBy to answer many real-world data analysis questions quickly and confidently.<br \/>\n&nbsp;<br \/>\n&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"Learn how to use Pandas GroupBy to summarize, compare, and analyze grouped data with simple, practical examples.\n","protected":false},"author":207,"featured_media":205400,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","_seopress_robots_follow":"","_seopress_robots_imageindex":"","_seopress_robots_snippet":"","_seopress_robots_primary_cat":"","_seopress_robots_breadcrumbs":"","_seopress_robots_freeze_modified_date":"","_seopress_robots_custom_modified_date":"","_seopress_robots_canonical":"","_seopress_social_fb_title":"","_seopress_social_fb_desc":"","_seopress_social_fb_img":"","_seopress_social_fb_img_attachment_id":0,"_seopress_social_fb_img_width":0,"_seopress_social_fb_img_height":0,"_seopress_social_twitter_title":"","_seopress_social_twitter_desc":"","_seopress_social_twitter_img":"","_seopress_social_twitter_img_attachment_id":0,"_seopress_social_twitter_img_width":0,"_seopress_social_twitter_img_height":0,"_seopress_redirections_value":"","_seopress_redirections_enabled":"","_seopress_redirections_enabled_regex":"","_seopress_redirections_logged_status":"","_seopress_redirections_param":"","_seopress_redirections_type":0,"_seopress_analysis_target_kw":"","inline_featured_image":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"mc4wp_mailchimp_campaign":[],"footnotes":"","_links_to":"","_links_to_target":""},"categories":[5286],"tags":[166],"class_list":["post-205319","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kdnuggets-originals","tag-data-science"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/posts\/205319","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/users\/207"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/comments?post=205319"}],"version-history":[{"count":3,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/posts\/205319\/revisions"}],"predecessor-version":[{"id":205398,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/posts\/205319\/revisions\/205398"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/media\/205400"}],"wp:attachment":[{"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/media?parent=205319"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/categories?post=205319"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/tags?post=205319"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}