[{"content":"\u8fd9\u662f Spark SQL Metrics \u6df1\u5ea6\u89e3\u6790\u7cfb\u5217\u7684\u7b2c\u516d\u90e8\u5206\uff1a\n\u7b2c\u4e00\u90e8\u5206\uff1a\u6307\u6807\u7c7b\u578b\u3001\u5b8c\u6574\u53c2\u8003\u548c\u542b\u4e49 \u7b2c\u4e8c\u90e8\u5206\uff1a\u5185\u90e8\u5b9e\u73b0\u673a\u5236\uff0c\u4ee5\u53ca AQE \u5982\u4f55\u5229\u7528\u6307\u6807\u505a\u51fa\u8fd0\u884c\u65f6\u51b3\u7b56 \u7b2c\u4e09\u90e8\u5206\uff1a\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API \u7b2c\u56db\u90e8\u5206\uff1aGluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf \u7b2c\u4e94\u90e8\u5206\uff1aGluten \u6307\u6807\u5185\u90e8\u673a\u5236 \u2014 \u8282\u70b9\u6620\u5c04\u3001\u7ba1\u9053\u805a\u5408\u3001MetricsUpdaterTree \u7b2c\u516d\u90e8\u5206\uff08\u672c\u6587\uff09\uff1a\u5b9e\u6218 \u2014 \u4ee5 TPC-DS q99 \u4e3a\u4f8b\uff0c\u9010\u7b97\u5b50\u89e3\u8bfb Gluten\/Velox \u6307\u6807 \u5728\u524d\u4e94\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u642d\u5efa\u4e86\u5b8c\u6574\u7684\u77e5\u8bc6\u4f53\u7cfb\uff1a\u6307\u6807\u7c7b\u578b\u3001\u5185\u90e8\u6d41\u8f6c\u673a\u5236\u3001\u6269\u5c55 API\u3001Gluten \u7684\u67b6\u6784\u8bbe\u8ba1\u3001\u4ee5\u53ca\u539f\u751f\u7aef\u7684\u805a\u5408\u673a\u5236\u3002\u73b0\u5728\uff0c\u662f\u65f6\u5019\u628a\u8fd9\u4e9b\u77e5\u8bc6\u4ed8\u8bf8\u5b9e\u8df5\u4e86\u3002\u6211\u4eec\u5c06\u6253\u5f00\u4e00\u4e2a\u771f\u5b9e\u67e5\u8be2\u7684 Spark UI\uff0c\u9010\u7b97\u5b50\u8d70\u8bfb\u6bcf\u4e2a\u6307\u6807\uff0c\u5c55\u793a\u5982\u4f55\u4ece\u6307\u6807\u4e2d\u8bfb\u61c2\u67e5\u8be2\u6267\u884c\u7684\u5168\u8c8c\u3002\n\u67e5\u8be2\uff1aTPC-DS q99 TPC-DS q99 \u662f\u4e00\u4e2a 5 \u8868\u5173\u8054\u67e5\u8be2\uff0c\u5206\u6790\u76ee\u5f55\u9500\u552e\u7684\u53d1\u8d27\u5ef6\u8fdf\u60c5\u51b5\uff0c\u6309\u4ed3\u5e93\u3001\u53d1\u8d27\u65b9\u5f0f\u548c\u547c\u53eb\u4e2d\u5fc3\u5206\u7ec4\u3002\u5b83\u5c06\u53d1\u8d27\u5ef6\u8fdf\u5206\u4e3a\u4e0d\u540c\u533a\u95f4\uff0831\u201360 \u5929\u300161\u201390 \u5929\u300191\u2013120 \u5929\u3001120 \u5929\u4ee5\u4e0a\uff09\uff0c\u7136\u540e\u6309\u5ef6\u8fdf\u60c5\u51b5\u6392\u5e8f\u3002\n\u6211\u4eec\u5728\u4e00\u4e2a\u96c6\u7fa4\u4e0a\u4ee5 SF10000\uff0810 TB \u539f\u59cb\u6570\u636e\u91cf\uff09\u8fd0\u884c\u4e86\u8fd9\u4e2a\u67e5\u8be2\uff0c\u4f7f\u7528 Gluten\/Velox \u4f5c\u4e3a\u539f\u751f\u6267\u884c\u540e\u7aef\uff0c\u5e95\u5c42\u5b58\u50a8\u4e3a\u4e91\u5bf9\u8c61\u5b58\u50a8\u4e0a\u7684 Delta Lake \u8868\u3002\n\u67e5\u8be2\u8ba1\u5212\u9075\u5faa\u7ecf\u5178\u7684\u661f\u578b\u6a21\u5f0f\uff1a\ncatalog_sales\uff08\u4e8b\u5b9e\u8868\uff0c33 \u4ebf\u884c\uff09 \u2192 BroadcastHashJoin \u5173\u8054 date_dim \u2192 BroadcastHashJoin \u5173\u8054 ship_mode \u2192 BroadcastHashJoin \u5173\u8054 call_center \u2192 BroadcastHashJoin \u5173\u8054 warehouse \u2192 \u90e8\u5206 HashAggregate \u2192 Shuffle\uff08\u54c8\u5e0c\u5206\u533a\uff09 \u2192 AQE \u5408\u5e76 \u2192 \u6700\u7ec8 HashAggregate \u2192 TakeOrderedAndProject\uff08\u53d6\u524d 100 \u6761\uff09 \u672c\u6587\u4e2d\u6240\u6709\u6570\u5b57\u5747\u4e3a\u771f\u5b9e\u6570\u636e\u3002\u8ba9\u6211\u4eec\u9010\u7b97\u5b50\u8d70\u8bfb\u6307\u6807\uff0c\u770b\u770b\u5b83\u4eec\u544a\u8bc9\u4e86\u6211\u4eec\u4ec0\u4e48\u3002\n\u7b2c\u4e00\u8282\uff1a\u4e8b\u5b9e\u8868\u626b\u63cf \u2014 33 \u4ebf\u884c \u8ba1\u5212\u4e2d\u7684\u7b2c\u4e00\u4e2a\u7b97\u5b50\u662f ScanTransformer catalog_sales\u3002\u4e00\u5207\u4ece\u8fd9\u91cc\u5f00\u59cb\uff0c\u8fd9\u91cc\u7684\u6307\u6807\u8bb2\u8ff0\u4e86\u4e00\u4e2a\u7cbe\u5f69\u7684\u6545\u4e8b\u3002\n\u6570\u636e\u89c4\u6a21 number of raw input rows\uff08\u539f\u59cb\u8f93\u5165\u884c\u6570\uff09\uff1a3,321,160,461 \u2014 33 \u4ebf\u884c number of output rows\uff08\u8f93\u51fa\u884c\u6570\uff09\uff1a2,837,474,310 \u2014 \u8c13\u8bcd\u4e0b\u63a8\u6dd8\u6c70\u90e8\u5206\u884c\u7ec4\u540e\u5269\u4f59 28 \u4ebf\u884c size of files read\uff08\u8bfb\u53d6\u6587\u4ef6\u5927\u5c0f\uff09\uff1a910.9 GiB\uff0c\u6d89\u53ca 2,605 \u4e2a\u6587\u4ef6\u30011,837 \u4e2a\u5206\u533a \u8fd9\u662f\u4e00\u6b21\u5927\u89c4\u6a21\u626b\u63cf\u3002\u4f46\u770b\u770b\u5b83\u7684\u670d\u52a1\u6548\u7387\u6709\u591a\u9ad8\u3002\nI\/O \u5c42\u7ea7 \u2014 \u7f13\u5b58\u7684\u6545\u4e8b \u8fd9\u91cc\u624d\u662f\u771f\u6b63\u6709\u8da3\u7684\u5730\u65b9\uff1a\nstorage read bytes\uff08\u5b58\u50a8\u8bfb\u53d6\u5b57\u8282\u6570\uff09\uff1a0.0 B \u2014 \u96f6\u5b57\u8282\u6765\u81ea\u8fdc\u7a0b\u5b58\u50a8\uff01 local ssd read bytes\uff08\u672c\u5730 SSD \u8bfb\u53d6\u5b57\u8282\u6570\uff09\uff1a2.1 GiB \u2014 \u6240\u6709\u6570\u636e\u5747\u6765\u81ea\u672c\u5730 SSD \u7f13\u5b58 ram read bytes\uff08\u5185\u5b58\u8bfb\u53d6\u5b57\u8282\u6570\uff09\uff1a0.0 B \u2014 \u6ca1\u6709\u5185\u5b58\u7f13\u5b58\u547d\u4e2d number of cache read bytes\uff08\u7f13\u5b58\u8bfb\u53d6\u5b57\u8282\u6570\uff09\uff1a2.1 GiB \u2014 \u4e0e\u672c\u5730 SSD \u6570\u5b57\u5b8c\u5168\u4e00\u81f4 \u60f3\u60f3\u8fd9\u610f\u5473\u7740\u4ec0\u4e48\uff1a\u8868\u5728\u78c1\u76d8\u4e0a\u5360\u636e 910.9 GiB\uff0c\u4f46\u6211\u4eec\u53ea\u9700\u8981\u8bfb\u53d6 2.1 GiB \u7684\u5b9e\u9645\u6570\u636e\u3002\u8fd9\u662f 99.8% \u7684\u7f29\u51cf\u3002\u4e24\u4e2a\u56e0\u7d20\u5171\u540c\u4f5c\u7528\uff1a\n\u5217\u5f0f\u5b58\u50a8\u6548\u7387 \u2014 \u6211\u4eec\u53ea\u8bfb\u53d6\u4e86\u67e5\u8be2\u5f15\u7528\u7684\u5217\uff08\u4ec5\u5360\u8868\u603b\u5217\u6570\u7684\u4e00\u5c0f\u90e8\u5206\uff09 \u884c\u7ec4\u7ea7\u8c13\u8bcd\u4e0b\u63a8 \u2014 Parquet\/Delta \u7684\u7edf\u8ba1\u4fe1\u606f\u8ba9 Velox \u53ef\u4ee5\u8df3\u8fc7 min\/max \u8303\u56f4\u4e0d\u5339\u914d\u7684\u6574\u4e2a\u884c\u7ec4 \u800c\u4e14\u8fd9 2.1 GiB \u5168\u90e8\u6765\u81ea\u672c\u5730 SSD \u7f13\u5b58\u2014\u2014\u6ca1\u6709\u4e00\u4e2a\u5b57\u8282\u9700\u8981\u901a\u8fc7\u7f51\u7edc\u8bbf\u95ee\u8fdc\u7a0b\u5b58\u50a8\u3002\n\u884c\u7ec4\u88c1\u526a number of skipped row groups\uff08\u8df3\u8fc7\u7684\u884c\u7ec4\u6570\uff09\uff1a2,757 Velox \u68c0\u67e5\u4e86\u884c\u7ec4\u7edf\u8ba1\u4fe1\u606f\uff0c\u5b8c\u5168\u8df3\u8fc7\u4e86 2,757 \u4e2a\u884c\u7ec4\u3002\u8fd9\u4e9b\u884c\u7ec4\u5305\u542b\u7684\u6570\u636e\u4e0d\u5728\u8c13\u8bcd\u8303\u56f4\u5185\uff08date_dim \u7684\u8fc7\u6ee4\u6761\u4ef6\u9650\u5236\u4e86 d_month_seq \u7684\u8303\u56f4\uff0c\u8fd9\u8f6c\u5316\u4e3a cs_sold_date_sk \u4e0a\u7684\u8303\u56f4\u8fc7\u6ee4\uff09\u3002\u8fd9\u5c31\u662f\u6211\u4eec\u4ece 910.9 GiB \u7684\u6587\u4ef6\u4e2d\u53ea\u9700\u8bfb\u53d6 2.1 GiB \u7684\u4e3b\u8981\u539f\u56e0\u3002\n\u52a8\u6001\u8fc7\u6ee4\u5668 number of dynamic filters accepted\uff08\u63a5\u53d7\u7684\u52a8\u6001\u8fc7\u6ee4\u5668\u6570\uff09\uff1a9,380 \u8fd9\u662f\u4e00\u4e2a\u5f3a\u5927\u7684\u4f18\u5316\u3002\u5728\u6267\u884c\u8fc7\u7a0b\u4e2d\uff0c\u6bcf\u4e2a broadcast hash join \u7684 Build \u7aef\u4f1a\u751f\u6210\u4e00\u4e2a\u8fc7\u6ee4\u5668\uff08\u4e00\u7ec4\u5173\u8054\u952e\u503c\u6216 Bloom \u8fc7\u6ee4\u5668\uff09\uff0c\u8fd9\u4e9b\u8fc7\u6ee4\u5668\u5728\u8fd0\u884c\u65f6\u88ab\u4e0b\u63a8\u5230 Scan \u7b97\u5b50\u3002\u901a\u8fc7\u5e94\u7528 9,380 \u4e2a\u52a8\u6001\u8fc7\u6ee4\u5668\uff0cScan \u7b97\u5b50\u5728\u6570\u636e\u5230\u8fbe join \u7b97\u5b50\u4e4b\u524d\u5c31\u6dd8\u6c70\u4e86\u4e0d\u5339\u914d\u7684\u884c\u3002\n\u5982\u679c\u4f60\u8bfb\u8fc7\u7b2c\u56db\u90e8\u5206\uff0c\u4f60\u4f1a\u77e5\u9053 Gluten \u5728\u626b\u63cf\u5c42\u9762\u8bb0\u5f55\u4e86\u52a8\u6001\u8fc7\u6ee4\u5668\u7684\u63a5\u53d7\u60c5\u51b5\u3002\u8fd9\u4e2a\u6307\u6807\u544a\u8bc9\u6211\u4eec\uff1a\u8fd0\u884c\u65f6\u8fc7\u6ee4\u5668\u786e\u5b9e\u751f\u6210\u5e76\u5e94\u7528\u4e86\uff0c\u800c\u4e14\u786e\u5b9e\u6709\u5e2e\u52a9\u3002\n\u8017\u65f6\u5206\u6790 time of scan and filter\uff08\u626b\u63cf\u548c\u8fc7\u6ee4\u65f6\u95f4\uff09\uff1a4.5 \u5206\u949f\uff08\u6240\u6709\u4efb\u52a1\u603b\u8ba1\uff09 time of scan IO\uff08\u626b\u63cf I\/O \u65f6\u95f4\uff09\uff1a2.1 \u5206\u949f \u2014 \u7ea6\u4e00\u534a\u7684\u626b\u63cf\u65f6\u95f4\u7528\u4e8e I\/O cpu wall time count\uff08CPU Wall Clock Time \u8ba1\u6570\uff09\uff1a969,724 \u4e2a\u6279\u6b21\u88ab\u5904\u7406 \u626b\u63cf\u5904\u7406\u4e86\u8fd1\u767e\u4e07\u4e2a\u6279\u6b21\u3002I\/O \u65f6\u95f4\uff082.1 \u5206\u949f\uff09\u4e0e\u603b\u626b\u63cf\u65f6\u95f4\uff084.5 \u5206\u949f\uff09\u7684\u5bf9\u6bd4\u544a\u8bc9\u6211\u4eec\uff0c\u5927\u7ea6\u4e00\u534a\u65f6\u95f4\u7528\u4e8e I\/O\uff0c\u4e00\u534a\u7528\u4e8e CPU \u5de5\u4f5c\uff08\u89e3\u538b\u7f29\u3001\u8c13\u8bcd\u8bc4\u4f30\u3001\u5217\u63d0\u53d6\uff09\u3002\n\u4efb\u52a1\u5206\u5e03 \u4ece min\/median\/max \u5206\u5e03\u6765\u770b\uff1a\n\u6bcf\u4e2a\u4efb\u52a1\u8bfb\u53d6\u7684\u5b57\u8282\u6570\uff1a\u4e2d\u4f4d\u6570 32 KiB\uff0c\u6700\u5927\u503c 17.3 MiB \u2014 \u6570\u636e\u5206\u5e03\u5b58\u5728\u9002\u5ea6\u503e\u659c \u6bcf\u4e2a\u4efb\u52a1\u7684\u5cf0\u503c\u5185\u5b58\uff1a\u4e2d\u4f4d\u6570 32 KiB\uff0c\u6700\u5927\u503c 29.9 MiB \u5927\u591a\u6570\u4efb\u52a1\u5904\u7406\u7684\u6570\u636e\u91cf\u5f88\u5c0f\uff08\u8868\u88ab\u5206\u6563\u5230 1,837 \u4e2a\u5206\u533a\uff09\uff0c\u4f46\u67d0\u4e9b\u5206\u533a\u660e\u663e\u66f4\u5927\u3002\u8fd9\u5728\u771f\u5b9e\u6570\u636e\u4e2d\u5f88\u5e38\u89c1 \u2014 \u5b8c\u5168\u5747\u5300\u7684\u5206\u533a\u5206\u5e03\u662f\u7f55\u89c1\u7684\u3002\n\u7b2c\u4e8c\u8282\uff1a\u56db\u4e2a Broadcast Hash Join \u2014 28 \u4ebf\u884c\u63a2\u6d4b \u626b\u63cf\u4e4b\u540e\uff0c\u8ba1\u5212\u94fe\u5f0f\u6267\u884c\u4e86\u56db\u4e2a broadcast hash join\u3002\u6bcf\u4e2a join \u5c06\u4e8b\u5b9e\u8868\u4e0e\u4e00\u4e2a\u7ef4\u5ea6\u8868\u5173\u8054\uff1adate_dim\u3001ship_mode\u3001call_center \u548c warehouse\u3002\u6211\u4eec\u5148\u8be6\u7ec6\u8d70\u8bfb date_dim join\uff08\u8282\u70b9 [18]\uff09\uff0c\u7136\u540e\u603b\u7ed3\u56db\u4e2a join \u7684\u5171\u540c\u6a21\u5f0f\u3002\nBuild \u7aef\uff08date_dim\uff09 number of hash build input rows\uff08\u54c8\u5e0c\u6784\u5efa\u8f93\u5165\u884c\u6570\uff09\uff1a855,925\uff08\u5e7f\u64ad\u540e\u5404\u5206\u533a\u7684 date_dim \u884c\u6570\uff09 time of hash build\uff08\u54c8\u5e0c\u6784\u5efa\u65f6\u95f4\uff09\uff1a819 ms \u2014 \u5feb\u901f\uff0c\u7ef4\u5ea6\u8868\u5f88\u5c0f time of building hash table\uff08\u6784\u5efa Hash Table \u65f6\u95f4\uff09\uff1a533 ms \u2014 \u5b9e\u9645\u7684 Hash Table \u6784\u5efa hash build peak memory bytes\uff08\u54c8\u5e0c\u6784\u5efa\u5cf0\u503c\u5185\u5b58\uff09\uff1a155.7 MiB \u603b\u8ba1\uff08\u6bcf\u4e2a\u4efb\u52a1 68 KiB \u2014 \u975e\u5e38\u5c0f\uff09 Build \u7aef\u5fae\u4e0d\u8db3\u9053\u3002date_dim \u53ea\u6709\u51e0\u5341\u4e07\u884c\uff0c\u5e7f\u64ad\u540e\u6bcf\u4e2a\u4efb\u52a1\u83b7\u5f97\u4e00\u4efd\u526f\u672c\u3002\u6784\u5efa Hash Table \u7684\u65f6\u95f4\u8fdc\u4f4e\u4e8e\u4e00\u79d2\u3002\nProbe \u7aef \u73b0\u5728\u770b Probe \u7aef \u2014 \u8fd9\u624d\u662f\u771f\u6b63\u7684\u5de5\u4f5c\u6240\u5728\uff1a\nnumber of hash probe input rows\uff08\u54c8\u5e0c\u63a2\u6d4b\u8f93\u5165\u884c\u6570\uff09\uff1a2,837,474,310 \u2014 \u6765\u81ea\u626b\u63cf\u7684\u5168\u90e8 28 \u4ebf\u884c time of hash probe\uff08\u54c8\u5e0c\u63a2\u6d4b\u65f6\u95f4\uff09\uff1a31.4 \u79d2\u603b\u8ba1 time of probing hash table\uff08\u63a2\u6d4b Hash Table \u65f6\u95f4\uff09\uff1a4.1 \u79d2 \u2014 \u5b9e\u9645\u7684\u54c8\u5e0c\u67e5\u627e time of preparing hash table probe\uff08\u51c6\u5907 Hash Table \u63a2\u6d4b\u65f6\u95f4\uff09\uff1a3.7 \u79d2 \u2014 \u53cd\u5e8f\u5217\u5316\u5e7f\u64ad\u6570\u636e time of converting rows to columns\uff08\u884c\u8f6c\u5217\u65f6\u95f4\uff09\uff1a8.3 \u79d2 \u2014 \u5217\u5f0f\u683c\u5f0f\u8f6c\u6362 \u6ce8\u610f\u8fd9\u4e2a\u5206\u89e3\uff1a\u5728 31.4 \u79d2\u7684\u603b\u63a2\u6d4b\u65f6\u95f4\u4e2d\uff0c\u53ea\u6709 4.1 \u79d2\u7528\u4e8e\u5b9e\u9645\u7684\u54c8\u5e0c\u67e5\u627e\u3002\u5176\u4f59\u662f\u5f00\u9500 \u2014 \u53cd\u5e8f\u5217\u5316\u3001\u683c\u5f0f\u8f6c\u6362\u548c\u7ba1\u9053\u534f\u8c03\u3002\u8fd9\u5bf9\u4e8e broadcast join \u6765\u8bf4\u662f\u5178\u578b\u7684\uff1a\u54c8\u5e0c\u67e5\u627e\u672c\u8eab\u5f88\u5feb\uff08Hash Table \u53ef\u4ee5\u653e\u5165 L2\/L3 \u7f13\u5b58\uff09\uff0c\u4f46\u5c06 28 \u4ebf\u884c\u901a\u8fc7\u7ba1\u9053\u4f20\u8f93\u9700\u8981\u65f6\u95f4\u3002\n\u751f\u6210\u7684\u52a8\u6001\u8fc7\u6ee4\u5668 number of hash probe dynamic filters produced\uff08\u54c8\u5e0c\u63a2\u6d4b\u4ea7\u751f\u7684\u52a8\u6001\u8fc7\u6ee4\u5668\u6570\uff09\uff1a2,345 \u5904\u7406\u8fd9\u4e2a join \u7684\u6bcf\u4e2a\u4efb\u52a1\u90fd\u4ece Build \u7aef\u7684\u952e\u503c\u751f\u6210\u4e86\u4e00\u4e2a\u52a8\u6001\u8fc7\u6ee4\u5668\u3002\u8fd9 2,345 \u4e2a\u8fc7\u6ee4\u5668\u88ab\u63a8\u56de\u5230 Scan \u7b97\u5b50\uff08\u5bf9\u6211\u4eec\u4e4b\u524d\u770b\u5230\u7684 9,380 \u4e2a\u52a8\u6001\u8fc7\u6ee4\u5668\u6709\u8d21\u732e \u2014 \u591a\u4e2a join \u5171\u540c\u8d21\u732e\u8fc7\u6ee4\u5668\uff09\u3002\n\u96f6 Spill bytes written for spilling of hash build\uff08\u54c8\u5e0c\u6784\u5efa Spill \u5b57\u8282\u6570\uff09\uff1a0.0 B bytes written for spilling of hash probe\uff08\u54c8\u5e0c\u63a2\u6d4b Spill \u5b57\u8282\u6570\uff09\uff1a0.0 B \u5b8c\u7f8e\u3002\u7ef4\u5ea6\u8868\u8db3\u591f\u5c0f\uff0c\u53ef\u4ee5\u5b8c\u5168\u653e\u5165\u5185\u5b58\u3002\u6240\u6709 join \u8fc7\u7a0b\u4e2d\u6ca1\u6709\u4efb\u4f55\u6570\u636e Spill \u5230\u78c1\u76d8\u3002\n\u8f93\u51fa number of hash probe output rows\uff08\u54c8\u5e0c\u63a2\u6d4b\u8f93\u51fa\u884c\u6570\uff09\uff1a2,837,474,310 \u2014 \u5168\u90e8 28 \u4ebf\u884c\u901a\u8fc7 \u8f93\u51fa\u6570\u636e\u91cf\uff1a65.7 GiB\uff08\u8f93\u51fa\u589e\u5927\u662f\u56e0\u4e3a\u6dfb\u52a0\u4e86\u7ef4\u5ea6\u8868\u7684\u5217\uff09 \u6240\u6709\u884c\u90fd\u901a\u8fc7\u4e86\uff0c\u56e0\u4e3a\u8c13\u8bcd\u4e0b\u63a8\u548c\u52a8\u6001\u8fc7\u6ee4\u5668\u5df2\u7ecf\u5728\u626b\u63cf\u9636\u6bb5\u6dd8\u6c70\u4e86\u4e0d\u5339\u914d\u7684\u884c\u3002join \u672c\u8eab\u53ea\u662f\u5728\u6bcf\u884c\u4e0a\u8ffd\u52a0\u7ef4\u5ea6\u5c5e\u6027\u3002\n\u56db\u4e2a Join \u7684\u5bf9\u6bd4 \u4ee5\u4e0b\u662f\u56db\u4e2a broadcast hash join \u7684\u6c47\u603b\u5bf9\u6bd4\uff1a\nJoin \u6784\u5efa\u8868 \u6784\u5efa\u884c\u6570 \u63a2\u6d4b\u65f6\u95f4 \u540e Project \u65f6\u95f4 date_dim date_dim 855,925 31.4s 8.1s ship_mode ship_mode 46,900 1.6 min 8.0s call_center call_center 126,630 2.4 min 8.1s warehouse warehouse 58,625 2.8 min 8.2s \u6ce8\u610f\u63a2\u6d4b\u65f6\u95f4\u968f\u7740\u6bcf\u4e2a\u540e\u7eed join \u9010\u6e10\u589e\u52a0\u3002\u8fd9\u4e0d\u662f\u56e0\u4e3a\u540e\u9762\u7684 join \u66f4\u6162 \u2014 \u800c\u662f\u56e0\u4e3a wallNanos\uff08\u63a2\u6d4b\u65f6\u95f4\u6307\u6807\u7684\u5e95\u5c42\u8ba1\u65f6\u5668\uff09\u5305\u542b\u5b50\u7b97\u5b50\u7684\u7b49\u5f85\u65f6\u95f4\u3002\u6b63\u5982\u6211\u4eec\u5728\u7b2c\u56db\u90e8\u5206\u4e2d\u89e3\u91ca\u7684\uff0c\u6bcf\u4e2a\u7b97\u5b50\u7684 Wall Clock Time \u5305\u62ec\u7b49\u5f85\u5b50\u7b97\u5b50\u63d0\u4f9b\u6570\u636e\u7684\u65f6\u95f4\u3002warehouse join\uff08\u6700\u5916\u5c42\uff09\u5305\u542b\u4e86\u4e0b\u9762\u4e09\u4e2a join \u548c\u626b\u63cf\u7684\u5168\u90e8\u65f6\u95f4\u3002\n\u540e Project \u65f6\u95f4\u975e\u5e38\u4e00\u81f4\uff08\u7ea6 8 \u79d2\uff09\uff0c\u8fd9\u5408\u60c5\u5408\u7406 \u2014 \u6bcf\u4e2a join \u8ffd\u52a0\u51e0\u5217\uff0cProject \u5de5\u4f5c\u91cf\u4e0e\u8f93\u51fa\u5927\u5c0f\u6210\u6b63\u6bd4\uff0c\u800c\u6bcf\u4e2a\u9636\u6bb5\u7684\u8f93\u51fa\u5927\u5c0f\u5927\u81f4\u76f8\u540c\uff0828 \u4ebf\u884c\uff09\u3002\n\u7b2c\u4e09\u8282\uff1a\u805a\u5408 \u2014 28 \u4ebf \u2192 230 \u4e07\u884c \u56db\u4e2a join \u4e4b\u540e\uff0c\u8ba1\u5212\u5e94\u7528\u4e86 FlushableHashAggregateExecTransformer\uff08\u8282\u70b9 [10]\uff09\u8fdb\u884c\u90e8\u5206\u805a\u5408\u3002\u8fd9\u662f\u6570\u636e\u91cf\u6025\u5267\u4e0b\u964d\u7684\u5730\u65b9\u3002\nnumber of output rows\uff08\u8f93\u51fa\u884c\u6570\uff09\uff1a2,369,250 \u2014 \u4ece 28 \u4ebf\u8f93\u5165\u884c\u5b9e\u73b0\u4e86 1,200 \u500d\u7f29\u51cf time of aggregation\uff08\u805a\u5408\u65f6\u95f4\uff09\uff1a6.6 \u5206\u949f time of aggregate functions\uff08\u805a\u5408\u51fd\u6570\u65f6\u95f4\uff09\uff1a52.9 \u79d2 \u2014 \u5b9e\u9645\u7684 SUM() \u8ba1\u7b97 time of preparing hash table probe\uff08\u51c6\u5907 Hash Table \u63a2\u6d4b\u65f6\u95f4\uff09\uff1a5.4 \u5206\u949f \u2014 \u4e3b\u8981\u662f\u5b50\u7b97\u5b50\u7b49\u5f85\u65f6\u95f4\uff08\u4e0a\u9762\u7684 join\uff09 peak memory bytes\uff08\u5cf0\u503c\u5185\u5b58\uff09\uff1a3.8 GiB \u603b\u8ba1\uff08\u6bcf\u4e2a\u4efb\u52a1\u6700\u5927 3.6 MiB\uff09 number of spilled bytes\uff08Spill \u5b57\u8282\u6570\uff09\uff1a0.0 B \u2014 \u65e0\u9700 Spill number of output vectors\uff08\u8f93\u51fa\u5411\u91cf\u6570\uff09\uff1a585 \u2014 28 \u4ebf\u8f93\u5165\u884c\u53ea\u4ea7\u751f\u4e86 585 \u4e2a\u8f93\u51fa\u6279\u6b21 1,200 \u500d\u7684\u7f29\u51cf\u544a\u8bc9\u6211\u4eec\uff0cgroup-by \u952e\uff08\u4ed3\u5e93\u3001\u53d1\u8d27\u65b9\u5f0f\u3001\u547c\u53eb\u4e2d\u5fc3\u3001\u65e5\u671f\u533a\u95f4\uff09\u867d\u7136\u6709\u4e00\u5b9a\u7684\u57fa\u6570\uff0c\u4f46\u4ea7\u751f\u7684\u5206\u7ec4\u6570\u8fdc\u5c11\u4e8e\u8f93\u5165\u884c\u6570\u3002\u5b9e\u9645\u7684\u805a\u5408\u8ba1\u7b97\uff08SUM\uff09\u53ea\u82b1\u4e86 52.9 \u79d2 \u2014 6.6 \u5206\u949f\u805a\u5408\u65f6\u95f4\u4e2d\u7684\u5927\u90e8\u5206\u662f\u4ece\u626b\u63cf\u548c join \u5c42\u5c42\u4e0a\u4f20\u7684\u5b50\u7b97\u5b50\u7b49\u5f85\u65f6\u95f4\u3002\nWholeStageCodegenTransformer \u4ece\u626b\u63cf\u5230\u56db\u4e2a join \u518d\u5230\u90e8\u5206\u805a\u5408\u7684\u6574\u4e2a\u539f\u751f\u7ba1\u9053\uff0c\u5728\u4e00\u4e2a WholeStageCodegenTransformer \u4e2d\u8fd0\u884c\uff1a\nduration\uff08\u6301\u7eed\u65f6\u95f4\uff09\uff1a26.2 \u5206\u949f\u603b\u8ba1\uff08\u6bcf\u4e2a\u4efb\u52a1\u6700\u5927 6.6 \u79d2\uff09 \u8fd9\u662f\u539f\u751f\u6267\u884c\u7ba1\u9053\u7684\u7aef\u5230\u7aef\u65f6\u95f4\u3002\u5b83\u6db5\u76d6\u4e86\u6211\u4eec\u76ee\u524d\u8ba8\u8bba\u7684\u6240\u6709\u5185\u5bb9\uff1a\u626b\u63cf 33 \u4ebf\u884c\u300128 \u4ebf\u884c\u7684\u56db\u4e2a broadcast hash join\u3001\u4ee5\u53ca\u805a\u5408\u81f3 230 \u4e07\u884c \u2014 \u5168\u90e8\u5728 Velox \u7684\u5411\u91cf\u5316\u539f\u751f\u5f15\u64ce\u4e2d\u6267\u884c\u3002\n\u7b2c\u56db\u8282\uff1aShuffle \u2014 \u57fa\u4e8e\u54c8\u5e0c\uff0c\u7d27\u51d1\u9ad8\u6548 \u90e8\u5206\u805a\u5408\u4e4b\u540e\uff0c\u8ba1\u5212\u901a\u8fc7 ColumnarExchange\uff08\u8282\u70b9 [7]\uff09\u6267\u884c shuffle\uff0c\u6309 group-by \u952e\u91cd\u65b0\u5206\u914d\u6570\u636e\u4ee5\u8fdb\u884c\u6700\u7ec8\u805a\u5408\u3002\n\u5199\u5165\u7aef shuffle bytes written\uff08shuffle \u5199\u5165\u5b57\u8282\u6570\uff09\uff1a243.6 MiB \u2014 \u975e\u5e38\u5c0f shuffle write time\uff08shuffle \u5199\u5165\u65f6\u95f4\uff09\uff1a327 ms time to split\uff08\u5206\u533a\u65f6\u95f4\uff09\uff1a33.7 \u79d2 \u2014 \u54c8\u5e0c\u5206\u533a\u4e3a 512 \u4e2a\u5206\u533a shuffle wall time\uff08shuffle Wall Clock Time\uff09\uff1a21.7 \u79d2 shuffle bytes spilled\uff08shuffle Spill \u5b57\u8282\u6570\uff09\uff1a0.0 B \u2014 \u5b8c\u5168\u653e\u5165\u5185\u5b58 peak bytes allocated\uff08\u5cf0\u503c\u5206\u914d\u5b57\u8282\u6570\uff09\uff1a255.1 GiB \u603b\u8ba1\uff08\u6bcf\u4e2a\u4efb\u52a1\u6700\u5927 446.5 MiB\uff09 \u805a\u5408\u5c06 28 \u4ebf\u884c\u7f29\u51cf\u5230 230 \u4e07\u884c\uff0c\u8fd9 230 \u4e07\u884c\u5e8f\u5217\u5316\u540e\u4ec5\u4e3a 243.6 MiB\u3002\u4e0e\u6211\u4eec\u8d77\u59cb\u7684 910.9 GiB \u4e8b\u5b9e\u8868\u76f8\u6bd4 \u2014 \u805a\u5408\u4f7f\u5f97 shuffle \u51e0\u4e4e\u5fae\u4e0d\u8db3\u9053\u3002\n\u5206\u533a\u65f6\u95f4\uff0833.7 \u79d2\uff09\u662f\u5c06\u8f93\u51fa\u54c8\u5e0c\u5206\u533a\u5230 512 \u4e2a\u6876\u7684\u65f6\u95f4\u3002\u5b9e\u9645\u5199\u5165\u65f6\u95f4\uff08327 ms\uff09\u5f88\u5feb\uff0c\u56e0\u4e3a\u6570\u636e\u91cf\u6781\u5c0f\u3002\n\u8bfb\u53d6\u7aef remote bytes read\uff08\u8fdc\u7a0b\u8bfb\u53d6\u5b57\u8282\u6570\uff09\uff1a228.3 MiB local bytes read\uff08\u672c\u5730\u8bfb\u53d6\u5b57\u8282\u6570\uff09\uff1a15.3 MiB remote reqs duration\uff08\u8fdc\u7a0b\u8bf7\u6c42\u8017\u65f6\uff09\uff1a41.9 \u79d2 \u2014 \u8de8\u8282\u70b9 shuffle \u83b7\u53d6 time to deserialize\uff08\u53cd\u5e8f\u5217\u5316\u65f6\u95f4\uff09\uff1a2.2 \u79d2 \u5927\u90e8\u5206 shuffle \u6570\u636e\uff08228.3 MiB\uff09\u4ece\u8fdc\u7a0b Executor \u8bfb\u53d6\uff0c\u5c11\u90e8\u5206\uff0815.3 MiB\uff09\u6765\u81ea\u672c\u5730\u4efb\u52a1\u300241.9 \u79d2\u7684\u8fdc\u7a0b\u8bf7\u6c42\u65f6\u95f4\u5305\u62ec\u7f51\u7edc\u5ef6\u8fdf\u548c\u8c03\u5ea6\u5f00\u9500 \u2014 \u8fd9\u5bf9\u5206\u5e03\u5f0f\u96c6\u7fa4\u4e0a\u7684\u8de8\u8282\u70b9 shuffle \u6765\u8bf4\u662f\u5178\u578b\u7684\u3002\nShuffle \u5199\u5165\u5668\u7c7b\u578b\u4e3a hash\uff08\u5728\u8ba1\u5212\u4e2d\u53ef\u89c1\uff09\uff0c\u5373\u6309 group-by \u952e\u8fdb\u884c\u54c8\u5e0c\u5206\u533a\u4ee5\u4f9b\u6700\u7ec8\u805a\u5408\u4f7f\u7528\u3002\n\u7b2c\u4e94\u8282\uff1aAQE \u5b9e\u6218 \u2014 512 \u2192 149 \u4e2a\u5206\u533a Shuffle \u4e4b\u540e\uff0cAQEShuffleRead\uff08\u8282\u70b9 [6]\uff09\u4ecb\u5165\uff1a\nnumber of coalesced partitions\uff08\u5408\u5e76\u7684\u5206\u533a\u6570\uff09\uff1a149 \u2014 AQE \u5c06 512 \u4e2a\u5206\u533a\u5408\u5e76\u4e3a 149 \u4e2a partition data size\uff08\u5206\u533a\u6570\u636e\u5927\u5c0f\uff09\uff1a252.8 MiB \u603b\u8ba1\uff0c\u76ee\u6807\u7ea6 1.7 MiB\/\u5206\u533a AQE \u68c0\u67e5\u4e86 shuffle \u8f93\u51fa\u7684\u7edf\u8ba1\u4fe1\u606f\uff0c\u5224\u5b9a 512 \u4e2a\u5206\u533a\u5bf9\u4e8e 252.8 MiB \u7684\u6570\u636e\u6765\u8bf4\u592a\u591a\u4e86\u3002\u6309 512 \u4e2a\u5206\u533a\u8ba1\u7b97\uff0c\u6bcf\u4e2a\u5206\u533a\u5e73\u5747\u4e0d\u5230 500 KiB \u2014 \u4e0d\u503c\u5f97 512 \u4e2a\u4efb\u52a1\u7684\u8c03\u5ea6\u5f00\u9500\u3002\u901a\u8fc7\u5408\u5e76\u5230 149 \u4e2a\u5206\u533a\uff0c\u6bcf\u4e2a\u4efb\u52a1\u5904\u7406\u66f4\u5408\u7406\u7684\u7ea6 1.7 MiB \u6570\u636e\u3002\n\u6ca1\u6709\u68c0\u6d4b\u5230\u503e\u659c\u5206\u533a\uff08numSkewedPartitions \u6307\u6807\u7f3a\u5931\uff09\uff0c\u56e0\u6b64 AQE \u53ea\u5e94\u7528\u4e86\u5408\u5e76\uff0c\u6ca1\u6709\u8fdb\u884c\u503e\u659c\u5904\u7406\u3002\n\u8fd9\u662f\u7b2c\u4e8c\u90e8\u5206\u4e2d\u8ba8\u8bba\u7684 AQE \u5982\u4f55\u5229\u7528\u6307\u6807\u8fdb\u884c\u8fd0\u884c\u65f6\u51b3\u7b56\u7684\u5b8c\u7f8e\u793a\u4f8b\u3002Stage 1 \u7684 shuffle \u5199\u5165\u7edf\u8ba1\u76f4\u63a5\u5f71\u54cd\u4e86 Stage 2 \u7684\u5206\u533a\u6570\u3002\n\u7b2c\u516d\u8282\uff1a\u6700\u7ec8\u805a\u5408\u4e0e\u7ed3\u679c \u2014 4,050 \u2192 100 \u884c \u6700\u7ec8\u805a\u5408 RegularHashAggregateExecTransformer\uff08\u8282\u70b9 [4]\uff09\u6267\u884c\u90e8\u5206\u805a\u5408\u7ed3\u679c\u7684\u6700\u7ec8\u5408\u5e76\uff1a\n\u8f93\u5165\uff1a2,369,250 \u884c\uff0c\u5206\u5e03\u5728 149 \u4e2a\u4efb\u52a1\u4e2d number of output rows\uff08\u8f93\u51fa\u884c\u6570\uff09\uff1a4,050 \u2014 \u6700\u7ec8\u5206\u7ec4\u6570 time of aggregation\uff08\u805a\u5408\u65f6\u95f4\uff09\uff1a3.2 \u79d2 peak memory bytes\uff08\u5cf0\u503c\u5185\u5b58\uff09\uff1a194.4 MiB\uff08\u6bcf\u4e2a\u4efb\u52a1 1.3 MiB\uff09 \u65e0 Spill \u90e8\u5206\u805a\u5408\u5df2\u7ecf\u5b8c\u6210\u4e86\u7e41\u91cd\u7684\u5de5\u4f5c \u2014 \u6700\u7ec8\u805a\u5408\u53ea\u9700\u5408\u5e76\u9884\u805a\u5408\u7684\u5206\u7ec4\u3002230 \u4e07\u4e2a\u90e8\u5206\u884c\u5728 3.2 \u79d2\u5185\u6298\u53e0\u4e3a 4,050 \u4e2a\u6700\u7ec8\u5206\u7ec4\u3002\u8f7b\u800c\u6613\u4e3e\u3002\nTop-100 \u6392\u5e8f\u4e0e\u7ed3\u679c\u4ea4\u4ed8 TakeOrderedAndProjectExecTransformer \u63a5\u6536 4,050 \u884c\uff0c\u6392\u5e8f\u540e\u8fd4\u56de\u524d 100 \u6761\u3002\u7136\u540e VeloxColumnarToRow \u5c06\u5217\u5f0f\u7ed3\u679c\u8f6c\u6362\u4e3a\u884c\u683c\u5f0f\u4ea4\u4ed8\u7ed9\u9a71\u52a8\u7a0b\u5e8f\uff1a\nnumber of output rows\uff08\u8f93\u51fa\u884c\u6570\uff09\uff1a100 \u4ece 33 \u4ebf\u884c\u5230 100 \u884c\u3002\u8fd9\u5c31\u662f\u6574\u4e2a\u67e5\u8be2\u3002\n\u5168\u666f\u56de\u987e \u2014 \u6307\u6807\u544a\u8bc9\u4e86\u6211\u4eec\u4ec0\u4e48 \u8ba9\u6211\u4eec\u9000\u540e\u4e00\u6b65\uff0c\u7528\u5173\u952e\u6307\u6807\u5ba1\u89c6\u5b8c\u6574\u7684\u6267\u884c\u6d41\u7a0b\uff1a\n\u626b\u63cf catalog_sales\uff1a33 \u4ebf\u539f\u59cb\u884c \u2192 28 \u4ebf\u884c\uff084.5 \u5206\u949f\uff0c2.1 GiB \u6765\u81ea SSD \u7f13\u5b58\uff09 \u2193 \u5e94\u7528\u4e86 9,380 \u4e2a\u52a8\u6001\u8fc7\u6ee4\u5668\uff0c\u8df3\u8fc7\u4e86 2,757 \u4e2a\u884c\u7ec4 Join \u00d7 4\uff08\u5e7f\u64ad\uff09\uff1a28 \u4ebf\u884c\u7ecf\u8fc7\u6bcf\u4e2a join\uff0c\u96f6Spill \u2193 \u90e8\u5206\u805a\u5408\uff1a28 \u4ebf \u2192 230 \u4e07\u884c\uff081,200 \u500d\u7f29\u51cf\uff09 \u2193 WholeStageCodegenTransformer\uff1a26.2 \u5206\u949f\u603b\u539f\u751f\u7ba1\u9053\u65f6\u95f4 Shuffle\uff1a\u5199\u5165 243.6 MiB\uff08\u54c8\u5e0c\uff0c512 \u4e2a\u5206\u533a\uff09 \u2193 AQE\uff1a512 \u2192 149 \u4e2a\u5206\u533a\uff08\u5408\u5e76\uff09 \u2193 \u6700\u7ec8\u805a\u5408\uff1a230 \u4e07 \u2192 4,050 \u884c\uff083.2 \u79d2\uff09 \u2193 Top-100 \u6392\u5e8f \u2192 \u8fd4\u56de 100 \u884c \u4ece\u6307\u6807\u4e2d\u5f97\u5230\u7684\u5173\u952e\u7ed3\u8bba 1. \u7f13\u5b58\u4e3a\u738b\u3002 \u96f6\u5b57\u8282\u6765\u81ea\u8fdc\u7a0b\u5b58\u50a8\uff0c\u4e00\u5207\u6765\u81ea\u672c\u5730 SSD\u3002910.9 GiB \u7684\u8868\u53ea\u9700\u8981 2.1 GiB \u7684\u5b9e\u9645\u8bfb\u53d6 \u2014 \u901a\u8fc7\u5217\u5f0f\u5b58\u50a8\u6548\u7387\u548c\u8c13\u8bcd\u4e0b\u63a8\u5b9e\u73b0\u4e86 99.8% \u7684\u7f29\u51cf\u3002\n2. \u884c\u7ec4\u88c1\u526a\u6210\u6548\u663e\u8457\u3002 \u8df3\u8fc7\u4e86 2,757 \u4e2a\u884c\u7ec4\u3002Velox \u4f7f\u7528 Parquet\/Delta \u7684 min-max \u7edf\u8ba1\u4fe1\u606f\u5728\u8bfb\u53d6\u4efb\u4f55\u6570\u636e\u4e4b\u524d\u5c31\u6dd8\u6c70\u4e86\u6574\u4e2a\u884c\u7ec4\u3002\u8fd9\u662f I\/O \u7f29\u51cf\u7684\u4e3b\u8981\u9a71\u52a8\u529b\u3002\n3. \u52a8\u6001\u8fc7\u6ee4\u5668\u53d1\u6325\u4e86\u4f5c\u7528\u3002 \u626b\u63cf\u671f\u95f4\u5e94\u7528\u4e86 9,380 \u4e2a\u8fc7\u6ee4\u5668\uff0c\u7531 join Build \u7aef\u5728\u8fd0\u884c\u65f6\u751f\u6210\u3002\u8fd9\u4e9b\u8fc7\u6ee4\u5668\u5728\u6570\u636e\u5230\u8fbe join \u7b97\u5b50\u4e4b\u524d\u5c31\u6dd8\u6c70\u4e86\u4e0d\u5339\u914d\u7684\u884c\uff0c\u907f\u514d\u4e86\u5bf9\u6570\u5341\u4ebf\u884c\u7684\u65e0\u8c13\u5904\u7406\u3002\n4. \u5168\u7a0b\u96f6 Spill\u3002 Join\u3001\u805a\u5408\u548c shuffle \u5168\u90e8\u5728\u5185\u5b58\u4e2d\u5b8c\u6210\u3002\u6574\u4e2a\u67e5\u8be2\u6ca1\u6709\u4e00\u4e2a\u5b57\u8282 Spill \u5230\u78c1\u76d8\u3002\u7ef4\u5ea6\u8868\u8db3\u591f\u5c0f\u53ef\u4ee5\u5e7f\u64ad\uff0c\u90e8\u5206\u805a\u5408\u5728 shuffle \u4e4b\u524d\u7f29\u51cf\u4e86\u6570\u636e\u91cf\u3002\n5. AQE \u5408\u5e76\u5207\u5b9e\u6709\u6548\u3002 512 \u2192 149 \u4e2a\u5206\u533a\u7528\u4e8e\u6700\u7ec8\u805a\u5408\u3002\u6ca1\u6709 AQE \u7684\u8bdd\uff0c\u6211\u4eec\u5c06\u6709 512 \u4e2a\u4efb\u52a1\uff0c\u6bcf\u4e2a\u5904\u7406\u4e0d\u5230 500 KiB \u2014 \u8c03\u5ea6\u5f00\u9500\u5f97\u4e0d\u507f\u5931\u3002\n6. \u74f6\u9888\u5728\u539f\u751f\u7ba1\u9053\u3002 26.2 \u5206\u949f\u7684 WholeStageCodegenTransformer\uff0c\u6db5\u76d6\u4e86 33 \u4ebf\u884c\u7684\u626b\u63cf\u300128 \u4ebf\u884c\u7684\u56db\u4e2a join \u548c\u90e8\u5206\u805a\u5408\u3002\u8fd9\u662f\u771f\u6b63\u7684\u5de5\u4f5c\u6240\u5728\uff0c\u5168\u90e8\u5728 Velox \u7684\u5411\u91cf\u5316\u539f\u751f\u5f15\u64ce\u4e2d\u6267\u884c\u3002\n7. wallNanos \u6cbf\u6811\u4e0a\u5347\u9012\u589e\u3002 \u6bcf\u4e2a\u7236\u7b97\u5b50\u7684 Wall Clock Time \u5305\u542b\u5b50\u7b97\u5b50\u7684\u7b49\u5f85\u65f6\u95f4\u3002\u6700\u5916\u5c42 join \u663e\u793a 2.8 \u5206\u949f\uff0c\u4e0d\u662f\u56e0\u4e3a\u5b83\u672c\u8eab\u6162\uff0c\u800c\u662f\u56e0\u4e3a\u5b83\u5305\u542b\u4e86\u626b\u63cf\u3001\u4e09\u4e2a\u5185\u5c42 join \u53ca\u6240\u6709 I\/O \u7684\u65f6\u95f4\u3002\u8fd9\u8bc1\u5b9e\u4e86\u7b2c\u56db\u90e8\u5206\u4e2d\u7684\u6ce8\u610f\u4e8b\u9879 \u2014 \u9605\u8bfb Wall Clock Time \u6307\u6807\u65f6\uff0c\u8bf7\u59cb\u7ec8\u5c06\u7b97\u5b50\u6811\u7684\u7ed3\u6784\u7eb3\u5165\u8003\u91cf\u3002\n\u603b\u7ed3 \u8fd9\u6b21\u5b9e\u6218\u6f14\u793a\u8868\u660e\uff0c\u6307\u6807\u4e0d\u53ea\u662f\u5c4f\u5e55\u4e0a\u7684\u6570\u5b57 \u2014 \u5b83\u4eec\u662f\u4e00\u4e2a\u53d9\u4e8b\u3002\u6bcf\u4e2a\u6307\u6807\u90fd\u5728\u56de\u7b54\u4e00\u4e2a\u5177\u4f53\u7684\u95ee\u9898\uff1a\n\u6570\u636e\u4ece\u54ea\u91cc\u6765\uff1f \u2192 I\/O \u5c42\u7ea7\u6307\u6807\uff08\u5168\u90e8\u6765\u81ea SSD \u7f13\u5b58\uff09 \u907f\u514d\u4e86\u591a\u5c11\u5de5\u4f5c\uff1f \u2192 \u884c\u7ec4\u88c1\u526a\u548c\u52a8\u6001\u8fc7\u6ee4\u5668 \u6570\u636e\u91cf\u5728\u54ea\u91cc\u9aa4\u964d\uff1f \u2192 \u805a\u5408\uff081,200 \u500d\u7f29\u51cf\uff09 \u662f\u5426\u5b58\u5728\u8d44\u6e90\u538b\u529b\uff1f \u2192 Spill \u6307\u6807\uff08\u5168\u7a0b\u4e3a\u96f6\uff09 \u4f18\u5316\u5668\u662f\u5426\u5e2e\u4e0a\u4e86\u5fd9\uff1f \u2192 AQE \u5408\u5e76\uff08512 \u2192 149 \u4e2a\u5206\u533a\uff09 \u5f53\u4f60\u4e0b\u6b21\u6253\u5f00 Spark UI \u67e5\u770b\u4e00\u4e2a\u6162\u67e5\u8be2\u65f6\uff0c\u4f60\u5df2\u7ecf\u5177\u5907\u4e86\u89e3\u8bfb\u6bcf\u4e2a\u6307\u6807\u3001\u7406\u89e3\u5176\u542b\u4e49\u3001\u5e76\u7cbe\u786e\u5b9a\u4f4d\u74f6\u9888\u6240\u5728\u7684\u80fd\u529b\u3002\n\u5728\u7b2c\u4e00\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u5b66\u4e60\u4e86\u4e94\u79cd\u6307\u6807\u7c7b\u578b\u5e76\u5efa\u7acb\u4e86\u5b8c\u6574\u7684\u53c2\u8003\u624b\u518c\u3002\u5728\u7b2c\u4e8c\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u8ffd\u8e2a\u4e86\u6307\u6807\u5728 Spark \u5185\u90e8\u7684\u6d41\u8f6c\u673a\u5236\u4ee5\u53ca AQE \u5982\u4f55\u5229\u7528\u5b83\u4eec\u3002\u5728\u7b2c\u4e09\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST \u7aef\u70b9\u3002\u5728\u7b2c\u56db\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u4e86\u89e3\u4e86 Gluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf\u4ee5\u652f\u6301\u539f\u751f\u6267\u884c\u3002\u5728\u7b2c\u4e94\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u6df1\u5165\u4e86 Gluten \u539f\u751f\u7aef\u7684\u6307\u6807\u673a\u5236\u3002\u5728\u7b2c\u516d\u90e8\u5206\uff08\u672c\u6587\uff09\u4e2d\uff0c\u6211\u4eec\u5c06\u6240\u6709\u77e5\u8bc6\u878d\u4f1a\u8d2f\u901a \u2014 \u901a\u8fc7\u8d70\u8bfb\u4e00\u4e2a\u771f\u5b9e\u7684 TPC-DS \u67e5\u8be2\uff0c\u5c55\u793a\u6bcf\u4e2a\u6307\u6807\u5982\u4f55\u8bb2\u8ff0\u6267\u884c\u6545\u4e8b\u7684\u4e00\u90e8\u5206\u3002\u672c\u7cfb\u5217\u5230\u6b64\u7ed3\u675f\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/sql-metrics-part6-in-action\/","summary":"SQL Metrics \u7cfb\u5217\u7b2c\u516d\u90e8\u5206\u3002\u4ee5 TPC-DS q99\uff08SF10000\uff0cGluten\/Velox\uff09\u4e3a\u4f8b\uff0c\u9010\u7b97\u5b50\u89e3\u8bfb\u6bcf\u4e2a\u6307\u6807\uff0c\u5c55\u793a\u5982\u4f55\u4ece\u6307\u6807\u4e2d\u8bfb\u61c2\u67e5\u8be2\u6267\u884c\u7684\u5168\u8c8c\u3002","title":"\u6df1\u5165 Spark SQL Metrics\uff08\u7b2c\u516d\u90e8\u5206\uff09\uff1a\u5b9e\u6218\u2014\u2014TPC-DS q99 \u7684 Gluten \u6307\u6807\u5168\u89e3\u8bfb"},{"content":"\u8fd9\u662f Spark SQL Metrics \u6df1\u5ea6\u89e3\u6790\u7cfb\u5217\u7684\u7b2c\u4e94\u90e8\u5206\uff1a\n\u7b2c\u4e00\u90e8\u5206\uff1a\u6307\u6807\u7c7b\u578b\u3001\u5b8c\u6574\u53c2\u8003\u548c\u542b\u4e49 \u7b2c\u4e8c\u90e8\u5206\uff1a\u5185\u90e8\u5b9e\u73b0\u673a\u5236\uff0c\u4ee5\u53ca AQE \u5982\u4f55\u5229\u7528\u6307\u6807\u505a\u51fa\u8fd0\u884c\u65f6\u51b3\u7b56 \u7b2c\u4e09\u90e8\u5206\uff1a\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API \u7b2c\u56db\u90e8\u5206\uff1aGluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf \u7b2c\u4e94\u90e8\u5206\uff08\u672c\u6587\uff09\uff1aGluten \u6307\u6807\u5185\u90e8\u673a\u5236 \u2014 \u8282\u70b9\u6620\u5c04\u3001\u7ba1\u9053\u805a\u5408\u3001MetricsUpdaterTree\u3001\u805a\u5408\u5b50\u9636\u6bb5\u548c Shuffle \u6307\u6807 \u5728\u7b2c\u56db\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u4ece\u5916\u90e8\u89c6\u89d2\u4e86\u89e3\u4e86 Gluten \u7684\u6307\u6807\u4f53\u7cfb \u2014 \u90a3\u4e9b\u51fa\u73b0\u5728 Spark UI \u4e2d\u7684 60 \u591a\u4e2a\u8ba1\u6570\u5668\u3002\u73b0\u5728\u6211\u4eec\u8981\u6df1\u5165\u5185\u90e8\uff1a\u8fd9\u4e9b\u6570\u5b57\u7a76\u7adf\u662f\u5982\u4f55\u4ece\u539f\u751f Velox \u7b97\u5b50\u4f20\u56de JVM \u7684\u3002\u5982\u679c\u4f60\u66fe\u7ecf\u76ef\u7740\u4e00\u4e2a\u4ee4\u4eba\u56f0\u60d1\u7684 Gluten \u6307\u6807\u503c\u60f3\u5f04\u6e05\u5b83\u7684\u6765\u6e90\uff0c\u6216\u8005\u4f60\u6b63\u5728\u4e3a Gluten \u8d21\u732e\u4ee3\u7801\u9700\u8981\u6dfb\u52a0\u65b0\u6307\u6807\uff0c\u8fd9\u7bc7\u6587\u7ae0\u5c31\u662f\u4e3a\u4f60\u5199\u7684\u3002\nSubstrait \u8282\u70b9 ID \u2192 Velox \u7b97\u5b50\u6620\u5c04 \u5f53 Gluten \u901a\u8fc7 Substrait \u5c06 Spark \u8ba1\u5212\u8f6c\u6362\u4e3a Velox \u8ba1\u5212\u65f6\uff0c\u8ba1\u5212\u4e2d\u7684\u6bcf\u4e2a\u7b97\u5b50\u90fd\u4f1a\u83b7\u5f97\u4e00\u4e2a\u8ba1\u5212\u8282\u70b9 ID\u3002\u8fd9\u4e2a ID \u662f JVM \u4e16\u754c\uff08Spark \u6240\u5728\u4e4b\u5904\uff09\u548c C++ \u4e16\u754c\uff08Velox \u6267\u884c\u4e4b\u5904\uff09\u4e4b\u95f4\u7684\u6865\u6881\u3002C++ \u4fa7\u9700\u8981\u5c06\u8fd9\u4e9b ID \u6620\u5c04\u56de\u6307\u6807\u6570\u7ec4\uff0c\u4f7f\u6bcf\u4e2a\u539f\u751f\u6307\u6807\u503c\u90fd\u80fd\u5173\u8054\u5230\u6b63\u786e\u7684 Spark \u7b97\u5b50\u3002\ngetOrderedNodeIds() \u7684\u5de5\u4f5c\u539f\u7406 \u5173\u952e\u51fd\u6570\u662f getOrderedNodeIds()\u3002\u5b83\u5bf9 Velox \u8ba1\u5212\u6811\u6267\u884c\u540e\u5e8f\u904d\u5386\uff0c\u6784\u5efa orderedNodeIds_ \u5411\u91cf\u3002\u904d\u5386\u987a\u5e8f\u81f3\u5173\u91cd\u8981 \u2014 \u5b83\u51b3\u5b9a\u4e86\u6241\u5e73\u6307\u6807\u6570\u7ec4\u4e2d\u7684\u54ea\u4e2a\u7d22\u5f15\u5bf9\u5e94\u54ea\u4e2a\u7b97\u5b50\u3002\nSpark Plan \u2192 Substrait Plan \u2192 Velox PlanNode tree \u2193 getOrderedNodeIds()\uff08\u540e\u5e8f\u904d\u5386\uff09 \u2193 orderedNodeIds_[0] = \u53f6\u5b50\u7b97\u5b50 orderedNodeIds_[1] = \u4e0b\u4e00\u4e2a\u7b97\u5b50 ... orderedNodeIds_[N] = \u6839\u7b97\u5b50 \u4e3a\u4ec0\u4e48\u4f7f\u7528\u540e\u5e8f\u904d\u5386\uff1f\u56e0\u4e3a JVM \u4fa7\u7684 MetricsUpdaterTree \u4e5f\u662f\u5148\u904d\u5386\u5b50\u8282\u70b9\u518d\u904d\u5386\u7236\u8282\u70b9\u3002\u4e24\u4fa7\u4f7f\u7528\u76f8\u540c\u7684\u904d\u5386\u987a\u5e8f\u786e\u4fdd\u7d22\u5f15\u4fdd\u6301\u540c\u6b65\uff0c\u65e0\u9700\u663e\u5f0f\u67e5\u627e\u8868\u3002\n\u7279\u6b8a\u60c5\u51b5\uff1aFilter\u2013Project \u878d\u5408 Velox \u5c06 FilterNode \u2192 ProjectNode \u878d\u5408\u4e3a\u5355\u4e2a FilterProject \u7b97\u5b50\u4ee5\u63d0\u5347\u6027\u80fd\u3002\u5f53\u8fd9\u79cd\u60c5\u51b5\u53d1\u751f\u65f6\uff0cFilter \u8282\u70b9\u6ca1\u6709\u72ec\u7acb\u7684\u8fd0\u884c\u65f6\u6307\u6807 \u2014 \u5b83\u5df2\u88ab\u5438\u6536\u5230\u878d\u5408\u7b97\u5b50\u4e2d\u3002Gluten \u901a\u8fc7\u5c06 Filter \u7684\u8ba1\u5212\u8282\u70b9 ID \u6dfb\u52a0\u5230 omittedNodeIds_ \u5e76\u4e3a\u5176\u6307\u6807\u69fd\u4f4d\u586b\u5145\u96f6\u503c\u6765\u5904\u7406\u8fd9\u79cd\u60c5\u51b5\u3002JVM \u4fa7\u770b\u5230\u96f6\u503c\u69fd\u4f4d\u540e\u77e5\u9053\u5e94\u8be5\u8df3\u8fc7\u5b83\u3002\n\u7406\u89e3\u8fd9\u4e00\u70b9\u5bf9\u8c03\u8bd5\u5f88\u91cd\u8981\uff1a\u5982\u679c\u4f60\u770b\u5230\u4e00\u4e2a FilterExecTransformer \u7684\u6240\u6709\u6307\u6807\u90fd\u662f\u96f6\uff0c\u8fd9\u5e76\u4e0d\u610f\u5473\u7740\u8fc7\u6ee4\u5668\u6ca1\u6709\u88ab\u6267\u884c \u2014 \u800c\u662f Velox \u5c06\u5b83\u4e0e\u76f8\u90bb\u7684 Project \u878d\u5408\u4e86\u3002\n\u7279\u6b8a\u60c5\u51b5\uff1aUnion Velox \u5c06 Union \u8868\u793a\u4e3a LocalPartitionNode + LocalExchangeNode + \u865a\u62df ProjectNode\u3002\u8fd9\u79cd\u5185\u90e8\u8868\u793a\u4e0d\u80fd\u5e72\u51c0\u5730\u6620\u5c04\u5230\u5355\u4e2a Spark UnionExec\u3002Gluten \u4f1a\u89e3\u5f00\u8fd9\u4e2a\u7ed3\u6784\u4ee5\u627e\u5230\u771f\u6b63\u7684\u5b50\u8282\u70b9\uff0c\u786e\u4fdd\u6307\u6807\u6570\u7ec4\u4e0e Spark \u8ba1\u5212\u7684\u7ed3\u6784\u5bf9\u9f50\uff0c\u800c\u975e Velox \u7684\u5185\u90e8\u8868\u793a\u3002\nVelox \u7ba1\u9053\u6a21\u578b\u4e0e\u6307\u6807\u805a\u5408 \u8fd9\u91cc\u6709\u4e00\u4e2a\u8ba9\u5f88\u591a\u5f00\u53d1\u8005\u59cb\u6599\u672a\u53ca\u7684\u5fae\u5999\u4e4b\u5904\uff1aVelox \u4e0d\u4f1a\u5c06\u8ba1\u5212\u4f5c\u4e3a\u5355\u4e2a\u7ba1\u9053\u6267\u884c\u3002\u5b83\u5728\u4ea4\u6362\u8fb9\u754c\uff08\u6709\u65f6\u4e5f\u5728\u5176\u4ed6\u70b9\uff0c\u5982 Hash Join \u7684 Build \u4fa7\uff09\u5c06\u8ba1\u5212\u62c6\u5206\u4e3a\u591a\u4e2a\u7ba1\u9053\u3002\u5355\u4e2a\u903b\u8f91\u7b97\u5b50\u53ef\u4ee5\u6709\u5728\u4e0d\u540c\u7ba1\u9053\u4e2d\u8fd0\u884c\u7684\u5b9e\u4f8b\uff0c\u6bcf\u4e2a\u5b9e\u4f8b\u6536\u96c6\u5404\u81ea\u7684\u6307\u6807\u3002\ntoPlanStats() \u5982\u4f55\u805a\u5408 toPlanStats(taskStats) \u4ece\u6240\u6709\u7ba1\u9053\u5b9e\u4f8b\u6536\u96c6\u6307\u6807\uff0c\u8fd4\u56de Map[PlanNodeId \u2192 PlanStats]\u3002\u6bcf\u4e2a PlanStats \u5305\u542b\uff1a\noperatorStats\uff1a\u4e00\u4e2a Map[SequenceId \u2192 OperatorStats]\uff0c\u5176\u4e2d\u6bcf\u4e2a\u6761\u76ee\u4ee3\u8868\u8be5\u7b97\u5b50\u7684\u4e00\u4e2a\u7ba1\u9053\u5b9e\u4f8b \u5f53 Gluten \u7684 collectMetrics() \u904d\u5386\u8fd9\u4e9b\u6761\u76ee\u65f6\uff0c\u5b83\u5c06\u6bcf\u4e2a\u7ba1\u9053\u5b9e\u4f8b\u5199\u5165\u5355\u72ec\u7684\u6307\u6807\u7d22\u5f15\uff1a\nfor (const auto&amp; entry : stats.operatorStats) { \/\/ \u6bcf\u4e2a entry \u662f\u8be5\u7b97\u5b50\u7684\u4e00\u4e2a\u7ba1\u9053\u5b9e\u4f8b metrics_-&gt;get(Metrics::kWallNanos)[metricIndex] = entry.second-&gt;cpuWallTiming.wallNanos; metricIndex++; } \u8fd9\u610f\u5473\u7740\u5355\u4e2a Spark \u7b97\u5b50\u53ef\u80fd\u6620\u5c04\u5230\u6307\u6807\u6570\u7ec4\u4e2d\u7684\u591a\u4e2a\u6307\u6807\u7d22\u5f15\u3002\u4f8b\u5982\uff0c\u4e00\u4e2a HashAggregateExec \u5982\u679c\u540c\u65f6\u51fa\u73b0\u5728 Shuffle \u524d\u7684\u5c40\u90e8\u805a\u5408\u7ba1\u9053\u548c Shuffle \u540e\u7684\u6700\u7ec8\u805a\u5408\u7ba1\u9053\u4e2d\uff0c\u5c31\u4f1a\u6709\u4e24\u4e2a\u72ec\u7acb\u7684\u6307\u6807\u6761\u76ee\u3002\nJVM \u4fa7\u7684\u5408\u5e76 JVM \u4fa7\u7684 MetricsUpdater \u901a\u8fc7\u4ece relMap \u83b7\u53d6\u7ed9\u5b9a\u7b97\u5b50\u7684\u6240\u6709\u6761\u76ee\u5e76\u8c03\u7528 mergeMetrics() \u6765\u5904\u7406\u591a\u7ba1\u9053\u6761\u76ee\u3002\u5bf9\u4e8e\u65f6\u95f4\u6307\u6807\uff0c\u901a\u5e38\u662f\u6c42\u548c\uff1b\u5bf9\u4e8e\u5cf0\u503c\u5185\u5b58\uff0c\u53d6\u6700\u5927\u503c\u3002\u5408\u5e76\u540e\u7684\u7ed3\u679c\u5c31\u662f\u4f60\u5728 Spark UI \u4e2d\u770b\u5230\u7684 \u2014 \u4e00\u7ec4\u4ee3\u8868\u7b97\u5b50\u5728\u6240\u6709\u7ba1\u9053\u4e2d\u603b\u5de5\u4f5c\u91cf\u7684\u6570\u5b57\u3002\nMetricsUpdaterTree \u904d\u5386 \u5728 JVM \u4fa7\uff0cMetricsUtil.scala \u901a\u8fc7\u4e24\u4e2a\u5173\u952e\u65b9\u6cd5\u7f16\u6392\u6574\u4e2a\u6307\u6807\u5206\u53d1\u8fc7\u7a0b\u3002\n\u6784\u5efa\u6811\uff1atreeifyMetricsUpdaters() treeifyMetricsUpdaters(plan) \u4ece SparkPlan \u6784\u5efa MetricsUpdaterTree\u3002\u8fd9\u4e0d\u662f\u7b80\u5355\u7684\u9012\u5f52\u590d\u5236 \u2014 \u6709\u51e0\u4e2a\u8c03\u6574\uff1a\nHashJoin \u5904\u7406\uff1a\u6811\u5c06 Build \u4fa7\u548c\u6d41\u4fa7\u5b50\u8282\u70b9\u5206\u5f00\uff0c\u56e0\u4e3a Velox \u5728\u4e0d\u540c\u7ba1\u9053\u4e2d\u6267\u884c\u5b83\u4eec\u5e76\u4f7f\u7528\u4e0d\u540c\u7684\u6307\u6807 SortMergeJoin \u5904\u7406\uff1a\u7c7b\u4f3c\u5730\u5c06\u7f13\u51b2\u4fa7\u548c\u6d41\u4fa7\u5b50\u8282\u70b9\u5206\u5f00 MetricsUpdater.None \u7b97\u5b50\uff1a\u8fd9\u4e9b\u88ab\u5b8c\u5168\u8df3\u8fc7 \u2014 \u5b83\u4eec\u7684\u5b50\u8282\u70b9\u76f4\u63a5\u94fe\u63a5\u5230\u7236\u8282\u70b9\u3002\u8fd9\u53d1\u751f\u5728 Gluten \u7528\u7a7a\u64cd\u4f5c\u66ff\u6362\u7684\u7b97\u5b50\u4e0a\uff08\u4f8b\u5982\uff0c\u67d0\u4e9b\u9002\u914d\u5668\u8282\u70b9\uff09 \u5b50\u8282\u70b9\u88ab\u53cd\u8f6c\uff1a\u8fd9\u4e00\u70b9\u81f3\u5173\u91cd\u8981\u3002\u5b50\u8282\u70b9\u5217\u8868\u88ab\u53cd\u8f6c\u4ee5\u5339\u914d C++ \u4fa7 getOrderedNodeIds() \u4f7f\u7528\u7684\u540e\u5e8f\u904d\u5386 \u904d\u5386\u6811\uff1aupdateTransformerMetricsInternal() updateTransformerMetricsInternal() \u904d\u5386 MetricsUpdaterTree \u5e76\u5c06\u6307\u6807\u5206\u53d1\u5230\u7279\u5b9a\u7c7b\u578b\u7684\u66f4\u65b0\u5668\uff1a\n\u66f4\u65b0\u5668 \u7b97\u5b50 \u7279\u6b8a\u5904\u7406 HashAggregateMetricsUpdater HashAggregate \u4e09\u9636\u6bb5\u5b50\u6307\u6807\uff08\u89c1\u4e0b\u4e00\u8282\uff09 JoinMetricsUpdaterBase HashJoin \u4e3a Build \u9636\u6bb5\u63d0\u53d6\u989d\u5916\u6307\u6807\u6761\u76ee SortMergeJoinMetricsUpdater SortMergeJoin \u7f13\u51b2\/\u6d41\u9636\u6bb5\u5206\u79bb LimitMetricsUpdater Limit over Sort \u8df3\u8fc7 Limit \u81ea\u8eab\u7684\u6307\u6807\uff08Velox TopN \u540c\u65f6\u5904\u7406\u4e24\u8005\uff09 \u9ed8\u8ba4 \u5176\u4ed6\u6240\u6709\u7b97\u5b50 mergeMetrics() \u2192 updateNativeMetrics() \u5bf9\u4e8e Join\uff0c\u6709\u4e00\u4e2a\u91cd\u8981\u7ec6\u8282\uff1aVelox \u5c06 Build \u9636\u6bb5\u7684\u6307\u6807\u62a5\u544a\u4e3a relMap \u76f4\u63a5\u63d0\u4f9b\u4e4b\u5916\u7684\u989d\u5916\u6761\u76ee\u3002Join \u66f4\u65b0\u5668\u77e5\u9053\u8981\u63d0\u53d6\u8fd9\u4e2a\u989d\u5916\u6761\u76ee\u5e76\u5c06\u5176\u9644\u52a0\u5230 Build \u4fa7\u6307\u6807\u4e0a\uff0c\u8fd9\u5c31\u662f\u4e3a\u4ec0\u4e48\u5373\u4f7f\u6784\u5efa\u548c\u63a2\u6d4b\u53d1\u751f\u5728\u4e0d\u540c\u7ba1\u9053\u4e2d\uff0c\u4f60\u4ecd\u7136\u80fd\u770b\u5230\u51c6\u786e\u7684 Build \u4fa7\u65f6\u95f4\u3002\n\u5bf9\u4e8e Limit over Sort\uff0cGluten \u5b8c\u5168\u8df3\u8fc7 Limit \u81ea\u8eab\u7684\u6307\u6807\u3002Velox \u5c06\u5176\u5b9e\u73b0\u4e3a TopN \u7b97\u5b50\uff0c\u5728\u4e00\u4e2a\u878d\u5408\u64cd\u4f5c\u4e2d\u540c\u65f6\u5904\u7406\u6392\u5e8f\u548c\u9650\u5236\uff0c\u56e0\u6b64\u53ea\u6709\u4e00\u7ec4\u6307\u6807\u9700\u8981\u62a5\u544a\u3002\n\u5206\u53d1\u5b8c\u6210\u540e\uff0c\u904d\u5386\u5668\u9012\u5f52\u5904\u7406\u5b50\u8282\u70b9\uff0c\u4f7f\u7528\u66f4\u65b0\u540e\u7684\u7b97\u5b50\u548c\u6307\u6807\u7d22\u5f15\uff0c\u786e\u4fdd\u6bcf\u4e2a\u5b50\u8282\u70b9\u4ece\u7236\u8282\u70b9\u5728\u6241\u5e73\u6307\u6807\u6570\u7ec4\u4e2d\u7ed3\u675f\u7684\u4f4d\u7f6e\u7ee7\u7eed\u3002\n\u805a\u5408\u5b50\u9636\u6bb5\u6307\u6807 Velox \u4e2d\u7684 Hash \u805a\u5408\u6bd4\u539f\u751f Spark \u66f4\u52a0\u7cbe\u7ec6\u3002\u5b83\u6700\u591a\u53ef\u4ee5\u6267\u884c\u4e09\u4e2a\u9636\u6bb5\uff0c\u7531 AggregationParams \u63a7\u5236\u3002\u7406\u89e3\u8fd9\u79cd\u62c6\u5206\u5bf9\u8bca\u65ad\u805a\u5408\u6027\u80fd\u81f3\u5173\u91cd\u8981\u3002\n\u9636\u6bb5\u4e00\uff1a\u62bd\u53d6\uff08extractionNeeded = true\uff09 \u805a\u5408\u524d\u7684\u5217\u62bd\u53d6 \u2014 \u4f8b\u5982\uff0c\u5728\u5206\u7ec4\u548c\u805a\u5408\u4e4b\u524d\u4ece\u5d4c\u5957\u7ed3\u6784\u4f53\u4e2d\u63d0\u53d6\u5b57\u6bb5\u3002\n\u6307\u6807\uff1a\nextractionCpuCount \u2014 \u62bd\u53d6\u7684 CPU \u65f6\u95f4 extractionWallNanos \u2014 \u62bd\u53d6\u7684\u6302\u949f\u65f6\u95f4 \u5982\u679c\u62bd\u53d6\u65f6\u95f4\u76f8\u5bf9\u4e8e\u603b\u805a\u5408\u65f6\u95f4\u8f83\u9ad8\uff0c\u4f60\u7684 Schema \u53ef\u80fd\u9700\u8981\u5728\u805a\u5408\u524d\u6241\u5e73\u5316\u5d4c\u5957\u5217\u3002\n\u9636\u6bb5\u4e8c\uff1a\u805a\u5408\uff08\u59cb\u7ec8\u5b58\u5728\uff09 \u4e3b\u8981\u7684 Hash \u805a\u5408\u5de5\u4f5c \u2014 \u5bf9\u5206\u7ec4\u952e\u8fdb\u884c\u54c8\u5e0c\u3001\u67e5\u627e\u6216\u521b\u5efa\u5206\u7ec4\u3001\u7d2f\u79ef\u503c\u3002\n\u6307\u6807\uff1a\naggOutputRows \u2014 \u8f93\u51fa\u884c\u6570\uff08\u5373\u4e0d\u540c\u5206\u7ec4\u6570\uff09 aggWallNanos \u2014 \u805a\u5408\u7684\u6302\u949f\u65f6\u95f4 aggPeakMemoryBytes \u2014 Hash Table \u4f7f\u7528\u7684\u5cf0\u503c\u5185\u5b58 aggSpilledBytes \u2014 \u5185\u5b58\u538b\u529b\u89e6\u53d1\u6ea2\u5199\u65f6 Spill \u7684\u5b57\u8282\u6570 flushRowCount \u2014 Hash Table \u8fc7\u5927\u65f6\u5237\u65b0\u7684\u4e2d\u95f4\u884c\u6570 loadedToValueHook \u2014 \u4e0b\u63a8\u805a\u5408\u8ba1\u6570\uff08\u4e00\u79cd\u5c06\u805a\u5408\u4e0b\u63a8\u5230 Scan \u7b97\u5b50\u7684\u4f18\u5316\uff09 flushRowCount \u5bf9\u8c03\u8bd5\u7279\u522b\u6709\u7528\uff1a\u9ad8\u5237\u65b0\u8ba1\u6570\u610f\u5473\u7740 Hash Table \u4e0d\u65ad\u8d85\u51fa\u5185\u5b58\u9884\u7b97\uff0c\u5bfc\u81f4\u4e2d\u95f4\u7ed3\u679c\u88ab\u5237\u65b0\u5e76\u91cd\u65b0\u805a\u5408\u3002\u8fd9\u4f1a\u5e26\u6765\u989d\u5916\u7684\u5de5\u4f5c\u91cf\u5e76\u964d\u4f4e\u67e5\u8be2\u901f\u5ea6\u3002\n\u9636\u6bb5\u4e09\uff1a\u884c\u6784\u9020\uff08rowConstructionNeeded = true\uff09 \u805a\u5408\u540e\u7684\u884c\u7ec4\u88c5 \u2014 \u4f8b\u5982\uff0c\u4ece\u805a\u5408\u7ed3\u679c\u6784\u9020\u8f93\u51fa\u7ed3\u6784\u4f53\u5217\u3002\n\u6307\u6807\uff1a\nrowConstructionCpuCount \u2014 \u884c\u6784\u9020\u7684 CPU \u65f6\u95f4 rowConstructionWallNanos \u2014 \u884c\u6784\u9020\u7684\u6302\u949f\u65f6\u95f4 \u9636\u6bb5\u5982\u4f55\u6620\u5c04\u5230\u6307\u6807\u6761\u76ee \u66f4\u65b0\u5668\u6309\u987a\u5e8f\u904d\u5386 aggregationMetrics \u5217\u8868\uff0c\u6bcf\u4e2a\u9636\u6bb5\u6d88\u8d39\u4e00\u4e2a\u6761\u76ee\uff1a\naggregationMetrics[0] \u2192 \u62bd\u53d6\u9636\u6bb5\uff08\u5982\u679c\u9700\u8981\uff09 aggregationMetrics[1] \u2192 \u805a\u5408\u9636\u6bb5 aggregationMetrics[2] \u2192 \u884c\u6784\u9020\u9636\u6bb5\uff08\u5982\u679c\u9700\u8981\uff09 \u8fd9\u79cd\u4e09\u9636\u6bb5\u62c6\u5206\u662f Gluten\/Velox \u72ec\u6709\u7684 \u2014 \u539f\u751f Spark \u7684 HashAggregateExec \u62a5\u544a\u5355\u4e2a aggTime\uff0c\u5c06\u6240\u6709\u5185\u5bb9\u6df7\u5728\u4e00\u8d77\u3002\u6709\u4e86 Gluten\uff0c\u4f60\u53ef\u4ee5\u7cbe\u786e\u5b9a\u4f4d\u805a\u5408\u7ba1\u9053\u4e2d\u7684\u65f6\u95f4\u6d88\u8017\u5728\u54ea\u91cc\u3002\nShuffle \u6307\u6807 Gluten \u7684\u5217\u5f0f Shuffle \u6709\u81ea\u5df1\u7684\u6307\u6807\u5c42\uff0c\u53ef\u7528\u7684\u6307\u6807\u56e0 Shuffle \u5199\u5165\u5668\u7c7b\u578b\u800c\u5f02\u3002\u4e86\u89e3\u4f7f\u7528\u7684\u662f\u54ea\u79cd\u5199\u5165\u5668\u80fd\u544a\u8bc9\u4f60\u5e94\u8be5\u67e5\u770b\u54ea\u4e9b\u6307\u6807 \u2014 \u4ee5\u53ca\u54ea\u4e9b\u8c03\u4f18\u624b\u6bb5\u662f\u76f8\u5173\u7684\u3002\n\u57fa\u7840\u6307\u6807\uff08\u6240\u6709\u5199\u5165\u5668\uff09 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 dataSize data size \u603b Shuffle \u6570\u636e\u5927\u5c0f bytesSpilled shuffle bytes spilled Shuffle \u671f\u95f4 Spill \u7684\u5b57\u8282\u6570 spillTime time to spill \u6ea2\u5199\u82b1\u8d39\u7684\u65f6\u95f4 compressTime time to compress \u538b\u7f29\u65f6\u95f4 decompressTime time to decompress \u89e3\u538b\u65f6\u95f4 deserializeTime time to deserialize \u53cd\u5e8f\u5217\u5316\u65f6\u95f4 shuffleWallTime shuffle wall time Shuffle \u603b\u6302\u949f\u65f6\u95f4 peakBytes peak bytes allocated Shuffle \u7f13\u51b2\u533a\u7684\u5cf0\u503c\u5185\u5b58 Hash Shuffle \u5199\u5165\u5668\u989d\u5916\u6307\u6807 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 splitTime time to split \u5c06\u884c\u62c6\u5206\u5230\u5206\u533a\u7684\u65f6\u95f4 dictionarySize dictionary size \u5b57\u5178\u7f16\u7801\u5217\u7684\u5927\u5c0f Sort Shuffle \u5199\u5165\u5668\u989d\u5916\u6307\u6807 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 sortTime time to shuffle sort \u6309\u5206\u533a\u6392\u5e8f\u884c\u7684\u65f6\u95f4 c2rTime time to shuffle c2r \u4e3a\u6392\u5e8f\u5c06\u5217\u5f0f\u2192\u884c\u5f0f\u683c\u5f0f\u8f6c\u6362\u7684\u65f6\u95f4 RSS\uff08\u8fdc\u7a0b Shuffle \u670d\u52a1\uff09\u5199\u5165\u5668\u989d\u5916\u6307\u6807 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 sortTime time to shuffle sort \u6309\u5206\u533a\u6392\u5e8f\u884c\u7684\u65f6\u95f4 \u8bca\u65ad Shuffle \u74f6\u9888 c2rTime \u6307\u6807\u503c\u5f97\u7279\u522b\u5173\u6ce8\u3002\u5b83\u4ee3\u8868\u5728\u57fa\u4e8e\u6392\u5e8f\u7684 Shuffle \u5199\u5165\u5668\u5185\u90e8\u5c06\u5217\u5f0f\u6279\u6b21\u8f6c\u6362\u4e3a\u884c\u683c\u5f0f\u7684\u5f00\u9500\u3002\u5728 Velox \u8fd9\u6837\u7684\u5217\u5f0f\u5f15\u64ce\u4e2d\uff0c\u6570\u636e\u5929\u7136\u4ee5\u5217\u5f0f\u683c\u5f0f\u5b58\u5728 \u2014 \u5c06\u5176\u8f6c\u6362\u4e3a\u884c\u683c\u5f0f\u8fdb\u884c\u6392\u5e8f\u662f\u7eaf\u7cb9\u7684\u5f00\u9500\u3002\n\u5982\u679c c2rTime \u5728 shuffleWallTime \u4e2d\u5360\u4e3b\u5bfc\u5730\u4f4d\uff0c\u90a3\u4e48\u5217\u5f0f\u5230\u884c\u5f0f\u7684\u8f6c\u6362\u5c31\u662f\u4f60\u7684\u74f6\u9888\u3002\u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u5207\u6362\u5230\u57fa\u4e8e\u54c8\u5e0c\u7684 Shuffle\uff08\u53ef\u4ee5\u76f4\u63a5\u5728\u5217\u5f0f\u6279\u6b21\u4e0a\u64cd\u4f5c\uff09\u53ef\u80fd\u4f1a\u5e26\u6765\u663e\u8457\u7684\u901f\u5ea6\u63d0\u5347\u3002\u8fd9\u662f Gluten \u7528\u6237\u9762\u4e34\u7684\u5173\u952e\u51b3\u7b56\u4e4b\u4e00\uff1a\u54c8\u5e0c Shuffle \u5bf9\u4e8e\u5217\u6570\u8f83\u591a\u7684\u5bbd\u8868\u66f4\u5feb\uff0c\u800c\u6392\u5e8f Shuffle \u5bf9\u4e8e\u9ad8\u57fa\u6570\u5206\u533a\u952e\u4f7f\u7528\u66f4\u5c11\u7684\u5185\u5b58\u3002\n\u603b\u7ed3 Gluten \u7684\u6307\u6807\u673a\u5236\u4e4b\u6240\u4ee5\u590d\u6742\uff0c\u662f\u56e0\u4e3a\u5b83\u6865\u63a5\u4e86\u4e24\u79cd\u975e\u5e38\u4e0d\u540c\u7684\u6267\u884c\u6a21\u578b \u2014 Spark \u7684\u9010\u884c Volcano \u8fed\u4ee3\u5668\u6a21\u578b\u548c Velox \u7684\u7ba1\u9053\u5e76\u884c\u5411\u91cf\u5316\u6a21\u578b\u3002\u9700\u8981\u8bb0\u4f4f\u7684\u5173\u952e\u6982\u5ff5\uff1a\n\u8282\u70b9 ID \u6620\u5c04\u901a\u8fc7\u540e\u5e8f\u904d\u5386\u4fdd\u6301 C++ \u548c JVM \u4e24\u4fa7\u540c\u6b65 \u591a\u7ba1\u9053\u805a\u5408\u610f\u5473\u7740\u5355\u4e2a Spark \u7b97\u5b50\u7684\u6307\u6807\u53ef\u80fd\u6765\u81ea\u591a\u4e2a Velox \u7ba1\u9053\u5b9e\u4f8b MetricsUpdaterTree \u5c06\u6307\u6807\u5206\u53d1\u5230\u7406\u89e3\u6bcf\u4e2a\u7b97\u5b50\u5185\u90e8\u7ed3\u6784\u7684\u7279\u5b9a\u7c7b\u578b\u66f4\u65b0\u5668 \u805a\u5408\u5b50\u9636\u6bb5\u8ba9\u4f60\u5206\u522b\u89c2\u5bdf\u62bd\u53d6\u3001\u805a\u5408\u548c\u884c\u6784\u9020 Shuffle \u5199\u5165\u5668\u7c7b\u578b\u51b3\u5b9a\u4e86\u53ef\u7528\u7684\u6307\u6807\u4ee5\u53ca\u9002\u7528\u7684\u8c03\u4f18\u7b56\u7565 \u5728\u7b2c\u4e00\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e94\u79cd\u6307\u6807\u7c7b\u578b\u548c\u5b8c\u6574\u53c2\u8003\u3002\u5728\u7b2c\u4e8c\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u8ffd\u8e2a\u4e86\u5185\u90e8\u751f\u547d\u5468\u671f\u548c AQE \u5bf9 Shuffle \u7edf\u8ba1\u4fe1\u606f\u7684\u4f7f\u7528\u3002\u5728\u7b2c\u4e09\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API\u3002\u5728\u7b2c\u56db\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u5206\u6790\u4e86 Gluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf\u3002\u5728\u7b2c\u4e94\u90e8\u5206\uff08\u672c\u6587\uff09\u4e2d\uff0c\u6211\u4eec\u6df1\u5165\u4e86\u5185\u90e8\u673a\u5236 \u2014 \u4ece Substrait \u5230 Velox \u7684\u8282\u70b9\u6620\u5c04\u3001\u7ba1\u9053\u805a\u5408\u3001MetricsUpdaterTree \u904d\u5386\u3001\u805a\u5408\u5b50\u9636\u6bb5\u5230 Shuffle \u6307\u6807\u3002\u672c\u7cfb\u5217\u5230\u6b64\u7ed3\u675f\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/sql-metrics-part5-gluten-internals\/","summary":"SQL Metrics \u7cfb\u5217\u7b2c\u4e94\u90e8\u5206\u3002Gluten \u5982\u4f55\u5c06 Substrait \u8ba1\u5212\u8282\u70b9\u6620\u5c04\u5230 Velox \u7b97\u5b50\u3001\u8de8\u7ba1\u9053\u805a\u5408\u6307\u6807\u3001\u904d\u5386 MetricsUpdaterTree\uff0c\u4ee5\u53ca\u805a\u5408\u5b50\u9636\u6bb5\u548c Shuffle \u6307\u6807\u7684\u5185\u90e8\u673a\u5236\u3002","title":"\u6df1\u5165 Spark SQL Metrics\uff08\u7b2c\u4e94\u90e8\u5206\uff09\uff1aGluten \u6307\u6807\u6536\u96c6\u7684\u5185\u90e8\u673a\u5236"},{"content":"\u8fd9\u662f Spark SQL Metrics \u6df1\u5ea6\u89e3\u6790\u7684\u4e09\u90e8\u66f2\uff1a\n\u7b2c\u4e00\u90e8\u5206\uff1a\u6307\u6807\u7c7b\u578b\u3001\u5b8c\u6574\u53c2\u8003\u548c\u542b\u4e49 \u7b2c\u4e8c\u90e8\u5206\uff08\u672c\u6587\uff09\uff1a\u5185\u90e8\u5b9e\u73b0\u673a\u5236\uff0c\u4ee5\u53ca AQE \u5982\u4f55\u5229\u7528\u6307\u6807\u505a\u51fa\u8fd0\u884c\u65f6\u51b3\u7b56 \u7b2c\u4e09\u90e8\u5206\uff1a\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API \u7b2c\u56db\u90e8\u5206\uff1aGluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf AccumulatorV2 \u751f\u547d\u5468\u671f \u5728\u7b2c\u4e00\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u4ece\u5916\u90e8\u89c6\u89d2\u4e86\u89e3\u4e86 SQL Metrics\u2014\u2014\u5b83\u4eec\u6d4b\u91cf\u4ec0\u4e48\u3001\u5982\u4f55\u89e3\u8bfb\u6570\u5b57\u3002\u73b0\u5728\u8ba9\u6211\u4eec\u8ffd\u8e2a\u8fd9\u4e9b\u6570\u5b57\u662f\u5982\u4f55\u4ece Executor \u7aef\u7684\u4efb\u52a1\u4f20\u9012\u5230 Spark UI \u7684\u3002\n\u4ece\u4efb\u52a1\u5230 Driver \u6bcf\u4e2a SQL \u6307\u6807\u90fd\u662f\u4e00\u4e2a SQLMetric\uff0c\u5b83\u7ee7\u627f\u81ea AccumulatorV2[Long, Long]\u3002\u5f53\u7269\u7406\u7b97\u5b50\u5b9a\u4e49\u4e00\u4e2a\u5982 numOutputRows \u7684\u6307\u6807\u65f6\uff0cSpark \u4f1a\u5728 Driver \u521b\u5efa\u4e00\u4e2a Accumulator \u5e76\u6ce8\u518c\u5230 SparkContext\u3002\n\u4efb\u52a1\u5728 Executor \u4e0a\u8fd0\u884c\u65f6\uff0c\u4f7f\u7528\u7684\u662f Accumulator \u7684\u672c\u5730\u526f\u672c\u3002\u7b97\u5b50\u4ee3\u7801\u901a\u8fc7 metric += value \u6216 metric.add(value) \u6765\u66f4\u65b0\u6307\u6807\u3002\u8fd9\u4e9b\u66f4\u65b0\u5b8c\u5168\u5728\u672c\u5730\u8fdb\u884c\u2014\u2014\u6267\u884c\u8fc7\u7a0b\u4e2d\u4e0d\u4ea7\u751f\u4efb\u4f55\u7f51\u7edc\u901a\u4fe1\u3002\n\u5173\u952e\u5728\u4e8e\u4efb\u52a1\u5b8c\u6210\u65f6\u53d1\u751f\u7684\u4e8b\u60c5\uff1a\nTask (Executor) Driver \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 \u2500\u2500\u2500\u2500\u2500\u2500 metric.add(value) \u2193 Task completes \u2192\u2500\u2500\u2500\u2500\u2192 onTaskEnd() \u2193 Store in LiveStageMetrics \u2193 aggregateMetrics() \u2193 MetricUtils.stringValue() \u2193 Map[accId \u2192 &#34;512.0 MiB (min, med, max)&#34;] \u2193 Persist to KVStore (SQLExecutionUIData) \u4efb\u52a1\u5b8c\u6210\u540e\uff0cDriver \u901a\u8fc7 SparkListener \u4e8b\u4ef6\u63a5\u6536 Accumulator \u66f4\u65b0\u3002SQLAppStatusListener \u5904\u7406 onTaskEnd() \u4e8b\u4ef6\u2014\u2014\u5b83\u4ece\u5df2\u5b8c\u6210\u7684\u4efb\u52a1\u4e2d\u63d0\u53d6\u6307\u6807\u503c\uff0c\u5e76\u5b58\u50a8\u5230 LiveStageMetrics \u4e2d\uff0c\u8fd9\u662f\u4e00\u4e2a\u5728\u5185\u5b58\u4e2d\u8ffd\u8e2a\u6bcf\u4e2a Stage \u5404\u4efb\u52a1\u6307\u6807\u503c\u7684\u6570\u636e\u7ed3\u6784\u3002\n\u805a\u5408\u4e0e\u5b58\u50a8 \u5bf9\u4e8e\u5df2\u5b8c\u6210\u7684\u6267\u884c\uff0c\u6307\u6807\u4f1a\u7ecf\u8fc7 aggregateMetrics() \u5904\u7406\uff0c\u8ba1\u7b97\u51fa\u4f60\u5728 UI \u4e2d\u770b\u5230\u7684 total (min, med, max) \u5206\u5e03\u3002\u8fd9\u4e9b\u805a\u5408\u503c\u7531 MetricUtils.stringValue() \u683c\u5f0f\u5316\u4e3a\u53ef\u8bfb\u5b57\u7b26\u4e32\uff0c\u7136\u540e\u4f5c\u4e3a SQLExecutionUIData \u7684\u4e00\u90e8\u5206\u6301\u4e45\u5316\u5230 KVStore\u3002\u4e00\u65e6\u5b58\u50a8\u5b8c\u6210\uff0c\u539f\u59cb\u7684\u6bcf\u4efb\u52a1\u503c\u4f1a\u88ab\u4e22\u5f03\u3002\n\u5bf9\u4e8e\u6b63\u5728\u8fd0\u884c\u7684\u6267\u884c\uff0c\u805a\u5408\u662f\u5b9e\u65f6\u8ba1\u7b97\u7684\u3002\u6bcf\u6b21\u4f60\u5237\u65b0 SQL \u6807\u7b7e\u9875\uff0c\u76d1\u542c\u5668\u90fd\u4f1a\u4ece\u5185\u5b58\u4e2d\u5f53\u524d\u53ef\u7528\u7684\u4efb\u52a1\u503c\u91cd\u65b0\u8ba1\u7b97\u5206\u5e03\u3002\u8fd9\u5c31\u662f\u4e3a\u4ec0\u4e48\u67e5\u8be2\u8fd0\u884c\u65f6\u6307\u6807\u80fd\u8fd1\u5b9e\u65f6\u66f4\u65b0\u3002\nDriver \u6307\u6807 \u5e76\u975e\u6240\u6709\u6307\u6807\u90fd\u6765\u81ea\u4efb\u52a1\u3002\u6709\u4e9b\u76f4\u63a5\u5728 Driver \u4ea7\u751f\uff1a\n\u5b50\u67e5\u8be2\u6267\u884c\u65f6\u95f4 \u2014\u2014\u6807\u91cf\u5b50\u67e5\u8be2\u8fd0\u884c\u65f6\uff0cDriver \u8ba1\u65f6\u5e76\u4e0a\u62a5\u7ed3\u679c \u5e7f\u64ad\u65f6\u95f4 \u2014\u2014Driver \u5c06\u8868\u5e7f\u64ad\u5230\u5404 Executor \u82b1\u8d39\u7684\u65f6\u95f4 \u8fd9\u4e9b Driver \u6307\u6807\u4f7f\u7528 SQLMetrics.postDriverMetricUpdates()\uff0c\u76f4\u63a5\u5728 Driver \u66f4\u65b0 Accumulator\uff0c\u65e0\u9700\u7ecf\u8fc7\u4efb\u52a1\u751f\u547d\u5468\u671f\uff0c\u5b8c\u5168\u7ed5\u8fc7\u4e86 onTaskEnd() \u8def\u5f84\u3002\nAQE \u5982\u4f55\u5229\u7528\u7edf\u8ba1\u4fe1\u606f\u505a\u51fa\u8fd0\u884c\u65f6\u51b3\u7b56 \u8fd9\u90e8\u5206\u975e\u5e38\u5173\u952e\uff0c\u4e5f\u662f\u5f88\u591a\u4eba\u5bb9\u6613\u6df7\u6dc6\u7684\u5730\u65b9\u3002\u81ea\u9002\u5e94\u67e5\u8be2\u6267\u884c\uff08AQE\uff09\u5728\u8fd0\u884c\u65f6\u57fa\u4e8e\u5b9e\u9645\u6570\u636e\u5927\u5c0f\u505a\u51fa\u4f18\u5316\u51b3\u7b56\u3002\u4f46\u5b83\u5e76\u4e0d\u4f7f\u7528 SQL Metrics\uff0c\u800c\u662f\u4f7f\u7528\u4e00\u4e2a\u5b8c\u5168\u72ec\u7acb\u7684\u6570\u636e\u6e90\uff1aMapOutputStatistics\u3002\n\u6570\u636e\u6d41\u8f6c\u8fc7\u7a0b \u5f53 AQE \u542f\u7528\u65f6\uff0cSpark \u4e0d\u4f1a\u4e00\u6b21\u6267\u884c\u6574\u4e2a\u67e5\u8be2\u8ba1\u5212\uff0c\u800c\u662f\u9010 Stage \u6267\u884c\uff1a\nShuffleExchangeExec \u901a\u8fc7 sparkContext.submitMapStage() \u63d0\u4ea4 Shuffle Map Stage Map Stage \u8fd0\u884c\u2014\u2014\u5404\u4efb\u52a1\u5c06 Shuffle \u6570\u636e\u5199\u5165\u672c\u5730\u78c1\u76d8 \u6240\u6709 Map \u4efb\u52a1\u5b8c\u6210\u540e\uff0cMapOutputTracker \u7cbe\u786e\u77e5\u9053\u6bcf\u4e2a Reducer \u5206\u533a\u5c06\u63a5\u6536\u591a\u5c11\u5b57\u8282 \u8fd9\u4e9b\u4fe1\u606f\u88ab\u5c01\u88c5\u4e3a MapOutputStatistics\uff0c\u5176\u4e2d\u5305\u542b bytesByPartitionId: Array[Long]\u2014\u2014\u6bcf\u4e2a Shuffle \u5206\u533a\u7684\u7cbe\u786e\u5b57\u8282\u5927\u5c0f ShuffleQueryStageExec \u901a\u8fc7 mapStats \u5c5e\u6027\u66b4\u9732\u8fd9\u4e9b\u7edf\u8ba1\u4fe1\u606f AdaptiveSparkPlanExec \u5728 Stage \u7269\u5316\u4e4b\u540e\u8fd0\u884c\u4f18\u5316\u89c4\u5219\uff0c\u4f7f\u7528\u771f\u5b9e\u7edf\u8ba1\u4fe1\u606f\u800c\u975e\u4f30\u7b97\u503c \u6838\u5fc3\u8981\u70b9\uff1aAQE \u7b49\u5f85 Shuffle Stage \u5b8c\u6210\uff0c\u7136\u540e\u5229\u7528\u5b9e\u9645\u8f93\u51fa\u5927\u5c0f\u51b3\u5b9a\u4e0b\u4e00\u6b65\u64cd\u4f5c\u3002\nCoalesceShufflePartitions\u2014\u2014\u5408\u5e76\u5c0f\u5206\u533a \u8fd9\u662f\u6700\u5e38\u89c1\u7684 AQE \u4f18\u5316\u3002Shuffle \u4e4b\u540e\uff0c\u4f60\u53ef\u80fd\u6709 200 \u4e2a\u5206\u533a\uff08spark.sql.shuffle.partitions \u7684\u9ed8\u8ba4\u503c\uff09\uff0c\u5176\u4e2d\u5927\u90e8\u5206\u53ea\u5305\u542b\u5f88\u5c11\u7684\u6570\u636e\u3002\nCoalesceShufflePartitions \u8bfb\u53d6 bytesByPartitionId\uff0c\u5c06\u76f8\u90bb\u7684\u5c0f\u5206\u533a\u5408\u5e76\uff0c\u76f4\u5230\u6bcf\u4e2a\u5408\u5e76\u540e\u7684\u5206\u533a\u5927\u7ea6\u8fbe\u5230 spark.sql.adaptive.advisoryPartitionSizeInBytes\uff08\u9ed8\u8ba4 64 MB\uff09\u3002\n\u5173\u952e\u914d\u7f6e\uff1a\n\u914d\u7f6e\u9879 \u9ed8\u8ba4\u503c \u7528\u9014 spark.sql.adaptive.advisoryPartitionSizeInBytes 64 MB \u5408\u5e76\u540e\u5206\u533a\u7684\u76ee\u6807\u5927\u5c0f spark.sql.adaptive.coalescePartitions.minPartitionNum \uff08\u65e0\uff09 \u4fdd\u7559\u7684\u6700\u5c0f\u5206\u533a\u6570 spark.sql.adaptive.coalescePartitions.minPartitionSize 1 MB \u4e0d\u4f1a\u521b\u5efa\u5c0f\u4e8e\u6b64\u503c\u7684\u5206\u533a \u793a\u4f8b\uff1a \u5982\u679c\u4f60\u6709 200 \u4e2a\u5206\u533a\uff0c\u6bcf\u4e2a\u5e73\u5747 1 MB\uff0cAQE \u53ef\u80fd\u5c06\u5b83\u4eec\u5408\u5e76\u4e3a\u5927\u7ea6 3 \u4e2a 64 MB \u7684\u5206\u533a\u3002\u539f\u6765 200 \u4e2a\u4efb\u52a1\u5404\u8bfb\u53d6\u5c11\u91cf\u6570\u636e\uff0c\u53d8\u6210 3 \u4e2a\u4efb\u52a1\u5904\u7406\u6709\u610f\u4e49\u7684\u5de5\u4f5c\u91cf\u3002\nOptimizeSkewedJoin\u2014\u2014\u62c6\u5206\u503e\u659c\u5206\u533a \u6570\u636e\u503e\u659c\u662f Spark \u4e2d\u6700\u5e38\u89c1\u7684\u6027\u80fd\u95ee\u9898\u4e4b\u4e00\u3002\u4e00\u4e2a\u5206\u533a\u6709 10 GB \u800c\u5176\u4f59\u5206\u533a\u53ea\u6709 100 MB\u2014\u2014\u503e\u659c\u5206\u533a\u6210\u4e3a\u6574\u4e2a\u67e5\u8be2\u7684\u74f6\u9888\u3002\nOptimizeSkewedJoin \u8bfb\u53d6 Shuffle Join \u4e24\u4fa7\u7684 bytesByPartitionId\uff0c\u8ba1\u7b97\u4e2d\u4f4d\u6570\u5206\u533a\u5927\u5c0f\uff0c\u7136\u540e\u5c06\u6ee1\u8db3\u4ee5\u4e0b\u6761\u4ef6\u7684\u5206\u533a\u6807\u8bb0\u4e3a&quot;\u503e\u659c&quot;\uff1a\nsize &gt; max(skewThreshold, median \u00d7 skewFactor) \u5173\u952e\u914d\u7f6e\uff1a\n\u914d\u7f6e\u9879 \u9ed8\u8ba4\u503c \u7528\u9014 spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes 256 MB \u88ab\u8ba4\u5b9a\u4e3a\u503e\u659c\u7684\u7edd\u5bf9\u6700\u5c0f\u503c spark.sql.adaptive.skewJoin.skewedPartitionFactor 5.0 \u5fc5\u987b\u8fbe\u5230\u4e2d\u4f4d\u6570\u7684\u8fd9\u4e2a\u500d\u6570 \u4e24\u4e2a\u6761\u4ef6\u5fc5\u987b\u540c\u65f6\u6ee1\u8db3\uff1a\u5206\u533a\u5927\u5c0f\u81f3\u5c11\u8fbe\u5230 256 MB \u4e14\u81f3\u5c11\u662f\u4e2d\u4f4d\u6570\u7684 5 \u500d\u3002\n\u4e00\u65e6\u8bc6\u522b\u51fa\u503e\u659c\u5206\u533a\uff0cAQE \u4f1a\u5c06\u5176\u62c6\u5206\u4e3a\u8f83\u5c0f\u7684\u5b50\u5206\u533a\uff0c\u6bcf\u4e2a\u5b50\u5206\u533a\u76ee\u6807\u5927\u5c0f\u4e3a advisoryPartitionSizeInBytes\uff0864 MB\uff09\u3002Join \u7684\u975e\u503e\u659c\u4fa7\u4f1a\u88ab\u590d\u5236\u4ee5\u5339\u914d\u2014\u2014\u503e\u659c\u4fa7\u7684\u6bcf\u4e2a\u5b50\u5206\u533a\u90fd\u4f1a\u83b7\u5f97\u6765\u81ea\u53e6\u4e00\u4fa7\u5bf9\u5e94\u5206\u533a\u7684\u5b8c\u6574\u526f\u672c\u3002\nOptimizeShuffleWithLocalRead\u2014\u2014\u6d88\u9664 Shuffle \u7f51\u7edc I\/O \u5f53 AQE \u5224\u65ad Shuffle \u6570\u636e\u53ef\u4ee5\u5728\u672c\u5730\u8bfb\u53d6\uff08\u4f4d\u4e8e\u540c\u4e00 Executor \u4e0a\uff09\u65f6\uff0c\u5b83\u4f1a\u7528\u914d\u7f6e\u4e3a\u672c\u5730\u8bfb\u53d6\u7684 AQEShuffleReadExec \u66ff\u6362\u6807\u51c6\u7684 Shuffle \u8bfb\u53d6\u3002\u8fd9\u5b8c\u5168\u6d88\u9664\u4e86\u7f51\u7edc\u4f20\u8f93\u2014\u2014Reducer \u76f4\u63a5\u4ece\u672c\u5730\u78c1\u76d8\u8bfb\u53d6 Shuffle \u6587\u4ef6\u3002\n\u8fd9\u79cd\u4f18\u5316\u6700\u5e38\u89c1\u4e8e Broadcast Hash Join \u4e4b\u540e\uff08\u6b64\u65f6\u6240\u6709\u6570\u636e\u5df2\u5728\u672c\u5730\uff09\uff0c\u4f46\u4e5f\u53ef\u4ee5\u5e94\u7528\u4e8e\u5176\u4ed6\u5206\u533a\u65b9\u5f0f\u5141\u8bb8\u672c\u5730\u8bfb\u53d6\u7684 Shuffle\u3002\n\u5f53 AQE \u5224\u65ad Shuffle \u6570\u636e\u5df2\u7ecf\u5728\u5c06\u8981\u8bfb\u53d6\u5b83\u7684\u540c\u4e00\u4e2a Executor \u4e0a\uff08\u5171\u7f6e\uff09\u65f6\uff0c\u53ef\u4ee5\u5c06\u6807\u51c6\u7684 ShuffleExchangeExec \u66ff\u6362\u4e3a\u914d\u7f6e\u4e86\u672c\u5730\u8bfb\u53d6\u7684 AQEShuffleReadExec\u3002\u8fd9\u5b8c\u5168\u6d88\u9664\u4e86\u7f51\u7edc\u4f20\u8f93\u2014\u2014Reducer \u76f4\u63a5\u4ece\u672c\u5730\u78c1\u76d8\u8bfb\u53d6 Shuffle \u6587\u4ef6\u3002\n\u6b64\u4f18\u5316\u901a\u5e38\u53d1\u751f\u5728 Broadcast Hash Join \u4e4b\u540e\uff0c\u56e0\u4e3a\u6b64\u65f6\u6240\u6709\u6570\u636e\u5df2\u7ecf\u5728\u672c\u5730\u3002\n\u6838\u5fc3\u533a\u522b\uff1aSQL Metrics \u4e0e AQE \u7edf\u8ba1\u4fe1\u606f \u8fd9\u662f\u672c\u6587\u6700\u91cd\u8981\u7684\u6982\u5ff5\u533a\u5206\uff1a\nSQL Metrics AQE \u7edf\u8ba1\u4fe1\u606f \u662f\u4ec0\u4e48 SQLMetric Accumulator MapOutputStatistics \u76ee\u7684 \u53ef\u89c2\u6d4b\u6027\uff08UI \u4e2d\u663e\u793a\u7684\u5185\u5bb9\uff09 \u8fd0\u884c\u65f6\u8ba1\u5212\u4f18\u5316 \u6570\u636e\u683c\u5f0f \u683c\u5f0f\u5316\u5b57\u7b26\u4e32\uff08&quot;512.0 MiB&quot;\uff09 \u539f\u59cb Long[] \u6570\u7ec4\uff08\u5b57\u8282\u6570\uff09 \u4ee3\u7801\u8def\u5f84 AccumulatorV2 \u2192 SparkListener \u2192 KVStore MapOutputTracker \u2192 ShuffleQueryStageExec.mapStats \u8ba1\u7b97\u65f6\u673a \u6bcf\u4e2a\u4efb\u52a1\u5b8c\u6210\u540e Stage \u4e2d\u6240\u6709 Map \u4efb\u52a1\u5b8c\u6210\u540e \u6d88\u8d39\u8005 Spark UI\u3001REST API\u3001\u7528\u6237 AQE \u4f18\u5316\u5668\u89c4\u5219 \u5b83\u4eec\u7ecf\u5e38\u6d4b\u91cf\u76f8\u4f3c\u7684\u5185\u5bb9\u2014\u2014\u90fd\u5173\u6ce8\u6570\u636e\u5927\u5c0f\u2014\u2014\u4f46\u901a\u8fc7\u5b8c\u5168\u4e0d\u540c\u7684\u4ee3\u7801\u8def\u5f84\u3002SQL Metrics \u544a\u8bc9\u4f60\u53d1\u751f\u4e86\u4ec0\u4e48\uff0cAQE \u7edf\u8ba1\u4fe1\u606f\u51b3\u5b9a\u63a5\u4e0b\u6765\u4f1a\u53d1\u751f\u4ec0\u4e48\u3002\n\u9700\u8981\u6ce8\u610f\u7684\u662f\uff0cAQE \u7684\u64cd\u4f5c\u786e\u5b9e\u4f1a\u53cd\u6620\u5728 SQL Metrics \u4e2d\u3002\u5f53 AQE \u5408\u5e76\u6216\u62c6\u5206\u5206\u533a\u65f6\uff0c\u4ea7\u751f\u7684 AQEShuffleReadExec \u7b97\u5b50\u4f1a\u4e0a\u62a5\u81ea\u5df1\u7684\u6307\u6807\uff0c\u544a\u8bc9\u4f60 AQE \u505a\u4e86\u4ec0\u4e48\u51b3\u7b56\u3002\n\u4ece\u6307\u6807\u4e2d\u8bfb\u53d6 AQE \u7684\u51b3\u7b56 AQEShuffleReadExec \u7b97\u5b50\uff08\u7b2c\u4e00\u90e8\u5206\u4e2d\u6709\u4ecb\u7ecd\uff09\u662f\u4f60\u4e86\u89e3 AQE \u51b3\u7b56\u7684\u7a97\u53e3\u3002\u6bcf\u4e2a\u6307\u6807\u7684\u542b\u4e49\u5982\u4e0b\uff1a\n\u6307\u6807 \u542b\u4e49 numCoalescedPartitions &gt; 0 AQE \u5408\u5e76\u4e86\u5c0f\u5206\u533a numSkewedPartitions &gt; 0 AQE \u68c0\u6d4b\u5230\u4e86\u503e\u659c\u5206\u533a numSkewedSplits \u4ece\u503e\u659c\u5206\u533a\u521b\u5efa\u4e86\u591a\u5c11\u4e2a\u5b50\u5206\u533a numEmptyPartitions \u68c0\u6d4b\u5230\u7684\u7a7a\u5206\u533a\u6570 partitionDataSize AQE \u4f18\u5316\u540e\u7684\u5b9e\u9645\u6570\u636e\u5927\u5c0f \u5b9e\u9645\u793a\u4f8b\uff1a \u5982\u679c\u4f60\u770b\u5230 numSkewedPartitions: 3 \u548c numSkewedSplits: 12\uff0c\u8fd9\u610f\u5473\u7740 AQE \u53d1\u73b0\u4e86 3 \u4e2a\u8d85\u8fc7\u503e\u659c\u9608\u503c\u7684\u5206\u533a\uff0c\u5e76\u5c06\u5b83\u4eec\u62c6\u5206\u4e3a 12 \u4e2a\u5b50\u5206\u533a\u3002\u539f\u6765\u7684 3 \u4e2a\u74f6\u9888\u4efb\u52a1\u53d8\u6210\u4e86 12 \u4e2a\u5e76\u884c\u4efb\u52a1\uff0c\u663e\u8457\u51cf\u5c11\u4e86\u603b\u6267\u884c\u65f6\u95f4\u3002\n\u5982\u679c\u4f60\u770b\u5230 numCoalescedPartitions: 180\uff0c\u539f\u59cb numPartitions: 200\uff0c\u8bf4\u660e AQE \u5c06 180 \u4e2a\u5fae\u5c0f\u5206\u533a\u5408\u5e76\u5728\u4e00\u8d77\u2014\u2014\u4f60\u7684 200 \u4e2a Reducer \u4efb\u52a1\u53ef\u80fd\u53d8\u6210\u4e86\u5927\u7ea6 20 \u4e2a\u3002\n\u8fd9\u4e9b\u6307\u6807\u662f\u786e\u8ba4 AQE \u662f\u5426\u771f\u6b63\u5e2e\u52a9\u4e86\u4f60\u7684\u67e5\u8be2\u7684\u552f\u4e00\u65b9\u5f0f\u3002\u5982\u679c numCoalescedPartitions \u548c numSkewedPartitions \u90fd\u4e3a\u96f6\uff0c\u8bf4\u660e AQE \u867d\u7136\u542f\u7528\u4e86\uff0c\u4f46\u6ca1\u6709\u627e\u5230\u9700\u8981\u4f18\u5316\u7684\u5185\u5bb9\u3002\n\u901a\u8fc7 SQL \u6267\u884c\u8ba1\u5212\u7406\u89e3 AQE SQL \u6267\u884c\u8ba1\u5212\u662f\u7406\u89e3 AQE \u884c\u4e3a\u7684\u53e6\u4e00\u4e2a\u5f3a\u5927\u5de5\u5177\u3002\u5f53 AQE \u542f\u7528\u65f6\uff0c\u8ba1\u5212\u9876\u90e8\u4f1a\u663e\u793a AdaptiveSparkPlan\uff0c\u5bf9\u4e8e\u5df2\u5b8c\u6210\u7684\u6267\u884c\u4f1a\u6807\u6ce8 isFinalPlan=true\u3002\n\u4f60\u53ef\u4ee5\u5bf9\u6bd4\u521d\u59cb\u8ba1\u5212\uff08\u4f18\u5316\u5668\u6700\u521d\u7684\u89c4\u5212\uff09\u548c\u6700\u7ec8\u8ba1\u5212\uff08AQE \u4fee\u6539\u540e\u5b9e\u9645\u6267\u884c\u7684\u8ba1\u5212\uff09\uff1a\n# \u67e5\u770b\u521d\u59cb\u8ba1\u5212\uff08AQE \u4e4b\u524d\uff09 spark-history-cli -a &lt;app-id&gt; sql-plan &lt;execution-id&gt; --view initial # \u67e5\u770b\u6700\u7ec8\u8ba1\u5212\uff08AQE \u4e4b\u540e\uff09 spark-history-cli -a &lt;app-id&gt; sql-plan &lt;execution-id&gt; --view final \u901a\u8fc7\u5bf9\u6bd4\u8fd9\u4e24\u4e2a\u8ba1\u5212\uff0c\u4f60\u53ef\u4ee5\u51c6\u786e\u770b\u5230 AQE \u5728\u54ea\u91cc\u8fdb\u884c\u4e86\u5e72\u9884\uff1a\nShuffleExchangeExec \u8282\u70b9\u88ab\u66ff\u6362\u4e3a AQEShuffleReadExec\u2014\u2014\u5e94\u7528\u4e86 Shuffle \u4f18\u5316 Join \u7b56\u7565\u6539\u53d8\u2014\u2014\u4f8b\u5982 Sort Merge Join \u8f6c\u6362\u4e3a Broadcast Hash Join\uff0c\u56e0\u4e3a\u67d0\u4e00\u4fa7\u7684\u6570\u636e\u91cf\u5b9e\u9645\u4e0a\u5f88\u5c0f \u6700\u7ec8\u8ba1\u5212\u4e2d\u5206\u533a\u6570\u4e0d\u540c\u2014\u2014\u53d1\u751f\u4e86\u5408\u5e76\u6216\u62c6\u5206 \u8fd9\u79cd\u5bf9\u6bd4\u5728\u8c03\u8bd5\u6027\u80fd\u95ee\u9898\u65f6\u975e\u5e38\u6709\u4ef7\u503c\uff1a\u4f60\u53ef\u4ee5\u770b\u5230 AQE \u7684\u51b3\u7b56\u662f\u5426\u6709\u5e2e\u52a9\uff0c\u6216\u8005\u662f\u5426\u9700\u8981\u8fdb\u4e00\u6b65\u8c03\u4f18\u3002\n\u5728\u7b2c\u4e00\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e94\u79cd\u6307\u6807\u7c7b\u578b\u548c\u5b8c\u6574\u53c2\u8003\u3002\u7b2c\u4e09\u90e8\u5206\u5c06\u6db5\u76d6 DataSource V2 CustomMetric \u6269\u5c55 API\u3001UI \u5982\u4f55\u6e32\u67d3\u6307\u6807\uff0c\u4ee5\u53ca\u5982\u4f55\u901a\u8fc7 REST API \u7f16\u7a0b\u67e5\u8be2\u6307\u6807\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/sql-metrics-part2-internals\/","summary":"SQL Metrics \u4e09\u90e8\u66f2\u7684\u7b2c\u4e8c\u90e8\u5206\u3002\u6307\u6807\u5982\u4f55\u4ece\u4efb\u52a1\u6d41\u5411Driver\uff0c\u4ee5\u53ca\u81ea\u9002\u5e94\u67e5\u8be2\u6267\u884c\uff08AQE\uff09\u5982\u4f55\u5229\u7528 Shuffle \u7edf\u8ba1\u4fe1\u606f\u5728\u8fd0\u884c\u65f6\u91cd\u5199\u67e5\u8be2\u8ba1\u5212\u3002","title":"\u6df1\u5165 Spark SQL Metrics\uff08\u7b2c\u4e8c\u90e8\u5206\uff09\uff1a\u5185\u90e8\u673a\u5236\u4e0e AQE \u7684\u8fd0\u884c\u65f6\u51b3\u7b56"},{"content":"\u8fd9\u662f Spark SQL Metrics \u6df1\u5ea6\u89e3\u6790\u7684\u4e09\u90e8\u66f2\uff1a\n\u7b2c\u4e00\u90e8\u5206\uff1a\u6307\u6807\u7c7b\u578b\u3001\u5b8c\u6574\u53c2\u8003\u548c\u542b\u4e49 \u7b2c\u4e8c\u90e8\u5206\uff1a\u5185\u90e8\u5b9e\u73b0\u673a\u5236\uff0c\u4ee5\u53ca AQE \u5982\u4f55\u5229\u7528\u6307\u6807\u505a\u51fa\u8fd0\u884c\u65f6\u51b3\u7b56 \u7b2c\u4e09\u90e8\u5206\uff08\u672c\u6587\uff09\uff1a\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API \u7b2c\u56db\u90e8\u5206\uff1aGluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf DataSource V2 CustomMetric API \u5728\u7b2c\u4e00\u90e8\u5206\u548c\u7b2c\u4e8c\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86 Spark \u5185\u7f6e\u6307\u6807\u53ca\u5176\u5185\u90e8\u673a\u5236\u3002\u4f46\u5982\u679c\u4f60\u6b63\u5728\u6784\u5efa\u81ea\u5b9a\u4e49\u8fde\u63a5\u5668\uff0c\u9700\u8981\u66b4\u9732\u8fde\u63a5\u5668\u7279\u6709\u7684\u6570\u636e\u2014\u2014\u4f8b\u5982\u4ece\u4e13\u6709\u683c\u5f0f\u8bfb\u53d6\u7684\u5b57\u8282\u6570\u3001\u7f13\u5b58\u547d\u4e2d\u7387\u6216\u9650\u6d41\u6b21\u6570\uff0c\u8be5\u600e\u4e48\u529e\uff1f\u4ece Spark 3.2 \u5f00\u59cb\uff0cDataSource V2 API \u63d0\u4f9b\u4e86\u4e00\u4e2a\u5e72\u51c0\u7684\u6269\u5c55\u70b9\u6765\u6ee1\u8db3\u8fd9\u4e00\u9700\u6c42\u3002\n\u63a5\u53e3\u5c42\u6b21\u7ed3\u6784 \u6838\u5fc3\u662f org.apache.spark.sql.connector.metric \u5305\u4e2d\u7684\u4e24\u4e2a\u63a5\u53e3\uff1a\nCustomMetric \u2014\u2014 \u5728\u8fde\u63a5\u5668\u4e2d\u5b9a\u4e49\u4e00\u6b21\uff0c\u63cf\u8ff0\u6307\u6807\u662f\u4ec0\u4e48\uff1a\nname() \u2014\u2014 \u6307\u6807\u540d\u79f0\uff08\u5fc5\u987b\u4e0e CustomTaskMetric \u5339\u914d\uff09 description() \u2014\u2014 \u4eba\u7c7b\u53ef\u8bfb\u7684\u63cf\u8ff0\uff0c\u5728 UI \u4e2d\u663e\u793a aggregateTaskMetrics(long[] taskMetrics) \u2014\u2014 \u7531\u4f60\u51b3\u5b9a\u5982\u4f55\u5c06\u5404\u4efb\u52a1\u7684\u503c\u5408\u5e76\u4e3a\u4e00\u4e2a\u663e\u793a\u5b57\u7b26\u4e32\u3002\u8fde\u63a5\u5668\u5728\u6b64\u62e5\u6709\u5b8c\u5168\u7684\u63a7\u5236\u6743\uff1a\u4f60\u53ef\u4ee5\u8ba1\u7b97\u603b\u548c\u3001\u5e73\u5747\u503c\u3001\u767e\u5206\u4f4d\u6570\u6216\u4efb\u4f55\u5176\u4ed6\u805a\u5408\u65b9\u5f0f\u3002 \u5fc5\u987b\u6709\u4e00\u4e2a\u65e0\u53c2\u6784\u9020\u51fd\u6570 \u2014\u2014 Spark \u5728 Driver \u805a\u5408\u65f6\u901a\u8fc7\u53cd\u5c04\u5b9e\u4f8b\u5316\u5b83\u3002 CustomTaskMetric \u2014\u2014 \u7531\u6bcf\u4e2a PartitionReader \u5728 Executor \u4e0a\u62a5\u544a\uff1a\nname() \u2014\u2014 \u5fc5\u987b\u4e0e\u5bf9\u5e94 CustomMetric.name() \u5339\u914d value() \u2014\u2014 \u8fd4\u56de\u4e00\u4e2a long\uff0c\u8868\u793a\u8be5\u4efb\u52a1\u7684\u5f53\u524d\u6307\u6807\u503c Spark \u63d0\u4f9b\u4e86\u4e24\u4e2a\u4fbf\u5229\u7684\u57fa\u7c7b\uff0c\u8ba9\u4f60\u65e0\u9700\u4ece\u5934\u5b9e\u73b0 aggregateTaskMetrics\uff1a\n\u7c7b \u805a\u5408\u903b\u8f91 \u8f93\u51fa\u683c\u5f0f CustomSumMetric \u6c42\u6240\u6709\u4efb\u52a1\u503c\u7684\u548c String.valueOf(sum) CustomAvgMetric \u8ba1\u7b97\u4efb\u52a1\u503c\u7684\u5e73\u5747\u503c DecimalFormat(&quot;#0.000&quot;).format(avg) \u5982\u4f55\u5b9e\u73b0\u81ea\u5b9a\u4e49\u6307\u6807 \u7b2c\u4e00\u6b65\uff1a\u5b9a\u4e49\u6307\u6807\u7c7b\u3002\n\u7ee7\u627f\u5185\u7f6e\u57fa\u7c7b\u6216\u76f4\u63a5\u5b9e\u73b0 CustomMetric \u63a5\u53e3\uff1a\npublic class MyBytesReadMetric extends CustomSumMetric { @Override public String name() { return &#34;myBytesRead&#34;; } @Override public String description() { return &#34;bytes read from my source&#34;; } } \u7b2c\u4e8c\u6b65\uff1a\u5728 Scan \u4e2d\u6ce8\u518c\u6307\u6807\u3002\n\u4f60\u7684 Scan \u5b9e\u73b0\u544a\u8bc9 Spark \u8fde\u63a5\u5668\u652f\u6301\u54ea\u4e9b\u81ea\u5b9a\u4e49\u6307\u6807\uff1a\n@Override public CustomMetric[] supportedCustomMetrics() { return new CustomMetric[] { new MyBytesReadMetric() }; } \u7b2c\u4e09\u6b65\uff1a\u5728 PartitionReader \u4e2d\u62a5\u544a\u503c\u3002\n\u6bcf\u4e2a PartitionReader \u5728 Spark \u8c03\u7528 currentMetricsValues() \u65f6\u62a5\u544a\u5f53\u524d\u6307\u6807\u503c\u3002\u6b64\u65b9\u6cd5\u6bcf\u5904\u7406 100 \u884c\uff08\u7531 CustomMetrics.NUM_ROWS_PER_UPDATE \u63a7\u5236\uff09\u8c03\u7528\u4e00\u6b21\uff0c\u4ee5\u53ca\u5728\u4efb\u52a1\u5b8c\u6210\u65f6\u8c03\u7528\uff1a\n@Override public CustomTaskMetric[] currentMetricsValues() { return new CustomTaskMetric[] { new CustomTaskMetric() { @Override public String name() { return &#34;myBytesRead&#34;; } @Override public long value() { return bytesReadSoFar; } } }; } \u5c31\u662f\u8fd9\u6837\u2014\u2014\u4f60\u7684\u81ea\u5b9a\u4e49\u6307\u6807\u73b0\u5728\u5c06\u5728 Spark UI \u4e2d\u4e0e\u5185\u7f6e\u6307\u6807\u4e00\u8d77\u663e\u793a\u3002\n\u5199\u5165\u7aef\u81ea\u5b9a\u4e49\u6307\u6807 \u81ea\u5b9a\u4e49\u6307\u6807\u4e0d\u4ec5\u9650\u4e8e\u8bfb\u53d6\u3002\u5199\u5165\u8fde\u63a5\u5668\u4e5f\u53ef\u4ee5\u901a\u8fc7 BatchWrite.supportedCustomMetrics() \u5b9a\u4e49\u81ea\u5b9a\u4e49\u6307\u6807\uff0c\u503c\u901a\u8fc7 DataWriter.currentMetricsValues() \u62a5\u544a\u3002\u8fd9\u5bf9\u4e8e\u538b\u7f29\u7387\u3001\u5237\u65b0\u6b21\u6570\u6216\u5199\u5165\u8def\u5f84\u4e0a\u7684\u6279\u5904\u7406\u7edf\u8ba1\u7b49\u6307\u6807\u975e\u5e38\u6709\u7528\u3002\nSpark \u5185\u90e8\u5982\u4f55\u5904\u7406\u81ea\u5b9a\u4e49\u6307\u6807 \u5728\u5e55\u540e\uff0c\u591a\u4e2a\u7ec4\u4ef6\u534f\u540c\u5de5\u4f5c\uff0c\u4f7f\u81ea\u5b9a\u4e49\u6307\u6807\u901a\u8fc7\u4e0e\u5185\u7f6e\u6307\u6807\u76f8\u540c\u7684\u7ba1\u9053\u6d41\u8f6c\uff1a\n\u6ce8\u518c\uff1aDataSourceV2ScanExecBase \u5728\u89c4\u5212\u9636\u6bb5\u8c03\u7528 scan.supportedCustomMetrics() \u5e76\u901a\u8fc7 SQLMetrics.createV2CustomMetric() \u521b\u5efa SQLMetric \u5305\u88c5\u5668\u3002\u6bcf\u4e2a\u5305\u88c5\u5668\u90fd\u6709\u4e00\u4e2a\u7279\u6b8a\u7684\u7c7b\u578b\u5b57\u7b26\u4e32\u3002\n\u7c7b\u578b\u7f16\u7801\uff1a\u6307\u6807\u7c7b\u578b\u5b58\u50a8\u4e3a &quot;v2Custom_&lt;\u5b8c\u6574\u7c7b\u540d&gt;&quot; \u2014\u2014 \u4f8b\u5982 &quot;v2Custom_com.mycompany.MyBytesReadMetric&quot;\u3002\u8fd9\u4e2a\u7f16\u7801\u7531 CustomMetrics.buildV2CustomMetricTypeName() \u6784\u9020\u3002\n\u805a\u5408\uff1a\u5f53 SQLAppStatusListener \u5728\u805a\u5408\u65f6\u63a5\u6536\u5230\u4efb\u52a1\u6307\u6807\u65f6\uff0c\u5b83\u89e3\u6790 v2Custom_ \u524d\u7f00\uff0c\u63d0\u53d6\u7c7b\u540d\uff0c\u901a\u8fc7\u53cd\u5c04\u52a0\u8f7d\u7c7b\uff0c\u5e76\u8c03\u7528 aggregateTaskMetrics(long[])\u3002\u8fd9\u5c31\u662f\u4e3a\u4ec0\u4e48\u9700\u8981\u65e0\u53c2\u6784\u9020\u51fd\u6570\u3002\n\u7279\u6b8a\u6307\u6807\u540d\uff1a\u5982\u679c\u4f60\u7684 CustomTaskMetric \u4f7f\u7528 &quot;bytesWritten&quot; \u6216 &quot;recordsWritten&quot; \u4f5c\u4e3a\u540d\u79f0\uff0cSpark \u8fd8\u4f1a\u5c06\u8fd9\u4e9b\u503c\u4f20\u64ad\u5230\u5185\u90e8\u7684 TaskOutputMetrics\u3002\u8fd9\u610f\u5473\u7740\u5b83\u4eec\u4e0d\u4ec5\u4f1a\u51fa\u73b0\u5728 SQL \u6807\u7b7e\u9875\uff0c\u8fd8\u4f1a\u51fa\u73b0\u5728 Executors \u6807\u7b7e\u9875\u548c\u9636\u6bb5\u7ea7 I\/O \u6458\u8981\u4e2d\u3002\nDriver \u81ea\u5b9a\u4e49\u6307\u6807 \u81ea\u5b9a\u4e49\u6307\u6807\u4e0d\u4ec5\u9650\u4e8e Executor \u7aef\u62a5\u544a\u3002\u4f60\u7684 Scan \u8fd8\u53ef\u4ee5\u4ece Driver \u62a5\u544a\u6307\u6807\uff1a\nScan.reportDriverMetrics() \u4ece Driver \u8fd4\u56de CustomTaskMetric[] \u6570\u7ec4 DataSourceV2ScanExecBase.postDriverMetrics() \u901a\u8fc7 SQLMetrics.postDriverMetricUpdates() \u5c06\u5b83\u4eec\u53d1\u9001\u5230\u6307\u6807\u7cfb\u7edf \u8fd9\u5bf9\u4e8e&quot;\u5217\u51fa\u7684\u6587\u4ef6\u6570&quot;\u3001&ldquo;\u88c1\u526a\u7684\u5206\u533a\u6570&quot;\u6216&quot;\u5143\u6570\u636e\u7f13\u5b58\u547d\u4e2d\u6b21\u6570&quot;\u7b49\u6307\u6807\u975e\u5e38\u6709\u7528\u2014\u2014\u8fd9\u4e9b\u4e8b\u60c5\u53d1\u751f\u5728 Driver \u7684\u89c4\u5212\u9636\u6bb5\uff0c\u800c\u975e Executor \u4e0a\u7684\u6570\u636e\u8bfb\u53d6\u9636\u6bb5\u3002\nUI \u4e2d\u7684\u6307\u6807\u6e32\u67d3 \u6307\u6807\u5728 Driver \u88ab\u6536\u96c6\u548c\u805a\u5408\u540e\uff0c\u9700\u8981\u88ab\u6e32\u67d3\u3002Spark UI \u7684 SQL \u6807\u7b7e\u9875\u5df2\u7ecf\u6709\u4e86\u663e\u8457\u7684\u6f14\u8fdb\uff0c\u7406\u89e3\u6e32\u67d3\u7ba1\u9053\u6709\u52a9\u4e8e\u4f60\u89e3\u8bfb\u6240\u770b\u5230\u7684\u5185\u5bb9\u3002\n\u8ba1\u5212\u53ef\u89c6\u5316\u7ba1\u9053 \u4ece\u5b58\u50a8\u7684\u6307\u6807\u5230\u89c6\u89c9\u6e32\u67d3\u7684\u8def\u5f84\u5982\u4e0b\uff1a\nSQLAppStatusStore.executionMetrics(id) \u2192 Map[accumulatorId \u2192 formatted String] ExecutionPage.planVisualization() \u2192 graph.makeDotFile(metrics) # \u7d27\u51d1\u7684 DOT \u6807\u7b7e \u2192 graph.makeNodeDetailsJson(metrics) # \u5b8c\u6574\u6307\u6807 JSON spark-sql-viz.js \u2192 renderPlanViz() # dagre-d3 \u56fe\u5f62 \u2192 getNodeDetails() # \u89e3\u6790 JSON \u2192 updateDetailsPanel() # \u70b9\u51fb\u540e\u7684\u4fa7\u8fb9\u9762\u677f \u2192 rerenderWithDetailedLabels() # \u53ef\u9009\u7684\u5185\u8054\u6a21\u5f0f \u670d\u52a1\u7aef\u51c6\u5907\u4e24\u79cd\u8868\u793a\uff1a\u7528\u4e8e\u56fe\u5f62\u5e03\u5c40\u7684 DOT \u6587\u4ef6\uff08\u4f7f\u7528\u7d27\u51d1\u7684\u8282\u70b9\u6807\u7b7e\uff09\u548c\u5305\u542b\u5b8c\u6574\u6307\u6807\u8be6\u60c5\u7684 JSON \u6570\u636e\u3002JavaScript \u524d\u7aef\u4f7f\u7528 dagre-d3 \u6e32\u67d3 DAG \u5e76\u63d0\u4f9b\u4ea4\u4e92\u5f0f\u6307\u6807\u63a2\u7d22\u3002\n\u7d27\u51d1\u6a21\u5f0f\u4e0e\u8be6\u7ec6\u6a21\u5f0f SQL \u8ba1\u5212\u53ef\u89c6\u5316\u652f\u6301\u4e24\u79cd\u663e\u793a\u6a21\u5f0f\uff1a\n\u7d27\u51d1\u6a21\u5f0f\uff08\u81ea SPARK-55785 \u8d77\u9ed8\u8ba4\uff09\uff1a\u8282\u70b9\u6807\u7b7e\u4ec5\u663e\u793a\u7b97\u5b50\u540d\u79f0\u3002\u70b9\u51fb\u8282\u70b9\u53ef\u6253\u5f00\u4fa7\u8fb9\u9762\u677f\uff0c\u67e5\u770b\u5b8c\u6574\u7684\u6307\u6807\u8868\u3002\u5373\u4f7f\u5bf9\u4e8e\u5305\u542b\u6570\u5341\u4e2a\u7b97\u5b50\u7684\u590d\u6742\u8ba1\u5212\uff0c\u56fe\u5f62\u4e5f\u4fdd\u6301\u53ef\u8bfb\u6027\u3002\n\u8be6\u7ec6\u6a21\u5f0f\uff08\u901a\u8fc7\u590d\u9009\u6846\u5207\u6362\uff09\uff1a\u6307\u6807\u4ee5 10px \u5b57\u53f7\u5185\u8054\u6e32\u67d3\u5728\u56fe\u5f62\u8282\u70b9\u5185\u3002\u5f53\u4f60\u9700\u8981\u4e00\u4e2a\u5305\u542b\u6240\u6709\u6570\u5b57\u7684\u5b8c\u6574\u8ba1\u5212\u7684\u53ef\u6253\u5370\u5feb\u7167\u65f6\u5f88\u6709\u7528\uff0c\u4f46\u5bf9\u4e8e\u6307\u6807\u8f83\u591a\u7684\u7b97\u5b50\u53ef\u80fd\u4f1a\u4f7f\u56fe\u5f62\u53d8\u5f97\u5f88\u5bbd\u3002\n\u9636\u6bb5\/\u4efb\u52a1\u5207\u6362\uff1a\u542f\u7528\u540e\uff0c\u4f1a\u5728\u6700\u5927\u503c\u65c1\u6dfb\u52a0 (stage X: task Y) \u6807\u6ce8\uff0c\u5e2e\u52a9\u4f60\u5b9a\u4f4d\u4ea7\u751f\u6781\u7aef\u503c\u7684\u5177\u4f53\u4efb\u52a1\u2014\u2014\u8fd9\u5bf9\u8c03\u8bd5\u6570\u636e\u503e\u659c\u975e\u5e38\u6709\u4ef7\u503c\u3002\n\u4fa7\u8fb9\u9762\u677f \u5728\u7d27\u51d1\u6a21\u5f0f\u4e0b\u70b9\u51fb\u8282\u70b9\u65f6\uff0c\u4fa7\u8fb9\u9762\u677f\u663e\u793a\uff1a\n\u6e05\u6670\u7684\u8868\u683c\u5e03\u5c40\u4e2d\u7684\u6307\u6807\u540d\u79f0 + \u683c\u5f0f\u5316\u540e\u7684\u503c WholeStageCodegen \u96c6\u7fa4\uff1a\u70b9\u51fb\u96c6\u7fa4\u8282\u70b9\u4f1a\u663e\u793a\u6240\u6709\u5b50\u7b97\u5b50\u7684\u6307\u6807\u5206\u7ec4\uff0c\u8ba9\u4f60\u770b\u5230\u5355\u4e2a\u4ee3\u7801\u751f\u6210\u5355\u5143\u5185\u53d1\u751f\u7684\u5168\u8c8c \u641c\u7d22\u8fc7\u6ee4\u5668\uff1a\u5bf9\u4e8e\u6307\u6807\u4f17\u591a\u7684\u8ba1\u5212\uff0c\u6587\u672c\u8fc7\u6ee4\u5668\u5e2e\u52a9\u4f60\u5feb\u901f\u627e\u5230\u5173\u5fc3\u7684\u6307\u6807 \u63cf\u8ff0\u63d0\u793a\u4fe1\u606f\uff1a\u5c06\u9f20\u6807\u60ac\u505c\u5728\u9762\u677f\u6807\u9898\u4e2d\u7684\u7b97\u5b50\u540d\u79f0\u4e0a\u53ef\u770b\u5230\u63d0\u793a\u4fe1\u606f\uff0c\u5f53\u8ba1\u5212\u4e2d\u51fa\u73b0\u591a\u4e2a\u76f8\u540c\u7c7b\u578b\u7684\u7b97\u5b50\u65f6\u6709\u52a9\u4e8e\u533a\u5206 REST API Spark UI \u975e\u5e38\u9002\u5408\u53ef\u89c6\u5316\u63a2\u7d22\uff0c\u4f46\u5bf9\u4e8e\u81ea\u52a8\u5316\u2014\u2014\u76d1\u63a7\u4eea\u8868\u677f\u3001\u6027\u80fd\u56de\u5f52\u6d4b\u8bd5\u6216\u4e8b\u540e\u5206\u6790\u811a\u672c\u2014\u2014\u4f60\u9700\u8981\u7f16\u7a0b\u8bbf\u95ee\u3002\n\u7aef\u70b9 SQL \u6267\u884c\u6307\u6807\u7684\u4e3b\u8981\u7aef\u70b9\u662f\uff1a\nGET \/api\/v1\/applications\/{appId}\/sql\/{executionId} \u67e5\u8be2\u53c2\u6570\uff1a\n\u53c2\u6570 \u9ed8\u8ba4\u503c \u63cf\u8ff0 details true \u5305\u542b\u8282\u70b9\u7ea7\u8be6\u60c5\u548c\u6307\u6807 planDescription true \u5305\u542b\u7269\u7406\u8ba1\u5212\u6587\u672c \u54cd\u5e94\u7ed3\u6784 \u5178\u578b\u7684\u54cd\u5e94\u5982\u4e0b\uff1a\n{ &#34;id&#34;: 0, &#34;status&#34;: &#34;COMPLETED&#34;, &#34;description&#34;: &#34;count at ...&#34;, &#34;planDescription&#34;: &#34;*(1) HashAggregate ...&#34;, &#34;submissionTime&#34;: &#34;2026-04-01T12:00:00Z&#34;, &#34;duration&#34;: 5432, &#34;runningJobIds&#34;: [], &#34;successJobIds&#34;: [0, 1], &#34;failedJobIds&#34;: [], &#34;nodes&#34;: [ { &#34;nodeId&#34;: 0, &#34;nodeName&#34;: &#34;HashAggregate&#34;, &#34;wholeStageCodegenId&#34;: 1, &#34;metrics&#34;: [ {&#34;name&#34;: &#34;number of output rows&#34;, &#34;value&#34;: &#34;5,000&#34;}, {&#34;name&#34;: &#34;peak memory&#34;, &#34;value&#34;: &#34;total (min, med, max)\\n512.0 MiB (128.0 MiB, 128.0 MiB, 128.0 MiB)&#34;}, {&#34;name&#34;: &#34;spill size&#34;, &#34;value&#34;: &#34;0.0 B&#34;} ] } ], &#34;edges&#34;: [ {&#34;fromId&#34;: 1, &#34;toId&#34;: 0} ] } \u5173\u952e\u8981\u70b9 Metric \u4ec5\u4ec5\u662f {name: String, value: String} \u2014\u2014 REST API \u8fd4\u56de\u7684\u662f\u683c\u5f0f\u5316\u540e\u7684\u663e\u793a\u5b57\u7b26\u4e32\uff0c\u800c\u975e\u539f\u59cb\u6570\u503c\u6216\u6307\u6807\u7c7b\u578b\u3002\u5982\u679c\u4f60\u9700\u8981\u5bf9\u6307\u6807\u503c\u505a\u7b97\u672f\u8fd0\u7b97\uff0c\u5fc5\u987b\u81ea\u884c\u89e3\u6790\u683c\u5f0f\u5316\u5b57\u7b26\u4e32\u3002\nwholeStageCodegenId \u8868\u793a\u7b97\u5b50\u6240\u5c5e\u7684\u4ee3\u7801\u751f\u6210\u96c6\u7fa4\u3002\u5177\u6709\u76f8\u540c ID \u7684\u7b97\u5b50\u88ab\u878d\u5408\u5230\u540c\u4e00\u4e2a\u751f\u6210\u7684 Java \u7c7b\u4e2d\u3002\nedges \u4ee5\u7236\u2192\u5b50\u7b97\u5b50\u5173\u7cfb\u5b9a\u4e49 DAG \u7ed3\u6784\u3002\u7ed3\u5408 nodeId \u503c\uff0c\u4f60\u53ef\u4ee5\u7f16\u7a0b\u91cd\u5efa\u5b8c\u6574\u7684\u8ba1\u5212\u6811\u3002\n\u5217\u8868\u7aef\u70b9\uff1a\u83b7\u53d6\u5e94\u7528\u7a0b\u5e8f\u7684\u6240\u6709 SQL \u6267\u884c\uff1a\nGET \/api\/v1\/applications\/{appId}\/sql?offset=0&amp;length=100 \u901a\u8fc7 offset \u548c length \u53c2\u6570\u652f\u6301\u5206\u9875\u3002\n\u901a\u8fc7 spark-history-cli \u8bbf\u95ee \u5bf9\u4e8e\u4ea4\u4e92\u5f0f\u63a2\u7d22\uff0cspark-history-cli \u5c01\u88c5\u4e86 REST API\uff0c\u63d0\u4f9b\u4fbf\u6377\u7684\u547d\u4ee4\uff1a\n# \u7ed3\u6784\u5316 JSON \u8f93\u51fa spark-history-cli --json -a &lt;app&gt; sql # \u5217\u51fa\u6240\u6709 SQL \u6267\u884c spark-history-cli --json -a &lt;app&gt; sql &lt;id&gt; # \u5355\u6b21\u6267\u884c\u53ca\u5176\u6307\u6807 # \u8ba1\u5212\u6587\u672c spark-history-cli -a &lt;app&gt; sql-plan &lt;id&gt; # \u5b8c\u6574\u8ba1\u5212 spark-history-cli -a &lt;app&gt; sql-plan &lt;id&gt; --view final # AQE \u540e\u7684\u6700\u7ec8\u8ba1\u5212 \u5b9e\u7528\u793a\u4f8b \u4f7f\u7528 Spark Listener \u7f16\u7a0b\u6355\u83b7\u6307\u6807 \u5982\u679c\u4f60\u60f3\u5b9e\u65f6\u54cd\u5e94\u6307\u6807\u2014\u2014\u4f8b\u5982\u8bb0\u5f55\u6162\u67e5\u8be2\u6216\u89e6\u53d1\u544a\u8b66\u2014\u2014\u53ef\u4ee5\u6ce8\u518c\u4e00\u4e2a QueryExecutionListener\uff1a\nspark.listenerManager.register(new QueryExecutionListener { override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = { val metrics = qe.executedPlan.collectLeaves().flatMap(_.metrics) metrics.foreach { case (name, metric) =&gt; println(s&#34;$name: ${metric.value}&#34;) } } override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {} }) \u8fd9\u4e2a\u76d1\u542c\u5668\u5728\u6bcf\u6b21\u6210\u529f\u7684\u67e5\u8be2\u6267\u884c\u540e\u89e6\u53d1\uff0c\u8ba9\u4f60\u53ef\u4ee5\u8bbf\u95ee\u5df2\u6267\u884c\u7684\u7269\u7406\u8ba1\u5212\uff0c\u904d\u5386\u7b97\u5b50\u5e76\u4ee5\u539f\u59cb Long \u503c\u76f4\u63a5\u8bfb\u53d6\u6307\u6807\u2014\u2014\u8fd9\u662f\u83b7\u53d6\u672a\u7ecf\u663e\u793a\u683c\u5f0f\u5316\u548c\u56db\u820d\u4e94\u5165\u7684\u6307\u6807\u503c\u7684\u552f\u4e00\u65b9\u5f0f\u3002\n\u4ece DataFrame \u6267\u884c\u4e2d\u8bbf\u95ee\u6307\u6807 \u5bf9\u4e8e\u4e34\u65f6\u8c03\u8bd5\u6216 REPL \u4ea4\u4e92\u5f0f\u63a2\u7d22\uff0c\u4f60\u53ef\u4ee5\u5728\u6267\u884c\u67e5\u8be2\u540e\u901a\u8fc7\u72b6\u6001\u5b58\u50a8\u8bbf\u95ee\u6307\u6807\uff1a\nval df = spark.sql(&#34;SELECT count(*) FROM my_table&#34;) df.collect() \/\/ \u8bbf\u95ee\u6700\u8fd1\u4e00\u6b21\u6267\u884c\u7684\u6307\u6807 val lastExec = spark.sharedState.statusStore.executionsList().last val metrics = spark.sharedState.statusStore.executionMetrics(lastExec.executionId) metrics.foreach { case (accId, value) =&gt; println(s&#34;$accId: $value&#34;) } \u8fd9\u79cd\u65b9\u5f0f\u5bf9\u4e8e\u96c6\u6210\u6d4b\u8bd5\u4e2d\u65ad\u8a00\u7279\u5b9a\u4f18\u5316\u662f\u5426\u751f\u6548\uff08\u4f8b\u5982&quot;\u88c1\u526a\u7684\u6587\u4ef6\u6570&rdquo; &gt; 0\uff09\u6216\u5728 Notebook \u4e2d\u4e0d\u5207\u6362 UI \u5373\u53ef\u68c0\u67e5\u6027\u80fd\u975e\u5e38\u6709\u7528\u3002\n\u6ce8\u610f\uff1a spark.sharedState.statusStore \u662f\u5185\u90e8 API\uff0c\u4ec5\u5728 Driver \u53ef\u7528\u3002\u5728 Spark Connect \u6a21\u5f0f\u4e0b\uff0c\u5ba2\u6237\u7aef\u65e0\u6cd5\u8bbf\u95ee status store\u2014\u2014\u8bf7\u6539\u7528 REST API\u3002\n\u7cfb\u5217\u603b\u7ed3 \u672c\u6587\u662f Spark SQL Metrics \u4e09\u90e8\u66f2\u6df1\u5ea6\u89e3\u6790\u7684\u6700\u7ec8\u7bc7\uff1a\n\u5728**\u7b2c\u4e00\u90e8\u5206**\u4e2d\uff0c\u6211\u4eec\u5efa\u7acb\u4e86\u57fa\u7840\uff1a\u4e94\u79cd\u6307\u6807\u7c7b\u578b\uff08sum\u3001size\u3001timing\u3001nanoTiming\u3001average\uff09\u3001total (min, med, max) \u805a\u5408\u683c\u5f0f\uff0c\u4ee5\u53ca\u8986\u76d6\u6240\u6709\u7b97\u5b50\u7684 100+ \u6307\u6807\u7684\u5b8c\u6574\u53c2\u8003\u3002\n\u5728**\u7b2c\u4e8c\u90e8\u5206**\u4e2d\uff0c\u6211\u4eec\u8ffd\u8e2a\u4e86\u5185\u90e8\u751f\u547d\u5468\u671f\u2014\u2014AccumulatorV2 \u503c\u5982\u4f55\u4ece Executor \u4efb\u52a1\u6d41\u5411 Driver\uff0cSQLAppStatusListener \u5982\u4f55\u805a\u5408\u5b83\u4eec\uff0c\u4ee5\u53ca\u81ea\u9002\u5e94\u67e5\u8be2\u6267\u884c\uff08AQE\uff09\u5982\u4f55\u5229\u7528 Shuffle \u7edf\u8ba1\u4fe1\u606f\uff08\u800c\u975e SQL \u6307\u6807\uff09\u505a\u51fa\u8fd0\u884c\u65f6\u51b3\u7b56\uff0c\u5305\u62ec\u5206\u533a\u5408\u5e76\u3001\u503e\u659c Join \u4f18\u5316\u548c\u672c\u5730 Shuffle \u8bfb\u53d6\u3002\n\u5728**\u7b2c\u4e09\u90e8\u5206\uff08\u672c\u6587\uff09**\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u6269\u5c55\u70b9\uff1a\u8fde\u63a5\u5668\u5f00\u53d1\u8005\u5982\u4f55\u901a\u8fc7 DataSource V2 API \u5b9a\u4e49\u81ea\u5b9a\u4e49\u6307\u6807\uff0cUI \u5982\u4f55\u901a\u8fc7 DOT\/JSON\/dagre-d3 \u7ba1\u9053\u6e32\u67d3\u8ba1\u5212\u548c\u6307\u6807\uff0c\u4ee5\u53ca\u5982\u4f55\u901a\u8fc7 REST API \u548c Spark Listener \u7f16\u7a0b\u67e5\u8be2\u6307\u6807\u3002\n\u8fd9\u4e09\u4e2a\u89c6\u89d2\u2014\u2014\u6307\u6807\u6d4b\u91cf\u4ec0\u4e48\u3001\u5185\u90e8\u5982\u4f55\u5de5\u4f5c\u3001\u5982\u4f55\u6269\u5c55\u548c\u8bbf\u95ee\u5b83\u4eec\u2014\u2014\u5171\u540c\u6784\u6210\u4e86\u6709\u6548\u4f7f\u7528 SQL \u6307\u6807\u8fdb\u884c\u6027\u80fd\u8c03\u8bd5\u3001\u76d1\u63a7\u548c\u8fde\u63a5\u5668\u5f00\u53d1\u6240\u9700\u7684\u5b8c\u6574\u77e5\u8bc6\u4f53\u7cfb\u3002\n\u5728\u7b2c\u4e00\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e94\u79cd\u6307\u6807\u7c7b\u578b\u548c\u5b8c\u6574\u53c2\u8003\u3002\u5728\u7b2c\u4e8c\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u8ffd\u8e2a\u4e86\u5185\u90e8\u751f\u547d\u5468\u671f\u548c AQE \u5bf9 Shuffle \u7edf\u8ba1\u4fe1\u606f\u7684\u4f7f\u7528\u3002\u672c\u6587\u4e3a\u7cfb\u5217\u7684\u7ec8\u7ae0\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/sql-metrics-part3-extension-api\/","summary":"SQL Metrics \u4e09\u90e8\u66f2\u7684\u7b2c\u4e09\u90e8\u5206\u3002\u5982\u4f55\u901a\u8fc7 DataSource V2 API \u6269\u5c55\u81ea\u5b9a\u4e49\u6307\u6807\u3001UI \u5982\u4f55\u6e32\u67d3\u6307\u6807\u3001\u4ee5\u53ca\u5982\u4f55\u901a\u8fc7 REST API \u7f16\u7a0b\u67e5\u8be2\u6307\u6807\u3002","title":"\u6df1\u5165 Spark SQL Metrics\uff08\u7b2c\u4e09\u90e8\u5206\uff09\uff1a\u6269\u5c55 API\u3001UI \u6e32\u67d3\u4e0e REST API"},{"content":"\u8fd9\u662f Spark SQL Metrics \u7cfb\u5217\u7684\u52a0\u9910\u7bc7\uff1a\n\u7b2c\u4e00\u90e8\u5206\uff1a\u6307\u6807\u7c7b\u578b\u3001\u5b8c\u6574\u53c2\u8003\u548c\u542b\u4e49 \u7b2c\u4e8c\u90e8\u5206\uff1a\u5185\u90e8\u5b9e\u73b0\u673a\u5236\uff0c\u4ee5\u53ca AQE \u5982\u4f55\u5229\u7528\u6307\u6807\u505a\u51fa\u8fd0\u884c\u65f6\u51b3\u7b56 \u7b2c\u4e09\u90e8\u5206\uff1a\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API \u7b2c\u56db\u90e8\u5206\uff08\u672c\u6587\uff09\uff1aGluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf Gluten \u539f\u751f\u5f15\u64ce\u5982\u4f55\u4ea7\u751f\u6307\u6807 Apache Gluten \u7528\u539f\u751f C++ \u5f15\u64ce\u2014\u2014Velox \u6216 ClickHouse\u2014\u2014\u66ff\u6362\u4e86 JVM \u6267\u884c\u5f15\u64ce\u3002\u7531\u4e8e\u539f\u751f\u7b97\u5b50\u72ec\u7acb\u6267\u884c\uff08\u6ca1\u6709\u88ab JVM \u4ee3\u7801\u751f\u6210\u878d\u5408\uff09\uff0c\u6bcf\u4e2a C++ \u7b97\u5b50\u90fd\u662f\u4e00\u4e2a\u72ec\u7acb\u7684\u51fd\u6570\u8c03\u7528\uff0c\u62e5\u6709\u81ea\u5df1\u7684\u8ba1\u65f6\u57fa\u7840\u8bbe\u65bd\u3002\u4f5c\u4e3a\u81ea\u7136\u7ed3\u679c\uff0cGluten \u66b4\u9732\u4e86 60+ \u4e2a\u6307\u6807\uff0c\u5305\u62ec\u6bcf\u7b97\u5b50\u7684 Wall Clock Time\u3001\u5206\u9636\u6bb5\u7684 Join \u6307\u6807\u3001\u539f\u751f Spill \u8ffd\u8e2a\u3001\u52a8\u6001\u8fc7\u6ee4\u5668\u7edf\u8ba1\u548c\u6309\u5b58\u50a8\u5c42\u7ea7\u5206\u89e3\u7684 I\/O \u6307\u6807\u3002\n\u4e09\u5c42\u67b6\u6784 Gluten \u7684\u6307\u6807\u7cfb\u7edf\u6865\u63a5\u4e86\u4e24\u4e2a\u4e16\u754c\uff1aSpark \u7684 JVM \u7aef SQLMetric \u6846\u67b6\u548c\u539f\u751f C++ \u6267\u884c\u5f15\u64ce\u3002\u67b6\u6784\u5206\u4e3a\u4e09\u5c42\uff1a\nSpark SQLMetric (JVM) \u2190\u2500\u2500 MetricsUpdater\uff08\u6865\u63a5\u5c42\uff09 \u2190\u2500\u2500 Velox\/CH (C++) Map[String, SQLMetric] updateNativeMetrics() long[] \u6570\u7ec4\uff0c\u901a\u8fc7 JNI \u4f20\u9012 \u7b2c\u4e00\u5c42\uff1aSpark SQLMetric\uff08\u4e0d\u53d8\uff09 \u6bcf\u4e2a *ExecTransformer\u2014\u2014Gluten \u5bf9\u539f\u751f Spark *Exec \u7b97\u5b50\u7684\u66ff\u4ee3\u2014\u2014\u90fd\u91cd\u5199\u4e86 lazy val metrics\uff0c\u4f7f\u7528\u4e0e\u539f\u751f Spark \u76f8\u540c\u7684\u6a21\u5f0f\u3002\u4f46\u5b83\u4e0d\u662f\u786c\u7f16\u7801\u6307\u6807\u96c6\uff0c\u800c\u662f\u59d4\u6258\u7ed9\u540e\u7aef\uff1a\nBackendsApiManager.getMetricsApiInstance .genFilterTransformerMetrics(sparkContext) \u8fd9\u610f\u5473\u7740 Velox \u540e\u7aef\u548c ClickHouse \u540e\u7aef\u53ef\u4ee5\u4e3a\u540c\u4e00\u4e2a\u903b\u8f91\u7b97\u5b50\u5b9a\u4e49\u5b8c\u5168\u4e0d\u540c\u7684\u6307\u6807\u3002\u8fd0\u884c\u5728 Velox \u4e0a\u7684 FilterExecTransformer \u53ef\u80fd\u66b4\u9732 wallNanos \u548c peakMemoryBytes\uff0c\u800c\u8fd0\u884c\u5728 ClickHouse \u4e0a\u7684\u540c\u4e00\u7b97\u5b50\u53ef\u80fd\u66b4\u9732\u4e0d\u540c\u7684\u5185\u90e8\u8ba1\u6570\u5668\u3002\u6307\u6807\u5b9a\u4e49\u662f\u540e\u7aef\u7279\u6709\u7684\uff0c\u4f46\u5b83\u4eec\u6700\u7ec8\u90fd\u6210\u4e3a\u6807\u51c6\u7684 SQLMetric \u5bf9\u8c61\uff0cSpark UI \u548c REST API \u53ef\u4ee5\u6b63\u5e38\u663e\u793a\u3002\n\u7b2c\u4e8c\u5c42\uff1aMetricsUpdater\uff08Gluten \u7684\u6865\u63a5\u62bd\u8c61\uff09 MetricsUpdater trait \u662f Gluten \u7684\u6838\u5fc3\u6865\u63a5\u62bd\u8c61\u3002\u5b83\u5b9a\u4e49\u4e86\u4e00\u4e2a\u65b9\u6cd5\uff1a\ntrait MetricsUpdater extends Serializable { def updateNativeMetrics(opMetrics: IOperatorMetrics): Unit } \u6bcf\u4e2a\u7b97\u5b50\u90fd\u6709\u5bf9\u5e94\u7684 MetricsUpdater \u5b9e\u73b0\u3002\u8fd9\u4e9b updater \u88ab\u7ec4\u7ec7\u6210 MetricsUpdaterTree\uff0c\u955c\u50cf\u8ba1\u5212 DAG\u2014\u2014\u6bcf\u4e2a\u7b97\u5b50\u4e00\u4e2a updater\uff0c\u4ee5\u4e0e\u7269\u7406\u8ba1\u5212\u76f8\u540c\u7684\u7236\u5b50\u7ed3\u6784\u8fde\u63a5\u3002\n\u4e3a\u4ec0\u4e48\u9700\u8981\u4e00\u68f5\u5355\u72ec\u7684\u6811\uff1f\u56e0\u4e3a MetricsUpdaterTree \u662f Serializable \u7684\u2014\u2014\u5b83\u53ef\u4ee5\u88ab\u53d1\u9001\u5230 Executor \u7aef\uff0c\u800c\u65e0\u9700\u5e8f\u5217\u5316\u5b8c\u6574\u7684 SparkPlan\uff08\u540e\u8005\u5305\u542b\u4e0d\u53ef\u5e8f\u5217\u5316\u7684\u5bf9\u8c61\u5982 SparkContext\uff09\u3002\u5728 Executor \u4e0a\uff0c\u539f\u751f\u6267\u884c\u5b8c\u6210\u540e\uff0c\u8fd9\u68f5\u6811\u904d\u5386\u539f\u751f\u6307\u6807\u5e76\u66f4\u65b0 SQLMetric Accumulator\u3002\n\u4e09\u4e2a\u7279\u6b8a\u7684\u54e8\u5175\u5b9e\u4f8b\u5904\u7406\u8fb9\u754c\u60c5\u51b5\uff1a\nMetricsUpdater.None \u2014\u2014 \u7b97\u5b50\u6ca1\u6709\u9700\u8981\u66f4\u65b0\u7684\u6307\u6807 MetricsUpdater.Todo \u2014\u2014 \u8be5\u7b97\u5b50\u7684\u6307\u6807\u652f\u6301\u5c1a\u672a\u5b9e\u73b0 MetricsUpdater.Terminate \u2014\u2014 \u5206\u652f\u5728\u6b64\u7ec8\u6b62\uff08\u6ca1\u6709\u5b50\u8282\u70b9\u9700\u8981\u9012\u5f52\uff09 \u4ee5\u4e0b\u662f\u4e00\u4e2a\u5177\u4f53\u4f8b\u5b50\u2014\u2014FilterMetricsUpdater\uff1a\nclass FilterMetricsUpdater(val metrics: Map[String, SQLMetric]) extends MetricsUpdater { override def updateNativeMetrics(opMetrics: IOperatorMetrics): Unit = { val m = opMetrics.asInstanceOf[OperatorMetrics] metrics(&#34;numOutputRows&#34;) += m.outputRows metrics(&#34;outputVectors&#34;) += m.outputVectors metrics(&#34;outputBytes&#34;) += m.outputBytes metrics(&#34;cpuCount&#34;) += m.cpuCount metrics(&#34;wallNanos&#34;) += m.wallNanos metrics(&#34;peakMemoryBytes&#34;) += m.peakMemoryBytes metrics(&#34;numMemoryAllocations&#34;) += m.numMemoryAllocations } } \u6ce8\u610f\u6bcf\u4e2a\u539f\u751f\u6307\u6807\u5b57\u6bb5\uff08\u5982 m.wallNanos\uff09\u5982\u4f55\u76f4\u63a5\u6620\u5c04\u5230 SQLMetric \u7684\u952e\u3002updater \u662f\u539f\u751f C++ \u547d\u540d\u548c Spark \u6307\u6807\u547d\u540d\u7a7a\u95f4\u4e4b\u95f4\u7684\u7ffb\u8bd1\u5c42\u3002\n\u7b2c\u4e09\u5c42\uff1a\u901a\u8fc7 JNI \u4f20\u9012\u7684\u539f\u751f\u6307\u6807 \u5728 C++ \u7aef\uff0cVelox \u5f15\u64ce\u5728\u6267\u884c\u8fc7\u7a0b\u4e2d\u4ee5\u6570\u7ec4\u5f62\u5f0f\u6536\u96c6\u6307\u6807\u2014\u2014\u6bcf\u4e2a\u7b97\u5b50\u7d22\u5f15\u4e00\u4e2a\u6761\u76ee\u3002\u5f53\u4e00\u4e2a\u4efb\u52a1\u5b8c\u6210\u65f6\uff0cGluten \u901a\u8fc7 JNI \u8fb9\u754c\u5c06\u8fd9\u4e9b\u6307\u6807\u4f5c\u4e3a\u5305\u542b long[] \u6570\u7ec4\u7684 Metrics \u5bf9\u8c61\u4f20\u9012\uff1a\ninputRows[] \u2014 \u6bcf\u4e2a\u7b97\u5b50\u6d88\u8d39\u7684\u884c\u6570 outputRows[] \u2014 \u6bcf\u4e2a\u7b97\u5b50\u4ea7\u51fa\u7684\u884c\u6570 wallNanos[] \u2014 \u6bcf\u4e2a\u7b97\u5b50\u7684\u5899\u949f\u7eb3\u79d2\u6570 cpuCount[] \u2014 \u6bcf\u4e2a\u7b97\u5b50\u7684 CPU \u65f6\u95f4 peakMemoryBytes[] \u2014 \u6bcf\u4e2a\u7b97\u5b50\u7684\u5cf0\u503c\u5185\u5b58 ... \u2014 \u8fd8\u6709 20+ \u4e2a\u6570\u7ec4 MetricsUpdatingFunction \u904d\u5386 MetricsUpdaterTree\uff0c\u6309\u7b97\u5b50\u7d22\u5f15\u4ece\u6570\u7ec4\u4e2d\u63d0\u53d6\u6bcf\u4e2a\u7b97\u5b50\u7684\u503c\u3002\u8fd9\u662f\u4e00\u6b21\u6279\u91cf\u4f20\u8f93\u2014\u2014\u6bcf\u4e2a\u4efb\u52a1\u4e00\u6b21 JNI \u8c03\u7528\uff0c\u800c\u975e\u6bcf\u884c\u4e00\u6b21\u2014\u2014\u5c06\u5f00\u9500\u964d\u5230\u6700\u4f4e\u3002\nGluten \u65b0\u589e\u4e86\u4ec0\u4e48\u2014\u201460+ \u4e2a\u6307\u6807 \u8ba9\u6211\u4eec\u6309\u7c7b\u522b\u770b\u770b Gluten \u5f15\u5165\u7684\u5177\u4f53\u6307\u6807\u3002\n\u6bcf\u7b97\u5b50\u6267\u884c\u6307\u6807 \u5728\u539f\u751f Spark \u4e2d\uff0c\u5927\u591a\u6570\u7b97\u5b50\u4ec5\u62a5\u544a numOutputRows\u3002\u5728 Gluten \u4e2d\uff0c\u6bcf\u4e2a\u7b97\u5b50\u90fd\u62e5\u6709\u4ee5\u4e0b\u6307\u6807\uff1a\n\u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b \u6d4b\u91cf\u5185\u5bb9 wallNanos time of {\u7b97\u5b50} nsTiming \u6bcf\u7b97\u5b50\u7684 Wall Clock Time cpuCount cpu wall time count sum getOutput() \u8c03\u7528\u6b21\u6570\uff08\u6279\u6b21\u6570\uff09 peakMemoryBytes peak memory bytes size \u5cf0\u503c\u5185\u5b58\u4f7f\u7528 numMemoryAllocations number of memory allocations sum \u5185\u5b58\u5206\u914d\u6b21\u6570 outputRows number of output rows sum \u8f93\u51fa\u884c\u6570 outputVectors number of output vectors sum \u8f93\u51fa\u5411\u91cf\uff08\u6279\u6b21\uff09\u6570 outputBytes number of output bytes size \u5217\u5f0f\u683c\u5f0f\u7684\u8f93\u51fa\u6570\u636e\u91cf loadLazyVectorTime time to load lazy vectors timing \u52a0\u8f7d\u60f0\u6027\u6c42\u503c\u5411\u91cf\u7684\u65f6\u95f4 \u6ce8\u610f\uff1awallNanos \u4f7f\u7528\u7b97\u5b50\u7279\u5b9a\u7684\u663e\u793a\u540d\u79f0\u2014\u2014&ldquo;time of filter&rdquo;\u3001&ldquo;time of sort&rdquo;\u3001&ldquo;time of scan and filter&rdquo;\u3001&ldquo;time of project&rdquo; \u7b49\u3002\n\u5728\u6bcf\u4e2a\u7b97\u5b50\u4e0a\u90fd\u6709 wallNanos \u4f7f\u5f97\u8bc6\u522b\u539f\u751f\u6267\u884c\u4e2d\u7684\u74f6\u9888\u7b97\u5b50\u53d8\u5f97\u76f4\u89c2\u3002\n\u6df1\u5165\u7406\u89e3 wallNanos \u548c cpuCount \u8fd9\u4e24\u4e2a\u6307\u6807\u503c\u5f97\u7279\u522b\u5173\u6ce8\uff0c\u56e0\u4e3a\u5b83\u4eec\u5bf9\u6027\u80fd\u5206\u6790\u6700\u4e3a\u91cd\u8981\u3002\n\u4e24\u8005\u90fd\u6765\u81ea Velox \u7684 CpuWallTiming \u7ed3\u6784\u4f53\uff0c\u901a\u8fc7 RAII \u8ba1\u65f6\u5668\uff08DeltaCpuWallTimer\uff09\u5305\u88c5\u6bcf\u4e2a\u7b97\u5b50\u7684 getOutput() \u8c03\u7528\u6765\u6536\u96c6\uff1a\nstruct CpuWallTiming { uint64_t count; \/\/ getOutput() \u8c03\u7528\u6b21\u6570\uff08\u6279\u6b21\u6570\uff09 uint64_t wallNanos; \/\/ \u603b Wall Clock Time\uff08steady_clock\uff0c\u7eb3\u79d2\uff09 uint64_t cpuNanos; \/\/ \u603b CPU \u65f6\u95f4\uff08CLOCK_THREAD_CPUTIME_ID\uff0c\u7eb3\u79d2\uff09 }; wallNanos \u2014 \u4f7f\u7528 std::chrono::steady_clock \u6d4b\u91cf\u3002\u6355\u83b7\u603b\u5b9e\u9645\u7ecf\u8fc7\u65f6\u95f4\uff0c\u5305\u62ec\u7b97\u5b50\u7b49\u5f85\u5b50\u7b97\u5b50\u4ea7\u751f\u6570\u636e\u3001I\/O \u7b49\u5f85\u6216\u7ebf\u7a0b\u8c03\u5ea6\u5ef6\u8fdf\u7684\u65f6\u95f4\u3002\ncpuCount \u2014 \u5c3d\u7ba1\u540d\u5b57\u53eb cpuCount\uff0c\u4f46\u5b83\u5b9e\u9645\u4e0a\u662f\u8c03\u7528\u6b21\u6570\uff08getOutput() \u88ab\u8c03\u7528\u7684\u6b21\u6570 = \u5904\u7406\u7684\u6279\u6b21\u6570\uff09\uff0c\u800c\u4e0d\u662f CPU \u65f6\u95f4\u3002Gluten JNI \u6865\u63a5\u5c06 CpuWallTiming.count \u6620\u5c04\u5230 cpuCount \u6307\u6807\u3002\n\u5982\u4f55\u89e3\u8bfb\uff1a\n\u573a\u666f wallNanos cpuCount \u542b\u4e49 \u5927\u6570\u636e\u91cf\uff0c\u5747\u5300\u5de5\u4f5c \u9ad8 \u9ad8 \u5904\u7406\u4e86\u5f88\u591a\u6279\u6b21\uff0c\u7b26\u5408\u9884\u671f \u5c11\u91cf\u6279\u6b21\uff0c\u6bcf\u4e2a\u5f88\u6162 \u9ad8 \u4f4e \u53ef\u80fd\u5b58\u5728\u6570\u636e\u503e\u659c\u6216\u590d\u6742\u7684\u6bcf\u6279\u5904\u7406\u903b\u8f91 \u53f6\u5b50\u7b97\u5b50\uff08\u626b\u63cf\uff09 \u9ad8 \u2014 \u4e3b\u8981\u662f I\/O \u65f6\u95f4\uff08\u53e6\u67e5 ioWaitTime\uff09 \u4e2d\u95f4\u7b97\u5b50\uff08\u8fc7\u6ee4\uff09 \u9ad8 \u2014 \u5305\u542b\u7b49\u5f85\u5b50\u7b97\u5b50\u7684\u65f6\u95f4\u2014\u2014\u4e0e\u5b50\u7b97\u5b50\u7684 wallNanos \u5bf9\u6bd4 \u91cd\u8981\u63d0\u9192\u2014\u2014wallNanos \u5305\u542b\u5b50\u7b97\u5b50\u7b49\u5f85\u65f6\u95f4\uff1a\n\u7531\u4e8e wallNanos \u5305\u88c5\u4e86\u6574\u4e2a getOutput() \u8c03\u7528\uff0c\u7236\u7b97\u5b50\u7684 wallNanos \u5305\u542b\u4e86\u7b49\u5f85\u5b50\u7b97\u5b50\u4ea7\u751f\u6570\u636e\u7684\u963b\u585e\u65f6\u95f4\u3002\u56e0\u6b64\uff1a\n\u53f6\u5b50\u7b97\u5b50\uff08\u626b\u63cf\uff09\uff1awallNanos \u2248 I\/O + \u8ba1\u7b97\u65f6\u95f4 \u4e2d\u95f4\u7b97\u5b50\uff08\u8fc7\u6ee4\uff09\uff1awallNanos = \u81ea\u8eab\u8ba1\u7b97 + \u5b50\u7b97\u5b50\u626b\u63cf\u65f6\u95f4 \u4e0d\u80fd\u7b80\u5355\u5730\u5bf9\u6240\u6709\u7b97\u5b50\u7684 wallNanos \u6c42\u548c\u2014\u2014\u90a3\u4f1a\u91cd\u590d\u8ba1\u7b97 \u8981\u9694\u79bb\u7b97\u5b50\u81ea\u8eab\u7684\u8d21\u732e\uff0c\u5c06\u5176 wallNanos \u4e0e\u5b50\u7b97\u5b50\u7684 wallNanos \u505a\u5dee\u3002Velox \u8fd8\u5355\u72ec\u8ffd\u8e2a I\/O \u76f8\u5173\u6307\u6807\uff08ioWaitTime\u3001dataSourceReadTime\uff09\u4ee5\u5e2e\u52a9\u5206\u79bb\u7eaf I\/O \u4e0e\u8ba1\u7b97\u3002\n\u626b\u63cf\u4e13\u7528\u6307\u6807 \u539f\u751f Spark \u7684 Scan \u7b97\u5b50\u6709 scanTime \u548c numFiles\u3002Gluten \u6df1\u5165\u5f97\u591a\uff1a\n\u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 skippedSplits \/ processedSplits number of skipped\/processed splits \u6587\u4ef6\u5206\u7247\u88c1\u526a\u6548\u679c skippedStrides \/ processedStrides number of skipped\/processed row groups \u6587\u4ef6\u5185\u884c\u7ec4\/\u6761\u5e26\u7ea7\u88c1\u526a ioWaitTime io wait time I\/O \u64cd\u4f5c\u7684\u7b49\u5f85\u65f6\u95f4 storageReadBytes storage read bytes \u4ece\u8fdc\u7a0b\u5b58\u50a8\u8bfb\u53d6\u7684\u5b57\u8282\u6570 localReadBytes local ssd read bytes \u4ece\u672c\u5730 SSD \u7f13\u5b58\u8bfb\u53d6\u7684\u5b57\u8282\u6570 ramReadBytes ram read bytes \u4ece\u5185\u5b58\u7f13\u5b58\u8bfb\u53d6\u7684\u5b57\u8282\u6570 preloadSplits number of preloaded splits \u9884\u52a0\u8f7d\u7684\u5206\u7247\u6570\uff08\u9884\u53d6\uff09 dataSourceAddSplitTime data source add split time \u7ba1\u7406\u5206\u7247\u5206\u914d\u7684\u65f6\u95f4 dataSourceReadTime data source read time \u4ece\u6570\u636e\u6e90\u8bfb\u53d6\u6570\u636e\u7684\u65f6\u95f4 storageReadBytes \/ localReadBytes \/ ramReadBytes \u7684\u5206\u89e3\u5728\u4e91\u73af\u5883\u4e2d\u5c24\u5176\u6709\u4ef7\u503c\u3002\u5982\u679c\u5927\u90e8\u5206\u8bfb\u53d6\u6765\u81ea storageReadBytes\uff0c\u8bf4\u660e\u7f13\u5b58\u8fd8\u6ca1\u70ed\u8d77\u6765\u3002\u5982\u679c ioWaitTime \u5728 wallNanos \u4e2d\u5360\u4e3b\u5bfc\u5730\u4f4d\uff0c\u74f6\u9888\u662f\u7f51\u7edc I\/O \u800c\u975e CPU\u3002\nSpill \u6307\u6807 \u539f\u751f Spark \u5728\u9636\u6bb5\u7ea7\u522b\u8ffd\u8e2a Spill\u3002Gluten \u5728\u6bcf\u7b97\u5b50\u3001\u6bcf\u9636\u6bb5\u7ea7\u522b\u8ffd\u8e2a\uff1a\n\u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 spilledBytes bytes written for spilling Spill \u5230\u78c1\u76d8\u7684\u6570\u636e\u91cf spilledRows total rows written for spilling Spill \u7684\u884c\u6570 spilledPartitions total spilled partitions \u6d89\u53ca Spill \u7684\u5206\u533a\u6570 spilledFiles total spilled files \u521b\u5efa\u7684 Spill \u6587\u4ef6\u6570 \u5bf9\u4e8e Join \u7b97\u5b50\uff0cSpill \u5728 Build \u9636\u6bb5\u548c Probe \u9636\u6bb5\u5206\u522b\u8ffd\u8e2a\uff08\u89c1\u4e0b\u4e00\u8282\uff09\uff0c\u56e0\u6b64\u4f60\u53ef\u4ee5\u7cbe\u786e\u5b9a\u4f4d\u54ea\u4e2a\u9636\u6bb5\u6b63\u9762\u4e34\u5185\u5b58\u538b\u529b\u3002\n\u52a8\u6001\u8fc7\u6ee4\u5668\u6307\u6807 \u52a8\u6001\u8fc7\u6ee4\u5668\uff08\u4e5f\u79f0\u8fd0\u884c\u65f6\u8fc7\u6ee4\u5668\uff09\u7531 Join \u7b97\u5b50\u751f\u6210\uff0c\u7528\u4e8e\u5728\u8fd0\u884c\u65f6\u88c1\u526a\u626b\u63cf\u7ed3\u679c\u3002\u539f\u751f Spark \u6ca1\u6709\u76f8\u5173\u6307\u6807\u3002Gluten \u8ffd\u8e2a\u4e86\u5b8c\u6574\u7684\u751f\u547d\u5468\u671f\uff1a\n\u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 numDynamicFiltersProduced number of dynamic filters produced Join Build \u7aef\u751f\u6210\u7684\u8fd0\u884c\u65f6\u8fc7\u6ee4\u5668\u6570 numDynamicFiltersAccepted number of dynamic filters accepted \u88ab Scan \u7b97\u5b50\u5e94\u7528\u7684\u8fd0\u884c\u65f6\u8fc7\u6ee4\u5668\u6570 numReplacedWithDynamicFilterRows number of replaced with dynamic filter rows \u88ab\u8fd0\u884c\u65f6\u8fc7\u6ee4\u5668\u6d88\u9664\u7684\u884c\u6570 \u5982\u679c numDynamicFiltersProduced &gt; 0 \u4f46 numDynamicFiltersAccepted = 0\uff0c\u8bf4\u660e\u8fc7\u6ee4\u5668\u5df2\u751f\u6210\u4f46\u672a\u88ab\u5e94\u7528\u2014\u2014\u8fd9\u8868\u660e\u626b\u63cf\u548c Join \u4e4b\u95f4\u7684\u8fde\u63a5\u65b9\u5f0f\u4e0e\u4f18\u5316\u5668\u7684\u9884\u671f\u4e0d\u7b26\u3002\u5982\u679c numReplacedWithDynamicFilterRows \u662f\u4e00\u4e2a\u5f88\u5927\u7684\u6570\u5b57\uff0c\u8bf4\u660e\u8fd0\u884c\u65f6\u8fc7\u6ee4\u5668\u8282\u7701\u4e86\u5927\u91cf\u5de5\u4f5c\u3002\nJoin \u9636\u6bb5\u5206\u79bb\u2014\u2014\u6bcf\u4e2a Join 20+ \u4e2a\u6307\u6807 \u8fd9\u53ef\u4ee5\u8bf4\u662f Gluten \u6700\u5f3a\u5927\u7684\u6307\u6807\u589e\u5f3a\u3002\u539f\u751f Spark \u7684 Join \u7b97\u5b50\u4ec5\u62a5\u544a\u4e00\u4e2a buildTime \u548c numOutputRows\u3002Gluten \u5c06\u6bcf\u4e2a Join \u62c6\u5206\u4e3a\u5176\u7ec4\u6210\u9636\u6bb5\uff0c\u6bcf\u4e2a\u9636\u6bb5\u90fd\u6709\u72ec\u7acb\u7684\u6307\u6807\uff1a\nBuild \u9636\u6bb5\uff1a\n\u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 hashBuildInputRows number of hash build input rows Build \u7aef\u6d88\u8d39\u7684\u884c\u6570 hashBuildOutputRows number of hash build output rows Hash Table \u4e2d\u7684\u884c\u6570 hashBuildWallNanos time of hash build Build \u9636\u6bb5\u7684 Wall Clock Time hashBuildPeakMemoryBytes hash build peak memory bytes Build \u9636\u6bb5\u7684\u5cf0\u503c\u5185\u5b58 hashBuildSpilledBytes hash build spilled bytes Build \u9636\u6bb5 Spill \u7684\u6570\u636e\u91cf hashBuildSpilledRows hash build spilled rows Build \u9636\u6bb5 Spill \u7684\u884c\u6570 hashBuildSpilledPartitions hash build spilled partitions Build \u9636\u6bb5 Spill \u7684\u5206\u533a\u6570 hashBuildSpilledFiles hash build spilled files Build \u9636\u6bb5\u521b\u5efa\u7684 Spill \u6587\u4ef6\u6570 Probe \u9636\u6bb5\uff1a\n\u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 hashProbeInputRows number of hash probe input rows Probe \u7aef\u6d88\u8d39\u7684\u884c\u6570 hashProbeOutputRows number of hash probe output rows \u63a2\u6d4b\u540e\u8f93\u51fa\u7684\u884c\u6570 hashProbeWallNanos time of hash probe Probe \u9636\u6bb5\u7684 Wall Clock Time hashProbePeakMemoryBytes hash probe peak memory bytes Probe \u9636\u6bb5\u7684\u5cf0\u503c\u5185\u5b58 hashProbeSpilledBytes hash probe spilled bytes Probe \u9636\u6bb5 Spill \u7684\u6570\u636e\u91cf hashProbeSpilledRows hash probe spilled rows Probe \u9636\u6bb5 Spill \u7684\u884c\u6570 hashProbeSpilledPartitions hash probe spilled partitions Probe \u9636\u6bb5 Spill \u7684\u5206\u533a\u6570 hashProbeSpilledFiles hash probe spilled files Probe \u9636\u6bb5\u521b\u5efa\u7684 Spill \u6587\u4ef6\u6570 \u524d\u7f6e\/\u540e\u7f6e Project\uff1a\n\u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 streamPreProjectionWallNanos time of stream preProjection \u6d41\u4fa7\uff08Probe \u4fa7\uff09Join \u524d\u8868\u8fbe\u5f0f\u6c42\u503c\u65f6\u95f4 streamPreProjectionCpuCount stream preProject cpu wall time count \u6d41\u4fa7\u524d\u7f6e Project \u7684\u6279\u6b21\u6570 buildPreProjectionWallNanos time to build preProjection Build \u4fa7 Join \u524d\u8868\u8fbe\u5f0f\u6c42\u503c\u65f6\u95f4 buildPreProjectionCpuCount preProject cpu wall time count Build \u4fa7\u524d\u7f6e Project \u7684\u6279\u6b21\u6570 postProjectionWallNanos time of postProjection Join \u540e\u8868\u8fbe\u5f0f\u6c42\u503c\u65f6\u95f4 postProjectionCpuCount postProject cpu wall time count \u540e\u7f6e Project \u7684\u6279\u6b21\u6570 \u5728\u539f\u751f Spark \u4e2d\uff0c\u4e00\u4e2a\u6162 Join \u51e0\u4e4e\u4e0d\u7ed9\u4f60\u4efb\u4f55\u53ef\u7528\u4fe1\u606f\u2014\u2014\u4f60\u53ea\u77e5\u9053\u5b83\u5f88\u6162\uff0c\u4f46\u4e0d\u77e5\u9053\u4e3a\u4ec0\u4e48\u3002\u6709\u4e86 Gluten\uff0c\u4f60\u53ef\u4ee5\u7acb\u5373\u770b\u5230\uff1a\u662f Build \u9636\u6bb5\u6162\uff08\u4e5f\u8bb8 Build \u7aef\u6570\u636e\u592a\u5927\u4e86\uff09\uff1f\u8fd8\u662f Probe \u9636\u6bb5\u6162\uff08\u4e5f\u8bb8\u54c8\u5e0c\u51b2\u7a81\u5bfc\u81f4\u4e86\u8fc7\u591a\u7684\u63a2\u6d4b\uff09\uff1fBuild \u9636\u6bb5\u662f\u5426\u5728 Spill\uff08\u5185\u5b58\u538b\u529b\uff09\uff1f\u8fd9\u79cd\u7ea7\u522b\u7684\u7ec6\u8282\u5f7b\u5e95\u6539\u53d8\u4e86\u4f60\u8bca\u65ad Join \u6027\u80fd\u7684\u65b9\u5f0f\u3002\n\u5199\u5165\u6307\u6807 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u6d4b\u91cf\u5185\u5bb9 physicalWrittenBytes number of written bytes \u5b9e\u9645\u5199\u5165\u5b58\u50a8\u7684\u5b57\u8282\u6570 writeIOTime \/ writeIONanos time of write IO \u5199\u5165\u8fc7\u7a0b\u4e2d\u7684 I\/O \u65f6\u95f4 numWrittenFiles number of written files \u4ea7\u751f\u7684\u6587\u4ef6\u6570 \u5728 Spark UI \u4e2d\u9605\u8bfb Gluten \u6307\u6807 Gluten \u7684\u6307\u6807\u51fa\u73b0\u5728\u540c\u4e00\u4e2a Spark SQL \u6807\u7b7e\u9875\u4e2d\uff0c\u56e0\u4e3a\u5b83\u4eec\u4f7f\u7528\u76f8\u540c\u7684 SQLMetric \u6846\u67b6\u3002\u7b97\u5b50\u540d\u79f0\u6709\u6240\u53d8\u5316\uff08\u4f8b\u5982 HashAggregateExecTransformer \u66ff\u4ee3\u4e86 HashAggregateExec\uff09\uff0c\u4f46\u5f53\u4f60\u70b9\u51fb\u7b97\u5b50\u8282\u70b9\u65f6\uff0c\u6307\u6807\u4ecd\u51fa\u73b0\u5728\u540c\u4e00\u4e2a\u4fa7\u8fb9\u9762\u677f\u4e2d\u3002\n\u5173\u952e\u89c2\u5bdf\u70b9 \u4ee5\u4e0b\u662f\u9605\u8bfb Gluten \u6307\u6807\u65f6\u9700\u8981\u5173\u6ce8\u7684\u5173\u952e\u6a21\u5f0f\uff1a\n\u5b9a\u4f4d\u74f6\u9888\u7b97\u5b50\uff1a\n\u67e5\u770b\u6bcf\u4e2a\u7b97\u5b50\u4e0a\u7684 wallNanos\u3002\u5728\u5065\u5eb7\u7684\u67e5\u8be2\u4e2d\uff0c\u626b\u63cf\u548c Join \u7b97\u5b50\u901a\u5e38\u5360\u636e\u5927\u90e8\u5206\u65f6\u95f4\u3002\u5982\u679c FilterExecTransformer \u6216 ProjectExecTransformer \u6709\u8f83\u9ad8\u7684 wallNanos\uff0c\u8bf4\u660e\u8fc7\u6ee4\u6216 Project \u8868\u8fbe\u5f0f\u672c\u8eab\u5f00\u9500\u8f83\u5927\u2014\u2014\u8003\u8651\u7b80\u5316\u5b83\u3002\n\u8bca\u65ad\u6162 Join\uff1a\n\u5bf9\u6bd4 hashBuildWallNanos \u548c hashProbeWallNanos\u3002\u5982\u679c Build \u7aef\u5360\u4e3b\u5bfc\u5730\u4f4d\uff0c\u8bf4\u660e\u6784\u5efa\u8f93\u5165\u592a\u5927\u2014\u2014\u8003\u8651\u6539\u53d8 Join \u987a\u5e8f\u6216\u6dfb\u52a0\u8fc7\u6ee4\u6761\u4ef6\u6765\u51cf\u5c0f Build \u7aef\u3002\u5982\u679c Probe \u7aef\u5360\u4e3b\u5bfc\u5730\u4f4d\uff0c\u67e5\u770b hashProbeInputRows\u2014\u2014\u8fc7\u591a\u7684\u63a2\u6d4b\u884c\u6216\u54c8\u5e0c\u51b2\u7a81\u53ef\u80fd\u662f\u539f\u56e0\u3002\n\u68c0\u67e5\u539f\u751f\u8c13\u8bcd\u4e0b\u63a8\uff1a\n\u5982\u679c skippedSplits &gt; 0\uff0c\u8bf4\u660e\u539f\u751f\u6587\u4ef6\u7ea7\u88c1\u526a\u6b63\u5728\u751f\u6548\u3002\u5982\u679c skippedStrides &gt; 0\uff0c\u8bf4\u660e\u6587\u4ef6\u5185\u884c\u7ec4\u6216\u6761\u5e26\u7ea7\u88c1\u526a\u6b63\u5728\u751f\u6548\u3002\u5982\u679c\u4e24\u8005\u90fd\u4e3a\u96f6\uff0c\u8bf4\u660e\u8c13\u8bcd\u6ca1\u6709\u88ab\u4e0b\u63a8\u5230\u539f\u751f\u626b\u63cf\u4e2d\u2014\u2014\u68c0\u67e5\u5217\u7c7b\u578b\u662f\u5426\u652f\u6301\u4e0b\u63a8\u3002\n\u9a8c\u8bc1\u8fd0\u884c\u65f6\u8fc7\u6ee4\u5668\u6548\u679c\uff1a\n\u5982\u679c numDynamicFiltersAccepted &gt; 0\uff0c\u8bf4\u660e\u6765\u81ea Join Build \u7aef\u7684\u8fd0\u884c\u65f6\u8fc7\u6ee4\u5668\u6b63\u5728\u88ab\u5e94\u7528\u5230\u626b\u63cf\u4e2d\u3002\u67e5\u770b numReplacedWithDynamicFilterRows \u4ee5\u4e86\u89e3\u6d88\u9664\u4e86\u591a\u5c11\u884c\u2014\u2014\u6570\u5b57\u8d8a\u5927\u610f\u5473\u7740 I\/O \u8282\u7701\u8d8a\u591a\u3002\n\u68c0\u6d4b\u539f\u751f\u5f15\u64ce\u4e2d\u7684\u5185\u5b58\u538b\u529b\uff1a\n\u5982\u679c\u4efb\u4f55\u7b97\u5b50\u4e0a spilledBytes &gt; 0\uff0c\u8bf4\u660e\u539f\u751f\u5f15\u64ce\u6b63\u5728 Spill \u5230\u78c1\u76d8\u3002\u5bf9\u4e8e Join\uff0c\u68c0\u67e5\u662f Build \u9636\u6bb5\u8fd8\u662f Probe \u9636\u6bb5\u5728 Spill\u3002\u5bf9\u4e8e\u805a\u5408\uff0cSpill \u610f\u5473\u7740\u5206\u7ec4\u57fa\u6570\u5f88\u9ad8\u3002\u8003\u8651\u589e\u52a0\u539f\u751f\u5185\u5b58\u5206\u914d\u6216\u51cf\u5c11\u6570\u636e\u91cf\u3002\nI\/O \u5c42\u7ea7\u5206\u6790\uff1a\n\u5bf9\u6bd4 Scan \u7b97\u5b50\u4e0a\u7684 storageReadBytes\u3001localReadBytes \u548c ramReadBytes\u3002\u5728\u7f13\u5b58\u826f\u597d\u7684\u73af\u5883\u4e2d\uff0c\u4f60\u5e0c\u671b\u5927\u90e8\u5206\u8bfb\u53d6\u6765\u81ea ramReadBytes \u6216 localReadBytes\u3002\u9ad8 storageReadBytes \u610f\u5473\u7740\u4f60\u5728\u4ece\u8fdc\u7a0b\u5b58\u50a8\uff08S3\u3001HDFS\uff09\u8bfb\u53d6\u2014\u2014\u68c0\u67e5\u7f13\u5b58\u5c42\u7684\u914d\u7f6e\u662f\u5426\u6b63\u786e\u3002\n\u901a\u8fc7 spark-history-cli \u8bbf\u95ee Gluten \u6307\u6807\u4e5f\u53ef\u4ee5\u901a\u8fc7 REST API \u548c spark-history-cli \u83b7\u53d6\uff0c\u56e0\u4e3a\u5b83\u4eec\u4ee5\u6807\u51c6 SQLMetric \u503c\u7684\u5f62\u5f0f\u5b58\u50a8\uff1a\nspark-history-cli --json -a &lt;app&gt; sql &lt;id&gt; # \u5305\u542b Gluten \u6307\u6807 JSON \u8f93\u51fa\u5c06\u5305\u542b\u6240\u6709 Gluten \u7279\u6709\u7684\u6307\u6807\u4ee5\u53ca\u539f\u751f Spark \u6307\u6807\uff0c\u4f7f\u7528\u7b2c\u4e09\u90e8\u5206\u4e2d\u63cf\u8ff0\u7684\u76f8\u540c {name, value} \u683c\u5f0f\u3002\n\u67b6\u6784\u542f\u793a Gluten \u7684\u6307\u6807\u7cfb\u7edf\u4e3a\u6269\u5c55 Spark \u7684\u53ef\u89c2\u6d4b\u6027\u63d0\u4f9b\u4e86\u51e0\u4e2a\u91cd\u8981\u542f\u793a\uff1a\n\u5f15\u64ce\u66ff\u6362\u81ea\u7136\u5e26\u6765\u5168\u9762\u6307\u6807\u3002 \u5f53\u5f15\u64ce\u63a7\u5236\u6bcf\u4e2a\u7b97\u5b50\u7684\u6267\u884c\u65f6\uff0c\u5b83\u53ef\u4ee5\u6d4b\u91cf\u6bcf\u4e00\u4e2a\u8fb9\u754c\u3002\u6bcf\u4e2a C++ \u7b97\u5b50\u90fd\u662f\u72ec\u7acb\u7684\u51fd\u6570\u8c03\u7528\uff0c\u62e5\u6709\u81ea\u5df1\u7684\u5f00\u59cb\u548c\u7ed3\u675f\u65f6\u95f4\u6233\u3002\nMetricsUpdater \u6a21\u5f0f\u662f\u53ef\u590d\u7528\u7684\u3002 \u4efb\u4f55\u539f\u751f\u540e\u7aef\u90fd\u53ef\u4ee5\u91c7\u7528\u8fd9\u4e00\u6a21\u5f0f\uff1a\u5b9a\u4e49\u4e00\u68f5\u8f7b\u91cf\u7ea7\u7684\u3001\u53ef\u5e8f\u5217\u5316\u7684 updater \u5bf9\u8c61\u6811\u6765\u955c\u50cf\u8ba1\u5212\uff0c\u901a\u8fc7 JNI \u4f20\u8f93\u6279\u91cf\u6307\u6807\u6570\u7ec4\uff0c\u7136\u540e\u904d\u5386\u6811\u6765\u66f4\u65b0 SQLMetric Accumulator\u3002\n\u57fa\u4e8e JNI \u6570\u7ec4\u7684\u4f20\u8f93\u5c06\u5f00\u9500\u964d\u5230\u6700\u4f4e\u3002 Gluten \u6ca1\u6709\u4e3a\u6bcf\u6b21\u6307\u6807\u66f4\u65b0\u90fd\u56de\u8c03 JVM\uff0c\u800c\u662f\u5c06\u6240\u6709\u6307\u6807\u6279\u91cf\u6253\u5305\u5230 long[] \u6570\u7ec4\u4e2d\u2014\u2014\u6bcf\u4e2a\u4efb\u52a1\u4e00\u6b21\u6279\u91cf JNI \u4f20\u8f93\u3002\u5373\u4f7f\u6bcf\u4e2a\u7b97\u5b50\u6709 60+ \u4e2a\u6307\u6807\uff0c\u6307\u6807\u5f00\u9500\u4e5f\u53ef\u4ee5\u5ffd\u7565\u4e0d\u8ba1\u3002\n\u901a\u8fc7 MetricsApi \u5b9e\u73b0\u540e\u7aef\u65e0\u5173\u8bbe\u8ba1\u3002 MetricsApi \u62bd\u8c61\u610f\u5473\u7740 Velox \u540e\u7aef\u548c ClickHouse \u540e\u7aef\u53ef\u4ee5\u4e3a\u540c\u4e00\u7b97\u5b50\u7c7b\u578b\u5b9a\u4e49\u5b8c\u5168\u4e0d\u540c\u7684\u6307\u6807\u3002\u6dfb\u52a0\u65b0\u7684\u540e\u7aef\uff08\u6bd4\u5982 DataFusion\uff09\u53ea\u9700\u5b9e\u73b0 MetricsApi \u63a5\u53e3\u2014\u2014\u65e0\u9700\u4fee\u6539\u6838\u5fc3\u6865\u63a5\u4ee3\u7801\u3002\n\u5728\u7b2c\u4e00\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u4ecb\u7ecd\u4e86\u4e94\u79cd\u6307\u6807\u7c7b\u578b\u548c\u5b8c\u6574\u53c2\u8003\u3002\u5728\u7b2c\u4e8c\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u8ffd\u8e2a\u4e86\u5185\u90e8\u751f\u547d\u5468\u671f\u548c AQE \u5bf9 Shuffle \u7edf\u8ba1\u4fe1\u606f\u7684\u4f7f\u7528\u3002\u5728\u7b2c\u4e09\u90e8\u5206\u4e2d\uff0c\u6211\u4eec\u63a2\u7d22\u4e86\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API\u3002\u672c\u52a0\u9910\u7bc7\u5206\u6790\u4e86 Apache Gluten \u5982\u4f55\u901a\u8fc7\u5c06\u539f\u751f\u5f15\u64ce\u6307\u6807\u6865\u63a5\u56de Spark \u6846\u67b6\u6765\u6269\u5c55\u6307\u6807\u7cfb\u7edf\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/sql-metrics-part4-gluten\/","summary":"SQL Metrics \u7cfb\u5217\u7684\u7b2c\u56db\u90e8\u5206\u3002Apache Gluten \u5982\u4f55\u5c06 Velox\/ClickHouse \u539f\u751f\u6307\u6807\u6865\u63a5\u56de Spark SQL Metrics \u6846\u67b6\uff0c\u6dfb\u52a0\u4e86 60+ \u4e2a\u539f\u751f Spark \u6ca1\u6709\u7684\u6307\u6807\u3002","title":"\u6df1\u5165 Spark SQL Metrics\uff08\u7b2c\u56db\u90e8\u5206\uff09\uff1aGluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf"},{"content":"\u8fd9\u662f Spark SQL Metrics \u6df1\u5ea6\u89e3\u6790\u7684\u4e09\u90e8\u66f2\uff1a\n\u7b2c\u4e00\u90e8\u5206\uff08\u672c\u6587\uff09\uff1a\u6307\u6807\u7c7b\u578b\u3001\u5b8c\u6574\u53c2\u8003\u548c\u542b\u4e49 \u7b2c\u4e8c\u90e8\u5206\uff1a\u5185\u90e8\u5b9e\u73b0\u673a\u5236\uff0c\u4ee5\u53ca AQE \u5982\u4f55\u5229\u7528\u6307\u6807\u505a\u51fa\u8fd0\u884c\u65f6\u51b3\u7b56 \u7b2c\u4e09\u90e8\u5206\uff1a\u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API \u7b2c\u56db\u90e8\u5206\uff1aGluten \u5982\u4f55\u6269\u5c55\u6307\u6807\u7cfb\u7edf \u4ec0\u4e48\u662f SQL Metrics\uff1f Spark SQL \u7684\u6bcf\u4e2a\u7269\u7406\u7b97\u5b50\u90fd\u53ef\u4ee5\u5b9a\u4e49 metrics\u2014\u2014\u5728\u67e5\u8be2\u6267\u884c\u8fc7\u7a0b\u4e2d\u8ddf\u8e2a\u5404\u79cd\u8ba1\u6570\u7684\u6307\u6807\u3002\u5f53\u4f60\u5728 SQL \u6807\u7b7e\u9875\u70b9\u51fb\u4e00\u4e2a\u67e5\u8be2\uff0c\u770b\u5230 &ldquo;number of output rows: 5,000&rdquo; \u6216 &ldquo;peak memory: 512.0 MiB&rdquo;\uff0c\u90a3\u4e9b\u5c31\u662f SQL Metrics\u3002\n\u5b83\u4eec\u57fa\u4e8e Spark \u7684 AccumulatorV2 \u6846\u67b6\uff1a\u6bcf\u4e2a\u4efb\u52a1\u66f4\u65b0\u81ea\u5df1\u7684\u672c\u5730\u526f\u672c\uff0c\u4efb\u52a1\u5b8c\u6210\u540e Driver \u8fdb\u884c\u805a\u5408\u3002\n\u4e94\u79cd\u6307\u6807\u7c7b\u578b 1. Sum\uff08createMetric\uff09 \u6700\u7b80\u5355\u7684\u7c7b\u578b\u3002\u6240\u6709\u4efb\u52a1\u7684\u503c\u6c42\u548c\u4e3a\u5355\u4e00\u603b\u8ba1\u3002\n\u663e\u793a\u683c\u5f0f\uff1a 1,234,567\n\u5178\u578b\u7528\u9014\uff1a \u884c\u6570\u3001\u6587\u4ef6\u6570\u3001\u5206\u533a\u6570\u3002\n2. Size\uff08createSizeMetric\uff09 \u7528\u4e8e\u5b57\u8282\u91cf\u5ea6\u3002\u663e\u793a\u603b\u8ba1\u52a0\u4e0a\u6bcf\u4efb\u52a1\u7684\u5206\u5e03\u3002\n\u663e\u793a\u683c\u5f0f\uff1a total (min, med, max): 512.0 MiB (128.0 MiB, 128.0 MiB, 128.0 MiB)\n\u5178\u578b\u7528\u9014\uff1a \u5cf0\u503c\u5185\u5b58\u3001Spill \u5927\u5c0f\u3001\u6570\u636e\u5927\u5c0f\u3001Shuffle \u5b57\u8282\u6570\u3002\n(min, med, max) \u5206\u5e03\u5bf9\u4e8e\u68c0\u6d4b\u6570\u636e\u503e\u659c\u81f3\u5173\u91cd\u8981\u2014\u2014\u5982\u679c max \u662f median \u7684 10 \u500d\uff0c\u8bf4\u660e\u6709\u6389\u961f\u4efb\u52a1\u3002\n3. Timing\uff08createTimingMetric\uff09 \u7528\u4e8e\u6beb\u79d2\u7ea7\u8017\u65f6\u3002\u663e\u793a\u603b\u8ba1\u52a0\u4e0a\u6bcf\u4efb\u52a1\u5206\u5e03\u3002\n\u663e\u793a\u683c\u5f0f\uff1a total (min, med, max): 5.0 s (100 ms, 1.2 s, 2.0 s)\n\u5178\u578b\u7528\u9014\uff1a \u805a\u5408\u65f6\u95f4\u3001\u6392\u5e8f\u65f6\u95f4\u3001\u5e7f\u64ad\u65f6\u95f4\u3002\n4. NsTiming\uff08createNanoTimingMetric\uff09 \u4e0e Timing \u76f8\u540c\u4f46\u63a5\u53d7\u7eb3\u79d2\u503c\uff0c\u663e\u793a\u65f6\u81ea\u52a8\u8f6c\u6362\u4e3a\u6beb\u79d2\u3002\n\u5178\u578b\u7528\u9014\uff1a Shuffle \u5199\u5165\u65f6\u95f4\u3002\n5. Average\uff08createAverageMetric\uff09 \u7528\u4e8e\u6bcf\u4efb\u52a1\u5e73\u5747\u503c\uff0c\u663e\u793a\u5e73\u5747\u503c\u5728\u5404\u4efb\u52a1\u95f4\u7684\u5206\u5e03\u3002\n\u663e\u793a\u683c\u5f0f\uff1a avg (min, med, max): (1.2, 2.5, 6.3)\n\u5178\u578b\u7528\u9014\uff1a \u54c8\u5e0c\u63a2\u6d4b\u6548\u7387\u3002\n\u5982\u4f55\u89e3\u8bfb &ldquo;total (min, med, max)&rdquo; \u683c\u5f0f peak memory total (min, med, max) 512.0 MiB (128.0 MiB, 128.0 MiB, 128.0 MiB (stage 3.0: task 36)) \u5b57\u6bb5 \u542b\u4e49 total \u6240\u6709\u4efb\u52a1\u7684\u603b\u548c min \u6700\u5c0f\u7684\u4efb\u52a1\u503c med \u4e2d\u4f4d\u6570\uff08\u7b2c 50 \u767e\u5206\u4f4d\uff09 max \u6700\u5927\u7684\u4efb\u52a1\u503c\uff0c\u6807\u6ce8 (stage X: task Y) \u8d1f\u8f7d\u5747\u8861\u65f6\uff1a min \u2248 med \u2248 max\n\u6570\u636e\u503e\u659c\u65f6\uff1a max &raquo; med\u2014\u2014\u68c0\u67e5\u6807\u6ce8\u7684\u90a3\u4e2a\u4efb\u52a1\n\u5b8c\u6574 SQL Metrics \u53c2\u8003 Scan \u7b97\u5b50 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b \u7b97\u5b50 numOutputRows number of output rows sum \u6240\u6709 Scan \u7b97\u5b50 numFiles number of files read sum DataSourceScanExec filesSize size of files read size DataSourceScanExec scanTime scan time timing DataSourceScanExec metadataTime metadata time timing DataSourceScanExec pruningTime dynamic partition pruning time timing DataSourceScanExec \u805a\u5408\u7b97\u5b50 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b \u7b97\u5b50 numOutputRows number of output rows sum \u6240\u6709\u805a\u5408\u7b97\u5b50 aggTime time in aggregation build timing HashAggregateExec, ObjectHashAggregateExec, SortAggregateExec peakMemory peak memory size HashAggregateExec spillSize spill size size HashAggregateExec, ObjectHashAggregateExec avgHashProbe avg hash probes per key average HashAggregateExec Join \u7b97\u5b50 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b \u7b97\u5b50 numOutputRows number of output rows sum \u6240\u6709 Join \u7b97\u5b50 buildDataSize data size of build side size ShuffledHashJoinExec buildTime time to build hash map timing ShuffledHashJoinExec spillSize spill size size SortMergeJoinExec Sort \u7b97\u5b50 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b \u7b97\u5b50 sortTime sort time timing SortExec peakMemory peak memory size SortExec spillSize spill size size SortExec Shuffle \u5199\u5165 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b dataSize data size size shuffleBytesWritten shuffle bytes written size shuffleRecordsWritten shuffle records written sum shuffleWriteTime shuffle write time nsTiming Shuffle \u8bfb\u53d6\uff08AQEShuffleReadExec\uff09 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b partitionDataSize partition data size size numCoalescedPartitions number of coalesced partitions sum numSkewedPartitions number of skewed partitions sum numSkewedSplits number of skewed partition splits sum fetchWaitTime fetch wait time timing remoteBytesRead remote bytes read size localBytesRead local bytes read size Broadcast Exchange \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b dataSize data size size collectTime time to collect timing buildTime time to build timing broadcastTime time to broadcast timing Python UDF \u7b97\u5b50 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b pythonDataSent data sent to Python workers size pythonDataReceived data returned from Python workers size pythonBootTime time to start Python workers timing pythonInitTime time to initialize Python workers timing pythonTotalTime time to run Python workers timing pythonProcessingTime time to execute Python code timing \u5199\u5165\u7b97\u5b50 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b numFiles number of written files sum numOutputBytes written output size taskCommitTime task commit time timing jobCommitTime job commit time timing MERGE INTO \u7b97\u5b50 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b numTargetRowsInserted target rows inserted sum numTargetRowsUpdated target rows updated sum numTargetRowsDeleted target rows deleted sum numTargetRowsCopied target rows copied unmodified sum \u6709\u72b6\u6001\u6d41\u5904\u7406\u7b97\u5b50 \u6307\u6807 \u663e\u793a\u540d\u79f0 \u7c7b\u578b numTotalStateRows number of total state rows sum stateMemory memory used by state size allUpdatesTimeMs time to update timing allRemovalsTimeMs time to remove timing commitTimeMs time to commit changes timing WholeStageCodegen \u4e0e\u6307\u6807\u8303\u56f4 \u5927\u591a\u6570\u7b97\u5b50\u88ab WholeStageCodegen \u878d\u5408\u6210\u5355\u4e00 JVM \u65b9\u6cd5\u3002\u5b83\u4eec\u7684\u884c\u6570\u6307\u6807\uff08numOutputRows\uff09\u5404\u81ea\u51c6\u786e\uff0c\u4f46\u6ca1\u6709\u5404\u81ea\u7684\u8ba1\u65f6\u2014\u2014\u56e0\u4e3a\u5b83\u4eec\u4f5c\u4e3a\u4e00\u4e2a\u7f16\u8bd1\u51fd\u6570\u6267\u884c\u3002\n\u5728\u4ee3\u7801\u751f\u6210\u7ba1\u9053\u4e4b\u5916\u6709\u72ec\u7acb\u6267\u884c\u9636\u6bb5\u5e76\u5177\u6709\u72ec\u7acb\u8ba1\u65f6\u7684\u7b97\u5b50\uff1a\nSortExec\uff08\u6392\u5e8f\u65f6\u95f4\uff09 \u805a\u5408\u7b97\u5b50\uff08\u805a\u5408\u6784\u5efa\u65f6\u95f4\uff09 ShuffledHashJoinExec\uff08Hash Table \u6784\u5efa\u65f6\u95f4\uff09 BroadcastExchangeExec\uff08\u6536\u96c6\/\u6784\u5efa\/\u5e7f\u64ad\u65f6\u95f4\uff09 ShuffleExchangeExec\uff08Shuffle \u5199\u5165\u65f6\u95f4\uff09 Python UDF \u7b97\u5b50\uff08Python \u5de5\u4f5c\u5668\u65f6\u95f4\uff09 \u6709\u72b6\u6001\u6d41\u5904\u7406\u7b97\u5b50\uff08\u66f4\u65b0\/\u5220\u9664\/\u63d0\u4ea4\u65f6\u95f4\uff09 \u7b2c\u4e8c\u90e8\u5206\u5c06\u6df1\u5165 SQL Metrics \u7684\u5185\u90e8\u5b9e\u73b0\u673a\u5236\uff08AccumulatorV2 \u751f\u547d\u5468\u671f\uff09\uff0c\u4ee5\u53ca AQE \u5982\u4f55\u5229\u7528 Shuffle \u7edf\u8ba1\u4fe1\u606f\u5728\u8fd0\u884c\u65f6\u91cd\u5199\u67e5\u8be2\u8ba1\u5212\u3002\u7b2c\u4e09\u90e8\u5206\u5c06\u4ecb\u7ecd DataSource V2 CustomMetric \u6269\u5c55 API\u3001UI \u6e32\u67d3\u548c REST API\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/understanding-sql-metrics\/","summary":"Spark SQL Metrics \u4e09\u90e8\u66f2\u7684\u7b2c\u4e00\u90e8\u5206\u3002\u6db5\u76d6 5 \u79cd\u6307\u6807\u7c7b\u578b\u3001100+ \u6307\u6807\u7684\u5b8c\u6574\u53c2\u8003\uff0c\u4ee5\u53ca\u5982\u4f55\u6b63\u786e\u89e3\u8bfb Spark UI \u4e2d\u7684\u6307\u6807\u6570\u5b57\u3002","title":"\u6df1\u5165 Spark SQL Metrics\uff08\u7b2c\u4e00\u90e8\u5206\uff09\uff1a\u7c7b\u578b\u3001\u5b8c\u6574\u53c2\u8003\u548c\u542b\u4e49"},{"content":"\u4ece DAG \u5230\u58f0\u660e\u5f0f\uff1a\u4e00\u573a\u8303\u5f0f\u8f6c\u53d8 \u6570\u636e\u5de5\u7a0b\u5e08\u6700\u719f\u6089\u7684\u5de5\u4f5c\u6a21\u5f0f\u662f\u4ec0\u4e48\uff1f\u5199\u4e00\u4e2a DAG\u3002\u5b9a\u4e49\u4efb\u52a1 A\u3001\u4efb\u52a1 B\u3001\u4efb\u52a1 C\uff0c\u7136\u540e\u753b\u51fa\u5b83\u4eec\u4e4b\u95f4\u7684\u4f9d\u8d56\u5173\u7cfb\u3002Airflow \u662f\u8fd9\u6837\uff0cDagster \u662f\u8fd9\u6837\uff0c\u5927\u591a\u6570\u7f16\u6392\u5de5\u5177\u90fd\u662f\u8fd9\u6837\u3002\n\u4f46\u95ee\u9898\u6765\u4e86\uff1a\u4f60\u771f\u6b63\u5173\u5fc3\u7684\u662f\u6570\u636e\u672c\u8eab\uff0c\u8fd8\u662f\u6267\u884c DAG\uff1f\nApache Spark 4.1 \u7ed9\u51fa\u4e86\u4e00\u4e2a\u65b0\u7b54\u6848\uff1aSpark Declarative Pipelines\uff08SDP\uff09\u3002\u4f60\u53ea\u9700\u8981\u58f0\u660e&quot;\u6211\u8981\u4ec0\u4e48\u8868\u3001\u8868\u7684\u5185\u5bb9\u600e\u4e48\u6765&quot;\uff0c\u5269\u4e0b\u7684\u2014\u2014\u4f9d\u8d56\u63a8\u65ad\u3001\u6267\u884c\u987a\u5e8f\u3001\u5e76\u884c\u5316\u3001\u9519\u8bef\u5904\u7406\u3001\u589e\u91cf\u66f4\u65b0\u2014\u2014\u5168\u90e8\u4ea4\u7ed9\u6846\u67b6\u3002\n\u8fd9\u4e0d\u662f\u4e00\u4e2a\u5c0f\u529f\u80fd\u3002\u8fd9\u662f Spark \u751f\u6001\u5bf9\u6570\u636e\u7ba1\u9053\u5f00\u53d1\u65b9\u5f0f\u7684\u6839\u672c\u6027\u91cd\u65b0\u601d\u8003\u3002\n\u4e09\u5206\u949f\u5feb\u901f\u4f53\u9a8c \u5b89\u88c5\u53ea\u9700\u4e00\u884c\uff1a\npip install pyspark[pipelines] \u5199\u4e00\u4e2a\u6700\u7b80\u5355\u7684\u7ba1\u9053\uff1a\nfrom pyspark import pipelines as dp @dp.materialized_view def daily_sales(): return spark.table(&#34;orders&#34;).groupBy(&#34;date&#34;).agg({&#34;amount&#34;: &#34;sum&#34;}) \u8fd0\u884c\uff1a\nspark-pipelines run \u6ca1\u6709 saveAsTable()\u3002\u6ca1\u6709 start()\u3002\u6ca1\u6709 awaitTermination()\u3002\u4f60\u53ea\u662f\u63cf\u8ff0\u4e86&quot;\u6211\u60f3\u8981\u4e00\u4e2a\u6309\u65e5\u671f\u6c47\u603b\u7684\u9500\u552e\u8868&quot;\uff0cSDP \u8d1f\u8d23\u8ba9\u5b83\u5b58\u5728\u3002\n\u6838\u5fc3\u6982\u5ff5 Flow\uff1a\u6570\u636e\u6d41\u52a8\u7684\u6700\u5c0f\u5355\u5143 Flow \u662f SDP \u7684\u57fa\u672c\u6784\u5efa\u5757\u3002\u6bcf\u4e2a Flow \u63cf\u8ff0\u4e86\u4e00\u4e2a\u5b8c\u6574\u7684\u6570\u636e\u6d41\u52a8\u8fc7\u7a0b\uff1a\u4ece\u54ea\u91cc\u8bfb\u3001\u600e\u4e48\u8f6c\u6362\u3001\u5199\u5230\u54ea\u91cc\u3002\nSDP \u6709\u4e24\u79cd Flow \u8bed\u4e49\uff1a\nStreaming Flow \u2192 \u8f93\u51fa\u5230 Streaming Table\uff08\u589e\u91cf\u5904\u7406\uff09 Batch Flow \u2192 \u8f93\u51fa\u5230 Materialized View \u6216 Temporary View Dataset\uff1a\u4f60\u771f\u6b63\u5173\u5fc3\u7684\u4e1c\u897f Dataset \u662f Flow \u7684\u8f93\u51fa\uff0c\u4e5f\u662f\u7ba1\u9053\u4e2d\u53ef\u67e5\u8be2\u7684\u5bf9\u8c61\u3002SDP \u63d0\u4f9b\u4e09\u79cd Dataset \u7c7b\u578b\uff1a\nStreaming Table \u2014\u2014 \u6301\u7eed\u589e\u91cf\u66f4\u65b0\u7684\u8868\uff0c\u9002\u5408\u4ece Kafka \u7b49\u6d88\u606f\u7cfb\u7edf\u6444\u5165\u6570\u636e\uff1a\n@dp.table def raw_events(): return ( spark.readStream.format(&#34;kafka&#34;) .option(&#34;kafka.bootstrap.servers&#34;, &#34;localhost:9092&#34;) .option(&#34;subscribe&#34;, &#34;events&#34;) .load() ) Materialized View \u2014\u2014 \u9884\u8ba1\u7b97\u7684\u6279\u5904\u7406\u8868\uff0c\u5b8c\u6574\u5237\u65b0\uff1a\n@dp.materialized_view def hourly_metrics(): return ( spark.table(&#34;raw_events&#34;) .groupBy(window(&#34;timestamp&#34;, &#34;1 hour&#34;)) .agg(count(&#34;*&#34;).alias(&#34;event_count&#34;)) ) Temporary View \u2014\u2014 \u7ba1\u9053\u6267\u884c\u671f\u95f4\u7684\u4e2d\u95f4\u7ed3\u679c\uff0c\u4e0d\u6301\u4e45\u5316\uff0c\u4f46\u8ba9\u4f9d\u8d56\u56fe\u66f4\u6e05\u6670\uff1a\n@dp.temporary_view def cleaned_events(): return spark.table(&#34;raw_events&#34;).filter(&#34;event_type IS NOT NULL&#34;) \u4f9d\u8d56\u81ea\u52a8\u63a8\u65ad \u8fd9\u662f SDP \u6700\u4f18\u96c5\u7684\u8bbe\u8ba1\u4e4b\u4e00\u3002\u4f60\u4e0d\u9700\u8981\u663e\u5f0f\u58f0\u660e &ldquo;hourly_metrics \u4f9d\u8d56 raw_events&rdquo;\u2014\u2014SDP \u5206\u6790\u4f60\u7684\u67e5\u8be2\u903b\u8f91\uff0c\u53d1\u73b0 spark.table(&quot;raw_events&quot;) \u7684\u8c03\u7528\uff0c\u81ea\u52a8\u6784\u5efa\u4f9d\u8d56\u56fe\u3002\nraw_events (Streaming Table) \u2193 cleaned_events (Temporary View) \u2193 hourly_metrics (Materialized View) SQL \u539f\u751f\u652f\u6301 SDP \u4e0d\u4ec5\u652f\u6301 Python\uff0c\u8fd8\u539f\u751f\u652f\u6301 SQL\u3002\u540c\u6837\u7684\u7ba1\u9053\u53ef\u4ee5\u8fd9\u6837\u5199\uff1a\nCREATE STREAMING TABLE raw_events AS SELECT * FROM STREAM kafka_source; CREATE TEMPORARY VIEW cleaned_events AS SELECT * FROM raw_events WHERE event_type IS NOT NULL; CREATE MATERIALIZED VIEW hourly_metrics AS SELECT window(timestamp, &#39;1 hour&#39;), count(*) AS event_count FROM cleaned_events GROUP BY 1; \u5bf9\u4e8e\u4ee5 SQL \u4e3a\u4e3b\u7684\u56e2\u961f\uff0c\u8fd9\u610f\u5473\u7740\u96f6\u5b66\u4e60\u6210\u672c\u3002\n\u6279\u6d41\u6df7\u5408\uff1a\u4e00\u4e2a\u56fe\u641e\u5b9a \u4f20\u7edf\u505a\u6cd5\u4e2d\uff0c\u6279\u5904\u7406\u548c\u6d41\u5904\u7406\u662f\u4e24\u5957\u72ec\u7acb\u7684\u7ba1\u9053\u3002SDP \u5141\u8bb8\u4f60\u5728\u540c\u4e00\u4e2a\u4f9d\u8d56\u56fe\u4e2d\u6df7\u5408\u4f7f\u7528\uff1a\n# \u6d41\u5f0f\u6444\u5165 @dp.table def orders(): return spark.readStream.format(&#34;kafka&#34;)... # \u6279\u5904\u7406\u805a\u5408\uff08\u8bfb\u53d6\u4e0a\u9762\u7684\u6d41\u8868\uff09 @dp.materialized_view def daily_summary(): return spark.table(&#34;orders&#34;).groupBy(&#34;date&#34;).count() SDP \u81ea\u52a8\u7ba1\u7406\u89e6\u53d1\u5668\u3001\u8c03\u5ea6\u548c\u68c0\u67e5\u70b9\uff0c\u4f60\u4e0d\u9700\u8981\u5173\u5fc3\u5e95\u5c42\u7684 Structured Streaming \u673a\u5236\u3002\n\u591a Flow \u5199\u5165\u540c\u4e00\u76ee\u6807 \u4e00\u4e2a\u5e38\u89c1\u573a\u666f\uff1a\u4f60\u6709\u591a\u4e2a\u6570\u636e\u6e90\u9700\u8981\u5199\u5165\u540c\u4e00\u5f20\u8868\u3002SDP \u901a\u8fc7 append_flow \u4f18\u96c5\u5730\u89e3\u51b3\uff1a\ndp.create_streaming_table(&#34;all_orders&#34;) @dp.append_flow(target=&#34;all_orders&#34;) def us_orders(): return spark.readStream.table(&#34;orders_us&#34;) @dp.append_flow(target=&#34;all_orders&#34;) def eu_orders(): return spark.readStream.table(&#34;orders_eu&#34;) \u5de5\u7a0b\u5316\uff1a\u9879\u76ee\u7ed3\u6784\u548c CLI SDP \u63d0\u4f9b\u4e86\u5b8c\u6574\u7684\u9879\u76ee\u7ed3\u6784\u548c\u547d\u4ee4\u884c\u5de5\u5177\uff1a\n# \u521d\u59cb\u5316\u9879\u76ee spark-pipelines init --name my_pipeline # \u9a8c\u8bc1\u7ba1\u9053\uff08\u4e0d\u8bfb\u5199\u6570\u636e\uff09 spark-pipelines dry-run # \u6267\u884c\u7ba1\u9053 spark-pipelines run \u9879\u76ee\u901a\u8fc7 spark-pipeline.yml \u914d\u7f6e\uff1a\nname: my_pipeline libraries: - glob: include: transformations\/** catalog: my_catalog database: my_db configuration: spark.sql.shuffle.partitions: &#34;1000&#34; dry-run \u7279\u522b\u503c\u5f97\u4e00\u63d0\u2014\u2014\u5b83\u80fd\u5728\u4e0d\u8bfb\u5199\u4efb\u4f55\u6570\u636e\u7684\u60c5\u51b5\u4e0b\u6355\u83b7\u8bed\u6cd5\u9519\u8bef\u3001\u5206\u6790\u9519\u8bef\u548c\u5faa\u73af\u4f9d\u8d56\uff0c\u8fd9\u5bf9 CI\/CD \u96c6\u6210\u975e\u5e38\u53cb\u597d\u3002\n\u4e0e\u7f16\u6392\u5de5\u5177\u7684\u5173\u7cfb SDP \u4e0d\u662f\u8981\u66ff\u4ee3 Airflow \u6216 Dagster\u3002\u5b83\u4e13\u6ce8\u4e8e Spark \u5c42\u9762\u7684\u6570\u636e\u8f6c\u6362\u548c\u4f9d\u8d56\u7ba1\u7406\u3002\u5728\u5b9e\u9645\u751f\u4ea7\u4e2d\uff0c\u4e00\u4e2a\u5178\u578b\u7684\u67b6\u6784\u662f\uff1a\nAirflow\/Dagster\uff08\u9876\u5c42\u7f16\u6392\uff09 \u251c\u2500\u2500 \u89e6\u53d1 SDP \u7ba1\u9053\uff08\u6570\u636e\u8f6c\u6362\uff09 \u251c\u2500\u2500 \u8c03\u7528\u5916\u90e8 API \u251c\u2500\u2500 \u53d1\u9001\u901a\u77e5 \u2514\u2500\u2500 \u5176\u4ed6\u975e Spark \u4efb\u52a1 SDP \u5904\u7406\u6570\u636e\u8f6c\u6362\u7684\u91cd\u6d3b\uff0c\u7f16\u6392\u5de5\u5177\u5904\u7406\u7aef\u5230\u7aef\u7684\u5de5\u4f5c\u6d41\u3002\n\u6211\u7684\u770b\u6cd5 \u4f5c\u4e3a Spark PMC \u6210\u5458\uff0c\u6211\u8ba4\u4e3a SDP \u89e3\u51b3\u4e86\u51e0\u4e2a\u957f\u671f\u5b58\u5728\u7684\u75db\u70b9\uff1a\n\u964d\u4f4e\u5165\u95e8\u95e8\u69db\u3002\u65b0\u624b\u4e0d\u9700\u8981\u7406\u89e3 Structured Streaming \u7684 checkpoint\u3001trigger\u3001outputMode \u7b49\u6982\u5ff5\u5c31\u80fd\u5199\u51fa\u53ef\u9760\u7684\u6d41\u5f0f\u7ba1\u9053\u3002\n\u51cf\u5c11\u6837\u677f\u4ee3\u7801\u3002\u4e0d\u518d\u9700\u8981 writeStream.format().option().start().awaitTermination() \u8fd9\u6837\u7684\u4eea\u5f0f\u6027\u4ee3\u7801\u3002\n\u7edf\u4e00\u6279\u6d41\u3002\u540c\u4e00\u5957\u58f0\u660e\u5f0f API\uff0c\u540c\u4e00\u4e2a\u4f9d\u8d56\u56fe\uff0cbatch \u548c streaming \u4e0d\u518d\u662f\u4e24\u4e2a\u4e16\u754c\u3002\nAI \u53cb\u597d\u3002\u58f0\u660e\u5f0f\u7684 Flow \u672c\u8d28\u4e0a\u662f\u51fd\u6570\uff0c\u53ef\u4ee5\u88ab\u6d4b\u8bd5\u3001\u88ab\u8c03\u7528\u3001\u88ab AI \u7f16\u7a0b\u52a9\u624b\u7406\u89e3\u548c\u751f\u6210\u3002\u8fd9\u5bf9 AI \u8f85\u52a9\u6570\u636e\u5de5\u7a0b\u610f\u4e49\u91cd\u5927\u3002\nSDP \u7684\u8bbe\u8ba1\u601d\u8def\u6e90\u81ea Databricks \u5728\u751f\u4ea7\u73af\u5883\u4e2d\u9a8c\u8bc1\u8fc7\u7684 Delta Live Tables\uff08DLT\uff09\u6a21\u5f0f\uff0c\u73b0\u5728\u88ab\u5e26\u5165\u4e86\u5f00\u6e90 Spark\u3002\u8fd9\u610f\u5473\u7740\u6574\u4e2a\u793e\u533a\u90fd\u80fd\u53d7\u76ca\u4e8e\u8fd9\u4e9b\u7ecf\u8fc7\u5927\u89c4\u6a21\u9a8c\u8bc1\u7684\u6700\u4f73\u5b9e\u8df5\u3002\n\u4e0a\u624b\u8bd5\u8bd5 pip install pyspark[pipelines] spark-pipelines init --name hello_sdp cd hello_sdp spark-pipelines run \u66f4\u591a\u5185\u5bb9\u8bf7\u53c2\u9605\u5b98\u65b9\u7f16\u7a0b\u6307\u5357\uff1aSpark Declarative Pipelines Programming Guide\nSpark Declarative Pipelines \u5728 Apache Spark 4.1 \u4e2d\u5f15\u5165\uff0c\u76f8\u5173\u8bbe\u8ba1\u6587\u6863\u89c1 SPARK-51727\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/spark-declarative-pipelines\/","summary":"Apache Spark 4.1 \u5f15\u5165\u4e86 Spark Declarative Pipelines\uff08SDP\uff09\uff0c\u4e00\u4e2a\u5168\u65b0\u7684\u58f0\u660e\u5f0f\u6570\u636e\u7ba1\u9053\u6846\u67b6\u3002\u4f5c\u4e3a Spark PMC \u6210\u5458\uff0c\u6211\u6765\u89e3\u8bfb\u8fd9\u4e2a\u6846\u67b6\u7684\u8bbe\u8ba1\u54f2\u5b66\u3001\u6838\u5fc3\u6982\u5ff5\uff0c\u4ee5\u53ca\u5b83\u5982\u4f55\u6539\u53d8\u6570\u636e\u5de5\u7a0b\u7684\u5f00\u53d1\u65b9\u5f0f\u3002","title":"Spark Declarative Pipelines\uff1a\u6570\u636e\u7ba1\u9053\u7684\u58f0\u660e\u5f0f\u9769\u547d"},{"content":"\u6bcf\u4e2a Spark \u5de5\u7a0b\u5e08\u90fd\u7ecf\u5386\u8fc7\u8fd9\u79cd\u573a\u666f\uff1a\u6628\u5929\u8fd8\u597d\u597d\u8fd0\u884c\u7684\u4f5c\u4e1a\uff0c\u4eca\u5929\u7a81\u7136\u6162\u4e86 3 \u500d\u3002\u6709\u4eba\u95ee&quot;\u8fd9\u4e2a\u67e5\u8be2\u4e3a\u4ec0\u4e48\u6162\uff1f&quot;\uff0c\u7136\u540e\u4f60\u82b1\u4e86\u4e00\u4e2a\u5c0f\u65f6\u5728 Spark History Server UI \u4e0a\u70b9\u6765\u70b9\u53bb\uff0c\u7528\u8089\u773c\u5bf9\u6bd4\u5404\u4e2a\u6807\u7b7e\u9875\u7684\u6570\u5b57\uff0c\u5728\u8111\u4e2d diff \u914d\u7f6e\u3002\n\u5982\u679c\u4f60\u7684 AI \u52a9\u624b\u80fd\u5e2e\u4f60\u505a\u8fd9\u4e9b\u5462\uff1f\n\u4ec0\u4e48\u662f spark-advisor\uff1f spark-advisor \u662f\u4e00\u4e2a Agent Skill\uff0c\u5c06\u4f60\u7684 AI \u7f16\u7a0b\u52a9\u624b\u53d8\u6210 Spark \u6027\u80fd\u5de5\u7a0b\u5e08\u3002\u5b83\u652f\u6301 GitHub Copilot\u3001Claude Code\u3001Cursor \u7b49 30 \u591a\u79cd Agent\u3002\n\u5f53\u4f60\u8bf4 &ldquo;\u4e3a\u4ec0\u4e48\u6211\u7684 Spark \u5e94\u7528\u5f88\u6162\uff1f&rdquo; \u6216 &ldquo;\u5bf9\u6bd4\u8fd9\u4e24\u6b21 TPC-DS \u8dd1\u6d4b&rdquo;\uff0cAgent \u4f1a\uff1a\n\u901a\u8fc7 spark-history-cli \u8fde\u63a5\u4f60\u7684 Spark History Server \u6536\u96c6\u7ed3\u6784\u5316 JSON \u6570\u636e\uff08\u6982\u89c8\u3001Stage\u3001Executor\u3001SQL \u8ba1\u5212\uff09 \u5e94\u7528\u8bca\u65ad\u89c4\u5219\u627e\u5230\u74f6\u9888 \u751f\u6210\u4f18\u5148\u7ea7\u6392\u5e8f\u7684\u62a5\u544a\u548c\u53ef\u64cd\u4f5c\u7684\u5efa\u8bae \u65e0\u9700\u624b\u52a8\u70b9\u51fb\uff0c\u65e0\u9700\u5207\u6362\u4e0a\u4e0b\u6587\u3002\u53ea\u9700\u5f00\u53e3\u95ee\u3002\n\u5feb\u901f\u5f00\u59cb \u5b89\u88c5 CLI \u548c Skill\uff1a\npip install spark-history-cli npx skills add yaooqinn\/spark-history-cli \u7136\u540e\u544a\u8bc9\u4f60\u7684 Agent\uff1a\n&ldquo;\u8bca\u65ad\u4e00\u4e0b History Server \u4e0a\u6700\u65b0\u7684 Spark \u5e94\u7528&rdquo;\n\u5c31\u8fd9\u4e48\u7b80\u5355\u3002Agent \u4f1a\u5217\u51fa\u5e94\u7528\u3001\u9009\u62e9\u6700\u65b0\u7684\u3001\u6536\u96c6\u6307\u6807\u3001\u5206\u6790\u5b83\u4eec\uff0c\u7136\u540e\u544a\u8bc9\u4f60\u95ee\u9898\u5728\u54ea\u3002\n\u5b83\u80fd\u8bca\u65ad\u4ec0\u4e48 \u8fd9\u4e2a Skill \u5c06\u7ecf\u9a8c\u4e30\u5bcc\u7684 Spark \u5de5\u7a0b\u5e08\u7684\u8bca\u65ad\u76f4\u89c9\u7f16\u7801\u4e3a\u7ed3\u6784\u5316\u89c4\u5219\u3002\u4ee5\u4e0b\u662f\u5b83\u68c0\u67e5\u7684\u5185\u5bb9\uff1a\n\u4efb\u52a1\u503e\u659c Spark \u6027\u80fd\u7684\u5934\u53f7\u6740\u624b\u3002spark-advisor \u83b7\u53d6\u4efb\u52a1\u6307\u6807\u5206\u4f4d\u6570\u5e76\u6bd4\u8f83 p50 \u548c p95\uff1a\np95\/p50 &gt; 3x \u2192 \u4e2d\u5ea6\u503e\u659c p95\/p50 &gt; 10x \u2192 \u4e25\u91cd\u503e\u659c \u7136\u540e\u6839\u636e\u6839\u56e0\u63a8\u8350\uff1aAQE \u503e\u659c Join\u3001\u5206\u533a\u6570\u8c03\u4f18\u6216 Key \u52a0\u76d0\u3002\nGC \u538b\u529b \u5f53 GC \u65f6\u95f4\u8d85\u8fc7 Executor \u603b\u8fd0\u884c\u65f6\u95f4\u7684 10% \u65f6\u6807\u8bb0\u8b66\u544a\uff0c20% \u4ee5\u4e0a\u6807\u8bb0\u4e25\u91cd\u3002\u5efa\u8bae\u4ece\u589e\u52a0 Executor \u5185\u5b58\u5230\u51cf\u5c11\u5355 Executor \u5e76\u53d1\u5ea6\u4e0d\u7b49\u3002\nShuffle \u5f00\u9500 \u5f53 Shuffle \u5b57\u8282\u6570\u8d85\u8fc7\u8f93\u5165\u5927\u5c0f\u7684 2 \u500d\u65f6\u68c0\u6d4b\u5230 Shuffle \u5bc6\u96c6\u578b Stage\u3002Skill \u4f1a\u68c0\u67e5\u8ba1\u5212\u4e2d\u7684\u5197\u4f59 Exchange\u3001\u9519\u8bef\u7684 Join \u7b56\u7565\uff08\u672c\u5e94\u7528 BroadcastHashJoin \u5374\u7528\u4e86 SortMergeJoin\uff09\uff0c\u4ee5\u53ca\u4e0d\u8db3\u7684\u5206\u533a\u6570\u3002\n\u5185\u5b58 Spill \u4efb\u4f55\u975e\u96f6\u7684 memoryBytesSpilled \u6216 diskBytesSpilled \u90fd\u4f1a\u89e6\u53d1\u544a\u8b66\u3002Spill \u610f\u5473\u7740 Executor \u7684\u805a\u5408\u6216\u6392\u5e8f\u7f13\u51b2\u533a\u5185\u5b58\u4e0d\u8db3\u2014\u2014\u8fd9\u4e2a\u95ee\u9898\u5728 UI \u4e2d\u51e0\u4e4e\u4e0d\u53ef\u89c1\uff0c\u9664\u975e\u4f60\u77e5\u9053\u5728\u54ea\u91cc\u770b\u3002\n\u6389\u961f\u4efb\u52a1 Stage \u4e2d\u8017\u65f6\u8d85\u8fc7\u4e2d\u4f4d\u6570 5 \u500d\u4ee5\u4e0a\u7684\u4efb\u52a1\u3002spark-advisor \u4f1a\u68c0\u67e5\u6389\u961f\u8005\u662f\u5426\u96c6\u4e2d\u5728\u7279\u5b9a Executor\uff08\u786c\u4ef6\u95ee\u9898\uff09\u8fd8\u662f\u5904\u7406\u4e86\u66f4\u591a\u6570\u636e\uff08\u503e\u659c\u53d8\u4f53\uff09\u3002\nGluten\/Velox \u611f\u77e5 \u5bf9\u4e8e Gluten \u52a0\u901f\u7684 Spark \u5e94\u7528\uff0cSkill \u80fd\u68c0\u6d4b\uff1a\n\u56de\u9000\u7b97\u5b50\uff1a\u6700\u7ec8\u8ba1\u5212\u4e2d\u7684\u975e Transformer \u8282\u70b9\uff08\u5982 SortMergeJoin \u800c\u975e ShuffledHashJoinExecTransformer\uff09 \u5217\u5f0f\u8f6c\u884c\u5f0f\u8fb9\u754c\uff1aVeloxColumnarToRow \u8f6c\u6362\u8868\u793a\u56de\u9000 \u539f\u751f\u6307\u6807\u6a21\u5f0f\uff1aGluten Stage \u4e0e\u539f\u751f Spark Stage \u4e0d\u540c\u7684 GC \u548c\u5185\u5b58\u7279\u5f81 TPC-DS \u57fa\u51c6\u5bf9\u6bd4 spark-advisor \u6700\u5f3a\u5927\u7684\u529f\u80fd\u4e4b\u4e00\u662f\u7ed3\u6784\u5316\u57fa\u51c6\u5bf9\u6bd4\u3002\u6bd4\u5982\u8bf4\uff1a\n&ldquo;\u5bf9\u6bd4\u8fd9\u4e24\u6b21 TPC-DS \u8dd1\u6d4b\uff1aapp-20260315120000-0001 \u548c app-20260320120000-0001&rdquo;\nAgent \u4f1a\uff1a\n\u8de8\u4e24\u6b21\u8fd0\u884c\u5339\u914d\u67e5\u8be2\uff08q1\u2013q99\uff0c\u5904\u7406 q14a\/b\u3001q23a\/b \u7b49\u62c6\u5206\u67e5\u8be2\uff09 \u8ba1\u7b97\u6bcf\u4e2a\u67e5\u8be2\u7684\u52a0\u901f\u6bd4\u548c\u56de\u9000\u6bd4 \u751f\u6210\u5bf9\u6bd4\u8868\uff1a | \u67e5\u8be2 | \u57fa\u7ebf | \u5019\u9009 | \u5dee\u503c | \u52a0\u901f\u6bd4 | \u72b6\u6001 | |------|--------|--------|-------|--------|-------------| | q67 | 72s | 85s | +13s | 0.85x | \u26a0 \u56de\u9000 | | q1 | 61s | 45s | -16s | 1.36x | \u2713 \u63d0\u5347 | | q50 | 34s | 33s | -1s | 1.03x | \u2248 \u6301\u5e73 | \u6df1\u5165\u5206\u6790 Top-3 \u56de\u9000\u67e5\u8be2\u2014\u2014\u5bf9\u6bd4\u6700\u7ec8\u8ba1\u5212\u3001Stage \u6307\u6807\u548c\u914d\u7f6e\u5dee\u5f02 \u62a5\u544a\u603b\u4f53\u52a0\u901f\u6bd4\uff08\u6240\u6709\u67e5\u8be2\u7684\u51e0\u4f55\u5e73\u5747\u503c\uff09 \u8fd9\u66ff\u4ee3\u4e86\u6bcf\u6b21\u8dd1\u5b8c\u57fa\u51c6\u540e\u6570\u5c0f\u65f6\u7684\u624b\u5de5\u8868\u683c\u5de5\u4f5c\u3002\n\u8bca\u65ad\u62a5\u544a spark-advisor \u751f\u6210\u7ed3\u6784\u5316\u7684 Markdown \u62a5\u544a\uff1a\n# Spark \u6027\u80fd\u62a5\u544a ## \u6458\u8981 \u5e94\u7528 65% \u7684\u65f6\u95f4\u82b1\u5728 3 \u4e2a Shuffle \u5bc6\u96c6\u578b Stage \u4e0a\u3002 GC \u538b\u529b\u4e2d\u7b49\uff08\u5360 Executor \u65f6\u95f4\u7684 12%\uff09\u3002 Stage 14 \u68c0\u6d4b\u5230\u4efb\u52a1\u503e\u659c\uff08p95\/p50 = 8.2x\uff09\u3002 ## \u53d1\u73b0 ### \u53d1\u73b0 1\uff1aStage 14 \u4e25\u91cd\u4efb\u52a1\u503e\u659c - **\u4e25\u91cd\u7a0b\u5ea6**\uff1a\u9ad8 - **\u8bc1\u636e**\uff1ap95 \u8017\u65f6 45s vs p50 5.5s\uff088.2x \u6bd4\u7387\uff09 - **\u5efa\u8bae**\uff1a\u542f\u7528 AQE \u503e\u659c Join \u4f18\u5316 ### \u53d1\u73b0 2\uff1aExecutor 3\u30017 GC \u538b\u529b - **\u4e25\u91cd\u7a0b\u5ea6**\uff1a\u4e2d - **\u8bc1\u636e**\uff1a18% GC \u65f6\u95f4\uff08\u9608\u503c\uff1a10%\uff09 - **\u5efa\u8bae**\uff1a\u5c06 spark.executor.memory \u4ece 4g \u589e\u52a0\u5230 8g ## \u5efa\u8bae 1. \u542f\u7528 spark.sql.adaptive.skewJoin.enabled=true 2. \u589e\u52a0 Executor \u5185\u5b58\u5230 8g 3. \u5ba1\u67e5 Stage 14 \u7684 Join \u7b56\u7565\u2014\u2014\u8003\u8651 Broadcast Join \u5e95\u5c42\u539f\u7406 spark-advisor \u662f\u4e00\u4e2a\u7eaf SKILL.md\u2014\u2014\u6ca1\u6709\u4ee3\u7801\uff0c\u53ea\u6709\u7ed3\u6784\u5316\u7684\u6307\u4ee4\uff0c\u6559\u4f1a Agent \u5982\u4f55\u6210\u4e3a Spark \u6027\u80fd\u5de5\u7a0b\u5e08\u3002\u5b83\u4f7f\u7528\uff1a\nspark-history-cli \u6536\u96c6\u6240\u6709\u6570\u636e\uff08--json \u6a21\u5f0f\u83b7\u53d6\u7ed3\u6784\u5316\u8f93\u51fa\uff09 references\/diagnostics.md \u4e2d\u7684\u8bca\u65ad\u89c4\u5219\uff0c\u5305\u542b\u5177\u4f53\u9608\u503c\u548c\u542f\u53d1\u5f0f\u65b9\u6cd5 references\/comparison.md \u4e2d\u7684\u5bf9\u6bd4\u65b9\u6cd5\u8bba\uff0c\u7528\u4e8e TPC-DS \u57fa\u51c6\u6d4b\u8bd5 sample_codes\/ \u4e2d\u7684\u793a\u4f8b\u811a\u672c\uff0c\u7528\u4e8e\u5e38\u89c1\u6a21\u5f0f Agent \u8bfb\u53d6\u8fd9\u4e9b\u53c2\u8003\u6587\u4ef6\uff0c\u5c06\u89c4\u5219\u5e94\u7528\u5230\u4f60\u7684\u6570\u636e\u4e0a\uff0c\u5e76\u5bf9\u7ed3\u679c\u8fdb\u884c\u63a8\u7406\u3002\u65e0\u9700\u6a21\u578b\u5fae\u8c03\uff0c\u65e0\u9700\u8bad\u7ec3\u6570\u636e\u2014\u2014\u53ea\u9700\u8981\u7ed3\u6784\u826f\u597d\u7684\u9886\u57df\u77e5\u8bc6\u3002\n\u5b89\u88c5 # \u5b89\u88c5 CLI pip install spark-history-cli # \u4e3a\u4f60\u7684 Agent \u5b89\u88c5 Skill npx skills add yaooqinn\/spark-history-cli \u8fd9\u4f1a\u5b89\u88c5\u4e24\u4e2a Skill\uff1a\nspark-history-cli \u2014 \u67e5\u8be2 Spark History Server spark-advisor \u2014 \u8bca\u65ad\u3001\u5bf9\u6bd4\u548c\u4f18\u5316 \u672a\u6765\u8ba1\u5212 \u81ea\u52a8\u4fee\u590d\uff1a\u5efa\u8bae\u53ef\u76f4\u63a5\u5e94\u7528\u7684\u914d\u7f6e\u8865\u4e01 \u5386\u53f2\u8d8b\u52bf\uff1a\u8de8\u7248\u672c\u8ffd\u8e2a\u6027\u80fd\u53d8\u5316 \u53ef\u89c6\u5316\uff1a\u751f\u6210 SVG \u706b\u7130\u56fe\u548c DAG \u6807\u6ce8 spark-advisor \u5728 github.com\/yaooqinn\/spark-history-cli \u5f00\u6e90\u3002\u5982\u679c\u89c9\u5f97\u6709\u7528\uff0c\u6b22\u8fce Star\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/spark-advisor\/","summary":"spark-advisor \u662f\u4e00\u4e2a Agent Skill\uff0c\u5c06\u4f60\u7684 AI \u7f16\u7a0b\u52a9\u624b\u53d8\u6210 Spark \u6027\u80fd\u5de5\u7a0b\u5e08\u2014\u2014\u8bca\u65ad\u6162\u4f5c\u4e1a\u3001\u68c0\u6d4b\u6570\u636e\u503e\u659c\u3001\u5bf9\u6bd4\u57fa\u51c6\u6d4b\u8bd5\u3001\u751f\u6210\u53ef\u64cd\u4f5c\u7684\u8c03\u4f18\u5efa\u8bae\u3002","title":"spark-advisor\uff1aAI \u9a71\u52a8\u7684 Spark \u6027\u80fd\u5de5\u7a0b\u5e08"},{"content":"Spark History Server \u6709\u4e0d\u9519\u7684 Web UI \u548c\u5b8c\u5584\u7684 REST API\u3002\u4f46\u5982\u679c\u4f60\u5df2\u7ecf\u5728\u7ec8\u7aef\u91cc\u2014\u2014SSH \u5230\u7f51\u5173\u8282\u70b9\u3001\u5728 CI \u91cc\u8c03\u8bd5\u7ba1\u9053\u3001\u6216\u8005\u7f16\u5199\u4e8b\u540e\u5206\u6790\u811a\u672c\u2014\u2014\u5207\u5230\u6d4f\u89c8\u5668\u603b\u611f\u89c9\u662f\u4e00\u6b21\u4e0d\u5fc5\u8981\u7684\u4e0a\u4e0b\u6587\u5207\u6362\u3002\nspark-history-cli \u628a\u6574\u4e2a Spark History Server \u653e\u5230\u4f60\u6307\u5c16\u3002\u5b83\u662f\u4e00\u4e2a Python CLI \u5de5\u5177\uff0c\u5c06\u5168\u90e8 20 \u4e2a REST API \u7aef\u70b9\u5c01\u88c5\u4e3a\u4ea4\u4e92\u5f0f REPL \u548c\u4e00\u6b21\u6027\u547d\u4ee4\u3002\u5217\u51fa\u5e94\u7528\u3001\u6df1\u5165\u4f5c\u4e1a\u548c Stage\u3001\u68c0\u67e5 SQL \u6267\u884c\u3001\u67e5\u770b Executor \u72b6\u6001\u3001\u4e0b\u8f7d\u4e8b\u4ef6\u65e5\u5fd7\u2014\u2014\u5168\u5728\u7ec8\u7aef\u91cc\u5b8c\u6210\u3002\n\u5b89\u88c5 pip install spark-history-cli \u5c31\u8fd9\u6837\u3002\u9700\u8981 Python 3.10+ \u548c\u4e00\u4e2a\u8fd0\u884c\u4e2d\u7684 Spark History Server\u3002\n\u4e24\u79cd\u4f7f\u7528\u65b9\u5f0f \u4ea4\u4e92\u5f0f REPL \u76f4\u63a5\u8fd0\u884c spark-history-cli \u8fdb\u5165 REPL\uff1a\n$ spark-history-cli --server http:\/\/my-shs:18080 spark-history&gt; apps --status completed --limit 5 ID Name Status Start Time Duration app-20260318091500-0003 ETL Pipeline COMPLETED 2026-03-18 09:15:00 4m 32s app-20260318080000-0002 Daily Report COMPLETED 2026-03-18 08:00:00 12m 15s ... spark-history&gt; use app-20260318091500-0003 Current app: app-20260318091500-0003 (ETL Pipeline) spark-history&gt; jobs Job ID Status Stages Duration Description 0 SUCCEEDED 3\/3 1m 02s save at ETLPipeline.scala:45 1 SUCCEEDED 2\/2 2m 18s save at ETLPipeline.scala:78 2 SUCCEEDED 1\/1 1m 12s save at ETLPipeline.scala:112 spark-history&gt; stages spark-history&gt; sql spark-history&gt; executors spark-history&gt; env use \u547d\u4ee4\u8bbe\u7f6e&quot;\u5f53\u524d\u5e94\u7528&quot;\u4e0a\u4e0b\u6587\uff0c\u8fd9\u6837\u4f60\u4e0d\u7528\u5728\u6bcf\u4e2a\u547d\u4ee4\u91cc\u91cd\u590d app ID\u3002\u5c31\u50cf SQL \u91cc\u7684 USE database \u4e00\u6837\u3002\n\u4e00\u6b21\u6027\u547d\u4ee4 \u9002\u7528\u4e8e\u811a\u672c\u3001CI \u7ba1\u9053\u6216\u5feb\u901f\u67e5\u8be2\uff1a\n# \u5217\u51fa\u5df2\u5b8c\u6210\u7684\u5e94\u7528 spark-history-cli apps --status completed --limit 10 # \u67e5\u770b\u6307\u5b9a\u5e94\u7528\u7684\u4f5c\u4e1a spark-history-cli --app-id app-20260318091500-0003 jobs # \u4e0b\u8f7d\u4e8b\u4ef6\u65e5\u5fd7\u7528\u4e8e\u79bb\u7ebf\u5206\u6790 spark-history-cli --app-id app-20260318091500-0003 logs .\/events.zip # JSON \u8f93\u51fa\uff0c\u65b9\u4fbf\u7ba1\u9053\u5904\u7406 spark-history-cli --json --app-id app-20260318091500-0003 stages --json \u6807\u5fd7\u8f93\u51fa\u539f\u59cb JSON\u2014\u2014\u53ef\u4ee5\u76f4\u63a5\u7ba1\u9053\u5230 jq\uff0c\u5582\u7ed9\u76d1\u63a7\u811a\u672c\uff0c\u6216\u4e0e\u5176\u4ed6\u5de5\u5177\u96c6\u6210\u3002\n\u529f\u80fd\u6982\u89c8 CLI \u8986\u76d6 Spark History Server REST API \u7684 \u5168\u90e8 20 \u4e2a\u7aef\u70b9\uff1a\n\u547d\u4ee4 \u529f\u80fd apps \u5217\u51fa\u6240\u6709\u5e94\u7528\uff0c\u542b\u72b6\u6001\u3001\u65f6\u95f4\u3001\u8017\u65f6 app &lt;id&gt; \u67e5\u770b\u5e94\u7528\u8be6\u60c5\u5e76\u8bbe\u4e3a\u5f53\u524d\u5e94\u7528 jobs \u5217\u51fa\u4f5c\u4e1a\uff0c\u542b\u72b6\u6001\u3001Stage \u6570\u548c\u8017\u65f6 job &lt;id&gt; \u67e5\u770b\u4f5c\u4e1a\u8be6\u60c5 stages \u5217\u51fa\u6240\u6709 Stage stage &lt;id&gt; \u67e5\u770b Stage \u8be6\u60c5\u548c\u4efb\u52a1\u6c47\u603b executors \u5217\u51fa\u6d3b\u8dc3 Executor executors --all \u5305\u542b\u5df2\u5931\u6548\u7684 Executor sql \u5217\u51fa SQL \u6267\u884c\u8bb0\u5f55 sql &lt;id&gt; \u67e5\u770b SQL \u6267\u884c\u8be6\u60c5\u548c\u8ba1\u5212\u56fe rdds \u5217\u51fa\u7f13\u5b58\u7684 RDD env \u67e5\u770b Spark \u914d\u7f6e\u548c\u73af\u5883\u53d8\u91cf logs [path] \u4e0b\u8f7d\u4e8b\u4ef6\u65e5\u5fd7\uff08ZIP \u683c\u5f0f\uff09 version \u67e5\u770b History Server \u7684 Spark \u7248\u672c \u4e3a\u4ec0\u4e48\u4e0d\u76f4\u63a5\u7528 Web UI\uff1f Web UI \u5728\u6709\u6d4f\u89c8\u5668\u7684\u65f6\u5019\u5f88\u597d\u7528\u3002\u4f46\u6709\u4e9b\u573a\u666f\u4e0b CLI \u66f4\u5408\u9002\uff1a\nSSH \u8c03\u8bd5\u3002 \u4f60\u5728\u8df3\u677f\u673a\u6216\u7f51\u5173\u8282\u70b9\u4e0a\u6392\u67e5\u751f\u4ea7\u96c6\u7fa4\u95ee\u9898\uff0c\u6ca1\u6709\u6d4f\u89c8\u5668\uff0c\u4e0d\u60f3\u505a\u7aef\u53e3\u8f6c\u53d1\u2014\u2014\u53ea\u6709\u7ec8\u7aef\u3002spark-history-cli --server http:\/\/shs:18080 apps \u7acb\u5373\u5f00\u59cb\u5de5\u4f5c\u3002\n\u811a\u672c\u548c\u81ea\u52a8\u5316\u3002 \u60f3\u68c0\u67e5\u6628\u5929\u7684 ETL \u4f5c\u4e1a\u662f\u5426\u5168\u90e8\u6210\u529f\uff1f\u5199\u4e00\u4e2a cron \u4efb\u52a1\u8fd0\u884c spark-history-cli --json apps --status failed\uff0c\u8f93\u51fa\u975e\u7a7a\u5c31\u544a\u8b66\u3002--json \u6807\u5fd7\u8ba9\u8fd9\u4e00\u5207\u53d8\u5f97\u7b80\u5355\u3002\n\u4e8b\u540e\u5206\u6790\u5de5\u4f5c\u6d41\u3002 \u7528 logs \u4e0b\u8f7d\u4e8b\u4ef6\u65e5\u5fd7\uff0c\u7528 jobs \u5bf9\u6bd4\u4f5c\u4e1a\u8017\u65f6\uff0c\u7528 executors --all \u68c0\u67e5 Executor \u5185\u5b58\u2014\u2014\u5168\u5728\u4e00\u4e2a\u7ec8\u7aef\u4f1a\u8bdd\u91cc\u5b8c\u6210\uff0c\u4e0d\u7528\u5728\u591a\u4e2a\u6d4f\u89c8\u5668\u6807\u7b7e\u9875\u95f4\u6765\u56de\u5207\u6362\u3002\nCI\/CD \u96c6\u6210\u3002 \u5728\u7ba1\u9053\u4e2d\u63d0\u4ea4 Spark \u5e94\u7528\u540e\uff0c\u67e5\u8be2 History Server \u9a8c\u8bc1\u4f5c\u4e1a\u662f\u5426\u6210\u529f\u5b8c\u6210\u3001\u68c0\u67e5 Stage \u6307\u6807\u3001\u6216\u5c06\u4e8b\u4ef6\u65e5\u5fd7\u5f52\u6863\u4e3a\u6784\u5efa\u4ea7\u7269\u3002\n\u4e3a\u4ec0\u4e48\u662f CLI\uff0c\u800c\u4e0d\u53ea\u662f Web UI\uff1fAgent \u89c6\u89d2 spark-history-cli \u6700\u91cd\u8981\u7684\u610f\u4e49\u4e0d\u662f\u4eba\u7c7b\u4f7f\u7528\u7684\u4fbf\u5229\u6027\u2014\u2014\u800c\u662f AI Agent \u6839\u672c\u65e0\u6cd5\u4f7f\u7528 Web UI\u3002\n\u6211\u4eec\u6b63\u8fdb\u5165\u4e00\u4e2a LLM \u9a71\u52a8\u7684 Agent\u2014\u2014GitHub Copilot\u3001\u7f16\u7a0b\u52a9\u624b\u3001\u503c\u73ed\u673a\u5668\u4eba\u3001\u81ea\u52a8\u6839\u56e0\u5206\u6790\u5668\u2014\u2014\u6210\u4e3a\u5de5\u7a0b\u5de5\u4f5c\u6d41\u4e00\u7b49\u516c\u6c11\u7684\u65f6\u4ee3\u3002\u8fd9\u4e9b Agent \u901a\u8fc7\u6587\u672c\u63a5\u53e3\u4e0e\u4e16\u754c\u4ea4\u4e92\uff1aShell \u547d\u4ee4\u3001API \u548c\u7ed3\u6784\u5316\u8f93\u51fa\u3002Web UI \u5bf9\u5b83\u4eec\u6765\u8bf4\u662f\u6b7b\u8def\u4e00\u6761\u3002\u65e0\u8bba Spark History Server \u7684\u7f51\u9875\u591a\u4e48\u7cbe\u7f8e\uff0cAgent \u65e0\u6cd5\u70b9\u51fb\u94fe\u63a5\u3001\u6eda\u52a8\u8868\u683c\u6216\u9605\u8bfb DAG \u53ef\u89c6\u5316\u56fe\u3002\nCLI \u6539\u53d8\u4e86\u4e00\u5207\uff1a\nAgent \u53ef\u4ee5\u5c06\u5b83\u4f5c\u4e3a\u5de5\u5177\u8c03\u7528\u3002 \u5f53 Agent \u9700\u8981\u56de\u7b54&quot;\u6628\u665a\u7684 ETL \u4e3a\u4ec0\u4e48\u5931\u8d25\u4e86\uff1f&ldquo;\u65f6\uff0c\u5b83\u53ef\u4ee5\u8fd0\u884c spark-history-cli --json apps --status failed\uff0c\u89e3\u6790 JSON\uff0c\u9009\u51fa\u76f8\u5173\u5e94\u7528\uff0c\u518d\u8fd0\u884c spark-history-cli --json --app-id &lt;id&gt; jobs \u627e\u5230\u5931\u8d25\u7684\u4f5c\u4e1a\uff0c\u7136\u540e\u7528 stages \u5b9a\u4f4d\u5931\u8d25\u7684 Stage\u2014\u2014\u5168\u90e8\u81ea\u4e3b\u5b8c\u6210\uff0c\u5728\u601d\u7ef4\u94fe\u5faa\u73af\u4e2d\u3002Web UI \u6ca1\u6709\u63d0\u4f9b\u4efb\u4f55\u7b49\u6548\u7684\u7a0b\u5e8f\u5316\u63a8\u7406\u5165\u53e3\u3002\n\u7ed3\u6784\u5316\u8f93\u51fa\u652f\u6491\u63a8\u7406\u3002 --json \u6807\u5fd7\u4e0d\u53ea\u662f\u7ed9 jq \u7528\u7684\u2014\u2014\u5b83\u8ba9\u5de5\u5177\u5bf9 LLM \u53ef\u8bfb\u3002Agent \u53ef\u4ee5\u8bfb\u5165\u4e00\u4e2a JSON \u4f5c\u4e1a\u6570\u7ec4\uff0c\u6bd4\u8f83\u8017\u65f6\uff0c\u53d1\u73b0\u5f02\u5e38\uff0c\u7136\u540e\u7efc\u5408\u51fa\u4e00\u4efd\u4eba\u7c7b\u53ef\u8bfb\u7684\u8bca\u65ad\u62a5\u544a\u3002\u8bd5\u8bd5\u7528\u6d4f\u89c8\u5668\u91cc\u6e32\u67d3\u7684 HTML \u8868\u683c\u505a\u540c\u6837\u7684\u4e8b\u3002\nREPL \u6620\u5c04\u4e86 Agent \u7684\u601d\u8003\u65b9\u5f0f\u3002 Agent \u63a2\u7d22 Spark \u5e94\u7528\u65f6\u9075\u5faa\u4e0e\u4eba\u7c7b\u76f8\u540c\u7684\u4e0b\u94bb\u6a21\u5f0f\uff1a\u5217\u51fa\u5e94\u7528 \u2192 \u9009\u4e00\u4e2a \u2192 \u67e5\u770b\u4f5c\u4e1a \u2192 \u6df1\u5165\u6162\u7684 Stage \u2192 \u67e5\u770b\u4efb\u52a1\u6307\u6807\u3002REPL \u7684 use \u547d\u4ee4\u548c\u5c42\u7ea7\u5bfc\u822a\u81ea\u7136\u5730\u6620\u5c04\u4e86\u8fd9\u79cd\u63a8\u7406\u6a21\u5f0f\u3002\u6bcf\u4e2a\u547d\u4ee4\u90fd\u662f\u4e00\u4e2a\u79bb\u6563\u7684\u3001\u53ef\u7ec4\u5408\u7684\u6b65\u9aa4\uff0cAgent \u53ef\u4ee5\u89c4\u5212\u548c\u6267\u884c\u3002\n\u5b83\u95ed\u5408\u4e86\u53cd\u9988\u5faa\u73af\u3002 \u8003\u8651\u4e00\u4e2a\u63d0\u4ea4 Spark \u5e94\u7528\u7684 CI \u7ba1\u9053\u3002\u4eca\u5929\uff0c\u9a8c\u8bc1\u7ed3\u679c\u610f\u5473\u7740\u8981\u4e48\u7528\u81ea\u5b9a\u4e49\u811a\u672c\u89e3\u6790\u539f\u59cb REST API \u54cd\u5e94\uff0c\u8981\u4e48\u8ba9\u4eba\u53bb\u67e5 Web UI\u3002\u6709\u4e86 spark-history-cli\uff0cAgent\uff08\u6216\u7b80\u5355\u7684 Shell \u811a\u672c\uff09\u53ef\u4ee5\u67e5\u8be2 History Server\uff0c\u9a8c\u8bc1\u6210\u529f\uff0c\u63d0\u53d6\u6307\u6807\u5e76\u62a5\u544a\u2014\u2014\u5b8c\u5168\u95ed\u5408\u81ea\u52a8\u5316\u5faa\u73af\u3002\n\u8fd9\u624d\u662f\u771f\u6b63\u7684\u8bba\u70b9\uff1aSpark History Server \u5b58\u50a8\u4e86\u4e30\u5bcc\u7684\u8bca\u65ad\u6570\u636e\uff0c\u4f46\u5b83\u88ab\u9501\u5728\u4e00\u4e2a\u4ec5\u4f9b\u4eba\u7c7b\u4f7f\u7528\u7684\u754c\u9762\u540e\u9762\u3002 spark-history-cli \u5c06\u8fd9\u4e9b\u6570\u636e\u8f6c\u5316\u4e3a\u4eba\u7c7b\u548c Agent \u90fd\u80fd\u6d88\u8d39\u7684\u5f62\u5f0f\u3002\u5728\u4f60\u7684\u503c\u73ed\u52a9\u624b\u662f LLM \u7684\u4e16\u754c\u91cc\uff0c\u8fd9\u4e2a\u533a\u522b\u81f3\u5173\u91cd\u8981\u3002\nGitHub Copilot CLI \u6280\u80fd spark-history-cli \u4f5c\u4e3a GitHub Copilot CLI \u6280\u80fd \u53d1\u5e03\u2014\u2014\u8fd9\u662f Agent \u96c6\u6210\u7684\u5b9e\u8df5\u3002\u5b89\u88c5\u65b9\u5f0f\uff1a\nspark-history-cli install-skill \u8fd9\u4f1a\u5c06\u5185\u7f6e\u7684\u6280\u80fd\u5b9a\u4e49\u590d\u5236\u5230 ~\/.copilot\/skills\/spark-history-cli\u3002\u91cd\u65b0\u52a0\u8f7d\u6280\u80fd\uff08\/skills reload\uff09\u540e\uff0c\u4f60\u53ef\u4ee5\u7528\u81ea\u7136\u8bed\u8a00\u63d0\u793a\uff1a\nUse \/spark-history-cli to inspect the latest completed SHS application. Copilot CLI \u4f1a\u8c03\u7528\u5de5\u5177\u3001\u89e3\u8bfb\u8f93\u51fa\uff0c\u5e76\u7528\u5bf9\u8bdd\u7684\u65b9\u5f0f\u56de\u7b54\u4f60\u5173\u4e8e Spark \u5e94\u7528\u5386\u53f2\u7684\u95ee\u9898\u3002\u4f60\u63cf\u8ff0\u610f\u56fe\uff0cAgent \u51b3\u5b9a\u8fd0\u884c\u54ea\u4e9b\u547d\u4ee4\uff0c\u5c06\u5b83\u4eec\u4e32\u8054\u8d77\u6765\uff0c\u5e76\u7efc\u5408\u51fa\u57fa\u4e8e\u771f\u5b9e History Server \u6570\u636e\u7684\u7b54\u6848\u3002\u65e0\u9700\u8bb0\u4f4f\u547d\u4ee4\u8bed\u6cd5\uff0c\u65e0\u9700\u624b\u52a8\u89e3\u6790 JSON\u2014\u2014\u53ea\u9700\u4e00\u4e2a\u95ee\u9898\u548c\u4e00\u4e2a\u7b54\u6848\u3002\n\u914d\u7f6e \u670d\u52a1\u5668 URL \u9ed8\u8ba4\u4e3a http:\/\/localhost:18080\u3002\u8986\u76d6\u65b9\u5f0f\uff1a\n# \u547d\u4ee4\u884c\u53c2\u6570 spark-history-cli --server http:\/\/my-shs:18080 # \u73af\u5883\u53d8\u91cf export SPARK_HISTORY_SERVER=http:\/\/my-shs:18080 spark-history-cli # REPL \u4e2d\u52a8\u6001\u5207\u6362 spark-history&gt; server http:\/\/another-shs:18080 \u5f00\u59cb\u4f7f\u7528 pip install spark-history-cli spark-history-cli \u6e90\u7801\u5728 GitHub\uff1ayaooqinn\/spark-history-cli\uff0c\u57fa\u4e8e Apache 2.0 \u8bb8\u53ef\u8bc1\u3002\u6b22\u8fce\u63d0 Issue\u3001PR \u548c\u53cd\u9988\u3002\nspark-history-cli v1.0.1 \u5df2\u53d1\u5e03\u5728 PyPI\u3002\u6e90\u7801\uff1agithub.com\/yaooqinn\/spark-history-cli\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/spark-history-cli\/","summary":"spark-history-cli \u5c06 Spark History Server \u5e26\u5230\u4f60\u7684\u7ec8\u7aef\u2014\u2014\u4e00\u4e2a\u4ea4\u4e92\u5f0f REPL \u548c\u4e00\u6b21\u6027\u547d\u4ee4\u884c\u5de5\u5177\uff0c\u8986\u76d6\u5168\u90e8 20 \u4e2a REST API \u7aef\u70b9\u3002\u5217\u51fa\u5e94\u7528\u3001\u68c0\u67e5\u4f5c\u4e1a\u3001\u6df1\u5165 Stage\u3001\u67e5\u770b SQL \u6267\u884c\u3001\u4e0b\u8f7d\u4e8b\u4ef6\u65e5\u5fd7\uff0c\u65e0\u9700\u6253\u5f00\u6d4f\u89c8\u5668\u3002\u8fd8\u53ef\u4ee5\u4f5c\u4e3a GitHub Copilot CLI \u6280\u80fd\u4f7f\u7528\u3002","title":"spark-history-cli\uff1a\u8ba9 Spark History Server \u5bf9 AI Agent \u53cb\u597d"},{"content":"\u4f60\u5728 Spark Web UI \u4e2d\u70b9\u51fb\u67d0\u6761 SQL \u6267\u884c\u8bb0\u5f55\uff0c\u60f3\u77e5\u9053\uff1a\u8fd9\u6761\u67e5\u8be2\u542f\u52a8\u4e86\u54ea\u4e9b\u4f5c\u4e1a\uff1f\u8fdb\u5c55\u5982\u4f55\uff1f\u6709\u6ca1\u6709\u5931\u8d25\uff1f\n\u4ee5\u524d\u9875\u9762\u53ea\u7ed9\u4f60\u770b\u8fd9\u4e9b\uff1a\nRunning Jobs: 0, 1, 2 Succeeded Jobs: 3, 4 \u5c31\u8fd9\u6837\u3002\u5149\u79c3\u79c3\u7684 ID\uff0c\u6ca1\u6709\u72b6\u6001\u3001\u6ca1\u6709\u8017\u65f6\u3001\u6ca1\u6709 Stage \u6570\u91cf\u3001\u6ca1\u6709\u8fdb\u5ea6\u6761\u3002\u8981\u5f04\u6e05\u695a\u5230\u5e95\u53d1\u751f\u4e86\u4ec0\u4e48\uff0c\u4f60\u5f97\u9010\u4e2a\u70b9\u51fb Job ID\uff0c\u67e5\u770b\u4f5c\u4e1a\u8be6\u60c5\u9875\uff0c\u518d\u8fd4\u56de\u6765\uff0c\u70b9\u4e0b\u4e00\u4e2a\uff0c\u7136\u540e\u5728\u8111\u5b50\u91cc\u628a\u4fe1\u606f\u62fc\u8d77\u6765\u3002\u5bf9\u4e8e\u4e00\u4e2a\u4f1a\u542f\u52a8\u5341\u51e0\u4e2a\u4f5c\u4e1a\u7684\u590d\u6742\u67e5\u8be2\u6765\u8bf4\uff0c\u8fd9\u771f\u7684\u5f88\u75db\u82e6\u3002\nSPARK-55971 \u89e3\u51b3\u4e86\u8fd9\u4e2a\u95ee\u9898\u3002 SQL \u6267\u884c\u8be6\u60c5\u9875\u73b0\u5728\u6709\u4e86\u5b8c\u6574\u7684 Associated Jobs \u8868\u683c\uff0c\u6240\u6709\u4fe1\u606f\u4e00\u76ee\u4e86\u7136\u3002\n\u754c\u9762\u5c55\u793a \u4e0b\u9762\u662f\u4e00\u4e2a\u6210\u529f\u6267\u884c\u7684\u67e5\u8be2\uff0c\u5c55\u793a\u65b0\u7684\u4f5c\u4e1a\u8868\u683c\uff1a\n\u4e0b\u9762\u662f\u4e00\u4e2a\u5931\u8d25\u7684\u6267\u884c\uff0c\u5305\u542b\u88ab\u7ec8\u6b62\u7684\u4efb\u52a1\u2014\u2014\u6ce8\u610f\u8fdb\u5ea6\u6761\u4e0a\u7b80\u6d01\u7684\u6807\u7b7e\uff1a\n\u5b8c\u6574\u9875\u9762\u5c55\u793a\uff0c\u4f5c\u4e1a\u8868\u683c\u4f4d\u4e8e Plan Details \u4e0b\u65b9\uff1a\n\u4f5c\u4e3a\u5bf9\u6bd4\uff0c\u8fd9\u662f Jobs \u9875\u9762\u2014\u2014\u65b0\u8868\u683c\u4e0e\u5176\u4fdd\u6301\u4e00\u81f4\u7684\u89c6\u89c9\u98ce\u683c\uff1a\n\u8868\u683c\u5185\u5bb9 \u5217 \u5c55\u793a\u5185\u5bb9 Job ID \u8df3\u8f6c\u5230\u4f5c\u4e1a\u8be6\u60c5\u9875\u7684\u94fe\u63a5 Description Stage \u540d\u79f0\u548c\u63cf\u8ff0 Submitted \u63d0\u4ea4\u65f6\u95f4\uff08\u53ef\u6392\u5e8f\uff09 Duration \u53ef\u8bfb\u7684\u8017\u65f6\uff08\u53ef\u6392\u5e8f\uff09 Stages: Succeeded\/Total Stage \u5b8c\u6210\u8fdb\u5ea6\uff0c\u542b\u5931\u8d25\/\u8df3\u8fc7\u8ba1\u6570 Tasks: Succeeded\/Total \u5e26 Task \u7ea7\u522b\u660e\u7ec6\u7684\u8fdb\u5ea6\u6761 \u8868\u683c\u6807\u9898\u663e\u793a &ldquo;Associated Jobs (N)&rdquo;\uff0c\u8ba9\u4f60\u7acb\u523b\u77e5\u9053\u8fd9\u6761\u67e5\u8be2\u542f\u52a8\u4e86\u591a\u5c11\u4e2a\u4f5c\u4e1a\u3002\u70b9\u51fb\u6807\u9898\u53ef\u4ee5\u6298\u53e0\u6216\u5c55\u5f00\u8be5\u533a\u5757\u2014\u2014\u72b6\u6001\u901a\u8fc7 localStorage \u6301\u4e45\u5316\uff0c\u5237\u65b0\u9875\u9762\u540e\u4f9d\u7136\u4fdd\u6301\u3002\n\u5217\u652f\u6301\u6392\u5e8f\u3002\u70b9\u51fb Duration \u627e\u5230\u6700\u6162\u7684\u4f5c\u4e1a\uff0c\u70b9\u51fb Submitted \u67e5\u770b\u6267\u884c\u987a\u5e8f\u3002Stages \u548c Tasks \u5217\u4f7f\u7528\u4e0e Jobs \u4e3b\u9875\u9762\u76f8\u540c\u7684\u8fdb\u5ea6\u6761\u6837\u5f0f\uff0c\u4fdd\u6301\u89c6\u89c9\u8bed\u8a00\u7684\u4e00\u81f4\u6027\u3002\n\u4e3a\u4ec0\u4e48\u8fd9\u5f88\u91cd\u8981 SQL \u6267\u884c\u8be6\u60c5\u9875\u662f\u5de5\u7a0b\u5e08\u5728\u67e5\u8be2\u7f13\u6162\u6216\u5931\u8d25\u65f6\u7b2c\u4e00\u4e2a\u53bb\u7684\u5730\u65b9\u3002\u95ee\u9898\u603b\u662f\u4e00\u6837\u7684\uff1a\n\u8fd9\u6761\u67e5\u8be2\u521b\u5efa\u4e86\u591a\u5c11\u4e2a\u4f5c\u4e1a\uff1f \u2014 \u73b0\u5728\u76f4\u63a5\u663e\u793a\u5728\u533a\u5757\u6807\u9898\u4e2d\u3002 \u54ea\u4e2a\u4f5c\u4e1a\u662f\u74f6\u9888\uff1f \u2014 \u6309 Duration \u6392\u5e8f\u5373\u53ef\u3002 \u4f5c\u4e1a\u8fd8\u5728\u8fd0\u884c\u5417\uff1f\u8fdb\u5c55\u5982\u4f55\uff1f \u2014 Task \u8fdb\u5ea6\u6761\u5373\u65f6\u5448\u73b0\u3002 \u6709 Stage \u5931\u8d25\u5417\uff1f \u2014 Stages \u5217\u5185\u8054\u663e\u793a\u5931\u8d25\/\u8df3\u8fc7\u8ba1\u6570\u3002 \u6bcf\u4e2a\u4f5c\u4e1a\u5728\u505a\u4ec0\u4e48\uff1f \u2014 Description \u5217\u5c55\u793a Stage \u540d\u79f0\u3002 \u4ee5\u524d\uff0c\u56de\u7b54\u4efb\u4f55\u4e00\u4e2a\u95ee\u9898\u90fd\u9700\u8981\u79bb\u5f00\u6267\u884c\u8be6\u60c5\u9875\u3002\u73b0\u5728\u5b83\u4eec\u90fd\u5728\u540c\u4e00\u5f20\u8868\u683c\u3001\u540c\u4e00\u4e2a\u9875\u9762\u4e0a\u5f97\u5230\u89e3\u7b54\uff0c\u65e0\u9700\u4efb\u4f55\u989d\u5916\u70b9\u51fb\u3002\n\u989d\u5916\u6536\u83b7\uff1a\u7b80\u6d01\u7684\u8fdb\u5ea6\u6761\u6807\u7b7e \u540c\u4e00\u4e2a PR \u8fd8\u4fee\u590d\u4e86\u4e00\u4e2a\u957f\u671f\u5b58\u5728\u7684\u8fdb\u5ea6\u6761\u53ef\u8bfb\u6027\u95ee\u9898\uff0c\u5f71\u54cd\u6574\u4e2a Web UI\u3002\u5f53\u4efb\u52a1\u88ab\u7ec8\u6b62\u65f6\uff0cSpark \u4ee5\u524d\u4f1a\u5728\u8fdb\u5ea6\u6761\u6807\u7b7e\u4e2d\u663e\u793a\u5b8c\u6574\u7684\u7ec8\u6b62\u539f\u56e0\u2014\u2014\u5305\u62ec\u5806\u6808\u8ddf\u8e2a\uff1a\n[====&gt; ] 45\/100 (5 killed: org.apache.spark.SparkException: Job 3 cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler...) \u73b0\u5728\u663e\u793a\u7b80\u6d01\u7684\u6807\u7b7e\uff0c\u8be6\u7ec6\u539f\u56e0\u901a\u8fc7\u60ac\u505c\u67e5\u770b\uff1a\n[====&gt; ] 45\/100 (5 killed) \u2191 \u60ac\u505c\u67e5\u770b\u5b8c\u6574\u539f\u56e0 \u8fd9\u9002\u7528\u4e8e Jobs \u9875\u9762\u3001Stages \u9875\u9762\u548c\u65b0\u7684 SQL \u6267\u884c\u4f5c\u4e1a\u8868\u683c\u7684\u8fdb\u5ea6\u6761\u3002\u622a\u65ad\u540e\u7684\u539f\u56e0\uff08\u6700\u591a 120 \u4e2a\u5b57\u7b26\uff09\u4fdd\u7559\u5728\u5de5\u5177\u63d0\u793a\u4e2d\u3002\n\u66f4\u5927\u7684\u73b0\u4ee3\u5316\u8ba1\u5212 \u8fd9\u662f SPARK-55760 Web UI \u73b0\u4ee3\u5316\u5de5\u4f5c\u7684\u4e00\u90e8\u5206\u3002\u5176\u4ed6\u8fd1\u671f\u6539\u8fdb\u5305\u62ec\uff1a\n\u6697\u9ed1\u6a21\u5f0f \u2014 \u4e00\u952e\u5207\u6362\uff0c\u7cfb\u7edf\u504f\u597d\u68c0\u6d4b\uff08SPARK-55766\uff09 \u7d27\u51d1\u578b SQL \u8ba1\u5212\u53ef\u89c6\u5316 \u2014 \u8fb9\u4e0a\u7684\u884c\u6570\u6807\u7b7e\uff0c\u53ef\u70b9\u51fb\u7684\u6307\u6807\u9762\u677f\uff08SPARK-55785\uff09 Offcanvas \u8be6\u60c5\u9762\u677f \u2014 \u6ed1\u51fa\u5f0f Executor \u89c6\u56fe\uff08SPARK-55767\uff09 Bootstrap 5 \u6298\u53e0 API \u2014 \u66ff\u6362\u6240\u6709\u9875\u9762\u7684\u81ea\u5b9a\u4e49 JS \u6298\u53e0\u903b\u8f91\uff08SPARK-55773\uff09 Bootstrap 5 \u5347\u7ea7 \u2014 \u4ece 4.6.2 \u5347\u7ea7\u5230 5.3.8\uff08SPARK-55761\uff09 \u76ee\u6807\u5f88\u7b80\u5355\uff1aSpark Web UI \u5728\u5e2e\u52a9\u4f60\u7406\u89e3\u67e5\u8be2\u65b9\u9762\uff0c\u5e94\u8be5\u548c Spark \u6267\u884c\u67e5\u8be2\u4e00\u6837\u51fa\u8272\u3002\n\u8bd5\u8bd5\u770b \u8be5\u529f\u80fd\u5df2\u5408\u5165 master \u5206\u652f\uff0c\u5c06\u5728\u4e0b\u4e00\u4e2a Apache Spark \u7248\u672c\u4e2d\u53d1\u5e03\u3002\u5982\u679c\u4f60\u4ece\u6e90\u7801\u6784\u5efa\uff0c\u4eca\u5929\u5c31\u53ef\u4ee5\u4f53\u9a8c\u3002\n\u8be5\u529f\u80fd\u4f5c\u4e3a SPARK-55971\uff08PR #54768\uff09\u8d21\u732e\u3002\u6b22\u8fce\u5728 SPARK-55760 \u63d0\u4f9b\u53cd\u9988\u548c\u53c2\u4e0e Spark Web UI \u73b0\u4ee3\u5316\u5de5\u4f5c\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/sql-execution-page-modernization\/","summary":"Spark Web UI \u7684 SQL \u6267\u884c\u8be6\u60c5\u9875\u8fc7\u53bb\u53ea\u7528\u9017\u53f7\u5206\u9694\u7684 ID \u5c55\u793a\u5173\u8054\u4f5c\u4e1a\u3002\u73b0\u5728\u5b83\u6709\u4e86\u5b8c\u6574\u7684 Associated Jobs \u8868\u683c\uff0c\u5305\u542b\u72b6\u6001\u3001\u8017\u65f6\u3001Stage \u8fdb\u5ea6\u548c Task \u8fdb\u5ea6\u6761\u2014\u2014\u8ba9\u4f60\u65e0\u9700\u9010\u4e2a\u70b9\u51fb\u5373\u53ef\u8c03\u8bd5 SQL \u67e5\u8be2\u3002","title":"SQL \u6267\u884c\u8be6\u60c5\u9875\u7ec8\u4e8e\u80fd\u76f4\u89c2\u5c55\u793a\u4f5c\u4e1a\u8fd0\u884c\u72b6\u6001\u4e86"},{"content":"\u5982\u679c\u4f60\u66fe\u5728\u51cc\u6668 2 \u70b9\u76ef\u7740 Spark Web UI \u8c03\u8bd5\u4e00\u4e2a\u5931\u8d25\u7684\u4f5c\u4e1a\uff0c\u4f60\u4e00\u5b9a\u4f53\u4f1a\u8fc7\u90a3\u79cd\u611f\u89c9\uff1a\u4e00\u7247\u523a\u773c\u7684\u767d\u5149\u6253\u5728\u773c\u775b\u4e0a\uff0c\u800c\u4f60\u8fd8\u5728\u7ffb\u9605 Stages\u3001Executors \u548c SQL \u6267\u884c\u8ba1\u5212\u3002\u8fd9\u4e2a\u65f6\u4ee3\u7ed3\u675f\u4e86\u3002\nApache Spark \u7684 Web UI \u73b0\u5df2\u652f\u6301\u6697\u9ed1\u6a21\u5f0f\uff0c\u4f5c\u4e3a SPARK-55766 \u7684\u4e00\u90e8\u5206\u5408\u5165 master \u5206\u652f\u3002\u4e00\u952e\u5207\u6362\uff0c\u8bb0\u4f4f\u4f60\u7684\u504f\u597d\uff0c\u5c0a\u91cd\u4f60\u7684\u7cfb\u7edf\u8bbe\u7f6e\u3002\u8986\u76d6\u6240\u6709\u9875\u9762\u2014\u2014Jobs\u3001Stages\u3001SQL\u3001Executors\u3001Environment \u7b49\u3002\n\u4e3a\u4ec0\u4e48\u9700\u8981\u6697\u9ed1\u6a21\u5f0f\uff1f \u6697\u9ed1\u6a21\u5f0f\u4e0d\u662f\u4e00\u79cd\u89c6\u89c9\u6f6e\u6d41\u2014\u2014\u5b83\u662f\u5f00\u53d1\u8005\u751f\u4ea7\u529b\u5de5\u5177\u3002\u4ee5\u4e0b\u662f\u5b83\u5bf9 Spark \u7528\u6237\u7684\u610f\u4e49\uff1a\n1. \u51cf\u8f7b\u957f\u65f6\u95f4\u8c03\u8bd5\u7684\u773c\u75b2\u52b3 Spark \u4f5c\u4e1a\u53ef\u80fd\u8fd0\u884c\u6570\u5c0f\u65f6\u3002\u5f53\u51fa\u73b0\u95ee\u9898\u65f6\uff0c\u5de5\u7a0b\u5e08\u9700\u8981\u82b1\u8d39\u5927\u91cf\u65f6\u95f4\u5728 Web UI \u4e0a\u2014\u2014\u67e5\u770b DAG \u53ef\u89c6\u5316\u3001\u9605\u8bfb Executor \u65e5\u5fd7\u3001\u8ffd\u8e2a SQL \u67e5\u8be2\u8ba1\u5212\u3002\u6697\u8272\u754c\u9762\u964d\u4f4e\u4e86\u5c4f\u5e55\u4e0e\u660f\u6697\u623f\u95f4\u4e4b\u95f4\u7684\u5bf9\u6bd4\u5ea6\uff0c\u800c\u6df1\u591c\u8c03\u8bd5\u6b63\u662f\u6700\u5e38\u89c1\u7684\u573a\u666f\u3002\n2. \u5c0a\u91cd\u5f00\u53d1\u8005\u504f\u597d \u73b0\u4ee3\u5f00\u53d1\u5de5\u5177\u5df2\u7ecf\u666e\u904d\u91c7\u7528\u6697\u9ed1\u6a21\u5f0f\u2014\u2014VS Code\u3001GitHub\u3001IntelliJ IDEA\u3001\u7ec8\u7aef\u6a21\u62df\u5668\uff0c\u751a\u81f3\u64cd\u4f5c\u7cfb\u7edf\u672c\u8eab\u3002\u5f53\u5f00\u53d1\u8005\u7684\u6574\u4e2a\u73af\u5883\u90fd\u5904\u4e8e\u6697\u9ed1\u6a21\u5f0f\uff0c\u7136\u540e\u5728\u6d4f\u89c8\u5668\u4e2d\u6253\u5f00 Spark UI \u65f6\uff0c\u523a\u773c\u7684\u767d\u8272\u9875\u9762\u4ee4\u4eba\u4e0d\u9002\u3002Spark UI \u5e94\u8be5\u6210\u4e3a\u5f00\u53d1\u8005\u5de5\u4f5c\u6d41\u7a0b\u7684\u81ea\u7136\u7ec4\u6210\u90e8\u5206\uff0c\u800c\u4e0d\u662f\u4e00\u4e2a\u4f8b\u5916\u3002\n3. \u793e\u533a\u4e00\u76f4\u5728\u547c\u5524 \u6697\u9ed1\u6a21\u5f0f\u4e00\u76f4\u662f Spark \u793e\u533a\u6700\u53d7\u671f\u5f85\u7684 UI \u7279\u6027\u4e4b\u4e00\u3002\u968f\u7740 Web UI \u5411 Bootstrap 5 \u73b0\u4ee3\u5316\uff08SPARK-55760\uff09\u8fc8\u8fdb\uff0c\u57fa\u7840\u8bbe\u65bd\u7ec8\u4e8e\u53ef\u4ee5\u6b63\u786e\u5730\u652f\u6301\u5b83\u2014\u2014\u65e0\u9700 hack\u3001\u65e0\u9700\u81ea\u5b9a\u4e49 CSS \u4e3b\u9898\u3001\u65e0\u9700\u989d\u5916\u7684\u7ef4\u62a4\u8d1f\u62c5\u3002\n\u6548\u679c\u5c55\u793a \u4ee5\u4e0b\u662f Spark Web UI \u5728\u4e24\u79cd\u6a21\u5f0f\u4e0b\u7684\u5bf9\u6bd4\uff1a\nJobs \u9875\u9762 \u4eae\u8272 \u6697\u8272 SQL \u67e5\u8be2\u8ba1\u5212\u53ef\u89c6\u5316 \u4eae\u8272 \u6697\u8272 Executors \u9875\u9762 \u4eae\u8272 \u6697\u8272 Environment \u9875\u9762 \u4eae\u8272 \u6697\u8272 \u5b9e\u73b0\u7406\u5ff5 \u6211\u4eec\u6709\u610f\u9009\u62e9\u4e86\u6700\u7b80\u6d01\u3001\u6700\u6613\u7ef4\u62a4\u7684\u65b9\u6848\uff1a\n\u57fa\u4e8e Bootstrap 5 \u989c\u8272\u6a21\u5f0f\u3002 \u6ca1\u6709\u81ea\u5b9a\u4e49 CSS \u989c\u8272\u7cfb\u7edf\uff0c\u6ca1\u6709\u72ec\u7acb\u7684\u6837\u5f0f\u8868\u3002Bootstrap \u7684 data-bs-theme \u5c5e\u6027\u81ea\u52a8\u5904\u7406 95% \u7684\u5de5\u4f5c\u2014\u2014\u6309\u94ae\u3001\u5361\u7247\u3001\u8868\u683c\u3001\u5bfc\u822a\u680f\u548c\u6587\u672c\u90fd\u4f1a\u81ea\u52a8\u9002\u914d\u3002\n\u5c0a\u91cd\u7cfb\u7edf\u504f\u597d\u3002 \u9996\u6b21\u8bbf\u95ee\u65f6\uff0cUI \u4f1a\u68c0\u67e5 prefers-color-scheme\u2014\u2014\u5982\u679c\u4f60\u7684\u64cd\u4f5c\u7cfb\u7edf\u5904\u4e8e\u6697\u9ed1\u6a21\u5f0f\uff0cSpark \u4f1a\u81ea\u52a8\u8ddf\u968f\u3002\u65e0\u9700\u914d\u7f6e\u3002\n\u8bb0\u4f4f\u4f60\u7684\u9009\u62e9\u3002 \u70b9\u51fb\u4e00\u6b21\u5207\u6362\u6309\u94ae\uff0clocalStorage \u4f1a\u5728\u6240\u6709\u9875\u9762\u548c\u4f1a\u8bdd\u4e2d\u4fdd\u5b58\u4f60\u7684\u504f\u597d\u3002\n\u65e0\u5185\u5bb9\u95ea\u70c1\uff08FOUC\uff09\u3002 \u4e00\u6bb5\u5185\u8054\u811a\u672c\u5728\u9875\u9762\u6e32\u67d3\u524d\u8fd0\u884c\uff0c\u56e0\u6b64\u5728\u6697\u9ed1\u6a21\u5f0f\u4e0b\u52a0\u8f7d\u65f6\u4e0d\u4f1a\u51fa\u73b0\u767d\u8272\u95ea\u70c1\u3002\n\u8fd9\u79cd\u7406\u5ff5\u4e0e\u6700\u4f18\u79c0\u7684\u5f00\u53d1\u5de5\u5177\u5904\u7406\u4e3b\u9898\u7684\u65b9\u5f0f\u4e00\u81f4\uff1a\nGitHub \u5728 2020 \u5e74 12 \u6708 \u4ee5\u7c7b\u4f3c\u7684\u65b9\u5f0f\u5f15\u5165\u6697\u9ed1\u6a21\u5f0f\u2014\u2014\u5c0a\u91cd\u7cfb\u7edf\u504f\u597d\u3001\u6301\u4e45\u5316\u7528\u6237\u9009\u62e9\u3001\u4f7f\u7528 CSS \u81ea\u5b9a\u4e49\u5c5e\u6027\u800c\u975e\u5e76\u884c\u6837\u5f0f\u8868\u3002 VS Code \u4ece\u7b2c\u4e00\u5929\u8d77\u5c31\u652f\u6301\u6697\u9ed1\u6a21\u5f0f\uff0c\u5c06\u5176\u89c6\u4e3a\u6838\u5fc3\u529f\u80fd\u800c\u975e\u4e8b\u540e\u8865\u5145\u3002 Grafana\u2014\u2014\u53e6\u4e00\u4e2a\u5de5\u7a0b\u5e08\u9700\u8981\u957f\u65f6\u95f4\u6ce8\u89c6\u7684\u5de5\u5177\uff0c\u751a\u81f3\u9ed8\u8ba4\u5c31\u662f\u6697\u9ed1\u6a21\u5f0f\u3002 \u8fd9\u4e9b\u5de5\u5177\u7684\u7ecf\u9a8c\u544a\u8bc9\u6211\u4eec\uff1a\u6697\u9ed1\u6a21\u5f0f\u5bf9\u4e8e\u9762\u5411\u5f00\u53d1\u8005\u7684\u5de5\u5177\u4e0d\u662f\u53ef\u9009\u7684\uff0c\u800c\u662f\u5fc5\u9700\u7684\u3002\n\u66f4\u5927\u89c4\u6a21\u73b0\u4ee3\u5316\u7684\u4e00\u90e8\u5206 \u6697\u9ed1\u6a21\u5f0f\u662f SPARK-55760 \u4e0b Spark Web UI \u73b0\u4ee3\u5316\u5de5\u4f5c\u7684\u4e00\u90e8\u5206\u3002\u540c\u671f\u843d\u5730\u7684\u5176\u4ed6\u6539\u8fdb\u5305\u62ec\uff1a\n\u7d27\u51d1 SQL \u6267\u884c\u8ba1\u5212\u53ef\u89c6\u5316 \u2014\u2014\u53ef\u70b9\u51fb\u7684\u8be6\u60c5\u4fa7\u8fb9\u9762\u677f Offcanvas \u9762\u677f \u2014\u2014\u7528\u4e8e Executor \u8be6\u60c5\u89c6\u56fe \u8868\u683c\u60ac\u505c\u6548\u679c \u2014\u2014\u63d0\u5347\u884c\u53ef\u8bfb\u6027 Bootstrap 5 \u5de5\u5177\u7c7b \u2014\u2014\u66ff\u6362\u65e7\u7248 CSS \u5168\u5c40\u9875\u811a \u2014\u2014\u5728\u6240\u6709\u9875\u9762\u663e\u793a\u7248\u672c\u3001\u8fd0\u884c\u65f6\u95f4\u548c\u7528\u6237 \u76ee\u6807\u5f88\u7b80\u5355\uff1aSpark Web UI \u5e94\u8be5\u4e0e\u5b83\u6240\u4ee3\u8868\u7684\u5f15\u64ce\u4e00\u6837\u73b0\u4ee3\u3002Spark \u7528\u524d\u6cbf\u7684\u5206\u5e03\u5f0f\u8ba1\u7b97\u5904\u7406 PB \u7ea7\u6570\u636e\u2014\u2014\u5b83\u7684 UI \u4e5f\u5e94\u8be5\u4f53\u73b0\u8fd9\u79cd\u54c1\u8d28\u3002\n\u6765\u8bd5\u8bd5\u5427 \u6697\u9ed1\u6a21\u5f0f\u5c06\u5728\u4e0b\u4e00\u4e2a Apache Spark \u7248\u672c\u4e2d\u53d1\u5e03\u3002\u5982\u679c\u4f60\u4ece\u6e90\u7801\u6784\u5efa\uff0c\u5b83\u5df2\u7ecf\u5728 master \u5206\u652f\u4e0a\u4e86\u3002\n\u70b9\u51fb\u5bfc\u822a\u680f\u4e2d\u7684 \u25d1 \u6309\u94ae\u5373\u53ef\u5207\u6362\u3002\u5c31\u662f\u8fd9\u4e48\u7b80\u5355\u3002\n\u672c\u529f\u80fd\u4f5c\u4e3a SPARK-55766 \u7684\u4e00\u90e8\u5206\u8d21\u732e\u3002\u6b22\u8fce\u5728 SPARK-55760 \u4e0a\u63d0\u4f9b\u53cd\u9988\u548c\u8d21\u732e\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/dark-mode-spark-ui\/","summary":"Apache Spark \u7684 Web UI \u73b0\u5df2\u652f\u6301\u6697\u9ed1\u6a21\u5f0f\u2014\u2014\u8fd9\u662f\u5f00\u53d1\u8005\u4eec\u671f\u5f85\u5df2\u4e45\u7684\u6539\u8fdb\uff0c\u5c24\u5176\u5bf9\u4e8e\u90a3\u4e9b\u957f\u65f6\u95f4\u8c03\u8bd5\u4f5c\u4e1a\u7684\u5de5\u7a0b\u5e08\u800c\u8a00\u3002","title":"Apache Spark Web UI \u8fce\u6765\u6697\u9ed1\u6a21\u5f0f"},{"content":"Spark Web UI \u7684 SQL \u6807\u7b7e\u9875\u4e00\u76f4\u6709\u4e00\u4e2a\u67e5\u8be2\u8ba1\u5212\u53ef\u89c6\u5316\u529f\u80fd\u3002\u5b83\u5c06\u7269\u7406\u8ba1\u5212\u5c55\u793a\u4e3a DAG\u2014\u2014\u7b97\u5b50\u4f5c\u4e3a\u8282\u70b9\uff0c\u6570\u636e\u6d41\u4f5c\u4e3a\u8fb9\u3002\u7406\u8bba\u4e0a\uff0c\u8fd9\u662f Spark \u4e2d\u6700\u5f3a\u5927\u7684\u8c03\u8bd5\u5de5\u5177\u4e4b\u4e00\u3002\u4f46\u5b9e\u9645\u4e0a\uff0c\u5bf9\u4e8e\u590d\u6742\u67e5\u8be2\u6765\u8bf4\uff0c\u5b83\u51e0\u4e4e\u4e0d\u53ef\u7528\u3002\n\u73b0\u5728\uff0c\u4e00\u5207\u90fd\u53d8\u4e86\u3002 SPARK-55785 \u91cd\u65b0\u8bbe\u8ba1\u4e86 SQL \u6267\u884c\u8ba1\u5212\u53ef\u89c6\u5316\u2014\u2014\u7d27\u51d1\u5e03\u5c40\u3001\u4ea4\u4e92\u5f0f\u6307\u6807\u9762\u677f\uff0c\u4ee5\u53ca\u4e00\u76ee\u4e86\u7136\u7684\u8fb9\u6807\u7b7e\u3002\n\u95ee\u9898\u6240\u5728 \u65e7\u7684\u53ef\u89c6\u5316\u5c06\u6240\u6709\u6307\u6807\u585e\u8fdb\u6bcf\u4e2a\u8282\u70b9\u6807\u7b7e\u4e2d\uff1a\nHashAggregate number of output rows: total (min, med, max) 5,000,000 (1,250,000, 1,250,000, 1,250,000) time in aggregation build: total (min, med, max) 2.3s (500ms, 575ms, 650ms) peak memory: total (min, med, max) 512.0 MiB (128.0 MiB, 128.0 MiB, 128.0 MiB) avg hash probe bucket list iters: ... 1.2 (1.1, 1.2, 1.3) \u5728\u4e00\u4e2a\u771f\u5b9e\u67e5\u8be2\u4e2d\u6709 30 \u591a\u4e2a\u7b97\u5b50\u65f6\uff0c\u4f60\u5f97\u5230\u7684\u662f\u4e00\u5835\u6587\u5b57\u5899\uff0c\u6ca1\u6709\u4efb\u4f55\u91cd\u70b9\u7a81\u51fa\u3002\u6700\u5173\u952e\u7684\u4fe1\u606f\u2014\u2014\u54ea\u4e2a\u7b97\u5b50\u6162\uff1f\u6570\u636e\u5728\u54ea\u91cc\u81a8\u80c0\uff1f\u5728\u54ea\u91cc\u88ab\u8fc7\u6ee4\uff1f\u2014\u2014\u90fd\u6df9\u6ca1\u5728\u89c6\u89c9\u566a\u97f3\u4e2d\u3002\n\u89e3\u51b3\u65b9\u6848 \u7d27\u51d1\u6a21\u5f0f\uff1a\u89c1\u6797\u4e0d\u89c1\u6728 \u65b0\u7684\u9ed8\u8ba4\u89c6\u56fe\u53ea\u663e\u793a\u7b97\u5b50\u540d\u79f0\u3002\u6ca1\u6709\u6307\u6807\u5e72\u6270\u3002\u8ba1\u5212\u7ed3\u6784\u4e00\u76ee\u4e86\u7136\uff1a\n\u6bcf\u4e2a\u8282\u70b9\u53ea\u662f\u7b97\u5b50\u540d\u79f0\u3002\u6bcf\u4e2a\u96c6\u7fa4\u663e\u793a WholeStageCodegen \u9636\u6bb5\u7f16\u53f7\u548c\u603b\u8017\u65f6\u3002\u8ba1\u5212\u53ef\u4ee5\u5728\u4e00\u4e2a\u5c4f\u5e55\u5185\u5c55\u793a\uff0c\u800c\u4e0d\u9700\u8981\u65e0\u5c3d\u6eda\u52a8\u3002\n\u8fb9\u6807\u7b7e\uff1a\u6570\u636e\u6d41\u4e00\u76ee\u4e86\u7136 \u6700\u6709\u4ef7\u503c\u7684\u6539\u8fdb\u4e0d\u5728\u8282\u70b9\u4e0a\u2014\u2014\u800c\u5728\u8fb9\u4e0a\u3002\u884c\u6570\u73b0\u5728\u51fa\u73b0\u5728\u6bcf\u6761\u8fb9\u4e0a\uff0c\u7cbe\u786e\u663e\u793a\u7b97\u5b50\u4e4b\u95f4\u6d41\u52a8\u7684\u6570\u636e\u91cf\uff1a\n\u8fd9\u4f7f\u5f97\u51e0\u7c7b\u6027\u80fd\u95ee\u9898\u7acb\u5373\u53ef\u89c1\uff1a\nJoin \u81a8\u80c0 \u2014\u2014 1M \u00d7 500K \u8f93\u5165\u4ea7\u751f 5B \u884c\uff1f\u4f60\u4f1a\u5728\u8fb9\u6807\u7b7e\u4e0a\u7acb\u5373\u770b\u5230\u3002 \u8fc7\u6ee4\u6548\u679c \u2014\u2014 \u4f60\u7684\u8fc7\u6ee4\u5668\u5c06 5B \u884c\u51cf\u5c11\u5230 400 \u884c\uff1f\u8fb9\u6807\u7b7e\u76f4\u63a5\u544a\u8bc9\u4f60\uff0c\u65e0\u9700\u70b9\u51fb\u4efb\u4f55\u4e1c\u897f\u3002 \u805a\u5408\u5f71\u54cd \u2014\u2014 \u76f4\u63a5\u770b\u5230\u4f60\u7684 GROUP BY \u5c06\u6570\u636e\u96c6\u538b\u7f29\u4e86\u591a\u5c11\u3002 \u8fd9\u4e9b\u662f\u5de5\u7a0b\u5e08\u5728\u8c03\u8bd5\u6162\u67e5\u8be2\u65f6\u6700\u5148\u95ee\u7684\u95ee\u9898\u3002\u4ee5\u524d\uff0c\u4f60\u9700\u8981\u5728\u8111\u4e2d\u8ffd\u8e2a\u6307\u6807\u8868\u6216\u4f7f\u7528 EXPLAIN \u8f93\u51fa\u3002\u73b0\u5728\u7b54\u6848\u5c31\u5728\u56fe\u4e0a\u3002\n\u70b9\u51fb\u67e5\u770b\u8be6\u60c5\uff1a\u6307\u6807\u4fa7\u8fb9\u9762\u677f \u9700\u8981\u5b8c\u6574\u7684\u4fe1\u606f\uff1f\u70b9\u51fb\u4efb\u610f\u8282\u70b9\uff0c\u4e00\u4e2a\u4fa7\u8fb9\u9762\u677f\u4f1a\u6ed1\u5165\uff0c\u5c55\u793a\u7ed3\u6784\u5316\u7684\u6307\u6807\u8868\uff1a\n\u9762\u677f\u663e\u793a\uff1a\n\u6bcf\u4e2a\u6307\u6807\u7684 Total \/ Min \/ Med \/ Max \u5206\u89e3 \u9f20\u6807\u60ac\u505c\u65f6\u7684\u7b97\u5b50\u63cf\u8ff0\uff0c\u7528\u4e8e\u533a\u5206\u76f8\u540c\u540d\u79f0\u7684\u7b97\u5b50\uff08\u5f53\u4f60\u6709\u591a\u4e2a HashAggregate \u8282\u70b9\u65f6\u975e\u5e38\u6709\u7528\uff09 \u70b9\u51fb\u96c6\u7fa4\u53ef\u67e5\u770b\u6240\u6709\u5b50\u7b97\u5b50\u6307\u6807\u7684\u5206\u7ec4\u5c55\u793a \u8fd9\u79cd\u8bbe\u8ba1\u53c2\u8003\u4e86 Databricks \u7684\u67e5\u8be2\u5206\u6790\u5668\uff0c\u5e76\u4e3a\u5f00\u6e90 Spark UI \u505a\u4e86\u9002\u914d\u3002\n\u6a21\u5f0f\u5207\u6362\uff1a\u4f60\u7684\u9009\u62e9 \u66f4\u559c\u6b22\u65e7\u7684\u8be6\u7ec6\u89c6\u56fe\uff1f\u4e00\u4e2a\u590d\u9009\u6846\u8ba9\u4f60\u5728\u7d27\u51d1\u6a21\u5f0f\u548c\u8be6\u7ec6\u6a21\u5f0f\u4e4b\u95f4\u5207\u6362\u3002\u5728\u8be6\u7ec6\u6a21\u5f0f\u4e0b\uff0c\u6307\u6807\u4ee5 10px \u5b57\u4f53\u6e32\u67d3\u5728\u56fe\u8282\u70b9\u5185\uff1a\n\u4f60\u7684\u504f\u597d\u4fdd\u5b58\u5728 localStorage \u4e2d\u2014\u2014UI \u4f1a\u8bb0\u4f4f\u4f60\u7684\u9009\u62e9\u3002\n\u8bbe\u8ba1\u51b3\u7b56 \u503c\u5f97\u8bf4\u660e\u7684\u51e0\u4e2a\u9009\u62e9\uff1a\n\u4e3a\u4ec0\u4e48\u9009\u62e9\u8fb9\u6807\u7b7e\u800c\u975e\u8282\u70b9\u6307\u6807\uff1f \u56e0\u4e3a\u6700\u91cd\u8981\u7684\u6027\u80fd\u4fe1\u53f7\u662f\u7b97\u5b50\u4e4b\u95f4\u7684\u6570\u636e\u91cf\uff0c\u800c\u4e0d\u662f\u5355\u4e2a\u7b97\u5b50\u7684\u5185\u90e8\u7ec6\u8282\u3002\u8fb9\u6807\u7b7e\u5728\u8ba1\u5212\u5c42\u9762\u56de\u7b54&quot;\u6211\u7684\u6570\u636e\u53d1\u751f\u4e86\u4ec0\u4e48\uff1f&ldquo;\u8282\u70b9\u6307\u6807\u56de\u7b54&quot;\u4e3a\u4ec0\u4e48\u8fd9\u4e2a\u7279\u5b9a\u7b97\u5b50\u6162\uff1f&quot;\u2014\u2014\u540e\u8005\u5c5e\u4e8e\u8be6\u60c5\u9762\u677f\uff0c\u800c\u4e0d\u662f\u6982\u89c8\u89c6\u56fe\u3002\n\u4e3a\u4ec0\u4e48\u4f7f\u7528\u7eaf\u6587\u672c\u6807\u7b7e\u800c\u975e HTML\uff1f \u65e7\u7684\u53ef\u89c6\u5316\u4f7f\u7528 dagre-d3 \u7684 labelType: &quot;html&quot; \u5728\u8282\u70b9\u5185\u8fdb\u884c\u5bcc\u6587\u672c\u683c\u5f0f\u5316\u3002\u8fd9\u5bfc\u81f4\u6e32\u67d3\u4e0d\u4e00\u81f4\u3001\u4f7f\u6697\u9ed1\u6a21\u5f0f\u652f\u6301\u66f4\u56f0\u96be\uff0c\u5e76\u4ea7\u751f\u8fc7\u5927\u7684\u8282\u70b9\u3002\u7eaf\u6587\u672c\u6807\u7b7e\u66f4\u8f7b\u91cf\u3001\u66f4\u53ef\u9884\u6d4b\uff0c\u8ba9 dagre-d3 \u6b63\u786e\u5730\u81ea\u52a8\u8c03\u6574\u8282\u70b9\u5927\u5c0f\u3002\n\u4e3a\u4ec0\u4e48\u7528\u4fa7\u8fb9\u9762\u677f\u800c\u975e\u5de5\u5177\u63d0\u793a\uff1f \u6307\u6807\u8868\u53ef\u80fd\u5f88\u5927\u2014\u20146 \u4e2a\u4ee5\u4e0a\u6307\u6807\uff0c\u6bcf\u4e2a\u90fd\u6709 Total\/Min\/Med\/Max \u5217\u3002\u5de5\u5177\u63d0\u793a\u4f1a\u88ab\u88c1\u526a\u6216\u91cd\u53e0\u3002\u56fa\u5b9a\u7684\u4fa7\u8fb9\u9762\u677f\u63d0\u4f9b\u7a33\u5b9a\u3001\u53ef\u6eda\u52a8\u7684\u7a7a\u95f4\uff0c\u5e76\u4e14\u4e0d\u4f1a\u906e\u6321\u56fe\u5f62\u3002\n\u672a\u6765\u8ba1\u5212 \u8fd9\u662f\u66f4\u5e7f\u6cdb\u7684 Spark Web UI \u73b0\u4ee3\u5316 \u5de5\u4f5c\u7684\u4e00\u90e8\u5206\u3002\u6b63\u5728\u8ba8\u8bba\u7684\u672a\u6765\u6539\u8fdb\u5305\u62ec\uff1a\n\u4e3a\u5355\u4e2a\u9009\u5b9a\u7b97\u5b50\u663e\u793a\u6570\u636e\u6d41\u8def\u5f84 \u901a\u8fc7\u989c\u8272\u9ad8\u4eae\u74f6\u9888\u7b97\u5b50\uff08\u65f6\u95f4\/\u884c\u6570\u70ed\u529b\u56fe\uff09 \u8fb9\u6807\u6ce8\u6570\u636e\u5927\u5c0f\uff08\u5b57\u8282\uff09\uff0c\u800c\u4e0d\u4ec5\u4ec5\u662f\u884c\u6570 \u672c\u529f\u80fd\u4f5c\u4e3a SPARK-55785\uff08PR #54565\uff09\u8d21\u732e\u3002\u611f\u8c22 @sarutak \u548c @gengliangwang \u7684\u5ba1\u9605\uff0c\u4ee5\u53ca\u63d0\u51fa\u7684\u8fb9\u6807\u7b7e\u5efa\u8bae\u3002\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/posts\/spark\/sql-plan-visualization\/","summary":"Spark SQL \u6267\u884c\u8ba1\u5212\u53ef\u89c6\u5316\u8fce\u6765\u91cd\u5927\u5347\u7ea7\u2014\u2014\u7d27\u51d1\u8282\u70b9\u6807\u7b7e\u3001\u53ef\u70b9\u51fb\u7684\u6307\u6807\u9762\u677f\uff0c\u4ee5\u53ca\u8ba9 Join \u81a8\u80c0\u4e00\u76ee\u4e86\u7136\u7684\u8fb9\u6807\u7b7e\u3002","title":"\u91cd\u65b0\u8bbe\u8ba1 Apache Spark \u7684 SQL \u6267\u884c\u8ba1\u5212\u53ef\u89c6\u5316"},{"content":"\ud83d\udc4b \u4f60\u597d\uff0c\u6211\u662f Kent Yao \u6211\u662f\u4e00\u540d\u5f00\u6e90\u7231\u597d\u8005\uff0cApache \u8f6f\u4ef6\u57fa\u91d1\u4f1a\u751f\u6001\u7cfb\u7edf\u7684\u70ed\u5fc3\u8d21\u732e\u8005\u3002\u6211\u4e13\u6ce8\u4e8e\u5927\u6570\u636e\u3001\u5206\u5e03\u5f0f SQL \u5f15\u64ce\u548c\u5f00\u6e90\u793e\u533a\u5efa\u8bbe\u3002\n\u89d2\u8272 \ud83e\uddd1\u200d\ud83e\udd1d\u200d\ud83e\uddd1 ASF \u6210\u5458 \ud83c\udf7c Apache \u5b75\u5316\u5668 PMC \u6210\u5458 \ud83e\udd8a Apache Kyuubi PMC \u4e3b\u5e2d\u3001\u526f\u603b\u88c1 \u2728 Apache Spark PMC \u6210\u5458 \ud83d\udea2 Apache Submarine Committer \ud83e\uddf1 Databricks Beacons \u8ba1\u5212\u6210\u5458 \u5f00\u6e90\u4e4b\u65c5 \u65f6\u95f4 \u7ec4\u7ec7 \u4e8b\u4ef6 2024\/10 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Cloudberry Mentor \u548c PPMC \u6210\u5458 2024\/08 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Spark PMC \u6210\u5458 2024\/07 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Polaris PMC \u6210\u5458 2024\/06 Databricks 2024 Databricks Beacons 2024\/03 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Amoro\uff08\u5b75\u5316\u4e2d\uff09 Mentor \u548c PPMC \u6210\u5458 2024\/03 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a ASF \u6210\u5458\uff08\u65b0\u95fb\uff09 2024\/01 SegmentFault \u601d\u5426 \/ \u5f00\u6e90\u793e 2023 \u4e2d\u56fd\u5f00\u6e90\u5148\u950b 33 \u4eba 2024\/01 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Gluten PMC \u6210\u5458 2023\/12 \u5f00\u653e\u539f\u5b50\u5f00\u6e90\u57fa\u91d1\u4f1a 2023 \u751f\u6001\u5f00\u6e90\u9879\u76ee\uff1aApache Kyuubi 2023\/11 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache \u5b75\u5316\u5668 PMC \u6210\u5458 2023\/10 \u7f51\u6613 \u7f51\u6613\u6280\u672f\u5927\u5956 2023\/09 \u4e2d\u56fd\u4fe1\u606f\u901a\u4fe1\u7814\u7a76\u9662 2023 OSCAR \u5c16\u5cf0\u5f00\u6e90\u4eba\u7269 2023\/05 \u4e2d\u592e\u7f51\u4fe1\u529e\u4fe1\u606f\u5316\u53d1\u5c55\u5c40 2022 \u5e74\u4e2d\u56fd\u5f00\u6e90\u521b\u65b0\u5927\u8d5b - \u4e8c\u7b49\u5956 2022\/12 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Kyuubi PMC \u4e3b\u5e2d\u3001\u526f\u603b\u88c1 2022\/12 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Kyuubi \u6210\u4e3a ASF \u9876\u7ea7\u9879\u76ee 2022\/10 \u7f51\u6613 \u7f51\u6613\u6280\u672f\u5927\u5956 2022\/09 \u4e2d\u56fd\u4fe1\u606f\u901a\u4fe1\u7814\u7a76\u9662 \/ \u4e2d\u56fd\u901a\u4fe1\u6807\u51c6\u5316\u534f\u4f1a 2022 OSCAR \u5f00\u6e90\u4ea7\u4e1a\u5927\u4f1a\u5c16\u5cf0\u5f00\u6e90\u9879\u76ee\uff1aApache Kyuubi\uff08\u5b75\u5316\u4e2d\uff09 2022\/06 ACM SIGMOD 2022 ACM SIGMOD \u7cfb\u7edf\u5956 2022\/05 \u4e2d\u56fd\u4fe1\u606f\u901a\u4fe1\u7814\u7a76\u9662 \u53ef\u4fe1\u5f00\u6e90\u793e\u533a\u5171\u540c\u4f53\uff1aApache Kyuubi\uff08\u5b75\u5316\u4e2d\uff09 2022\/02 \u4e2d\u56fd\u79d1\u5b66\u6280\u672f\u534f\u4f1a 2021&quot;\u79d1\u521b\u4e2d\u56fd&quot;\u5f00\u6e90\u521b\u65b0\u699c\uff1aApache Kyuubi\uff08\u5b75\u5316\u4e2d\uff09 2021\/10 \u7f51\u6613 \u7f51\u6613\u6280\u672f\u5927\u5956 2021\/08 Databricks Databricks Beacons \u8ba1\u5212\u6210\u5458 2021\/06 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Kyuubi\uff08\u5b75\u5316\u4e2d\uff09PPMC \u6210\u5458 2021\/06 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a \u5c06 NetEase\/Kyuubi \u6350\u8d60\u81f3 Apache \u5b75\u5316\u5668 2021\/02 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Spark Committer 2020\/12 Apache \u8f6f\u4ef6\u57fa\u91d1\u4f1a Apache Submarine Committer ","permalink":"https:\/\/yaooqinn.github.io\/zh\/about\/","summary":"\u5173\u4e8e Kent Yao","title":"\u5173\u4e8e"},{"content":"\ud83e\udd8a Apache Kyuubi \u89d2\u8272\uff1a PMC \u4e3b\u5e2d\u3001\u526f\u603b\u88c1\n\u4e00\u4e2a\u5206\u5e03\u5f0f\u591a\u79df\u6237\u7f51\u5173\uff0c\u63d0\u4f9b\u6570\u636e\u4ed3\u5e93\u548c\u6e56\u4ed3\u4e0a\u7684 Serverless SQL \u670d\u52a1\u3002\u6211\u4e8e 2021 \u5e74\u5c06\u8be5\u9879\u76ee\u4ece\u7f51\u6613\u6350\u8d60\u81f3 Apache \u5b75\u5316\u5668\uff0c2022 \u5e74\u6bd5\u4e1a\u6210\u4e3a ASF \u9876\u7ea7\u9879\u76ee\u3002\n\ud83d\udd17 GitHub \u00b7 \u5b98\u7f51\n\u2728 Apache Spark \u89d2\u8272\uff1a PMC \u6210\u5458\n\u5927\u89c4\u6a21\u6570\u636e\u5904\u7406\u7684\u7edf\u4e00\u5206\u6790\u5f15\u64ce\u3002\u6211\u4e00\u76f4\u5728\u4e3a Spark SQL \u505a\u8d21\u732e\uff0c\u4e13\u6ce8\u4e8e SQL \u517c\u5bb9\u6027\u3001\u6570\u636e\u7c7b\u578b\u548c\u914d\u7f6e\u6539\u8fdb\u3002\n\ud83d\udd17 GitHub \u00b7 \u5b98\u7f51\n\ud83c\udf1f Apache Gluten \u89d2\u8272\uff1a PMC \u6210\u5458\n\u4e00\u4e2a\u5c06 SparkSQL \u6267\u884c\u5f15\u64ce\u5378\u8f7d\u5230\u539f\u751f\u5f15\u64ce\u7684\u63d2\u4ef6\uff0c\u53ef\u5c06\u6027\u80fd\u63d0\u5347\u4e00\u500d\u3002Gluten \u5df2\u6bd5\u4e1a\u4e3a ASF \u9876\u7ea7\u9879\u76ee\u3002\n\ud83d\udd17 GitHub \u00b7 \u5b98\u7f51\n\ud83e\uddca Apache Polaris \u89d2\u8272\uff1a PMC \u6210\u5458\nApache Iceberg \u7684\u5f00\u6e90 Catalog\u3002Polaris \u5df2\u6bd5\u4e1a\u4e3a ASF \u9876\u7ea7\u9879\u76ee\u3002\n\ud83d\udd17 GitHub \u00b7 \u5b98\u7f51\n\ud83e\uded0 Apache Cloudberry \u89d2\u8272\uff1a Mentor\u3001PPMC \u6210\u5458\n\u9762\u5411\u5206\u6790\u548c AI \u7684\u4e0b\u4e00\u4ee3\u7edf\u4e00\u6570\u636e\u5e93\uff0c\u4ece Greenplum Database \u5206\u652f\u800c\u6765\u3002\n\ud83d\udd17 GitHub \u00b7 \u5b98\u7f51\n\ud83d\udc3b\u200d\u2744\ufe0f Apache Amoro \u89d2\u8272\uff1a Mentor\u3001PPMC \u6210\u5458\n\u57fa\u4e8e\u5f00\u653e\u6570\u636e\u6e56\u683c\u5f0f\u6784\u5efa\u7684\u6e56\u4ed3\u7ba1\u7406\u7cfb\u7edf\u3002\n\ud83d\udd17 GitHub \u00b7 \u5b98\u7f51\n\ud83d\udea2 Apache Submarine \u89d2\u8272\uff1a Committer\n\u4e00\u4e2a\u7edf\u4e00\u7684 AI \u5e73\u53f0\uff0c\u5141\u8bb8\u5de5\u7a0b\u5e08\u548c\u6570\u636e\u79d1\u5b66\u5bb6\u8fd0\u884c\u673a\u5668\u5b66\u4e60\u548c\u6df1\u5ea6\u5b66\u4e60\u5de5\u4f5c\u8d1f\u8f7d\u3002\n\ud83d\udd17 GitHub \u00b7 \u5b98\u7f51\n","permalink":"https:\/\/yaooqinn.github.io\/zh\/projects\/","summary":"\u5f00\u6e90\u9879\u76ee","title":"\u9879\u76ee"}]