-
Notifications
You must be signed in to change notification settings - Fork 980
Description
Code of Conduct
- I agree to follow this project's Code of Conduct
Search before asking
- I have searched in the issues and found no similar issues.
Describe the bug
I am trying to replace the HiveServer2 with Kyuubi for my production environment with more than 3500 NodeManagers running.
During the busiest periods, the HiveServer2 has to handle roughly 400 new connections.
$ ls hadoop-cmf-hive_on_tez-HIVESERVER2-<hostname>.log.out* | xargs -I{} bash -c "grep 'Session opened, SessionHandle' {} | cut -b-16 | uniq -c" | sort -n | tail -n10
169 2022-10-26 16:38
174 2022-10-26 13:52
215 2022-10-26 13:49
218 2022-10-26 13:48
227 2022-10-26 13:47
227 2022-10-26 13:51
228 2022-10-26 13:50
283 2022-10-26 16:37
301 2022-10-26 13:46
379 2022-10-26 13:45
While I am running scalability test for KyuubiServer, I found that the CPU loadaverage is quite high even with much lower concurrency.
E.g. this the loadaverage with 20 connections / min. By the way, the HiveServer2 with exactly the same hardware can handle 300+ connections/min with no significant CPU usage. According to our calculation, we need 60+ Kyuubi servers to handle the workload which can be easily handled by 4 HiveServers
top - 16:41:41 up 23 days, 59 min, 3 users, load average: 27.49, 26.43, 22.72
Tasks: 178 total, 1 running, 177 sleeping, 0 stopped, 0 zombie %Cpu(s): 75.1 us, 4.7 sy, 0.0 ni, 19.9 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
KiB Mem : 32779588 total, 18658312 free, 11246896 used, 2874380 buff/cache KiB Swap: 4194300 total, 4088680 free, 105620 used. 21308144 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
46732 kyuubi 20 0 12.3g 624740 53860 S 59.5 1.9 0:13.21 java 46762 kyuubi 20 0 12.1g 472876 53864 S 54.8 1.4 0:13.00 java
46886 kyuubi 20 0 12.0g 545456 53872 S 48.8 1.7 0:13.56 java 47177 kyuubi 20 0 12.3g 492448 53872 S 45.2 1.5 0:12.88 java
47077 kyuubi 20 0 12.2g 498132 53868 S 40.5 1.5 0:12.49 java 47637 kyuubi 20 0 11.4g 541136 53860 S 38.9 1.7 0:10.22 java
46979 kyuubi 20 0 12.3g 621928 53864 S 38.5 1.9 0:11.84 java 47513 kyuubi 20 0 12.1g 467256 53864 S 37.9 1.4 0:11.56 java
46918 kyuubi 20 0 12.0g 480620 53864 S 37.5 1.5 0:12.18 java
47766 kyuubi 20 0 12.0g 495616 53844 S 31.6 1.5 0:10.80 java
47606 kyuubi 20 0 12.0g 441112 53860 S 30.9 1.3 0:10.10 java 46843 kyuubi 20 0 12.2g 491684 53860 S 21.3 1.5 0:11.38 java
47244 kyuubi 20 0 11.9g 410096 53864 S 21.3 1.3 0:10.75 java 27799 root 20 0 115860 3904 1104 S 18.9 0.0 403:14.08 bash
47359 kyuubi 20 0 12.1g 414828 53864 S 16.6 1.3 0:10.50 java 47317 kyuubi 20 0 12.1g 446280 53860 S 16.3 1.4 0:10.62 java
47432 kyuubi 20 0 12.0g 432704 53856 S 15.6 1.3 0:10.86 java 47131 kyuubi 20 0 11.8g 510692 53860 S 14.6 1.6 0:10.84 java
47280 kyuubi 20 0 11.8g 445524 53856 S 13.3 1.4 0:10.23 java
One of the largest reason is that the Kyuubi will need to use ProcessBuilder to trigger spark-submit command, and this procedure is very time and CPU consuming. Just my two cents, we may be able to use YARN REST-API, Spark Connect, or other way to reduce the CPU usage.
I am wondering If there is anything that we can change to improve this situation?
Or, do you have any idea on how to configure the KyuubiServer to support such amount of traffic?
Affects Version(s)
master/1.7.0/1.7.1
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
No response
Kyuubi Server Configurations
No response
Kyuubi Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?
- Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- No. I cannot submit a PR at this time.