Skip to content

[Bug] High CPU Usage When Concurrency is High #4825

@risyomei

Description

@risyomei

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

I am trying to replace the HiveServer2 with Kyuubi for my production environment with more than 3500 NodeManagers running.

During the busiest periods, the HiveServer2 has to handle roughly 400 new connections.

$ ls hadoop-cmf-hive_on_tez-HIVESERVER2-<hostname>.log.out* | xargs -I{}  bash -c "grep 'Session opened, SessionHandle' {} |  cut -b-16 | uniq -c" |  sort -n | tail -n10
    169 2022-10-26 16:38
    174 2022-10-26 13:52
    215 2022-10-26 13:49
    218 2022-10-26 13:48
    227 2022-10-26 13:47
    227 2022-10-26 13:51
    228 2022-10-26 13:50
    283 2022-10-26 16:37
    301 2022-10-26 13:46
    379 2022-10-26 13:45

While I am running scalability test for KyuubiServer, I found that the CPU loadaverage is quite high even with much lower concurrency.

E.g. this the loadaverage with 20 connections / min. By the way, the HiveServer2 with exactly the same hardware can handle 300+ connections/min with no significant CPU usage. According to our calculation, we need 60+ Kyuubi servers to handle the workload which can be easily handled by 4 HiveServers

top - 16:41:41 up 23 days, 59 min,  3 users,  load average: 27.49, 26.43, 22.72
Tasks: 178 total,   1 running, 177 sleeping,   0 stopped,   0 zombie                                                          %Cpu(s): 75.1 us,  4.7 sy,  0.0 ni, 19.9 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem : 32779588 total, 18658312 free, 11246896 used,  2874380 buff/cache                                                   KiB Swap:  4194300 total,  4088680 free,   105620 used. 21308144 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 46732 kyuubi    20   0   12.3g 624740  53860 S  59.5  1.9   0:13.21 java                                                      46762 kyuubi    20   0   12.1g 472876  53864 S  54.8  1.4   0:13.00 java
 46886 kyuubi    20   0   12.0g 545456  53872 S  48.8  1.7   0:13.56 java                                                      47177 kyuubi    20   0   12.3g 492448  53872 S  45.2  1.5   0:12.88 java
 47077 kyuubi    20   0   12.2g 498132  53868 S  40.5  1.5   0:12.49 java                                                      47637 kyuubi    20   0   11.4g 541136  53860 S  38.9  1.7   0:10.22 java
 46979 kyuubi    20   0   12.3g 621928  53864 S  38.5  1.9   0:11.84 java                                                      47513 kyuubi    20   0   12.1g 467256  53864 S  37.9  1.4   0:11.56 java
 46918 kyuubi    20   0   12.0g 480620  53864 S  37.5  1.5   0:12.18 java
 47766 kyuubi    20   0   12.0g 495616  53844 S  31.6  1.5   0:10.80 java
 47606 kyuubi    20   0   12.0g 441112  53860 S  30.9  1.3   0:10.10 java                                                      46843 kyuubi    20   0   12.2g 491684  53860 S  21.3  1.5   0:11.38 java
 47244 kyuubi    20   0   11.9g 410096  53864 S  21.3  1.3   0:10.75 java                                                      27799 root      20   0  115860   3904   1104 S  18.9  0.0 403:14.08 bash
 47359 kyuubi    20   0   12.1g 414828  53864 S  16.6  1.3   0:10.50 java                                                      47317 kyuubi    20   0   12.1g 446280  53860 S  16.3  1.4   0:10.62 java
 47432 kyuubi    20   0   12.0g 432704  53856 S  15.6  1.3   0:10.86 java                                                      47131 kyuubi    20   0   11.8g 510692  53860 S  14.6  1.6   0:10.84 java
 47280 kyuubi    20   0   11.8g 445524  53856 S  13.3  1.4   0:10.23 java 

One of the largest reason is that the Kyuubi will need to use ProcessBuilder to trigger spark-submit command, and this procedure is very time and CPU consuming. Just my two cents, we may be able to use YARN REST-API, Spark Connect, or other way to reduce the CPU usage.

I am wondering If there is anything that we can change to improve this situation?
Or, do you have any idea on how to configure the KyuubiServer to support such amount of traffic?

Affects Version(s)

master/1.7.0/1.7.1

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions