Skip to content

Conversation

@richardstartin
Copy link
Member

@richardstartin richardstartin commented Jan 13, 2022

This is a fix for some low hanging fruit found by a user with a heavy RealtimeToOfflineTask (see #8014) - a good amount of time is spent sanitising strings.

image

This change makes this method more efficient (less than half the allocation when truncation is required, and no allocation when it isn't, ~2x faster) by making better use of the String API:

@State(Scope.Benchmark)
public class BenchmarkSanitizeStringValue {

  @Param({"10", "100", "1000"})
  int _length;

  @Param("512")
  int _maxLength;

  String _string;

  @Setup(Level.Trial)
  public void setup() {
    byte[] bytes = new byte[_length];
    for (int i = 0; i < _length; i++) {
      bytes[i] = (byte) ('a' + ThreadLocalRandom.current().nextInt(26));
    }
    _string = new String(bytes, StandardCharsets.UTF_8);
  }

  @Benchmark
  public String sanitize() {
    return StringUtil.sanitizeStringValue(_string, _maxLength);
  }

  @Benchmark
  public String sanitizeNew() {
    return StringUtil.sanitizeStringValueNew(_string, _maxLength);
  }
}
Benchmark                                                              (_length)  (_maxLength)  Mode  Cnt     Score     Error   Units
BenchmarkSanitizeStringValue.sanitize                                         10           512  avgt    5    10.444 ±   0.213   ns/op
BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                     10           512  avgt    5    40.000 ±   0.001    B/op
BenchmarkSanitizeStringValue.sanitize                                        100           512  avgt    5    29.909 ±   1.637   ns/op
BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                    100           512  avgt    5   216.000 ±   0.001    B/op
BenchmarkSanitizeStringValue.sanitize                                       1000           512  avgt    5   203.566 ±  13.111   ns/op
BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                   1000           512  avgt    5  2568.000 ±   0.001    B/op
BenchmarkSanitizeStringValue.sanitizeNew                                      10           512  avgt    5     6.107 ±   0.369   ns/op
BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                  10           512  avgt    5    ≈ 10⁻⁶              B/op
BenchmarkSanitizeStringValue.sanitizeNew                                     100           512  avgt    5    14.349 ±   0.261   ns/op
BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                 100           512  avgt    5    ≈ 10⁻⁵              B/op
BenchmarkSanitizeStringValue.sanitizeNew                                    1000           512  avgt    5   102.201 ±   0.703   ns/op
BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                1000           512  avgt    5   552.000 ±   0.001    B/op

@richardstartin richardstartin force-pushed the string-sanitize-perf-fix branch from dc6a607 to 824d7aa Compare January 13, 2022 12:44
@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2022

Codecov Report

Merging #8013 (824d7aa) into master (bf4bb9a) will increase coverage by 0.98%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #8013      +/-   ##
============================================
+ Coverage     70.34%   71.32%   +0.98%     
  Complexity     4214     4214              
============================================
  Files          1596     1596              
  Lines         82778    82773       -5     
  Branches      12348    12347       -1     
============================================
+ Hits          58227    59036     +809     
+ Misses        20564    19742     -822     
- Partials       3987     3995       +8     
Flag Coverage Δ
integration1 29.05% <25.00%> (+0.05%) ⬆️
integration2 27.59% <75.00%> (?)
unittests1 68.10% <100.00%> (-0.01%) ⬇️
unittests2 14.31% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...java/org/apache/pinot/common/utils/StringUtil.java 100.00% <100.00%> (ø)
...he/pinot/segment/local/segment/store/IndexKey.java 70.00% <0.00%> (-10.00%) ⬇️
...ntroller/helix/core/minion/TaskMetricsEmitter.java 86.36% <0.00%> (-4.55%) ⬇️
.../org/apache/pinot/core/startree/StarTreeUtils.java 69.89% <0.00%> (-2.16%) ⬇️
.../aggregation/function/ModeAggregationFunction.java 88.37% <0.00%> (-0.28%) ⬇️
...va/org/apache/pinot/controller/ControllerConf.java 57.31% <0.00%> (ø)
...lix/core/realtime/PinotRealtimeSegmentManager.java 81.67% <0.00%> (ø)
...nction/DistinctCountBitmapAggregationFunction.java 47.15% <0.00%> (ø)
...ntroller/helix/core/PinotHelixResourceManager.java 64.73% <0.00%> (+0.07%) ⬆️
...server/starter/helix/HelixInstanceDataManager.java 81.50% <0.00%> (+0.57%) ⬆️
... and 87 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf4bb9a...824d7aa. Read the comment docs.

@npawar npawar merged commit 61608a4 into apache:master Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants