Reducing memory allocation from client-side stats#10705
Reducing memory allocation from client-side stats#10705
Conversation
Introduced DDCache-s around UTF8BytesString construction UTF8BytesString are advantageous for serialization, but that only applies to the key instance that is actually serialized Most key instances here are being created to do a look-up into the map. so the UTF8BytesString creation was extra work. This change is intended as a quick fix. I think there's still a lot of room for improvement by restructuring the code further, but that is left for a future PR. As is this changer reduces the impact on application throughput by 5-20% depending on the heap size.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 66 metrics, 5 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~419a462a59, baseline=1.61.0-SNAPSHOT~c04d61b318
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.058 s) : 0, 1058393
Total [baseline] (11.001 s) : 0, 11000773
Agent [candidate] (1.058 s) : 0, 1058485
Total [candidate] (11.007 s) : 0, 11007454
section appsec
Agent [baseline] (1.246 s) : 0, 1246132
Total [baseline] (11.172 s) : 0, 11171522
Agent [candidate] (1.244 s) : 0, 1244422
Total [candidate] (11.154 s) : 0, 11154275
section iast
Agent [baseline] (1.225 s) : 0, 1225146
Total [baseline] (11.29 s) : 0, 11289807
Agent [candidate] (1.233 s) : 0, 1232797
Total [candidate] (11.343 s) : 0, 11343473
section profiling
Agent [baseline] (1.182 s) : 0, 1182434
Total [baseline] (10.97 s) : 0, 10969563
Agent [candidate] (1.181 s) : 0, 1181073
Total [candidate] (11.042 s) : 0, 11041774
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~419a462a59, baseline=1.61.0-SNAPSHOT~c04d61b318
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.192 ms) : 0, 1192
crashtracking [candidate] (1.182 ms) : 0, 1182
BytebuddyAgent [baseline] (628.345 ms) : 0, 628345
BytebuddyAgent [candidate] (627.002 ms) : 0, 627002
AgentMeter [baseline] (29.198 ms) : 0, 29198
AgentMeter [candidate] (29.102 ms) : 0, 29102
GlobalTracer [baseline] (257.145 ms) : 0, 257145
GlobalTracer [candidate] (256.654 ms) : 0, 256654
AppSec [baseline] (31.577 ms) : 0, 31577
AppSec [candidate] (31.402 ms) : 0, 31402
Debugger [baseline] (59.777 ms) : 0, 59777
Debugger [candidate] (59.445 ms) : 0, 59445
Remote Config [baseline] (596.963 µs) : 0, 597
Remote Config [candidate] (581.85 µs) : 0, 582
Telemetry [baseline] (8.713 ms) : 0, 8713
Telemetry [candidate] (8.596 ms) : 0, 8596
Flare Poller [baseline] (5.801 ms) : 0, 5801
Flare Poller [candidate] (8.65 ms) : 0, 8650
section appsec
crashtracking [baseline] (1.192 ms) : 0, 1192
crashtracking [candidate] (1.185 ms) : 0, 1185
BytebuddyAgent [baseline] (658.215 ms) : 0, 658215
BytebuddyAgent [candidate] (657.656 ms) : 0, 657656
AgentMeter [baseline] (11.994 ms) : 0, 11994
AgentMeter [candidate] (12.025 ms) : 0, 12025
GlobalTracer [baseline] (257.906 ms) : 0, 257906
GlobalTracer [candidate] (257.686 ms) : 0, 257686
AppSec [baseline] (177.519 ms) : 0, 177519
AppSec [candidate] (177.137 ms) : 0, 177137
Debugger [baseline] (65.154 ms) : 0, 65154
Debugger [candidate] (65.51 ms) : 0, 65510
Remote Config [baseline] (580.65 µs) : 0, 581
Remote Config [candidate] (570.64 µs) : 0, 571
Telemetry [baseline] (9.786 ms) : 0, 9786
Telemetry [candidate] (8.943 ms) : 0, 8943
Flare Poller [baseline] (3.6 ms) : 0, 3600
Flare Poller [candidate] (3.585 ms) : 0, 3585
IAST [baseline] (23.938 ms) : 0, 23938
IAST [candidate] (23.94 ms) : 0, 23940
section iast
crashtracking [baseline] (1.184 ms) : 0, 1184
crashtracking [candidate] (1.196 ms) : 0, 1196
BytebuddyAgent [baseline] (794.809 ms) : 0, 794809
BytebuddyAgent [candidate] (800.519 ms) : 0, 800519
AgentMeter [baseline] (11.315 ms) : 0, 11315
AgentMeter [candidate] (11.539 ms) : 0, 11539
GlobalTracer [baseline] (246.82 ms) : 0, 246820
GlobalTracer [candidate] (248.102 ms) : 0, 248102
AppSec [baseline] (26.449 ms) : 0, 26449
AppSec [candidate] (26.515 ms) : 0, 26515
Debugger [baseline] (63.217 ms) : 0, 63217
Debugger [candidate] (63.348 ms) : 0, 63348
Remote Config [baseline] (541.977 µs) : 0, 542
Remote Config [candidate] (538.707 µs) : 0, 539
Telemetry [baseline] (14.681 ms) : 0, 14681
Telemetry [candidate] (14.898 ms) : 0, 14898
Flare Poller [baseline] (4.915 ms) : 0, 4915
Flare Poller [candidate] (4.893 ms) : 0, 4893
IAST [baseline] (25.164 ms) : 0, 25164
IAST [candidate] (25.278 ms) : 0, 25278
section profiling
ProfilingAgent [baseline] (93.85 ms) : 0, 93850
ProfilingAgent [candidate] (93.576 ms) : 0, 93576
crashtracking [baseline] (1.168 ms) : 0, 1168
crashtracking [candidate] (1.169 ms) : 0, 1169
BytebuddyAgent [baseline] (683.21 ms) : 0, 683210
BytebuddyAgent [candidate] (682.336 ms) : 0, 682336
AgentMeter [baseline] (8.604 ms) : 0, 8604
AgentMeter [candidate] (8.586 ms) : 0, 8586
GlobalTracer [baseline] (215.309 ms) : 0, 215309
GlobalTracer [candidate] (215.376 ms) : 0, 215376
AppSec [baseline] (32.032 ms) : 0, 32032
AppSec [candidate] (31.918 ms) : 0, 31918
Debugger [baseline] (62.887 ms) : 0, 62887
Debugger [candidate] (62.873 ms) : 0, 62873
Remote Config [baseline] (586.981 µs) : 0, 587
Remote Config [candidate] (579.334 µs) : 0, 579
Telemetry [baseline] (9.746 ms) : 0, 9746
Telemetry [candidate] (10.411 ms) : 0, 10411
Flare Poller [baseline] (4.306 ms) : 0, 4306
Flare Poller [candidate] (3.554 ms) : 0, 3554
Profiling [baseline] (94.411 ms) : 0, 94411
Profiling [candidate] (94.136 ms) : 0, 94136
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~419a462a59, baseline=1.61.0-SNAPSHOT~c04d61b318
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.066 s) : 0, 1066102
Total [baseline] (8.895 s) : 0, 8894684
Agent [candidate] (1.063 s) : 0, 1063281
Total [candidate] (8.803 s) : 0, 8802710
section iast
Agent [baseline] (1.225 s) : 0, 1224996
Total [baseline] (9.552 s) : 0, 9552138
Agent [candidate] (1.224 s) : 0, 1224428
Total [candidate] (9.571 s) : 0, 9571258
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~419a462a59, baseline=1.61.0-SNAPSHOT~c04d61b318
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.213 ms) : 0, 1213
BytebuddyAgent [baseline] (633.259 ms) : 0, 633259
BytebuddyAgent [candidate] (631.018 ms) : 0, 631018
AgentMeter [baseline] (29.278 ms) : 0, 29278
AgentMeter [candidate] (29.262 ms) : 0, 29262
GlobalTracer [baseline] (258.51 ms) : 0, 258510
GlobalTracer [candidate] (257.756 ms) : 0, 257756
AppSec [baseline] (31.86 ms) : 0, 31860
AppSec [candidate] (31.761 ms) : 0, 31761
Debugger [baseline] (59.268 ms) : 0, 59268
Debugger [candidate] (59.076 ms) : 0, 59076
Remote Config [baseline] (590.816 µs) : 0, 591
Remote Config [candidate] (598.962 µs) : 0, 599
Telemetry [baseline] (8.735 ms) : 0, 8735
Telemetry [candidate] (8.684 ms) : 0, 8684
Flare Poller [baseline] (7.26 ms) : 0, 7260
Flare Poller [candidate] (7.838 ms) : 0, 7838
section iast
crashtracking [baseline] (1.191 ms) : 0, 1191
crashtracking [candidate] (1.2 ms) : 0, 1200
BytebuddyAgent [baseline] (795.605 ms) : 0, 795605
BytebuddyAgent [candidate] (794.212 ms) : 0, 794212
AgentMeter [baseline] (11.278 ms) : 0, 11278
AgentMeter [candidate] (11.321 ms) : 0, 11321
GlobalTracer [baseline] (246.708 ms) : 0, 246708
GlobalTracer [candidate] (246.802 ms) : 0, 246802
IAST [baseline] (25.175 ms) : 0, 25175
IAST [candidate] (25.132 ms) : 0, 25132
AppSec [baseline] (26.42 ms) : 0, 26420
AppSec [candidate] (26.391 ms) : 0, 26391
Debugger [baseline] (62.48 ms) : 0, 62480
Debugger [candidate] (63.016 ms) : 0, 63016
Remote Config [baseline] (520.959 µs) : 0, 521
Remote Config [candidate] (516.654 µs) : 0, 517
Telemetry [baseline] (14.717 ms) : 0, 14717
Telemetry [candidate] (14.935 ms) : 0, 14935
Flare Poller [baseline] (4.896 ms) : 0, 4896
Flare Poller [candidate] (4.919 ms) : 0, 4919
LoadParameters
See matching parameters
SummaryFound 3 performance improvements and 2 performance regressions! Performance is the same for 14 metrics, 17 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~419a462a59, baseline=1.61.0-SNAPSHOT~c04d61b318
dateFormat X
axisFormat %s
section baseline
no_agent (1.232 ms) : 1220, 1244
. : milestone, 1232,
iast (3.204 ms) : 3160, 3248
. : milestone, 3204,
iast_FULL (5.878 ms) : 5819, 5937
. : milestone, 5878,
iast_GLOBAL (3.557 ms) : 3504, 3610
. : milestone, 3557,
profiling (2.051 ms) : 2031, 2072
. : milestone, 2051,
tracing (1.801 ms) : 1786, 1816
. : milestone, 1801,
section candidate
no_agent (1.213 ms) : 1200, 1225
. : milestone, 1213,
iast (3.108 ms) : 3068, 3148
. : milestone, 3108,
iast_FULL (5.625 ms) : 5569, 5681
. : milestone, 5625,
iast_GLOBAL (3.392 ms) : 3337, 3447
. : milestone, 3392,
profiling (2.146 ms) : 2125, 2166
. : milestone, 2146,
tracing (1.766 ms) : 1750, 1781
. : milestone, 1766,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~419a462a59, baseline=1.61.0-SNAPSHOT~c04d61b318
dateFormat X
axisFormat %s
section baseline
no_agent (18.06 ms) : 17875, 18245
. : milestone, 18060,
appsec (18.688 ms) : 18499, 18878
. : milestone, 18688,
code_origins (17.738 ms) : 17563, 17914
. : milestone, 17738,
iast (17.826 ms) : 17645, 18006
. : milestone, 17826,
profiling (18.809 ms) : 18622, 18997
. : milestone, 18809,
tracing (17.581 ms) : 17411, 17752
. : milestone, 17581,
section candidate
no_agent (18.077 ms) : 17893, 18261
. : milestone, 18077,
appsec (18.703 ms) : 18515, 18891
. : milestone, 18703,
code_origins (17.827 ms) : 17648, 18007
. : milestone, 17827,
iast (17.814 ms) : 17637, 17992
. : milestone, 17814,
profiling (18.614 ms) : 18430, 18799
. : milestone, 18614,
tracing (18.813 ms) : 18619, 19006
. : milestone, 18813,
DacapoParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 0 unstable metrics.
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~419a462a59, baseline=1.61.0-SNAPSHOT~c04d61b318
dateFormat X
axisFormat %s
section baseline
no_agent (1.47 ms) : 1458, 1482
. : milestone, 1470,
appsec (3.774 ms) : 3553, 3995
. : milestone, 3774,
iast (2.243 ms) : 2174, 2312
. : milestone, 2243,
iast_GLOBAL (2.289 ms) : 2219, 2358
. : milestone, 2289,
profiling (2.102 ms) : 2046, 2158
. : milestone, 2102,
tracing (2.069 ms) : 2016, 2123
. : milestone, 2069,
section candidate
no_agent (1.474 ms) : 1463, 1486
. : milestone, 1474,
appsec (2.51 ms) : 2455, 2565
. : milestone, 2510,
iast (2.256 ms) : 2186, 2325
. : milestone, 2256,
iast_GLOBAL (2.295 ms) : 2226, 2365
. : milestone, 2295,
profiling (2.071 ms) : 2016, 2125
. : milestone, 2071,
tracing (2.071 ms) : 2018, 2125
. : milestone, 2071,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~419a462a59, baseline=1.61.0-SNAPSHOT~c04d61b318
dateFormat X
axisFormat %s
section baseline
no_agent (14.952 s) : 14952000, 14952000
. : milestone, 14952000,
appsec (15.074 s) : 15074000, 15074000
. : milestone, 15074000,
iast (17.92 s) : 17920000, 17920000
. : milestone, 17920000,
iast_GLOBAL (17.888 s) : 17888000, 17888000
. : milestone, 17888000,
profiling (15.456 s) : 15456000, 15456000
. : milestone, 15456000,
tracing (15.333 s) : 15333000, 15333000
. : milestone, 15333000,
section candidate
no_agent (14.713 s) : 14713000, 14713000
. : milestone, 14713000,
appsec (14.895 s) : 14895000, 14895000
. : milestone, 14895000,
iast (18.327 s) : 18327000, 18327000
. : milestone, 18327000,
iast_GLOBAL (17.8 s) : 17800000, 17800000
. : milestone, 17800000,
profiling (15.202 s) : 15202000, 15202000
. : milestone, 15202000,
tracing (15.03 s) : 15030000, 15030000
. : milestone, 15030000,
|
| public final class MetricKey { | ||
| static final DDCache<String, UTF8BytesString> RESOURCE_CACHE = DDCaches.newFixedSizeCache(32); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_SOURCE_CACHE = DDCaches.newFixedSizeCache(4); |
There was a problem hiding this comment.
This should be already filled with UTF8BytesString. It can take values of the instrumentation name (that's should also be UTF8BytesString) so I suggest removing caching for this specific one. 4, is also too small
There was a problem hiding this comment.
The lookup logic accounts for receiving a UTF8BytesString and in that case bypasses the cache entirely.
Also, I erred on the side of keeping the caches small. I just want to eliminate most of the allocation - not all of the allocation.
That said, I may have misunderstood what "serviceSource" is so maybe that one should be larger. If it is instrumentations, then maybe we should increase it to 16. (The thinking being that only a few instrumentations are typically active in a given system.)
There was a problem hiding this comment.
If it can only be a UTF8BytesString, we could also tighten the type in the constructor. Although in that case, the JIT would dead code eliminate using the cache anyway (except for the initial allocation).
| static final DDCache<String, UTF8BytesString> TYPE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> KIND_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> HTTP_METHOD_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> HTTP_ENDPOINT_CACHE = DDCaches.newFixedSizeCache(32); |
There was a problem hiding this comment.
Would it be possible to widen those a bit? Also, for the http endpoint, I'm wondering if we should just do that caching earlier to make other serialisation benefitting of this (i.e. in EndpointResolver)
There was a problem hiding this comment.
I erred on the side of making the caches small.
The thinking is any object reuse is an improvement over what we were doing, but consuming too much memory could be harmful.
| } | ||
|
|
||
| static UTF8BytesString utf8(DDCache<String, UTF8BytesString> cache, CharSequence charSeq) { | ||
| if ( charSeq == null ) { |
There was a problem hiding this comment.
There are things that are supposing to check nullity of this. Now if we replace with empty is not more the same semantic and this will break some code. I suggest to let the caller return the default value so it won't break the existing logic
There was a problem hiding this comment.
Nevermind - I see that I did accidentally change the semantics for method & endpoint. I'll fix that.
I kept the semantics that already existed. If you look at the prior code, null was replaced with EMPTY, so I did the same.
There was a problem hiding this comment.
well there are things that are OK to return empty but others needs to be literally null (i.e. service source, httpMethod, httpEndpoint)
There was a problem hiding this comment.
Yes, I realized after my initial reply. I'll double check all of them.
There was a problem hiding this comment.
Since the split of EMPTY vs null was about 50 / 50, I decided to just restore the null checks in the constructor.
- fix null handling for http method & endpoint - increase service source cache size
Since it about a 50/50 split between cases that use EMPTY vs null, when a null is passed I decided to just put the null handling back into the constructor. That makes the choice more explicit, and makes the PR easier to review / compare to the prior logic
| public final class MetricKey { | ||
| static final DDCache<String, UTF8BytesString> RESOURCE_CACHE = DDCaches.newFixedSizeCache(32); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_CACHE = DDCaches.newFixedSizeCache(8); | ||
| static final DDCache<String, UTF8BytesString> SERVICE_SOURCE_CACHE = |
There was a problem hiding this comment.
That field does not need to be cached, for two main reasons:
- It is already either a
UTF8BytesStringornull. In fact, it comes from thecomponent()method, which in helpers typically returns aUTF8BytesString. - Its cardinality matches that of the instrumentations, the size of the cache is too small.
The constructor calling UTF8BytesString.create() may look confusing at first glance, but in this case it does not allocate a new object — it simply returns the existing instance. It was kept for consistency with the usual pattern.
Unless there are strong counterarguments, I would suggest removing the cache for this field.
There was a problem hiding this comment.
I think the key part here is "typically returns a UTF8BytesString".
The static typing doesn't currently guarantee it.
And the caching code accounts for the case of receiving a UTF8BytesString by bypassing the cache. In the same manner as UTF8BytesString.create did previously.
When a UTF8BytesString is passed, the cache incurs no overhead other than the static overhead of the cache array itself. But in the case where something other than UTF8BytesString is passed, the cache can significantly reduce memory consumption.
I think that's important because we need to curtail the worst case outcomes.
As for the size, the cardinality doesn't need to match total instrumentations. That would be too large. The size just needs to accommodate the active span producing instrumentations. As I said, I erred on the small size, since I want to avoid the worst case of consuming a lot of static memory.
And assuming a UTF8BytesString is introducing a subtle form of coupling that could easily be compromised by someone later on.
amarziali
left a comment
There was a problem hiding this comment.
Thanks for that improvement. wrt service name source, as per our chat, we might simplify trying changing the signature of that constructor if possible and shortcut the cache if we ensure that every value is already an UTF8BytesString itself.
What Does This Do
Introduces DDCache-s around UTF8BytesString construction
Motivation
UTF8BytesString are advantageous for serialization, but that only applies to the MetricKey instance that is actually serialized
Most MetricKey instances are only created to do a look-up into the map. so the UTF8BytesString creation was just extra work
This change reduces allocation by 6% and GC time by 7% in span creation stress test.
This change reduces impact on application throughput by 5-20% depending on heap size.
Additional Notes
This change is intended as a quick fix. I think there's still a lot of room for improvement by restructuring the code further, but that is left for a future PR.
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.