Skip to content

Proposed changes to Append CT PR.#1

Merged
ArthurSens merged 3 commits intoArthurSens:scrape-synthetic-zerofrom
bwplotka:arct2
Dec 8, 2023
Merged

Proposed changes to Append CT PR.#1
ArthurSens merged 3 commits intoArthurSens:scrape-synthetic-zerofrom
bwplotka:arct2

Conversation

@bwplotka
Copy link
Copy Markdown

@bwplotka bwplotka commented Dec 7, 2023

Changes:

  • Changed textparse Parser interface for consistency and robustness.
  • Changed CT interface to be more explicit and handle validation.
  • Simplified test, change scrapeManager to allow testability.
  • Added TODOs - we need appender tests and check how hard is to get agent to use it too (and if we need that)

Changes:

* Changed textparse Parser interface for consistency and robustness.
* Changed CT interface to be more explicit and handle validation.
* Simplified test, change scrapeManager to allow testability.
* Added TODOs.


Signed-off-by: bwplotka <[email protected]>
Signed-off-by: bwplotka <[email protected]>
Copy link
Copy Markdown
Owner

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this and challenging many decisions I've made!

I liked how you changed many places, e.g. feature-flag and method names, to make a point that we're ingesting zeros.

One thing that I couldn't understand was the metadata test, what are we trying to accomplish over there? 🤔

Comment on lines +248 to 252
// CreatedTimestamp returns nil as it's not implemented yet.
// TODO(bwplotka): https://github.com/prometheus/prometheus/issues/12980
func (p *PromParser) CreatedTimestamp() *int64 {
return nil
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the impression we didn't want to implement CT for Prometheus text. The issue mentioned here also doesn't mention prometheus text parsing 🤔

func (p *ProtobufParser) CreatedTimestamp(ct *types.Timestamp) bool {
var foundCT *types.Timestamp
// CreatedTimestamp returns CT or nil if CT is not present or
// invalid (as timestamp e.g. negative value) on counters, summaries or histograms.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// invalid (as timestamp e.g. negative value) on counters, summaries or histograms.
// invalid (e.g. a timestamp with negative value) on counters, summaries or histograms.

Not sure if that's what you meant here, but it reads weird 🤔

Comment on lines +377 to +379
if err != nil {
// Errors means ct == nil or invalid timestamp, which we silently ignore.
return nil
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, ideally we would want to at least log client-side errors. Would it make sense to return an error as well?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also change the proto definition to return int64 like callum suggested and not worry with conversion errors

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return where? So by making int64 we don't magically remove error surface. We simply not check things. So why not doing the same here and simply not check things? (:

Also how do you propose changing field to int64 without braking changes? Do you propose adding another field (as we should in proto)? What would you do with other timestamp fields that have google Timestamp type? (:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that adding a new message is the right way of doing backward/forward compatibility, but since the proto change was released only in Prometheus 2.48 and seeing that proto changes weren't announced in the changelog, I don't expect that replacing the field will break anyone (even though this is a breaking change)

tsdb/agent/db.go Outdated
func (a *appender) AppendCreatedTimestamp(ref storage.SeriesRef, l labels.Labels, t int64) (storage.SeriesRef, error) {
func (a *appender) AppendCTZeroSample(ref storage.SeriesRef, l labels.Labels, t int64, ct int64) (storage.SeriesRef, error) {
// TODO(bwplotka): Implement
panic("to implement")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic? 😱

Don't Prometheus server and agent share the same scrape codebase? Won't this cause agents to panic on every scrape? 🤔

Copy link
Copy Markdown
Author

@bwplotka bwplotka Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will, but only for this PR. I planned to do extra work in this PR, but now I think we should postpone, so fixed in next commit to this PR (pushing in a 10m)

// Disable regular scrapes.
ScrapeInterval: model.Duration(9999 * time.Minute),
ScrapeTimeout: model.Duration(5 * time.Second),
// Ensure proto is chosen.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point adding a comment here! Maybe we could extend by saying that we're choosing proto because at the time it is the only protocol that parses CT? We can remove this option once we add more protocols

}))

// Start fake HTTP target to scrape returning a single metric.
once := sync.Once{}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about once

scrape/scrape.go Outdated
Comment on lines +1572 to +1577
if ctMs := p.CreatedTimestamp(); sl.enableCTZeroIngestion && ctMs != nil {
ref, err = app.AppendCTZeroSample(ref, lset, t, *ctMs)
if err != nil && !errors.Is(err, storage.ErrOutOfOrderCT) { // OOO is a common case, ignoring completely for now.
// CT is an experimental feature. For now, we don't need to fail the
// scrape on errors updating the created timestamp, log debug.
level.Debug(sl.l).Log("msg", "Error when updating metadata in scrape loop", "series", string(met), "ct", *ctMs, "t", t, "err", err)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks so much simpler now!

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the log message looks like a copy-paste mistake? We're not updating metadata yet, are we?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whooops, great catch!

Signed-off-by: bwplotka <[email protected]>
@ArthurSens ArthurSens merged commit 385c860 into ArthurSens:scrape-synthetic-zero Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants