avilevy18 eb220862e5 discovery, scrape: Use backoff interval for throttling discovery updates; add DiscoveryReloadOnStartup option for short-lived environments (#18187)
* Adding scape on shutdown

Signed-off-by: avilevy <avilevy@google.com>

* scrape: replace skipOffsetting to make the test offset deterministic instead of skipping it entirely

Signed-off-by: avilevy <avilevy@google.com>

* renamed calculateScrapeOffset to getScrapeOffset

Signed-off-by: avilevy <avilevy@google.com>

* discovery: Add skipStartupWait to bypass initial discovery delay

In short-lived environments like agent mode or serverless, the
Prometheus process may only execute for a few seconds. Waiting for
the default 5-second `updatert` ticker before sending the first
target groups means the process could terminate before collecting
any metrics at all.

This commit adds a `skipStartupWait` option to the Discovery Manager
to bypass this initial delay. When enabled, the sender uses an
unthrottled startup loop that instantly forwards all triggers. This
ensures both the initial empty update from `ApplyConfig` and the
first real targets from discoverers are passed downstream immediately.

After the first ticker interval elapses, the sender cleanly breaks out
of the startup phase, resets the ticker, and resumes standard
operations.

Signed-off-by: avilevy <avilevy@google.com>

* scrape: Bypass initial reload delay for ScrapeOnShutdown

In short-lived environments like agent mode or serverless, the default
5-second `DiscoveryReloadInterval` can cause the process to terminate
before the scrape manager has a chance to process targets and collect
any metrics.

Because the discovery manager sends an initial empty update upon
configuration followed rapidly by the actual targets, simply waiting
for a single reload trigger is insufficient—the real targets would
still get trapped behind the ticker delay.

This commit introduces an unthrottled startup loop in the `reloader`
when `ScrapeOnShutdown` is enabled. It processes all incoming
`triggerReload` signals immediately during the first interval. Once
the initial tick fires, the `reloader` resets the ticker and falls
back into its standard throttled loop, ensuring short-lived processes
can discover and scrape targets instantly.

Signed-off-by: avilevy <avilevy@google.com>

* test(scrape): refactor time-based manager tests to use synctest

Addresses PR feedback to remove flaky, time-based sleeping in the scrape manager tests.

Add TestManager_InitialScrapeOffset and TestManager_ScrapeOnShutdown to use the testing/synctest package, completely eliminating real-world time.Sleep delays and making the assertions 100% deterministic.

- Replaced httptest.Server with net.Pipe and a custom startFakeHTTPServer helper to ensure all network I/O remains durably blocked inside the synctest bubble.
- Leveraged the skipOffsetting option to eliminate random scrape jitter, making the time-travel math exact and predictable.
- Using skipOffsetting also safely bypasses the global singleflight DNS lookup in setOffsetSeed, which previously caused cross-bubble panics in synctest.
- Extracted shared boilerplate into a setupSynctestManager helper to keep the test cases highly readable and data-driven.

Signed-off-by: avilevy <avilevy@google.com>

* Clarify use cases in InitialScrapeOffset comment

Signed-off-by: avilevy <avilevy@google.com>

* test(scrape): use httptest for mock server to respect context cancellation

- Replaced manual HTTP string formatting over `net.Pipe` with `httptest.NewUnstartedServer`.
- Implemented an in-memory `pipeListener` to allow the server to handle `net.Pipe` connections directly. This preserves `synctest` time isolation without opening real OS ports.
- Added explicit `r.Context().Done()` handling in the mock HTTP handler to properly simulate aborted requests and scrape timeouts.
- Validates that the request context remains active and is not prematurely cancelled during `ScrapeOnShutdown` scenarios.
- Renamed `skipOffsetting` to `skipJitterOffsetting`.
- Addressed other PR comments.

Signed-off-by: avilevy <avilevy@google.com>

* tmp

Signed-off-by: bwplotka <bwplotka@gmail.com>

* exp2

Signed-off-by: bwplotka <bwplotka@gmail.com>

* fix

Signed-off-by: bwplotka <bwplotka@gmail.com>

* scrape: fix scrapeOnShutdown context bug and refactor test helpers
The scrapeOnShutdown feature was failing during manager shutdown because
the scrape pool context was being cancelled before the final shutdown
scrapes could execute. Fix this by delaying context cancellation
in scrapePool.stop() until after all scrape loops have stopped.
In addition:
- Added test cases to verify scrapeOnShutdown works with InitialScrapeOffset.
- Refactored network test helper functions from manager_test.go to
  helpers_test.go.
- Addressed other comments.

Signed-off-by: avilevy <avilevy@google.com>

* Update scrape/scrape.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>

* feat(discovery): add SkipInitialWait to bypass initial startup delay

This adds a SkipInitialWait option to the discovery Manager, allowing consumers sensitive to startup latency to receive the first batch of discovered targets immediately instead of waiting for the updatert ticker.

To support this without breaking the immediate dropped target notifications introduced in #13147, ApplyConfig now uses a keep flag to only trigger immediate downstream syncs for obsolete or updated providers. This prevents sending premature empty target groups for brand-new providers on initial startup.

Additionally, the scrape manager's reloader loop is updated to process the initial triggerReload immediately, ensuring the end-to-end pipeline processes initial targets without artificial delays.

Signed-off-by: avilevy <avilevy@google.com>

* scrape: Add TestManagerReloader and refactor discovery triggerSync

Adds a new TestManagerReloader test suite using synctest to assert
behavior of target updates, discovery reload ticker intervals, and
ScrapeOnShutdown flags.

Updates setupSynctestManager to allow skipping initial config setup by
passing an interval of 0.

Also renames the 'keep' variable to 'triggerSync' in ApplyConfig inside
discovery/manager.go for clarity, and adds a descriptive comment.

Signed-off-by: avilevy <avilevy@google.com>

* feat(discovery,scrape): rename startup wait options and add DiscoveryReloadOnStartup

- discovery: Rename `SkipInitialWait` to `SkipStartupWait` for clarity.
- discovery: Pass `context.Context` to `flushUpdates` to handle cancellation and avoid leaks.
- scrape: Add `DiscoveryReloadOnStartup` to `Options` to decouple startup discovery from `ScrapeOnShutdown`.
- tests: Refactor `TestTargetSetTargetGroupsPresentOnStartup` and `TestManagerReloader` to use table-driven tests and `synctest` for better stability and coverage.

Signed-off-by: avilevy <avilevy@google.com>

* feat(discovery,scrape): importing changes proposed in 043d710

- Refactor sender to use exponential backoff
- Replaces `time.NewTicker` in `sender()` with an exponential backoff
  to prevent panics on non-positive intervals and better throttle updates.
- Removes obsolete `skipStartupWait` logic.
- Refactors `setupSynctestManager` to use an explicit `initConfig` argument

Signed-off-by: avilevy <avilevy@google.com>

* fix: updating go mod

Signed-off-by: avilevy <avilevy@google.com>

* fixing merge

Signed-off-by: avilevy <avilevy@google.com>

* fixing issue: 2 variables but NewTestMetrics returns 1 value

Signed-off-by: avilevy <avilevy@google.com>

* Update discovery/manager.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>

* Refactor setupSynctestManager initConfig into a separate function

Signed-off-by: avilevy <avilevy@google.com>

---------

Signed-off-by: avilevy <avilevy@google.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
2026-04-03 11:01:49 +01:00
2026-03-23 08:30:39 +01:00
2026-03-27 16:08:11 +01:00
2026-03-12 11:11:31 +00:00
2026-01-19 14:39:59 +01:00
2015-01-21 20:07:45 +01:00
2026-03-26 16:35:14 +01:00
2026-03-26 12:28:56 +01:00
2022-05-03 10:59:09 +02:00
2026-03-27 16:08:11 +01:00

Prometheus
Prometheus

Visit prometheus.io for the full documentation, examples and guides.

CI Docker Repository on Quay Docker Pulls Go Report Card CII Best Practices OpenSSF Scorecard CLOMonitor Gitpod ready-to-code Fuzzing Status

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

The features that distinguish Prometheus from other metrics and monitoring systems are:

  • A multi-dimensional data model (time series defined by metric name and set of key/value dimensions)
  • PromQL, a powerful and flexible query language to leverage this dimensionality
  • No dependency on distributed storage; single server nodes are autonomous
  • An HTTP pull model for time series collection
  • Pushing time series is supported via an intermediary gateway for batch jobs
  • Targets are discovered via service discovery or static configuration
  • Multiple modes of graphing and dashboarding support
  • Support for hierarchical and horizontal federation

Architecture overview

Architecture overview

Install

There are various ways to install Prometheus.

Precompiled binaries

Precompiled binaries for released versions are available in the download section on prometheus.io. Using the latest production release binary is the recommended way to install Prometheus. See the Installing chapter in the documentation for all the details.

Docker images

Docker images are available on Quay.io or Docker Hub.

You can launch a Prometheus container for trying it out with

docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus

Prometheus will now be reachable at http://localhost:9090/.

Building from source

To build Prometheus from source code, you need:

  • Go: Version specified in go.mod or greater.
  • NodeJS: Version specified in .nvmrc or greater.
  • npm: Version 10 or greater (check with npm --version and here).

Start by cloning the repository:

git clone https://github.com/prometheus/prometheus.git
cd prometheus

You can use the go tool to build and install the prometheus and promtool binaries into your GOPATH:

go install github.com/prometheus/prometheus/cmd/...
prometheus --config.file=your_config.yml

However, when using go install to build Prometheus, Prometheus will expect to be able to read its web assets from local filesystem directories under web/ui/static. In order for these assets to be found, you will have to run Prometheus from the root of the cloned repository. Note also that this directory does not include the React UI unless it has been built explicitly using make assets or make build.

An example of the above configuration file can be found here.

You can also build using make build, which will compile in the web assets so that Prometheus can be run from anywhere:

make build
./prometheus --config.file=your_config.yml

The Makefile provides several targets:

  • build: build the prometheus and promtool binaries (includes building and compiling in web assets)
  • test: run the tests
  • test-short: run the short tests
  • format: format the source code
  • vet: check the source code for common errors
  • assets: build the React UI

Service discovery plugins

Prometheus is bundled with many service discovery plugins. You can customize which service discoveries are included in your build using Go build tags.

To exclude service discoveries when building with make build, add the desired tags to the .promu.yml file under build.tags.all:

build:
    tags:
        all:
            - netgo
            - builtinassets
            - remove_all_sd           # Exclude all optional SDs
            - enable_kubernetes_sd    # Re-enable only kubernetes

Then run make build as usual. Alternatively, when using go build directly:

go build -tags "remove_all_sd,enable_kubernetes_sd" ./cmd/prometheus

Available build tags:

  • remove_all_sd - Exclude all optional service discoveries (keeps file_sd, static_sd, and http_sd)
  • enable_<name>_sd - Re-enable a specific SD when using remove_all_sd

If you add out-of-tree plugins, which we do not endorse at the moment, additional steps might be needed to adjust the go.mod and go.sum files. As always, be extra careful when loading third party code.

Building the Docker image

You can build a docker image locally with the following commands:

make promu
promu crossbuild -p linux/amd64
make npm_licenses
make common-docker-amd64

The make docker target is intended only for use in our CI system and will not produce a fully working image when run locally.

Using Prometheus as a Go Library

Within the Prometheus project, repositories such as prometheus/common and prometheus/client-golang are designed as re-usable libraries.

The prometheus/prometheus repository builds a stand-alone program and is not designed for use as a library. We are aware that people do use parts as such, and we do not put any deliberate inconvenience in the way, but we want you to be aware that no care has been taken to make it work well as a library. For instance, you may encounter errors that only surface when used as a library.

Remote Write

We are publishing our Remote Write protobuf independently at buf.build.

You can use that as a library:

go get buf.build/gen/go/prometheus/prometheus/protocolbuffers/go@latest

This is experimental.

Prometheus code base

In order to comply with go mod rules, Prometheus release number do not exactly match Go module releases.

For the Prometheus v3.y.z releases, we are publishing equivalent v0.3y.z tags. The y in v0.3y.z is always padded to two digits, with a leading zero if needed.

Therefore, a user that would want to use Prometheus v3.0.0 as a library could do:

go get github.com/prometheus/prometheus@v0.300.0

For the Prometheus v2.y.z releases, we published the equivalent v0.y.z tags.

Therefore, a user that would want to use Prometheus v2.35.0 as a library could do:

go get github.com/prometheus/prometheus@v0.35.0

This solution makes it clear that we might break our internal Go APIs between minor user-facing releases, as breaking changes are allowed in major version zero.

React UI Development

For more information on building, running, and developing on the React-based UI, see the React app's README.md.

More information

  • Godoc documentation is available via pkg.go.dev. Due to peculiarities of Go Modules, v3.y.z will be displayed as v0.3y.z (the y in v0.3y.z is always padded to two digits, with a leading zero if needed), while v2.y.z will be displayed as v0.y.z.
  • See the Community page for how to reach the Prometheus developers and users on various communication channels.

Contributing

Refer to CONTRIBUTING.md

License

Apache License 2.0, see LICENSE.

Description
No description provided
Readme Apache-2.0 712 MiB
Languages
Go 87.3%
TypeScript 11.4%
Yacc 0.5%
Shell 0.2%
SCSS 0.2%
Other 0.1%