* Adding scape on shutdown
Signed-off-by: avilevy <avilevy@google.com>
* scrape: replace skipOffsetting to make the test offset deterministic instead of skipping it entirely
Signed-off-by: avilevy <avilevy@google.com>
* renamed calculateScrapeOffset to getScrapeOffset
Signed-off-by: avilevy <avilevy@google.com>
* discovery: Add skipStartupWait to bypass initial discovery delay
In short-lived environments like agent mode or serverless, the
Prometheus process may only execute for a few seconds. Waiting for
the default 5-second `updatert` ticker before sending the first
target groups means the process could terminate before collecting
any metrics at all.
This commit adds a `skipStartupWait` option to the Discovery Manager
to bypass this initial delay. When enabled, the sender uses an
unthrottled startup loop that instantly forwards all triggers. This
ensures both the initial empty update from `ApplyConfig` and the
first real targets from discoverers are passed downstream immediately.
After the first ticker interval elapses, the sender cleanly breaks out
of the startup phase, resets the ticker, and resumes standard
operations.
Signed-off-by: avilevy <avilevy@google.com>
* scrape: Bypass initial reload delay for ScrapeOnShutdown
In short-lived environments like agent mode or serverless, the default
5-second `DiscoveryReloadInterval` can cause the process to terminate
before the scrape manager has a chance to process targets and collect
any metrics.
Because the discovery manager sends an initial empty update upon
configuration followed rapidly by the actual targets, simply waiting
for a single reload trigger is insufficient—the real targets would
still get trapped behind the ticker delay.
This commit introduces an unthrottled startup loop in the `reloader`
when `ScrapeOnShutdown` is enabled. It processes all incoming
`triggerReload` signals immediately during the first interval. Once
the initial tick fires, the `reloader` resets the ticker and falls
back into its standard throttled loop, ensuring short-lived processes
can discover and scrape targets instantly.
Signed-off-by: avilevy <avilevy@google.com>
* test(scrape): refactor time-based manager tests to use synctest
Addresses PR feedback to remove flaky, time-based sleeping in the scrape manager tests.
Add TestManager_InitialScrapeOffset and TestManager_ScrapeOnShutdown to use the testing/synctest package, completely eliminating real-world time.Sleep delays and making the assertions 100% deterministic.
- Replaced httptest.Server with net.Pipe and a custom startFakeHTTPServer helper to ensure all network I/O remains durably blocked inside the synctest bubble.
- Leveraged the skipOffsetting option to eliminate random scrape jitter, making the time-travel math exact and predictable.
- Using skipOffsetting also safely bypasses the global singleflight DNS lookup in setOffsetSeed, which previously caused cross-bubble panics in synctest.
- Extracted shared boilerplate into a setupSynctestManager helper to keep the test cases highly readable and data-driven.
Signed-off-by: avilevy <avilevy@google.com>
* Clarify use cases in InitialScrapeOffset comment
Signed-off-by: avilevy <avilevy@google.com>
* test(scrape): use httptest for mock server to respect context cancellation
- Replaced manual HTTP string formatting over `net.Pipe` with `httptest.NewUnstartedServer`.
- Implemented an in-memory `pipeListener` to allow the server to handle `net.Pipe` connections directly. This preserves `synctest` time isolation without opening real OS ports.
- Added explicit `r.Context().Done()` handling in the mock HTTP handler to properly simulate aborted requests and scrape timeouts.
- Validates that the request context remains active and is not prematurely cancelled during `ScrapeOnShutdown` scenarios.
- Renamed `skipOffsetting` to `skipJitterOffsetting`.
- Addressed other PR comments.
Signed-off-by: avilevy <avilevy@google.com>
* tmp
Signed-off-by: bwplotka <bwplotka@gmail.com>
* exp2
Signed-off-by: bwplotka <bwplotka@gmail.com>
* fix
Signed-off-by: bwplotka <bwplotka@gmail.com>
* scrape: fix scrapeOnShutdown context bug and refactor test helpers
The scrapeOnShutdown feature was failing during manager shutdown because
the scrape pool context was being cancelled before the final shutdown
scrapes could execute. Fix this by delaying context cancellation
in scrapePool.stop() until after all scrape loops have stopped.
In addition:
- Added test cases to verify scrapeOnShutdown works with InitialScrapeOffset.
- Refactored network test helper functions from manager_test.go to
helpers_test.go.
- Addressed other comments.
Signed-off-by: avilevy <avilevy@google.com>
* Update scrape/scrape.go
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>
* feat(discovery): add SkipInitialWait to bypass initial startup delay
This adds a SkipInitialWait option to the discovery Manager, allowing consumers sensitive to startup latency to receive the first batch of discovered targets immediately instead of waiting for the updatert ticker.
To support this without breaking the immediate dropped target notifications introduced in #13147, ApplyConfig now uses a keep flag to only trigger immediate downstream syncs for obsolete or updated providers. This prevents sending premature empty target groups for brand-new providers on initial startup.
Additionally, the scrape manager's reloader loop is updated to process the initial triggerReload immediately, ensuring the end-to-end pipeline processes initial targets without artificial delays.
Signed-off-by: avilevy <avilevy@google.com>
* scrape: Add TestManagerReloader and refactor discovery triggerSync
Adds a new TestManagerReloader test suite using synctest to assert
behavior of target updates, discovery reload ticker intervals, and
ScrapeOnShutdown flags.
Updates setupSynctestManager to allow skipping initial config setup by
passing an interval of 0.
Also renames the 'keep' variable to 'triggerSync' in ApplyConfig inside
discovery/manager.go for clarity, and adds a descriptive comment.
Signed-off-by: avilevy <avilevy@google.com>
* feat(discovery,scrape): rename startup wait options and add DiscoveryReloadOnStartup
- discovery: Rename `SkipInitialWait` to `SkipStartupWait` for clarity.
- discovery: Pass `context.Context` to `flushUpdates` to handle cancellation and avoid leaks.
- scrape: Add `DiscoveryReloadOnStartup` to `Options` to decouple startup discovery from `ScrapeOnShutdown`.
- tests: Refactor `TestTargetSetTargetGroupsPresentOnStartup` and `TestManagerReloader` to use table-driven tests and `synctest` for better stability and coverage.
Signed-off-by: avilevy <avilevy@google.com>
* feat(discovery,scrape): importing changes proposed in 043d710
- Refactor sender to use exponential backoff
- Replaces `time.NewTicker` in `sender()` with an exponential backoff
to prevent panics on non-positive intervals and better throttle updates.
- Removes obsolete `skipStartupWait` logic.
- Refactors `setupSynctestManager` to use an explicit `initConfig` argument
Signed-off-by: avilevy <avilevy@google.com>
* fix: updating go mod
Signed-off-by: avilevy <avilevy@google.com>
* fixing merge
Signed-off-by: avilevy <avilevy@google.com>
* fixing issue: 2 variables but NewTestMetrics returns 1 value
Signed-off-by: avilevy <avilevy@google.com>
* Update discovery/manager.go
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>
* Refactor setupSynctestManager initConfig into a separate function
Signed-off-by: avilevy <avilevy@google.com>
---------
Signed-off-by: avilevy <avilevy@google.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
The package makes vulnerability scanners unhappy, and the functionality is available in the smaller moby/moby packages.
Signed-off-by: alex boten <223565+codeboten@users.noreply.github.com>
Add Outscale VM service discovery using osc-sdk-go, including optional secret_key_file support, metrics, docs, and configuration examples. Document the default region (eu-west-2).
Signed-off-by: Aurelien Duboc <aurelienduboc96@gmail.com>
Bump github.com/pb33f/libopenapi to v0.34.4, which upgrades github.com/buger/jsonparser
to v1.1.2 to address CVE-2026-32285.
Signed-off-by: Tomer Gabay <10978711+sTomerG@users.noreply.github.com>
* discovery/vultr: upgrade govultr from v2 to v3
The govultr/v2 library is no longer actively maintained. Upgrade to
govultr/v3 (v3.28.1) which receives regular updates and security
patches.
The v3 library is API-compatible with v2 for the Instance.List
method used by the Vultr SD, with the only change being an
additional *http.Response return value.
Signed-off-by: Pierluigi Lenoci <pierluigi.lenoci@gmail.com>
* discovery/vultr: check HTTP response status code
Validate that the Vultr API returns a 2xx status code after listing
instances, as the *http.Response from govultr v3 is now available.
Signed-off-by: Pierluigi Lenoci <pierluigi.lenoci@gmail.com>
* discovery/vultr: fix linter error in error string capitalization
Error strings should not be capitalized per Go conventions (ST1005).
Signed-off-by: Pierluigi Lenoci <pierluigi.lenoci@gmail.com>
---------
Signed-off-by: Pierluigi Lenoci <pierluigi.lenoci@gmail.com>
Replace gopkg.in/yaml.v2 and gopkg.in/yaml.v3 imports with
go.yaml.in/yaml/v2 and go.yaml.in/yaml/v3 respectively.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Add OpenAPI 3.2 specification generation for Prometheus HTTP API
This commit introduces an OpenAPI specification for the Prometheus API.
After testing multiple code-generation servers with built-in APIs, this
implementation uses an independent spec file outside of the critical path.
This spec file is tested with a framework present in this pull request.
The specification helps clients know which parameters they can use and is
served at /api/v1/openapi.yaml. The spec file will evolve with the
Prometheus API and has the same version number.
Downstream projects can tune the APIs presented in the spec file with
configuration options using the IncludePaths setting for path filtering.
In the future, there is room to generate a server from this spec file
(e.g. with interfaces), but this is out of scope for this pull request.
Architecture:
- Core OpenAPI infrastructure (openapi.go): Dynamic spec building,
caching, and thread-safe spec generation
- Schema definitions (openapi_schemas.go): Complete type definitions
for all API request and response types
- Path specifications (openapi_paths.go): Endpoint definitions with
parameters, request bodies, and response schemas
- Examples (openapi_examples.go): Realistic request/response examples
- Helper functions (openapi_helpers.go): Reusable builders for common
OpenAPI structures
Testing:
- Comprehensive test suite with golden file validation
- Test helpers package for API testing infrastructure
- OpenAPI compliance validation utilities
The golden file captures the complete specification for snapshot testing.
Update with: go test -run TestOpenAPIGolden -update-openapi-spec
REVIEWERS: The most important thing to check would be the OpenAPI golden
file (web/api/v1/testdata/openapi_golden.yaml). Test scenarios are important
as they test the actual OpenAPI spec validity.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Add OpenAPI 3.1 support with version selection
Add support for both OpenAPI 3.1 and 3.2 specifications with version
selection via openapi_version query parameter. Defaults to 3.1 for
broader compatibility
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Enhance OpenAPI examples and add helper functions
- Add timestampExamples helper for consistent time formatting
- Add exampleMap helper to simplify example creation
- Improve example summaries with query details
- Add matrix result example for range vector queries
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* web/api: Add AtST method to test helper iterators
Implement the AtST() method required by chunkenc.Iterator interface
for FakeSeriesIterator and FakeHistogramSeriesIterator test helpers.
The method returns 0 as these test helpers don't use start timestamps
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* OpenAPI: Add minimum coverage test
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* OpenAPI: Improve examples handling
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
---------
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* otlptranslator: add label caching for OTLP-to-Prometheus conversion
Add per-request caching to reduce redundant computation and allocations
during OTLP metric conversion:
1. Per-request label sanitization cache: Cache sanitized label names
within a request to avoid repeated string allocations for commonly
repeated labels like __name__, job, instance.
2. Resource-level label caching: Precompute and cache job, instance,
promoted resource attributes, and external labels once per
ResourceMetrics boundary instead of for each datapoint.
3. Scope-level label caching: Precompute and cache scope metadata labels
(otel_scope_name, otel_scope_version, etc.) once per ScopeMetrics
boundary.
4. LabelNamer instance caching: Reuse the LabelNamer struct across
datapoints within the same resource context.
These optimizations significantly reduce allocations and improve latency
for OTLP ingestion workloads with many datapoints per resource/scope.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>