Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to OTel v0.116.0 #2314

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Upgrade to OTel v0.116.0 #2314

wants to merge 5 commits into from

Conversation

ptodev
Copy link
Contributor

@ptodev ptodev commented Dec 24, 2024

PR Description

Upgrading to the latest version of OTel.

Community components

Pinging community component owners:

Please feel free to open a PR to update the community components :) There have been a few minor changes to the upstream code. You could either merge to my PR, or wait for my PR to be merged and then to merge yours to main. It is not necessary to update the community components for v1.6, since the upstream changes are not that big. If you decide to not update them at all with the latest changes, that's ok too.

Which issue(s) this PR fixes

Fixes #2255
Fixes #2243

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@ptodev ptodev requested review from clayton-cornell and a team as code owners December 24, 2024 15:28
Copy link
Contributor

github-actions bot commented Dec 24, 2024

The forked Beyla version contains an updated OTel dependency.
Comment on lines +962 to +963
//TODO: Do not merge this. Wait for upstream to upgrade the main branch, or to release a new version.
replace github.com/grafana/beyla => github.com/grafana/beyla v1.9.1-0.20241230130037-7083b65bf473
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we need to upgrade to a new version of Beyla as part of this OTel upgrade. It'll have to contain an upgrade to OTel 0.116.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also lint and build issues due to Beyla dependencies.

ptodev added 2 commits January 1, 2025 22:35
The non-test package needs to have access to the gate,
so that when it's called from non-test code it can register the feature gate with certainty.
@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Jan 2, 2025
Comment on lines +80 to 83

> **EXPERIMENTAL**: Metrics support in `otelcol.exporter.loadbalancing` is an [experimental][] feature.
> Experimental features are subject to frequent breaking changes, and may be removed with no equivalent replacement.
> The `stability.level` flag must be set to `experimental` to use the feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> **EXPERIMENTAL**: Metrics support in `otelcol.exporter.loadbalancing` is an [experimental][] feature.
> Experimental features are subject to frequent breaking changes, and may be removed with no equivalent replacement.
> The `stability.level` flag must be set to `experimental` to use the feature.
[blocks]: #blocks
> **EXPERIMENTAL**: Metrics support in `otelcol.exporter.loadbalancing` is an [experimental][] feature.
> Experimental features are subject to frequent breaking changes, and may be removed with no equivalent replacement.
> The `stability.level` flag must be set to `experimental` to use the feature.
[experimental]: https://grafana.com/docs/release-life-cycle/

Missing the blocks link definition, and when we included the custom experimental text we forgot to add the link definition for the release life cycle.

Comment on lines +112 to +113
* The ones under `protocol > otlp`. This is useful for temporary problems with a specific backend, like transient network issues.
* The ones top-level ones for `otelcol.exporter.loadbalancing` itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The ones under `protocol > otlp`. This is useful for temporary problems with a specific backend, like transient network issues.
* The ones top-level ones for `otelcol.exporter.loadbalancing` itself.
* The queue and retry blocks under `protocol > otlp`. This is useful for temporary problems with a specific backend, like transient network issues.
* The top-level queue and retry blocks for `otelcol.exporter.loadbalancing`.

* The ones under `protocol > otlp`. This is useful for temporary problems with a specific backend, like transient network issues.
* The ones top-level ones for `otelcol.exporter.loadbalancing` itself.
Those configuration options provide capability to re-route data into a new set of healthy backends.
This are useful for highly elastic environments like Kubernetes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This are useful for highly elastic environments like Kubernetes,
This is useful for highly elastic environments like Kubernetes,

@@ -56,6 +58,17 @@ data without any of the well-known IP attributes. If the Deployment {{< param "P
{{< param "PRODUCT_NAME" >}}s deployed as DaemonSet, then some of those attributes might be missing. As a workaround,
you can configure the DaemonSet {{< param "PRODUCT_NAME" >}}s with `passthrough` set to `true`.

By default, `otelcol.processor.k8sattributes` will be ready as soon as it starts, even if no metadata has been fetched yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, `otelcol.processor.k8sattributes` will be ready as soon as it starts, even if no metadata has been fetched yet.
By default, `otelcol.processor.k8sattributes` is ready as soon as it starts, even if no metadata has been fetched yet.

If telemetry is sent to this processor before the metadata is synced, there will be no metadata to enrich the telemetry with.

To wait for the metadata to be synced before `otelcol.processor.k8sattributes` is ready, set the `wait_for_metadata` option to `true`.
Then the processor will not be ready until the metadata is fully synced. As a result, the start-up of {{< param "PRODUCT_NAME" >}} will be blocked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then the processor will not be ready until the metadata is fully synced. As a result, the start-up of {{< param "PRODUCT_NAME" >}} will be blocked.
Then, the processor will not be ready until the metadata is fully synced. As a result, the start-up of {{< param "PRODUCT_NAME" >}} will be blocked.

To wait for the metadata to be synced before `otelcol.processor.k8sattributes` is ready, set the `wait_for_metadata` option to `true`.
Then the processor will not be ready until the metadata is fully synced. As a result, the start-up of {{< param "PRODUCT_NAME" >}} will be blocked.
If the metadata cannot be synced by the time the `metadata_sync_timeout` duration is reached,
`otelcol.processor.k8sattributes` will become unhealthy and will fail to start.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`otelcol.processor.k8sattributes` will become unhealthy and will fail to start.
`otelcol.processor.k8sattributes` will become unhealthy and fail to start.

`otelcol.processor.k8sattributes` will become unhealthy and will fail to start.

If `otelcol.processor.k8sattributes` is unhealthy, other {{< param "PRODUCT_NAME" >}} components will still be able to start.
However, they may not be able to send telemetry to `otelcol.processor.k8sattributes`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, they may not be able to send telemetry to `otelcol.processor.k8sattributes`.
However, they may be unable to send telemetry to `otelcol.processor.k8sattributes`.

@@ -143,12 +156,30 @@ The `annotation` block configures how to extract Kubernetes annotations.

{{< docs/shared lookup="reference/components/extract-field-block.md" source="alloy" version="<ALLOY_VERSION>" >}}

{{< admonition type="warning" >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{< admonition type="warning" >}}
{{< admonition type="caution" >}}

Comment on lines +322 to +323
This example will add the same new `"documentId"="12345678"` attribute as the previous example.
However, it will now result in an unchanged span name (/api/v1/document/12345678/update).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This example will add the same new `"documentId"="12345678"` attribute as the previous example.
However, it will now result in an unchanged span name (/api/v1/document/12345678/update).
This example adds the same new `"documentId"="12345678"` attribute as the previous example.
However, the span name is unchanged (/api/v1/document/12345678/update).


`decision_wait` determines the number of batches to maintain on a channel. Its value must convert to a number of seconds greater than zero.

`num_traces` determines the buffer size of the trace delete channel which is composed of trace ids. Increasing the number will increase the memory usage of the component while decreasing the number will lower the maximum amount of traces kept in memory.

`expected_new_traces_per_sec` determines the initial slice sizing of the current batch. A larger number will use more memory but be more efficient when adding traces to the batch.

`decision_cache` requires a key `sampled_cache_size` with a value that indicates the number of trace IDs to keep in the cache. When `sampled_cache_size` is set to `0`, the cache is inactive. When you use `decision_cache`, make sure you set `sampled_cache_size` to a value much higher than `num_traces` so that decisions for trace IDs are kept longer than the span data for the trace.
`decision_cache` can contain two keys:
- `sampled_cache_size`: Configures the amount of trace IDs to be kept in an LRU cache,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `sampled_cache_size`: Configures the amount of trace IDs to be kept in an LRU cache,
- `sampled_cache_size`: Configures the number of trace IDs to be kept in an LRU cache,

- `sampled_cache_size`: Configures the amount of trace IDs to be kept in an LRU cache,
persisting the "keep" decisions for traces that may have already been released from memory.
By default, the size is 0 and the cache is inactive.
- `non_sampled_cache_size`: Configures amount of trace IDs to be kept in an LRU cache,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `non_sampled_cache_size`: Configures amount of trace IDs to be kept in an LRU cache,
- `non_sampled_cache_size`: Configures number of trace IDs to be kept in an LRU cache,

By default, the size is 0 and the cache is inactive.

You may want to vary the size of the `decision_cache` depending on how many "keep" vs "drop" decisions you expect from your policies.
For example, you may allocate a larger `non_sampled_cache_size` if you expect most traces to be dropped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, you may allocate a larger `non_sampled_cache_size` if you expect most traces to be dropped.
For example, you can allocate a larger `non_sampled_cache_size` if you expect most traces to be dropped.


You may want to vary the size of the `decision_cache` depending on how many "keep" vs "drop" decisions you expect from your policies.
For example, you may allocate a larger `non_sampled_cache_size` if you expect most traces to be dropped.
Additionally, when using `decision_cache`, configure it with a much higher value than `num_traces` so decisions for trace IDs are kept longer than the span data for the trace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Additionally, when using `decision_cache`, configure it with a much higher value than `num_traces` so decisions for trace IDs are kept longer than the span data for the trace.
Additionally, when you use `decision_cache`, configure it with a much higher value than `num_traces` so decisions for trace IDs are kept longer than the span data for the trace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
3 participants