Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick subsequent reconciliations cause errors caused by using cached client #733

Open
1 task
pmalek opened this issue Oct 10, 2024 · 3 comments
Open
1 task
Milestone

Comments

@pmalek
Copy link
Member

pmalek commented Oct 10, 2024

Problem statement

controller-runtime (use by the operator) uses a cached client.

This causes issues when quick, successive reconciliations of a resource happen and the subsequent iterations do not get an up to date view on an updated/patched object. For example:

  • calls an API (either external or kubeapi-server's) to create a resource
  • the request was successfull, object's status is patched with ID, controller returns from reconciliation loop (update/patch will cause a requeue)
  • another reconciliation happens
  • controller (using controller-runtime's client) gets the object and sees that there is no ID in status, attempts to call create again
  • this can result in
    • an error if the call specificies unique createria (like ID, or name which has to be unique)
    • or duplicated resources

Up until now, this has been taken care of by calling one of the Reduce*() helper functions which are aware of the objects' schema and their priorities so that surplus objects can be removed without consequences.

With integration with external APIs (e.g. Konnect API in #370) this does not work as the resource is created in external API and more importantly the create API calls will result in HTTP 409 Conflicts.

Exemplar logs

request:
POST /v2/control-planes HTTP/1.1
Host: eu.api.konghq.tech
Content-Length: 290
Accept: application/json
Authorization: Bearer kpat_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Type: application/json
Accept-Encoding: gzip

{"name":"test1","labels":{"app":"test1","k8s-generation":"1","k8s-group":"konnect.konghq.com","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-namespace":"default","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","k8s-version":"v1alpha1","key1":"test1","session":"eng-demo"}}
response:
HTTP/2.0 201 Created
Connection: close
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Date,x-datadog-trace-id,Konnect-Acting-As,Konnect-Feature-Set
Content-Security-Policy: default-src 'self';base-uri 'self';block-all-mixed-content;font-src 'self' https: data:;form-action 'self';frame-ancestors 'self';img-src 'self' data:;object-src 'none';script-src 'self';script-src-attr 'none';style-src 'self' https: 'unsafe-inline';upgrade-insecure-requests
Content-Type: application/json; charset=utf-8
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Resource-Policy: same-origin
Date: Thu, 10 Oct 2024 12:22:14 GMT
Etag: W/"2a5-ebpzw2GSE0sps7ves89bnmx3ey4"
Expect-Ct: max-age=0
Origin-Agent-Cluster: ?1
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4920
Ratelimit-Reset: 47
Referrer-Policy: no-referrer
Strict-Transport-Security: max-age=15552000; includeSubDomains
Vary: Origin
Via: kong-enterprise-edition
X-B3-Sampled: 1
X-B3-Spanid: 5c13d5952748e800
X-B3-Traceid: 000000000000000039ca0f476805b600
X-Content-Type-Options: nosniff
X-Datadog-Parent-Id: 6634881512632804896
X-Datadog-Trace-Id: 4164157604819744060
X-Dns-Prefetch-Control: off
X-Download-Options: noopen
X-Envoy-Upstream-Service-Time: 940
X-Frame-Options: SAMEORIGIN
X-Kong-Proxy-Latency: 1
X-Kong-Upstream-Latency: 941
X-Permitted-Cross-Domain-Policies: none
X-Ratelimit-Limit-Minute: 5000
X-Ratelimit-Remaining-Minute: 4920
X-Xss-Protection: 0

{"id":"28a018d8-44d2-4f1d-97e7-077e4f2d8fbc","name":"test1","description":"","labels":{"app":"test1","key1":"test1","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","session":"eng-demo","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-group":"konnect.konghq.com","k8s-version":"v1alpha1","k8s-namespace":"default","k8s-generation":"1"},"config":{"control_plane_endpoint":"https://25d04d0909.eu.cp0.konghq.tech","telemetry_endpoint":"https://25d04d0909.eu.tp0.konghq.tech","cluster_type":"CLUSTER_TYPE_CONTROL_PLANE","auth_type":"pinned_client_certs","cloud_gateway":false,"proxy_urls":[]},"created_at":"2024-10-10T12:22:13.692Z","updated_at":"2024-10-10T12:22:13.692Z"}
{"level":"info","ts":"2024-10-10T14:22:14.528+0200","logger":"KonnectGatewayControlPlane","msg":"operation in Konnect API complete","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"8fbd8dcf-53a9-47b8-ac2f-790815e089ed","op":"create","duration":"990.068708ms","konnect_id":"28a018d8-44d2-4f1d-97e7-077e4f2d8fbc"}
{"level":"debug","ts":"2024-10-10T14:22:14.558+0200","logger":"KonnectGatewayControlPlane","msg":"reconciling","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"4761a3b1-5543-4635-82a7-e4405645e265"}
request:
POST /v2/control-planes HTTP/1.1
Host: eu.api.konghq.tech
Content-Length: 290
Accept: application/json
Authorization: Bearer kpat_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Content-Type: application/json
Accept-Encoding: gzip

{"name":"test1","labels":{"app":"test1","k8s-generation":"1","k8s-group":"konnect.konghq.com","k8s-kind":"KonnectGatewayControlPlane","k8s-name":"test1","k8s-namespace":"default","k8s-uid":"0c2b1653-2839-4a4e-9c42-15ec8bcacc01","k8s-version":"v1alpha1","key1":"test1","session":"eng-demo"}}
Error: <nil>
response:
HTTP/2.0 409 Conflict
Content-Length: 169
Access-Control-Allow-Credentials: true
Access-Control-Expose-Headers: Date,x-datadog-trace-id,Konnect-Acting-As,Konnect-Feature-Set
Content-Security-Policy: default-src 'self';base-uri 'self';block-all-mixed-content;font-src 'self' https: data:;form-action 'self';frame-ancestors 'self';img-src 'self' data:;object-src 'none';script-src 'self';script-src-attr 'none';style-src 'self' https: 'unsafe-inline';upgrade-insecure-requests
Content-Type: application/problem+json; charset=utf-8
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Resource-Policy: same-origin
Date: Thu, 10 Oct 2024 12:22:15 GMT
Etag: W/"a9-+hkYkS5+BfEhKzaetYcn4sNF9XU"
Expect-Ct: max-age=0
Origin-Agent-Cluster: ?1
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4919
Ratelimit-Reset: 46
Referrer-Policy: no-referrer
Strict-Transport-Security: max-age=15552000; includeSubDomains
Vary: Origin
Via: kong-enterprise-edition
X-B3-Sampled: 1
X-B3-Spanid: 5fe637d7c66f5c00
X-B3-Traceid: 00000000000000004793b70701333800
X-Content-Type-Options: nosniff
X-Datadog-Parent-Id: 6910272078133288161
X-Datadog-Trace-Id: 5157667238982137886
X-Dns-Prefetch-Control: off
X-Download-Options: noopen
X-Envoy-Upstream-Service-Time: 360
X-Frame-Options: SAMEORIGIN
X-Kong-Proxy-Latency: 1
X-Kong-Upstream-Latency: 361
X-Permitted-Cross-Domain-Policies: none
X-Ratelimit-Limit-Minute: 5000
X-Ratelimit-Remaining-Minute: 4919
X-Xss-Protection: 0

{"status":409,"title":"Conflict","instance":"kong:trace:5157667238982137886","detail":"Key (org_id, name)=(5ca26716-02f7-4430-9117-1d1a7a2695e7, test1) already exists."}
{"level":"error","ts":"2024-10-10T14:22:14.962+0200","logger":"KonnectGatewayControlPlane","msg":"operation in Konnect API failed","controller":"KonnectGatewayControlPlane","controllerGroup":"konnect.konghq.com","controllerKind":"KonnectGatewayControlPlane","KonnectGatewayControlPlane":{"name":"test1","namespace":"default"},"namespace":"default","name":"test1","reconcileID":"4761a3b1-5543-4635-82a7-e4405645e265","op":"create","duration":"403.763875ms","error":"failed to create KonnectGatewayControlPlane default/test1: {\"status\":409,\"title\":\"Conflict\",\"instance\":\"kong:trace:5157667238982137886\",\"detail\":\"Key (org_id, name)=(5ca26716-02f7-4430-9117-1d1a7a2695e7, test1) already exists.\"}"}

Proposed solution

Make use of expectations https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go

Related material

https://www.youtube.com/watch?v=wMqzAOp15wo&t=748s

Acceptance criteria

  • Subsequent reconciliations of a resource should be idempotent (e.g. do not attempt to create a resource more than once)
@pmalek pmalek added this to the KGO v1.5.x milestone Oct 10, 2024
@pmalek pmalek changed the title Quick subsequent reconciliations cause errors Quick subsequent reconciliations cause errors caused by using cached client Oct 11, 2024
@pmalek
Copy link
Member Author

pmalek commented Oct 11, 2024

As a follow up to the conversation on Kubernetes slack: https://kubernetes.slack.com/archives/C02MRBMN00Z/p1728554873783249.

A possible way forward (at least for the resources that are created against an external API like Konnect) we can implement a fallback lookup when 409 Conflict is returned. We could add filtering based on Kubernetes UID so that we're not "adopting" (separate issue #460) resources mapped from elsewhere. This could look like this (example for KonnectGatewayControlPlane):

	var sdkConflictError *sdkkonnecterrs.ConflictError
		if errors.As(err, &sdkConflictError) {
			reqList := operations.ListControlPlanesRequest{
				FilterNameEq: lo.ToPtr(cp.Spec.Name),
				Labels:       lo.ToPtr(KubernetesUIDLabelKey + ":" + string(cp.GetUID())),
			}
			if cp.Spec.ClusterType != nil {
				reqList.FilterClusterTypeEq = lo.ToPtr(string(*cp.Spec.ClusterType))
			}

			respList, err := sdk.ListControlPlanes(ctx, reqList)
			if err != nil {
				return err
			}
			for _, listCP := range respList.ListControlPlanesResponse.Data {
				if listCP.Name != nil && *listCP.Name == req.Name {
					cp.Status.SetKonnectID(*listCP.ID)
					break
				}
			}

@lahabana
Copy link
Contributor

@pmalek this should be part of: #827 no?

@pmalek
Copy link
Member Author

pmalek commented Dec 10, 2024

This issue has dual nature behind it:

I hope that clears things up a bit w.r.t. where this issue belongs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants