Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spring Boot application sometimes does not shutdown after closing client and Spring context #43345

Open
3 tasks done
nateha1984 opened this issue Dec 11, 2024 · 5 comments
Assignees
Labels
azure-spring All azure-spring related issues azure-spring-servicebus Spring service bus related issues. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Milestone

Comments

@nateha1984
Copy link

Describe the bug
Occasionally the Service Bus connection is preventing a Spring Boot application from shutting down after closing the client. We have an application that is started with a KEDA event to consume one message from a service bus queue. Most of the time, the application/job starts in AKS, consumes the message, then we close the client and the Spring context, which shuts down the app and thus stops the container. At times, there looks to be some failure in the client close code that ends up preventing the Spring app and thus the container from stopping, so it continues to run the job until we manually kill the pod.

When this happens, we can see the messages below in the logs, but no exception is logged (just the Azure SDK json messages).

Exception or Stack Trace

2024-12-11T17:14:14.985Z  WARN 1 --- [           main] c.a.c.a.i.AmqpChannelProcessor           : {"az.sdk.message":"Cannot subscribe. Processor is already terminated.","exception":"Cannot subscribe. Processor is already terminated.","connectionId":"MF_b743c7_1733937252434","entityPath":"$cbs"}
2024-12-11T17:14:14.987Z  INFO 1 --- [           main] c.a.c.a.i.ReactorConnectionCache         : {"az.sdk.message":"connection close finished with error.","exception":"Cannot subscribe. Processor is already terminated.","entityPath":"N/A","connectionId":"MF_b743c7_1733937252434"}
2024-12-11T17:19:14.927Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.handler.ConnectionHandler      : {"az.sdk.message":"onConnectionRemoteClose","connectionId":"MF_b743c7_1733937252434","errorCondition":"amqp:connection:forced","errorDescription":"The connection was closed by container '7e9fe2d10e6941f592878859b2be6e61_G6' because it did not have any active links in the past 300000 milliseconds. TrackingId:7e9fe2d10e6941f592878859b2be6e61_G6, SystemTracker:gateway5, Timestamp:2024-12-11T17:19:14","hostName":"xxx-xxx-xxx.servicebus.windows.net"}

When the application DOES shutdown successfully, we see these messages at the end:

2024-12-11T16:30:16.364Z  INFO 1 --- [ctor-executor-1] c.a.c.a.implementation.ReactorExecutor   : {"az.sdk.message":"Processing all pending tasks and closing old reactor.","connectionId":"MF_31f861_1733934549999"}
2024-12-11T16:30:16.364Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.ReactorDispatcher              : {"az.sdk.message":"Reactor selectable is being disposed.","connectionId":"MF_31f861_1733934549999"}
2024-12-11T16:30:16.364Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.ReactorConnection              : {"az.sdk.message":"onConnectionShutdown. Shutting down.","connectionId":"MF_31f861_1733934549999","isTransient":false,"isInitiatedByClient":false,"shutdownMessage":"connectionId[MF_31f861_1733934549999] Reactor selectable is disposed.","namespace":"xxx-xxx-xxx.servicebus.windows.net"}
2024-12-11T16:30:16.366Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.ReactorConnectionCache         : {"az.sdk.message":"connection close finished.","entityPath":"N/A","connectionId":"MF_31f861_1733934549999"}
2024-12-11T16:30:16.366Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.ReactorConnection              : {"az.sdk.message":"onConnectionShutdown. Shutting down.","connectionId":"MF_31f861_1733934549999","isTransient":false,"isInitiatedByClient":false,"shutdownMessage":"Finished processing pending tasks.","namespace":"xxxx-xxxx-xxxx.servicebus.windows.net"}

To Reproduce
Steps to reproduce the behavior:

Use a Spring Boot CommandLineRunner application to receive a message from an Azure Service Bus Queue, then close the ServiceBusReceiverClient once the message has been received, then close the context. Error is NOT happening all the time.

Code Snippet

    public static void main(String[] args) {
        ConfigurableApplicationContext context = SpringApplication.run(Application.class, args);
        context.close();
    }

    @Override
    public void run(String... args) {
        receiveMessage();
    }

    void receiveMessage() {

        log.info("Starting the processor");
        processorClient.receiveMessages(1).forEach(m -> {
            boolean success = processMessage(m);

            if (success) {
                log.info("Completing the message");
                processorClient.complete(m);
            } else {
                log.warn("Abandoning the message");
                processorClient.abandon(m);
            }
        });

        log.info("Stopping and closing the processor");
        processorClient.close();
        log.info("Processor stopped and closed");
    }

Expected behavior
Application shuts down cleanly

Screenshots
If applicable, add screenshots to help explain your problem.

Setup (please complete the following information):

  • OS: Chainguard container image for JDK 23
  • IDE: [e.g. IntelliJ]
  • Library/Libraries: com.azure.spring:spring-cloud-azure-starter-servicebus:5.16.0
  • Java version: 23
  • App Server/Environment: Azure K8s containers, Spring Boot embedded Netty
  • Frameworks: Spring Boot

Additional context
This looks to be the recommended way to shutdown a Spring Boot app in our situation, but open to other suggestions that might allow things to close more cleanly/consistently

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 11, 2024
@joshfree joshfree added azure-spring All azure-spring related issues azure-spring-servicebus Spring service bus related issues. labels Dec 11, 2024
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 11, 2024
@joshfree
Copy link
Member

@yiliuTo @saragluna could you please follow up?

/cc @anuchandy as fyi

@saragluna
Copy link
Member

@anuchandy, have you seen similar error logs before?

@saragluna saragluna assigned saragluna and unassigned yiliuTo Dec 16, 2024
@saragluna saragluna added this to the 2025-01 milestone Dec 16, 2024
@anuchandy
Copy link
Member

Hi @saragluna, the first log from AmqpChannelProcessor happens when the connection timeout because of no activity on the connection. The origin of this log, i.e., AmqpChannelProcessor is deprecated and has been replaced with more reliable RequestResponseChannelCache in later versions. Has the spring starter using azure-messaging-servicebus version 7.17.6 or 7.17.7 been released yet? Version 7.17.6+ uses RequestResponseChannelCache as default.

Also, in the usage code, there is no try-catch-ignore for complete()/abandon() hence if these API throws, then control won't reach close(). Additionally, the close() may take up to a minute to shut down the internal state machine.

@nateha1984
Copy link
Author

hi @anuchandy, are you suggesting that the error is coming from the complete()/abandon() calls? the rest of the logs seem to show the close() method getting called and no errors being throw from anywhere else (I didn't include all of them because they're so verbose, so only included the lines that are different). we don't see any stack traces getting logged either to indicate an exception was thrown either. we will add the catch you mentioned to see if that makes a difference.

here are the rest of the logs between the abandon call and the different logs that chow up:

2024-12-11T17:14:14.889Z  WARN 1 --- [           main] c.a.c.p.s.f.x.l.QueueMessageListener     : Abandoning the message
2024-12-11T17:14:14.915Z  INFO 1 --- [           main] c.a.c.p.s.f.x.l.QueueMessageListener     : Stopping and closing the processor
2024-12-11T17:14:14.915Z  WARN 1 --- [           main] c.a.c.amqp.implementation.MessageFlux    : {"az.sdk.message":"Downstream cancellation signal arrived at MessageFlux.","messageFlux":"mf_a84048_1733937252404","connectionId":"MF_b743c7_1733937252434","linkName":"xxxx_6677fd_1733937252398","entityPath":"xxxx"}
2024-12-11T17:14:14.919Z  INFO 1 --- [           main] c.a.m.s.ServiceBusReceiverAsyncClient    : Removing receiver links.
2024-12-11T17:14:14.920Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.handler.ReceiveLinkHandler2    : {"az.sdk.message":"onLinkRemoteClose","connectionId":"MF_b743c7_1733937252434","linkName":"xxxx_6677fd_1733937252398","entityPath":"xxxx"}
2024-12-11T17:14:14.920Z  INFO 1 --- [           main] c.a.m.s.ServiceBusClientBuilder          : {"az.sdk.message":"Closing a client using shared connection cache.","numberOfOpenClients":1}
2024-12-11T17:14:14.920Z  INFO 1 --- [           main] c.a.c.p.s.f.x.l.QueueMessageListener     : Processor stopped and closed
2024-12-11T17:14:14.923Z  INFO 1 --- [           main] c.a.m.s.ServiceBusReceiverAsyncClient    : Removing receiver links.
2024-12-11T17:14:14.923Z  INFO 1 --- [           main] c.a.m.s.ServiceBusClientBuilder          : {"az.sdk.message":"Closing a client using shared connection cache.","numberOfOpenClients":0}
2024-12-11T17:14:14.923Z  INFO 1 --- [           main] c.a.m.s.ServiceBusClientBuilder          : No more open clients, closing shared connection cache.
2024-12-11T17:14:14.923Z  INFO 1 --- [           main] c.a.c.a.i.ReactorConnectionCache         : {"az.sdk.message":"closing connection (Terminating the connection recovery support.).","entityPath":"N/A","connectionId":"MF_b743c7_1733937252434"}
2024-12-11T17:14:14.923Z  INFO 1 --- [           main] c.a.c.a.i.ReactorConnection              : {"az.sdk.message":"Disposing of ReactorConnection.","connectionId":"MF_b743c7_1733937252434","isTransient":false,"isInitiatedByClient":true,"shutdownMessage":"Disposed by client."}
2024-12-11T17:14:14.924Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.handler.ReceiveLinkHandler2    : {"az.sdk.message":"onLinkFinal","connectionId":"MF_b743c7_1733937252434","linkName":"xxxx_6677fd_1733937252398","entityPath":"xxxx"}
2024-12-11T17:14:14.928Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.handler.SendLinkHandler        : {"az.sdk.message":"onLinkRemoteClose","connectionId":"MF_b743c7_1733937252434","linkName":"cbs:sender","entityPath":"$cbs"}
2024-12-11T17:14:14.929Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.handler.ReceiveLinkHandler2    : {"az.sdk.message":"onLinkRemoteClose","connectionId":"MF_b743c7_1733937252434","linkName":"cbs:receiver","entityPath":"$cbs"}
2024-12-11T17:14:14.929Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.AmqpChannelProcessor           : {"az.sdk.message":"Channel is closed. Requesting upstream.","connectionId":"MF_b743c7_1733937252434","entityPath":"$cbs"}
2024-12-11T17:14:14.929Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.AmqpChannelProcessor           : {"az.sdk.message":"Connection not requested, yet. Requesting one.","connectionId":"MF_b743c7_1733937252434","entityPath":"$cbs"}
2024-12-11T17:14:14.930Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.ReactorConnection              : {"az.sdk.message":"Emitting new response channel.","connectionId":"MF_b743c7_1733937252434","entityPath":"$cbs","linkName":"cbs"}
2024-12-11T17:14:14.930Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.AmqpChannelProcessor           : {"az.sdk.message":"Setting next AMQP channel.","connectionId":"MF_b743c7_1733937252434","entityPath":"$cbs"}
2024-12-11T17:14:14.930Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.AmqpChannelProcessor           : {"az.sdk.message":"Upstream connection publisher was completed. Terminating processor.","connectionId":"MF_b743c7_1733937252434","entityPath":"$cbs"}
2024-12-11T17:14:14.985Z  INFO 1 --- [ctor-executor-1] c.a.c.a.i.handler.SendLinkHandler        : {"az.sdk.message":"Sender link was never active. Closing endpoint states.","connectionId":"MF_b743c7_1733937252434","linkName":"cbs","entityPath":"$cbs"}

@anuchandy
Copy link
Member

Hello @nateha1984, my main note was that we should ensure close() is always called when the app decides to tear down.

For a run where the container remains without exiting, could you share 15 minutes of debug logs before the app calls close() and 5 minutes following the close() call? i.e., -15/+5 around the close call of an "impacted" container. Also, 10 minutes after calling close()on the same impacted container, take a heap dump and please share the names of all active threads.

@saragluna saragluna modified the milestones: 2025-01, 2025-02 Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
azure-spring All azure-spring related issues azure-spring-servicebus Spring service bus related issues. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
Status: Todo
Development

No branches or pull requests

5 participants