Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Bus Messages Held in AMQP Buffer After Being Dequeued and processed #47684

Open
2 tasks done
RaviBell opened this issue Dec 27, 2024 · 8 comments
Open
2 tasks done
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-author-feedback Workflow: More information is needed from author to address the issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus

Comments

@RaviBell
Copy link

RaviBell commented Dec 27, 2024

I am experiencing an issue with Azure Service Bus where messages are dequeued from the subscription and processed successfully by the backend application. However, for some messages, we are missing the status update. Ideally, if a message misses its status update, it should move to the Dead Letter Queue (DLQ). Instead, these messages are not moving to the DLQ and are remaining indefinitely in the AMQP buffer, causing a gradual memory increase in the application. Additionally, these messages persist beyond their Time-to-Live (TTL).

Why is this not a Bug or a feature Request?
We are not categorizing this as a bug at this point. Instead, we are seeking suggestions or guidance based on the current issue we are experiencing.

Setup (please complete the following information if applicable):

  • OS: Windows

  • IDE: .NET application targeting net462

  • Library/Libraries:
    Azure.Core.Amqp = 1.3.1
    Azure.Messaging.ServiceBus = 7.18.2
    Project Type: Service Fabric Application
    Azure Service Bus Namespace Type: Standard

    Configuration:
    Topic-Level Settings:
    Default Message TTL: 1 hour
    Subscription-Level Settings:
    Subscription Name: gatewayworker
    Default Message TTL: 5 minutes
    Max Delivery Count: 10
    Lock Duration: 2 minutes
    Auto-Delete on Idle: Effectively disabled (106,751 days)
    Dead-Lettering on TTL Expiration: Enabled
    Dead-Lettering on Filter Evaluation Error: Enabled

Message Details AMQP library (from debug session):
Creation Time: 12/11/2024 5:57:33 PM
Absolute Expiry Time: 12/11/2024 6:57:33 PM (matches topic TTL of 1 hour)
Delivery Count: 0
Current Status:
Active Message Count: 0 (after being dequeue).
Dead-Letter Message Count: 0.

Some messages are not visible as active in the subscription after being dequeued but are remaining held in the AMQP buffer, causing memory usage to increase.
Dump screenshots: This was taken on 12/12/2024 12:10am PST, the message in the buffer was
dequed at: 12/11/2024 5:57:33 PM
expires at: 12/11/2024 6:57:33 PM

Header:
Image

AbsoluteExpiryTime
Image

Additional Observations:

Memory Increase: Application memory usage grows gradually.

Buffer Stream: From the dump, the Microsoft.Azure.Amqp.AmqpMessage and its buffer stream are holding the message data.

No Prefetch: Prefetch is explicitly disabled.
No Lock Renewal: Lock renewal is not enabled or performed for messages.

Message Behavior After Being Dequeued:

Messages are processed successfully by the backend application.
For some messages, the status update fails. These messages are not moving to the DLQ, even though they are missing status updates.
Instead, they remain in the AMQP buffer, and the TTL and DLQ behavior are not triggered as expected.

Questions:

1)Why are the messages that fail status updates not moving to the DLQ?

2)Why do these messages remain held in the AMQP buffer indefinitely instead of being redelivered or discarded?

3)Why are TTL and DLQ behavior not triggered for messages that are neither completed nor abandoned?

4)Could the AMQP library buffer stream retaining data be related to the memory increase? If so, how can it be resolved?
5) what is the best approach to clean the AMQP buffer for the message that are greater than 1hr?

am trying to attach dump but since it is 1.30GB it is failing to upload, please suggest where i can upload the dump of it is required for analysis.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Query Added
  • Setup information Added
@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus labels Dec 27, 2024
Copy link

@anuchandy @conniey @lmolkova

Copy link

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@lmolkova lmolkova transferred this issue from Azure/azure-sdk-for-java Dec 27, 2024
@lmolkova lmolkova added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Dec 27, 2024
@jsquire jsquire self-assigned this Dec 28, 2024
@jsquire
Copy link
Member

jsquire commented Dec 28, 2024

Hi @RaviBell. Thanks for reaching out and we regret that you're experiencing difficulties. There's not enough context about how you're using the Service Bus clients to provide thoughts around what you may be seeing or how to mitigate. Generally, unless you explicitly enable prefetch in a client or your application holds a long-lived reference to an instance, messages are not buffered internally and exist only until they are completed. At that time, the unmanaged transport objects are disposed, and the .NET instance is eligible for garbage collection.

Since you've included a debugger screenshot, I'll also mention that it is important to note that debugger inspection will hold a reference to the objects, triggering lazy allocations that may otherwise not be executed and extending the lifetime.

For assistance, we'll need to understand the end-to-end scenario for how the application is using the Service Bus library. Either a small, stand-alone app that reproduces what you're seeing or snippets from your application showing which Service Bus client types you're using, how you initialize them, and how you interact with them to read/process/complete messages would be helpful. In addition, seeing a capture of verbose Azure SDK logs for a +/- 5-minute period around the behavior would allow us to understand the client's perspective. Guidance for capturing logs can be found in the article Logging with the Azure SDK for .NET.

@jsquire jsquire added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Dec 28, 2024
Copy link

Hi @RaviBell. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

@RaviBell
Copy link
Author

Hi @RaviBell. Thanks for reaching out and we regret that you're experiencing difficulties. There's not enough context about how you're using the Service Bus clients to provide thoughts around what you may be seeing or how to mitigate. Generally, unless you explicitly enable prefetch in a client or your application holds a long-lived reference to an instance, messages are not buffered internally and exist only until they are completed. At that time, the unmanaged transport objects are disposed, and the .NET instance is eligible for garbage collection.

Since you've included a debugger screenshot, I'll also mention that it is important to note that debugger inspection will hold a reference to the objects, triggering lazy allocations that may otherwise not be executed and extending the lifetime.

For assistance, we'll need to understand the end-to-end scenario for how the application is using the Service Bus library. Either a small, stand-alone app that reproduces what you're seeing or snippets from your application showing which Service Bus client types you're using, how you initialize them, and how you interact with them to read/process/complete messages would be helpful. In addition, seeing a capture of verbose Azure SDK logs for a +/- 5-minute period around the behavior would allow us to understand the client's perspective. Guidance for capturing logs can be found in the article Logging with the Azure SDK for .NET.

Hello @jsquire,
Thanks lot for the quick response to the post. Please find the details below as suggested. If any requested details are still missing, kindly let me know. If incase need a memory dump for your analysis, do let me know.

Application Overview and Workflow

  1. Service Fabric-Based Application:
    o The application is designed to run on-premises across a set of specific role-type nodes (e.g., 3 or 4 nodes).
    o The architecture leverages Service Fabric for scalability and reliability.
  2. Service Bus Setup:
    o The Service Bus configuration includes a Topic with associated Subscriptions, organized into 9 distinct workflows.
  3. Purpose:
    o This application’s sole purpose is to dequeue messages from respective workflows as triggered by API calls.
  4. Transport Type:
    o The application uses AmqpWebSockets as the ServiceBusTransportType.

Service Bus Client Initialization
5. Client Setup:

o Client Initialization:

o Service Bus clients are initialized using a shared service called ServiceBusManager.
o ServiceBusClient is created using certificate-based authentication for secure access.

o ServiceBusReceiver:

o For each queue or subscription, a ServiceBusReceiver is instantiated using CreateReceiver.
o The ReceiveMode is explicitly set to PeekLock to allow message processing with visibility control.

Message Processing Workflow
6. How Messages Are Processed:

o Dequeue:

o Messages are fetched from Service Bus using an API call that invokes the ReceiveMessagesAsync method.

o Processing:

o Each message is processed according to its associated workflow type (e.g., signing, scanning, or other business logic).

o Completion/Abandonment:

o After processing, messages are either:
o Completed: Using CompleteMessageAsync.
o Abandoned: Using AbandonMessageAsync.
o Moved to Dead-Letter Queue: Using DeadLetterMessageAsync.

API-Based Interaction
7. Interaction via APIs:
o All interactions with Service Bus (dequeue, complete, abandon) are performed through API endpoints.
o These APIs internally rely on ServiceBusManager to handle interactions with the Service Bus.

o API Endpoints:

Dequeue API:
o Fetches messages using ReceiveMessagesAsync.

Completion/Abandonment API:
o Marks messages as completed or abandoned using their MessageId.

Memory Dump Analysis:
• The observations were made on an offline memory dump captured during runtime, not through live debugging. This ensures that the debugger did not interfere with object lifetimes or trigger lazy allocations.
• The Microsoft.Azure.Amqp.BufferListStream objects, along with their associated properties (e.g., CreationAt and ExpiresAt timestamps), were inspected to determine their lifecycle behavior.
• Despite the expiration timestamps, these objects were still present in memory, indicating that they might not have been cleaned up as expected.

AMQP Buffer Behavior:
• As mentioned above, My application uses the Azure Service Bus library with AmqpWebSockets transport and PeekLock mode for receiving messages.
• The Service Bus operations (e.g., dequeue, complete, abandon) are handled through API calls, and prefetch is explicitly not enabled.
• While analyzing the dump, it appears that some messages remained in the AMQP buffer despite their locks being expired and no status updates (e.g., complete or abandon) being issued for them. This is the primary concern.

• Is it expected behavior for Microsoft.Azure.Amqp.BufferListStream objects (and associated AMQP message streams) to persist in memory after their locks have expired, especially if no further operations (e.g., CompleteMessageAsync, AbandonMessageAsync, etc.) are performed.
• Should these objects automatically move to the dead-letter queue once their TTL or delivery count is exceeded, even if no status update is explicitly sent?

Below screenshot memory dump below was taken on 12/31/2024:
sorted based on the size "Microsoft.Azure.Amqp.BufferListStream"
Image

I go through "Microsoft.Azure.Amqp.BufferListStream",
Image

creation at: 12/25/2024 14:06:33 PM
expires at: 12/25/2024 14:11:33 PM
Image

@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Dec 31, 2024
@jsquire
Copy link
Member

jsquire commented Dec 31, 2024

Hi @RaviBell. Thanks for the additional context. As mentioned previously, we'll need to see code and log details to assist. For convenience, copied here:

For assistance, we'll need to understand the end-to-end scenario for how the application is using the Service Bus library. Either a small, stand-alone app that reproduces what you're seeing or snippets from your application showing which Service Bus client types you're using, how you initialize them, and how you interact with them to read/process/complete messages would be helpful. In addition, seeing a capture of verbose Azure SDK logs for a +/- 5-minute period around the behavior would allow us to understand the client's perspective. Guidance for capturing logs can be found in the article Logging with the Azure SDK for .NET.

@jsquire
Copy link
Member

jsquire commented Dec 31, 2024

it appears that some messages remained in the AMQP buffer despite their locks being expired and no status updates (e.g., complete or abandon) being issued for them. This is the primary concern.

This is expected. Locking is a service concept that the client has no insight into nor direct influence over. Unless you're using a processor type with auto renew set, when you receive a batch of messages, those messages exist on the client until you explicitly consume them and complete/abandon them. Lock expiration does not expire the message client-side.

The service, however, will increase the delivery count and consider the message abandoned. If your resource is configured such that the delivery count has a threshold, once that is reached, the message will be moved to the dead-letter queue.

@jsquire jsquire added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Dec 31, 2024
Copy link

Hi @RaviBell. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-author-feedback Workflow: More information is needed from author to address the issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus
Projects
None yet
Development

No branches or pull requests

4 participants