-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: iOS app crash caused by the request operation canceling #48350
base: main
Are you sure you want to change the base?
fix: iOS app crash caused by the request operation canceling #48350
Conversation
@zhouzh1 hi! Out of curiosity, why did you tag me in this PR? I keep getting tagged in React Native PRs for some reason, and I'm not sure why :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is proper fix for the issue.
Could you share more about the crash?
Is there an error message?
Can you share the full crashlog?
Is it happening in production only or also in debug?
Is the app in background or in foreground?
if (_fileQueue) { | ||
for (NSOperation *operation in _fileQueue.operations) { | ||
if ([operation isKindOfClass:[NSOperation class]] && !operation.isCancelled && !operation.isFinished) { | ||
[operation cancel]; | ||
} | ||
} | ||
_fileQueue = nil; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary. You can only add NSOperation
to an NSOperationQueue
and cancellAllOperations
runs the same code you are manually writing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cipolleschi Seems that the cancellAllOperations
won't do the status checks for the operation, and in the crash report of my iOS app, the stack trace exactly tells me the crash point is just in the cancellAllOperations
internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the official Apple docs for cancelAllOperations
.
This method calls the cancel method on all operations currently in the queue.
Canceling the operations does not automatically remove them from the queue or stop those that are currently executing. For operations that are queued and waiting execution, the queue must still attempt to execute the operation before recognizing that it is canceled and moving it to the finished state. For operations that are already executing, the operation object itself must check for cancellation and stop what it is doing so that it can move to the finished state. In both cases, a finished (or canceled) operation is still given a chance to execute its completion block before it is removed from the queue.
And this is the docs of NSOperation cancel
.
In any case, calling cancel
on an already cancelled or finished operation does not crash the app.
I believe that the crash is happening inside one of the operations that are being cancelled and that's why the crash reporter reports the crash there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that even though we already put assurance for the corresponding code to make it only executed on the main thread or another sole thread, e.g. the JS thread, but because of the nature of object pointer reference, the operation object could still be shared among multiple thread contexts, then the situation you said above happens.
if (_didInvalidate) { | ||
return; | ||
} | ||
RCTUnsafeExecuteOnMainQueueSync(^{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially deadlock. We should not run the unsafe variant of this method. Can you change it with RCTExecuteOnMainQueue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
|
||
RCTAssertMainQueue(); | ||
RCTLogInfo(@"Invalidating %@ (parent: %@, executor: %@)", self, _parentBridge, [self executorClass]); | ||
RCTAssertMainQueue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that confuses me is that, in your stacktrace, the crash is happening in Thread 26... but this assert should force the app to be on the main thread, which is not the Thread 26... how's this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cipolleschi Good question, that's because of actually the RCTAssertMainQueue
only takes effects in dev build, when in the release build, it does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I added the RCTUnsafeExecuteOnMainQueueSync
wrapper to ensure the code to run on the main thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good explanation, but then we should see crashes in development happening because of the assertion. And IIUC, the app does not crash in development, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the crash log, the JS thread is triggering the invalidation. I think that this is the root of the problem: after the JS thread detect the invalidation, we should jump on the UI thread to invalidate everything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't encounter this crash in development, but I am not sure if it happens in it, you know, it's a occasional issue on itself.
Hey @augustl, sorry, I thought you're a memeber of the react-native dev team, because your id was displayed in the hint list when I was typing the |
@cipolleschi Put more information here for your reference.
|
@zhouzh1, this diff assumes that the crash occurs because file reader invalidation should occur on the main thread. Have we validated that assumption? If not, we should! If so, then, why not just just surgically modify the file reader as opposed to the bridge? As it stands, your changes will cause a lot of code to execute on the main thread, which could have significant perf implications. |
@RSNara As you can see from the code and the above conversation with @cipolleschi , there is already a |
I just took a look! RCTCxxBridge invalidate should only ever be called from RCTBridge. And RCTBridge invalidate and reload (but not dealloc) schedules RCTCxxBridge invalidate on the main thread: So, it's very curious that you're running into this issue. Could it be that in your code, you're relying on the dealloc method of RCTBridge? And that just synchronously deallocates the RCTCxxBridge on the current (i.e: potentially non-main) thread? |
Your curiosity is mine as well. Before I submitted this PR, I was also suspecting if there is a certain place in my code or 3rd-party library code where the RCTBridge or the RCTCxxBridge deallocation is invoked explicitly, but I didn't manage to find it. However anyway, we always need to ensure the RCTCxxBridge invalidation to be run on the main thread, is it right? If so, it makes sense to wrap it with the |
@zhouzh1 is your app using Expo? I wonder if Expo does something under the hood to try and manage the lifecycle of the Bridge. Similarly, there might be libraries that attempt to do the same. If they connect some private API like the reload one to some JS function, it might happen that the invalidation process starts from the JS Thread instead of from the main one. 🤔 This is an hypothesis that we need to validate, thought.. It would be helpful to know what dependencies are you using. Also, are you using any crash reporting solution like Sentry? those product usually allow you to leave breadcrumbs that can be used to investigate crashes. What was the user doing in the app when the crash occurred? What was the last action issued? |
Yes, our app is using the expo and many of its associated libraries (e.g. the expo-updates, expo-camera, and so on), I think what you said above makes sense. |
@cipolleschi Any ideas about the above information I provided? |
It partially makes sense. Moving an app to background triggers the events on the main thread. So the whole invalidation should already happen in the right thread. Also, the purpose of the invalidation and requestCancelling, is exactly to avoid those kind of crashes... As soon as the app goes in background, the operation are cancelled and they should not be executed... :/ Any chance that you can try and prepare a repro using this template, so that we can investigate this problem with more ease? |
Summary:
Currently we observed many iOS app crashes caused by the
[RCTFileRequestHanlder invalidate]
method, just as the below screenshot.Changelog:
[IOS] [FIXED] - app crash caused by the
[RCTFileRequestHanlder invalidate]
methodTest Plan:
I am not able to reproduce this issue locally either, so the changes in this PR are totally from my inference, I am not sure if it really makes sense, so please help take a deeper look, thanks.