Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasionally ANRs on Huawei devices #6294

Closed
stetro opened this issue Aug 13, 2019 · 22 comments
Closed

Occasionally ANRs on Huawei devices #6294

stetro opened this issue Aug 13, 2019 · 22 comments
Assignees

Comments

@stetro
Copy link

stetro commented Aug 13, 2019

Issue description

We experience occasionally ANRs on different Huawai devices when clearing video surface on an simpleExoPlayer instance. We experienced something similar before reported by my colleague in #3724.

Reproduction steps

Although we have similar or same devices to test on we cannot reproduce this issue in our environment.

A full bug report captured from the device

Unfortunatly we only have the obfuscated ANR log available but we could deobfuscate the following ANR report from play console:

"main" prio=5 tid=1 Waiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x751288f0 self=0x7a54615c00
  | sysTid=12663 nice=-10 cgrp=default sched=0/0 handle=0x7adaeea548
  | state=S schedstat=( 32648761976 3586515706 36908 ) utm=2876 stm=388 core=5 HZ=100
  | stack=0x7ff5661000-0x7ff5663000 stackSize=8MB
  | held mutexes=
  at java.lang.Object.wait (Native method)
- waiting on <0x0f3f840c> (a com.google.android.exoplayer2.O)
  at com.google.android.exoplayer2.PlayerMessage.boolean blockUntilDelivered() (SourceFile:8)
- locked <0x0f3f840c> (a com.google.android.exoplayer2.O)
  at com.google.android.exoplayer2.SimpleExoPlayer.void setVideoSurfaceInternal(android.view.Surface,boolean) (SourceFile:107)
  at com.google.android.exoplayer2.SimpleExoPlayer.void setVideoSurface(android.view.Surface) (SourceFile:11)
  at com.google.android.exoplayer2.SimpleExoPlayer.void clearVideoSurface(android.view.Surface) (SourceFile:6)
...

Version of ExoPlayer being used

We are using 2.10.3 in our current release.

Device(s) and version(s) of Android being used

Happens in Android 9 in the following devices

Mate 10 Pro (HWBLA)	103	27,3 %
P20 Pro (HWCLT)	69	18,3 %
P20 (HWEML)	66	17,5 %
Mate 20 lite (HWSNE)	43	11,4 %
HUAWEI P30 lite (HWMAR)	20	5,3 %
P30 Pro (HWVOG)	18	4,8 %
HUAWEI P30 (HWELE)	13	3,4 %
Mate 20 Pro (HWLYA)	10	2,7 %
Mate 20 (HWHMA)	9	2,4 %
HUAWEI P smart 2019 (HWPOT-H)	5	1,3 %
Honor 10 (HWCOL)	5	1,3 %
Honor 8X (HWJSN-H)	5	1,3 %
@google-oss-bot
Copy link
Collaborator

This issue does not seem to follow the issue template. Make sure you provide all the required information.

@tonihei tonihei self-assigned this Aug 19, 2019
@tonihei
Copy link
Collaborator

tonihei commented Aug 19, 2019

Could you try to set the same workarounds as used in #3724? That is set codecNeedsSetOutputSurfaceWorkaround to true?

@stetro
Copy link
Author

stetro commented Aug 20, 2019

Unfortunatly this would mean that we enable the workaround in our production environment since we cannot reproduce this issue on our local devices. And I think we should avoid this 🙈.
But we could try patching the Renderer and add the mentioned devices to try if this reduces this failure.

@tonihei
Copy link
Collaborator

tonihei commented Aug 20, 2019

To reproduce locally, can you try to switch surfaces

  1. from one surface to another (while playback is already running),
  2. from a surface to null (i.e. no surface)
  3. from null to to an actual surface.
    If that workaround helps, at least one of these cases should fail consistently on these devices.

@stetro
Copy link
Author

stetro commented Aug 20, 2019

Great! Thanks for the fast reply! I will try to reproduce this on one of these devices 👍

@stetro
Copy link
Author

stetro commented Aug 20, 2019

I build a demo application to try these switches on a Pixel 3a and an Honor10. Turns out, app is not crashing on Honor10 but exoplayer is cycling buffers from a previous set surface into the video rendering. When setting codecNeedsSetOutputSurfaceWorkaround to true this issue is gone.

@stetro
Copy link
Author

stetro commented Aug 20, 2019

On Pixel3a, sm-t813, lg-h850 the app behaviour is as expected.
P20 had the same issues as Honor 10

@tonihei
Copy link
Collaborator

tonihei commented Aug 21, 2019

I tested the surface switches on a Mate 10 Pro (because that seems to get the most errors in your table above), but couldn't reproduce the problem nor any cycling buffers. There also shouldn't be any problems in theory because Android device specification now ensures the correct behavior on SDK versions 27+.

The cycling buffer issue you mentioned above is probably different from the setVideoSurface ANR one. So maybe file a new issue and provide more detailed reproduction steps so that we can have a closer look.

For the ANR issue, this might also just be a case of the Android platform MediaCodec code being too slow to respond to the setSurface command. See #5887 and #5078 for other examples of this. Unfortunately, we need to block on these methods to ensure we don't accidentally leak decoder instances for other apps to use. And we are also trying to get stricter guarantees for MediaCodec calls in future Android releases to prevent this from happening.

Besides that. it would still be good to see if setting the workaround flag helps to eliminate the issues you are having with these devices because we could then add them to the workaround list.

@stetro
Copy link
Author

stetro commented Aug 22, 2019

Thanks for your efforts and feedback!
I will upload my demonstration code and write another issue about the buffer cycling behaviour.
I will add the mentioned device list to the workaround list in our build and followup with results if this resolves our issue.

@ojw28
Copy link
Contributor

ojw28 commented Aug 29, 2019

Can we duplicate this onto #6331? The symptoms may be different but it seems like the solution is going to be the same.

@stetro
Copy link
Author

stetro commented Aug 30, 2019

May let us wait until we have verified ANR statistics with our upcoming release to ensure that this will help with the ANR issues. (~1.5 Weeks)

@google-oss-bot
Copy link
Collaborator

Hey @stetro. We need more information to resolve this issue but there hasn't been an update in 14 days. I'm marking the issue as stale and if there are no new updates in the next 7 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

@stetro
Copy link
Author

stetro commented Sep 18, 2019

Hi, we now have our release distributed to clients and it turns out that these ANRs still happen during app start on the above shown devicelist in a similar frequency. In conclusion to this I assume that the workaround does not have any effect. 😞

We now investigate further if this ANR happens in combination of Exoplayer and a weird view-lifecycle on devices running EMUI 9. But this again will take a couple of days until we see evidence after our sprint release.

@tonihei
Copy link
Collaborator

tonihei commented Sep 27, 2019

Thanks for the update!

@krok55
Copy link

krok55 commented Feb 9, 2020

Hello,
I have a similar issue on ALL devices (not just Huawei).
Some of them report on ANR at blockUntilDelivered. Here is a typical ANR report:

"main" prio=5 tid=1 Waiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x75794770 self=0x3073021c00
  | sysTid=23917 nice=0 cgrp=default sched=0/0 handle=0x2fed5cd548
  | state=S schedstat=( 10073527995 6894125827 42689 ) utm=789 stm=217 core=2 HZ=100
  | stack=0x7fd1b1d000-0x7fd1b1f000 stackSize=8MB
  | held mutexes=
  at java.lang.Object.wait (Native method)
- waiting on <0x0c5fefc8> (a com.google.android.exoplayer2.PlayerMessage)
  at com.google.android.exoplayer2.PlayerMessage.c (PlayerMessage.java:283)
- locked <0x0c5fefc8> (a com.google.android.exoplayer2.PlayerMessage)
  at com.google.android.exoplayer2.SimpleExoPlayer.a (SimpleExoPlayer.java:1471)
  at com.google.android.exoplayer2.SimpleExoPlayer.a (SimpleExoPlayer.java:71)
  at com.google.android.exoplayer2.SimpleExoPlayer$ComponentListener.surfaceDestroyed (SimpleExoPlayer.java:1719)
  at android.view.SurfaceView.updateSurface (SurfaceView.java:641)
  at android.view.SurfaceView.onWindowVisibilityChanged (SurfaceView.java:252)
  at android.view.View.dispatchWindowVisibilityChanged (View.java:12868)
  at android.view.ViewGroup.dispatchWindowVisibilityChanged (ViewGroup.java:1553)
  at android.view.ViewGroup.dispatchWindowVisibilityChanged (ViewGroup.java:1553)
  at android.view.ViewGroup.dispatchWindowVisibilityChanged (ViewGroup.java:1553)
  at android.view.ViewRootImpl.performTraversals (ViewRootImpl.java:1854)
  at android.view.ViewRootImpl.doTraversal (ViewRootImpl.java:1536)
  at android.view.ViewRootImpl$TraversalRunnable.run (ViewRootImpl.java:7502)
  at android.view.Choreographer$CallbackRecord.run (Choreographer.java:949)
  at android.view.Choreographer.doCallbacks (Choreographer.java:761)
  at android.view.Choreographer.doFrame (Choreographer.java:696)
  at android.view.Choreographer$FrameDisplayEventReceiver.run (Choreographer.java:935)
  at android.os.Handler.handleCallback (Handler.java:873)
  at android.os.Handler.dispatchMessage (Handler.java:99)
  at android.os.Looper.loop (Looper.java:193)
  at android.app.ActivityThread.main (ActivityThread.java:6720)
  at java.lang.reflect.Method.invoke (Native method)
  at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:493)
  at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:858)

I investigated a little bit and I think that the cause is a possible deadlock in the following scenario:
I have 3 threads:

  1. The ExoPlayer thread (the one that I used to create ExoPlayer and send control messages)
  2. Other Async thread for doing some application-specific logic
  3. The UI thread

In my application I need to perform some logic on the ExoPlayer thread, since this is the only way I can access various states of the player. So, during this I may request a lock on a shared resource in my application.
Now, lets assume that this lock is being held by my other ASync thread, and in the same time, I try to perform the following command on the UI thread:
setVisibility(INVISIBLE) for making the video container (my custom VideoView) to disappear.
This will cause the destroySurface event to be thrown on the UI thread, and then the following code in SimpleExoPlayer.java will block the UI thread:

 private void setVideoSurfaceInternal(@Nullable Surface surface, boolean ownsSurface) {
    // Note: We don't turn this method into a no-op if the surface is being replaced with itself
    // so as to ensure onRenderedFirstFrame callbacks are still called in this case.
    List<PlayerMessage> messages = new ArrayList<>();
    for (Renderer renderer : renderers) {
      if (renderer.getTrackType() == C.TRACK_TYPE_VIDEO) {
        messages.add(
            player.createMessage(renderer).setType(C.MSG_SET_SURFACE).setPayload(surface).send());
      }
    }
    if (this.surface != null && this.surface != surface) {
      // We're replacing a surface. Block to ensure that it's not accessed after the method returns.
      try {
        for (PlayerMessage message : messages) {
          message.blockUntilDelivered();
        }
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }

I assume that the dead-lock is bacause you need the ExoPlayer thread to be running, in order to activate a flow that will release the message block - but it's blocked.

I would like to suggest to add a special case where the surface parameter is null (which is the case in my scenario), so that no messages will be sent or blocked. You don't need to send any messages if the surface is being destroyed, right?

Thanks

@tonihei
Copy link
Collaborator

tonihei commented Jul 6, 2020

@stetro. Do you have any updates for the investigations mentioned above?

@stetro
Copy link
Author

stetro commented Jul 6, 2020

Hi @tonihei Thanks for the reminder: Unfortunately we could not find any evidence with different view behavior on EMUI devices.

@tonihei
Copy link
Collaborator

tonihei commented Jul 6, 2020

Thanks! In this case, I'll close this issue because there is nothing we can do about it. If anything new comes up, please feel free to reopen (or file a new issue).

@tonihei tonihei closed this as completed Jul 6, 2020
@stetro
Copy link
Author

stetro commented Jul 6, 2020

But maybe for completeness: Since we constantly keep updating to the latest ExoPlayer release we see a decay of ANRs in our releases during the last couple months. Unfortunately I cannot pinpoint a specific release which might has fixed an issue here.

@waseefakhtar
Copy link

But maybe for completeness: Since we constantly keep updating to the latest ExoPlayer release we see a decay of ANRs in our releases during the last couple months. Unfortunately I cannot pinpoint a specific release which might has fixed an issue here.

Did a new release solve the issue for you @stetro? We happen to have a similar ANR experienced only on Huawei HUAWEI P smart+ 2019 (HWPOT-H) using version 2.10.3. I'd like to know if a newer release solved it for you.

@stetro
Copy link
Author

stetro commented Jul 6, 2020

We are running 2.11.5 and we ware seeing only a couple of ANRs on those devices the last three months
image Before it was like 100 cases per day per device type.

@waseefakhtar
Copy link

We are running 2.11.5 and we ware seeing only a couple of ANRs on those devices the last three months
image Before it was like 100 cases per day per device type.

Looks promising! Will try updating it as well and report back how things go. Thanks!

@google google locked and limited conversation to collaborators Sep 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants