Update j9jni_deleteGlobalRef assertion check to allow exclusive access#23595
Update j9jni_deleteGlobalRef assertion check to allow exclusive access#23595JasonFengJ9 wants to merge 1 commit intoeclipse-openj9:masterfrom
Conversation
|
Please read the comments in the issue again - I believe my suggestion is incorrect, and there's an ifdef to remove. |
If you meant following Lines 1699 to 1701 in 14edc02 |
This doesn't tigger an assertion failure for @dmitripivkine does this work for |
|
The problem with metronome is that one thread acqiures exclusive and then hands off to another thread to do the work, so the thread doing the class unloading does not have the count set. |
In the core |
Ensure j9jni_deleteGlobalRef() caller has exclusive or VM access. Signed-off-by: Jason Feng <fengj@ca.ibm.com>
cecfd64 to
a0b7de2
Compare
Changed to check |
|
@gacholio this is ready for another look. |
|
jenkins test functional.extended xlinux jdk25 |
|
jenkins test extended.functional xlinux jdk25 |
1 similar comment
|
jenkins test extended.functional xlinux jdk25 |
|
Testing has aborted two tries in a row. |
|
Attached GC thread stacktrace The thread with VM access @dmitripivkine does this sound a problem because GC didn't acquire VM access first? |
|
Also not clear to me how this change could cause any problems in metronome - the assertion is more permissive than it was (and clearly isn't the issue) and the ifdef change also clearly has no effect on metronome. |
|
I suggest grinding the test with an without this change to prove it's an existing issue. |
I ruled out the hang was caused by this change via restore the original code and just removed the assertion. |
|
OK, once @dmitripivkine weighs in, I'll merge this. |
Without this change, assertion failure occurred consistently (that's why it was a blocker issue) |
So the assertion failure was hiding the VM access issue. We should probably open a new issue for the underlying problem. |
|
May be worth looking into when this failure started to try an isolate the change that caused it. |
I feel an assertion failure is better than the hang to save machine resources. |
There was a recent GC change that affects |
As per #23581 (comment) Actually this was introduced the previous night where the xlinux builds failed due to disk space issues. eclipse-openj9/openj9-omr@db4e137...1207e46 |
In theory this is expected behaviour. There is a transfer of control between mutator thread triggered GC to Main GC dedicated thread. So, VM Access flag still be set for mutator thread requested exclusive and GC uses internal synchronization mechanisms after transfer of control. And there is complication of possible race between System GC request and scheduled GC increment. This logic is complicated (may be overcomplicated) and always was a point of attention. However there should not be any connection with current issue. I think this updated assertion code should go in.
@JasonFengJ9 Would you please refer to it? |
|
Thank you! #15173 removes Ownable Synchronizers support. This change is spreads widely across GC code but it is straight forward and should not have impact in this case. |
Recent changes looks unrelated but I am going to review them again. |
|
@JasonFengJ9 May I have access to the system core mentioned in #23595 (comment)? |
|
Ah, I was blind: the hang is related with #15173 indeed. It occurs at attempt to perform System GC in order to rebuild Ownable Synchronizers list. This is new code path. |
|
New code rebuilds Ownable Synchronizers list from scratch. In order to perform this it initiates System GC. And this synchronous System GC request caused hangs (or may be even incorrect VM Access transfer by some reason). We are looking to this issue. So it is possible this assertion change is not going to be needed after all. |
Here it is https://ibm.box.com/s/21va3gsnyjvrvkn9d42qr4gapxjhh70p |
Thank you! |
|
Unfortunately looks like this file is broken, I can not unzip it anywhere. |
Initially I attempted to upload here which requires one of recognized suffix such as |
|
C-stack for main GC thread shows exclusive request from jit code. There is the same problem as we got assertion at the first place. Now, with assertion loosen up, there is another manifestation of the problem in JIT code. Both components (VM and JIT) can not recognize that exclusive access has been taken already, particularly VM Access flag is set. |
|
The fix is supplied by GC team. |
Update
j9jni_deleteGlobalRefassertion check to allowexclusiveaccessEnsure
j9jni_deleteGlobalRef()caller has exclusive or VM access.closes #23570
Signed-off-by: Jason Feng fengj@ca.ibm.com