-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[otbn] Add an error bit for incoming lifecycle escalations #7625
Comments
Sounds like a good plan to me. Does the system require that we don't send an alert when we get an incoming lifecycle escalation? If not, maybe we should do that as well, which would keep everything looking a bit more consistent. |
@msfschaffner Is there a requirement one way or the other? I.e. do you get into a strange loop if we send a fatal alert as reaction to a lifecycle escalation? (Probably not?) |
Thanks for bringing this up! From a response point of view I think it does not matter much, since the system will be rendered totally non-functional upon assertion of @tjaychen @cdgori One thing we may want to think about however is the cause register "crashdump" of the alert handler. In a scenario where the system goes through all escalation phases, many alerts are going to trigger as a result of Note that this should not be of concern when the firmware reads out the alert cause registers as a result of an NMI escalation, since at that point, |
o that's a good point. Yes i think you're right, we want to capture the snapshot as we start the escalation process, and not capture fall out from it. |
CC @cindychip |
Ok I made a separate issue for this. |
OK, I see the issue and agree with @tjaychen - we should capture the snapshot early on and not retake it. I also think I agree with @imphil 's original proposal to unify the behavior of handling for fatal alerts from an OTBN perspective, unless there is something magical about the LC escalation I don't understand. (as @msfschaffner noted, the system will be very quickly rendered nonfunctional in the event of this kind of escalation) |
One thing to note is this proposal would have us setting the
Indicating that we want to avoid setting So do we want to revise that restriction or not set I guess the race condition is around the software thinking @rswarbrick wrote that line 9 months ago can you remember a specific scenario we were worried about? @imphil any recollections on your side> |
Yep, the point is that you don't want weird race conditions for software where |
btw @imphil @rswarbrick @GregAC Those blocks do not run software, but similarly they can detect errors when there is an operation running vs when it is not. The scheme in keymgr is very similar to what you guys are describing here, ie ERR_CODE (my version of err bits) sets when there's an operation, and FAULT_STATUS (my version of FATAL_ERR_CAUSE) sets asynchronously. I feel this is worth standardizing across the project (if not naming, then at least the behavior), since this question will undoubtedly come up again and again. Let me know what you think. We don't have to go back and change every block now, but it would be good to setup guidance for future blocks. |
adding @weicaiyang since we are discussing a very similar thing #7772 |
Agree. If we also unify the naming (both alert name and CSR status name), it will really help DV. |
Before, the incoming life cycle escalation handling was specified as "like a fatal alert, except for XZY." This commit simplifies the specification to handle an incoming life cycle escalation in the same way as a fatal error that was detected within OTBN. Fixes lowRISC#7625 Signed-off-by: Philipp Wagner <[email protected]>
Before, the incoming life cycle escalation handling was specified as "like a fatal alert, except for XZY." This commit simplifies the specification to handle an incoming life cycle escalation in the same way as a fatal error that was detected within OTBN. Fixes lowRISC#7625 Signed-off-by: Philipp Wagner <[email protected]>
I've now opened PR #7897 to make the incoming life cycle escalation signal a "normal" fatal error/alert. The normal fatal error behavior should already address the point @GregAC made: If a fatal error/alert (including an incoming life cycle escalation) is observed during an operation, ERR_BITS is set in addition to FATAL_ALERT_CAUSE and the operation is terminated in the usual way (which includes sending a "done" interrupt to unblock software if it wants to do so). If a fatal error is observed outside of an operation, only FATAL_ALERT_CAUSE is set. On the two other things which were discussed as part of this issue:
|
Before, the incoming life cycle escalation handling was specified as "like a fatal alert, except for XZY." This commit simplifies the specification to handle an incoming life cycle escalation in the same way as a fatal error that was detected within OTBN. Fixes #7625 Signed-off-by: Philipp Wagner <[email protected]>
Currently, we have two registers to indicate what error happened: ERR_BITS, and FATAL_ALERT_CAUSE.
We have one interesting case: The reaction to an incoming lifecycle escalation signal is specified as "An escalation request signaled through the
lc_escalate_en_i
signal results in the same action as a fatal error but does not raise a fatal alert."This wording would also imply that we don't set the FATAL_ALERT_CAUSE register (which is only specified as part of the fatal alert, not the fatal error.
I'm suggesting
to meto also set FATAL_ALERT_CAUSE for incoming lifecycle escalations, and make the lifecycle escalation just another fatal error.So here's my proposal:
Motivation:
Estimate for spec change and RTL implementation.
The text was updated successfully, but these errors were encountered: