Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error status registers during an operation, and otherwise #7901

Open
Tracked by #20680
imphil opened this issue Aug 24, 2021 · 12 comments
Open
Tracked by #20680

Error status registers during an operation, and otherwise #7901

imphil opened this issue Aug 24, 2021 · 12 comments
Labels
Component:DV DV issue: testbench, test case, etc. Component:RTL Earlgrey-PROD Triaged Temporary label to triage issues into Earlgrey-PROD Milestones Hotlist:Silicon to be resolved in Silicon Committee Priority:P3 Priority: low Triaging:MultipleBlocks Issue is relevant for the triage of multiple HW blocks Type:FutureRelease Not relevant to currently planned releases/milestones Type:Task Tasks, to-do list.
Milestone

Comments

@imphil
Copy link
Contributor

imphil commented Aug 24, 2021

Multiple IP blocks in OpenTitan have two main operational states: performing an operation ("busy"), or not ("idle").

An operation is a chunk of work that is performed by the IP block. Most times (always?) an operation is triggered by the host CPU, performed by the IP block, and then a completion signal is sent back to the host CPU, e.g. in the form of an interrupt or a change in the IP block's STATUS register which software on Ibex is polling.

IP blocks in OpenTitan are able to detect errors in both operational states and have CSRs which indicate the error that was detected. Typically, two CSRs are provided:

  • one CSR which is used for errors which are observed during an operation ("synchronous errors"). This register is also read by host software to check if the operation was successful.
  • another CSR for errors which are detected outside of an operation or "asynchronous errors". This register is typically used for debugging only and might end up in the crash dump that the alert handler records.

It would be nice if we could come up with shared semantics and ideally even naming for these error status registers.

We put a fair amount of thought into this topic for OTBN and have written it down at https://docs.opentitan.org/hw/ip/otbn/doc/index.html#design-details-error-handling-and-reporting.

@tjaychen mentioned in #7625 (comment) that keymgr has similar functionality with slightly different naming (ERR_CODE vs ERR_BITS and FAULT_STATUS vs FATAL_ALERT_CAUSE).

To push this discussion forward, I'd be interested in

I'll then come up with a consolidated proposal.

@tjaychen
Copy link

@imphil one thing to get your opinion on...some blocks (i believe kmac as an example) have an explicit priority in terms of their equivalent to ERR_CODE. So instead of having individual bits for different conditions, there is a code that represents "the most egregious error encountered".

Do you see this as fitting into this scheme here? It feels like to me that the ERR_CODE is still representative of a "synchronous" error even if it does not break out all bits. Or do you feel it's better to have a particular bit indicate there is a code to read, and have ERR_CODE separate from ERR_BITS?

@tjaychen tjaychen added the Priority:P1 Priority: high label Aug 27, 2021
@tjaychen
Copy link

@imphil i'm experimenting with some priority flags, so don't take it too seriously for now.
I'm going to discuss with a few people and see what makes sense in terms of what they really mean.

@martin-lueker
Copy link
Contributor

FYI: @dsandrs, @igk8

@imphil, this sounds like a completely reasonable thing to do. However, please forgive me if my next response is predictable.

At this point, with the silver netlist already delivered, this sounds like a very-nice-to-have. Changing register names is going to cascade to verification test-benches. The schedule impact might be medium to small, but I don't think there is much spare capacity. I'll leave it to @dsandrs to make a call from the logistical standpoint.

Regarding the technical viewpoint: SPI_HOST does have a similar error register. They are all synchronous errors.

@igk8 would have to make an analysis on i2c, which being a target device, has the potential for async errors.

entropy_src definitely has the potential for async error conditions, but I'm not sure that all of it's error signals are even condensed into a single register. @mwbranstad any thoughts?

@mwbranstad
Copy link
Contributor

@imphil this would have been a good thing to do about 6-12 months ago. So far, I have followed what people (mostly @msfschaffner ) have suggested in the PR feedback. Changing now would impact DV most importantly. I advocate no change at this point in time.

@tjaychen
Copy link

hey all,
i think for now, let's try to put aside whether this change will apply now or in the future, and just try to arrive at some conclusions on what makes the most sense. At least that's how i'm approaching this :)

For example, it would be good to think about what this scheme "doesn't" cover with your particular blocks. For example...are asynchronous errors always fatal faults? (they are for keymgr) Are synchronous errors always the recoverable variety or can they be fatal also etc.

@tjaychen
Copy link

tjaychen commented Feb 7, 2022

This issue somewhat duplicates the idea behind #531
There is also a general proposal for how to unify errors that should be considered at a later date.

@tjaychen tjaychen added Priority:P3 Priority: low Component:DV DV issue: testbench, test case, etc. Component:RTL and removed Priority:P1 Priority: high labels Feb 7, 2022
@andreaskurth
Copy link
Contributor

I came across this issue while triaging spi_host. IIUC, this change is not necessary for M2.5 nor the other currently planned milestones. I'm thus tagging it Type:FutureRelease Not relevant to currently planned releases/milestones .

@andreaskurth andreaskurth added Type:FutureRelease Not relevant to currently planned releases/milestones Triaging:MultipleBlocks Issue is relevant for the triage of multiple HW blocks labels Feb 22, 2023
@vogelpi
Copy link
Contributor

vogelpi commented Feb 24, 2023

Sounds good to me @andreaskurth , I've triaged this for entropy_src.

@GregAC
Copy link
Contributor

GregAC commented Feb 24, 2023

Triaged for otbn I agree with the above

@msfschaffner msfschaffner added the Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones label Oct 6, 2023
@msfschaffner msfschaffner added the Hotlist:Silicon to be resolved in Silicon Committee label Nov 6, 2023
@msfschaffner msfschaffner added this to the Earlgrey-PROD.M2 milestone Nov 7, 2023
@msfschaffner
Copy link
Contributor

I don't think we will be able to perform this alignment for PROD, since it will be pretty invasive.
I hence triaged this into the backlog.

Let me know if you think otherwise.

CC @moidx

@msfschaffner msfschaffner modified the milestones: Earlgrey-PROD.M2, Backlog Dec 4, 2023
@cdgori
Copy link

cdgori commented Dec 5, 2023

Agreed @msfschaffner - this is a "would have been nice" (and would be good if any clean-sheet work on new blocks is done in the future to align error signaling) but seems unlikely for PROD.

@msfschaffner msfschaffner added Earlgrey-PROD Triaged Temporary label to triage issues into Earlgrey-PROD Milestones and removed Earlgrey-PROD Candidate Temporary label to triage issues into Earlgrey-PROD Milestones labels Dec 6, 2023
@vogelpi
Copy link
Contributor

vogelpi commented Dec 19, 2023

I agree that we won't be able to align the behavior of our blocks for production. But it's probably worth to formulate a recommendation e.g. as part of the comportability spec for future clean-sheet work. I think there are other, similarly suitable issues and I will create an new issue to collect those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:DV DV issue: testbench, test case, etc. Component:RTL Earlgrey-PROD Triaged Temporary label to triage issues into Earlgrey-PROD Milestones Hotlist:Silicon to be resolved in Silicon Committee Priority:P3 Priority: low Triaging:MultipleBlocks Issue is relevant for the triage of multiple HW blocks Type:FutureRelease Not relevant to currently planned releases/milestones Type:Task Tasks, to-do list.
Projects
None yet
Development

No branches or pull requests