Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: eth_enc424j600: explicitly disable INTIE after reset #35439

Merged
merged 2 commits into from
May 21, 2021

Conversation

jfischer-no
Copy link
Collaborator

@jfischer-no jfischer-no commented May 19, 2021

After system reset (SETETHRST) interrupt enable register (EIE)
has the default value 0x8010 and global interrupt enable flag (INTIE)
is set. This is not desired and the INTIE flag should be set only at
the end of the initialization.
Disable INTIE flag and set desired interrupts sources in
a single write command just right after system reset.

Also clarify how unknown interrupt source should be handled.

Alternative fix for #35091
Fixes #35091

@jfischer-no jfischer-no added bug The issue is a bug, or the PR is fixing a bug area: Drivers area: Ethernet labels May 19, 2021
@jfischer-no jfischer-no added this to the v2.6.0 milestone May 19, 2021
@jfischer-no jfischer-no requested a review from pfalcon as a code owner May 19, 2021 08:59
@carlescufi carlescufi requested a review from nordicjm May 19, 2021 09:39
* Terminate interrupt handling thread because
* the integrity of the driver is no longer given.
*/
k_oops();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not accept this, the linux kernel driver ignores any unknown interrupts, no other zephyr ethernet driver does this. This k_oops and the continue needs to go entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of k_oops(), the device stop running after that which might not be desirable in embedded device.
Is it so that getting into this branch should not be possible after the changes in second commit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it so that getting into this branch should not be possible after the changes in second commit?

It should never be the case. The second commit fixes a bug that was found thanks to this brunch. Faulty driver should not just keep running. AFAIK we do not have any possibilities to tell the stack or application about an error from driver thread. I might be exaggerating, I could probably live with hiding it behind LOG_LEVEL_DBG, and it would also fit the style of the driver. But logging is for developer, in the field such an error would remains undetected.

diff --git a/drivers/ethernet/eth_enc424j600.c b/drivers/ethernet/eth_enc424j600.c
index 417b66ae07..f86718f319 100644
--- a/drivers/ethernet/eth_enc424j600.c
+++ b/drivers/ethernet/eth_enc424j600.c
@@ -484,7 +484,9 @@ static void enc424j600_rx_thread(struct enc424j600_runtime *context)
                        }
                } else {
                        LOG_ERR("Unknown Interrupt, EIR: 0x%04x", eir);
-                       continue;
+                       if (CONFIG_ETHERNET_LOG_LEVEL == LOG_LEVEL_DBG) {
+                               k_oops();
+                       }
                }
 
                enc424j600_set_sfru(context->dev, ENC424J600_SFR3_EIEL,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue that this bug fixed wouldn't have been an issue at all had the continue/k_oops not been there, the driver would have ignore the first spurious interrupt and then continued to process the second valid interrupt and functioned properly. Having just a log output with no continue/k_oops is fine with me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could probably live with hiding it behind LOG_LEVEL_DBG

Your proposal looks like a good tradeoff 👍

About the error happening in the field, that is certainly possible, but in that case perhaps it is better to ignore/skip the issue than to halt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue that this bug fixed wouldn't have been an issue at all had the continue/k_oops not been there, the driver would have ignore the first spurious interrupt and then continued to process the second valid interrupt and functioned properly. Having just a log output with no continue/k_oops is fine with me.

So you mean that there is no other bug except the one suspected in line 487? So is it okay for your products that there is an interrupt in the Ethernet driver firing for no reason?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drivers and applications are designed to cope with simplistic errors, e.g. a false interrupt, which could be caused by many things including a error in the silicon (of which Microchip note there is a bug in the enc424j600 silicon which causes a spurious interrupt), an error in the MCU silicon or firmware or even from an external source e.g. by a motor. You show me where in the linux kernel (which is used in thousands of embedded devices) whereby a network or modem driver will kill the system if it receives an unexpected interrupt or command:

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just wondering about the use of k_oops

* Terminate interrupt handling thread because
* the integrity of the driver is no longer given.
*/
k_oops();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of k_oops(), the device stop running after that which might not be desirable in embedded device.
Is it so that getting into this branch should not be possible after the changes in second commit?

The controller has several interrupt sources which are signaled
via a single INT pin. Only the interrupt sources that are explicitly
switched on during controller initialization may generate an interrupt
signal. Currently there are only PHY Link Status Change Interrupt
and RX Packet Pending Interrupt enabled. So there is no other reason why
an interrupt can be triggered.

Terminate interrupt handling thread on unknown interrupt
only when debugging, as there are concerns that stopping
thread in the field is going too far.

Signed-off-by: Johann Fischer <[email protected]>
After system reset (SETETHRST) interrupt enable register (EIE)
has the default value 0x8010 and global interrupt enable flag (INTIE)
is set. This is not desired and the INTIE flag should be set only at
the end of the initialization.

Disable INTIE flag and set desired interrupts sources in
a single write command just right after system reset.

Resolves: zephyrproject-rtos#35091

Reported-by: Jamie McCrae <[email protected]>
Signed-off-by: Johann Fischer <[email protected]>
Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Johann 👍 LGTM

@galak galak requested a review from nordicjm May 20, 2021 18:57
Copy link
Collaborator

@nordicjm nordicjm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not happy with the k_oops, no way should that be this code at all, but it's excluded by default so I'm done trying to explain driver development

@nashif nashif merged commit de974ef into zephyrproject-rtos:main May 21, 2021
@jfischer-no jfischer-no deleted the pr-fix-35091 branch May 21, 2021 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Drivers area: Ethernet area: Networking bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enc424j600 does not work
5 participants