Skip to content

Commit

Permalink
mlxsw: core: Add validation of transceiver temperature thresholds
Browse files Browse the repository at this point in the history
Validate thresholds to avoid a single failure due to some transceiver
unreliability. Ignore the last readouts in case warning temperature is
above alarm temperature, since it can cause unexpected thermal
shutdown. Stay with the previous values and refresh threshold within
the next iteration.

This is a rare scenario, but it was observed at a customer site.

Fixes: 6a79507 ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Signed-off-by: Vadim Pasternak <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
  • Loading branch information
vadimp-nvidia authored and kuba-moo committed Jan 10, 2021
1 parent b774134 commit 57726eb
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,12 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
if (err)
return err;

if (crit_temp > emerg_temp) {
dev_warn(dev, "%s : Critical threshold %d is above emergency threshold %d\n",
tz->tzdev->type, crit_temp, emerg_temp);
return 0;
}

/* According to the system thermal requirements, the thermal zones are
* defined with four trip points. The critical and emergency
* temperature thresholds, provided by QSFP module are set as "active"
Expand All @@ -190,11 +196,8 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
tz->trips[MLXSW_THERMAL_TEMP_TRIP_NORM].temp = crit_temp;
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HIGH].temp = crit_temp;
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HOT].temp = emerg_temp;
if (emerg_temp > crit_temp)
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
MLXSW_THERMAL_MODULE_TEMP_SHIFT;
else
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp;

return 0;
}
Expand Down

0 comments on commit 57726eb

Please sign in to comment.