Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: nvs: Add configurable CRC-24 support for ATEs #74432

Conversation

RICCIARDI-Adrien
Copy link
Contributor

Our current application requires a more robust CRC than CRC-8.
This pull requests allows to select the current ATE CRC-8 (this is the default choice), or to select a CRC-24 for improved robustness.
This 3-byte CRC requires 2 more flash memory bytes per ATE than the 1-byte CRC.

Allow to choose CRC-8 or CRC-24 to protect an ATE integrity.

CRC-8 takes less space in NVS memory but is likely to fail detecting
all ATE corruptions.
CRC-24 takes 2 more bytes per ATE than CRC-8, but is able to detect
almost all ATE corruptions in a reliable way.

Note that the ATE struct is no more a multiple of 4 bytes when CRC-24
is enabled.

Signed-off-by: Adrien Ricciardi <[email protected]>
Select the CRC-24 algorithm instead of the CRC-8 one to protect the ATE
integrity and run all existing tests.

Signed-off-by: Adrien Ricciardi <[email protected]>
The CONFIG_NVS_ATE_CRC8 configuration item keeps the current behavior
of the ATE CRC (1-byte CRC), while the newly introduced
CONFIG_NVS_ATE_CRC24 configuration item allows to use a more robust
3-byte CRC.

Signed-off-by: Adrien Ricciardi <[email protected]>
@Laczen
Copy link
Collaborator

Laczen commented Jun 18, 2024

Hi @RICCIARDI-Adrien, I don't think this proposal is a good idea.

The crc8 that is added to the ate is added mainly to ensure that a write was completed. The crc8 protects 7 byte data and as such should be more than sufficient.

When the crc8 is used to ensure that the data in the ate is correct (data retention) there are 2 possible failures:

  1. one of the 7 bytes is wrong and thus the data should not be used,
  2. the crc8 is wrong,

As the chance that the error occurs is equal for all bytes there is a possibility of 1/8 that an ate is rejected even when the data is correct,

When a crc24 is used the possibility of rejecting an ate becomes 3/10, so the use of the crc24 increases the possibility of unneedingly rejecting the ate.

@de-nordic de-nordic assigned Laczen and unassigned de-nordic Jun 18, 2024
@RICCIARDI-Adrien
Copy link
Contributor Author

Hi @Laczen,

Thank you for your quick reply.

In our case, we would prefer a lot discarding good data because of a corrupted CRC than taking for good a corrupted data, because the later would be far more harmful for us.

One of our concern is a corrupted length field in the last ATE. There is a 1/256 chance that a 8-bit CRC can't detect such corruption, while there is 1/16777216 chance that the same issue occurs with a 24-bit CRC.
If the length of the data part is lowered due to a bit flip that goes undetected, then the next element write to the NVS will try to write to a non-erased flash area, leading to a write error in most flash-based microcontrollers. All subsequent writes to the NVS would then fail, kind of "deadlocking" the NVS with no possiblity of detecting such issue and no way to fix it.

The changes in the NVS code are quite small, and do not impact the default behavior of using a 8-bit CRC.

@Laczen
Copy link
Collaborator

Laczen commented Jun 18, 2024

Hi @Laczen,

Thank you for your quick reply.

In our case, we would prefer a lot discarding good data because of a corrupted CRC than taking for good a corrupted data, because the later would be far more harmful for us.

One of our concern is a corrupted length field in the last ATE. There is a 1/256 chance that a 8-bit CRC can't detect such corruption, while there is 1/16777216 chance that the same issue occurs with a 24-bit CRC. If the length of the data part is lowered due to a bit flip that goes undetected, then the next element write to the NVS will try to write to a non-erased flash area, leading to a write error in most flash-based microcontrollers. All subsequent writes to the NVS would then fail, kind of "deadlocking" the NVS with no possiblity of detecting such issue and no way to fix it.

This can't happen. NVS already checks if there is any data behind the area pointed to by the last ate. This error could happen because of unfinished writes or as a result of a corrupted (smaller) length field, both are handled.

@RICCIARDI-Adrien
Copy link
Contributor Author

It's nice that these issues are already addressed. The paid attention to the data integrity was one of the key criteria for us to choose the NVS as our storage backend.

Speaking of data integrity, we are still concerned by the effect of an undetected bit flipping. I think that the code that detects an aborted data write after the last ATE successful write is this one :

while (fs->ate_wra > fs->data_wra) {

What if an ATE ID, data offset or length is corrupted, for instance, during runtime and the CRC-8 fails to detect it ?

A lot of logic relies on the fact that nvs_ate_valid() can't be wrong. If an ATE CRC-8 goes bad during the garbage collection process or a nvs_read_history() call, something unpredictable can happen.

The data part of an NVS item is now protected by a 32-bit CRC, so the 8-bit CRC granting the metadata integrity seems quite weak now.

Don't hesitate to correct me if I'm wrong !

@Laczen
Copy link
Collaborator

Laczen commented Jun 18, 2024

It's nice that these issues are already addressed. The paid attention to the data integrity was one of the key criteria for us to choose the NVS as our storage backend.

Speaking of data integrity, we are still concerned by the effect of an undetected bit flipping. I think that the code that detects an aborted data write after the last ATE successful write is this one :

while (fs->ate_wra > fs->data_wra) {

What if an ATE ID, data offset or length is corrupted, for instance, during runtime and the CRC-8 fails to detect it ?
A lot of logic relies on the fact that nvs_ate_valid() can't be wrong. If an ATE CRC-8 goes bad during the garbage collection process or a nvs_read_history() call, something unpredictable can happen.

The data part of an NVS item is now protected by a 32-bit CRC, so the 8-bit CRC granting the metadata integrity seems quite weak now.

Don't hesitate to correct me if I'm wrong !

Bit flipping in the ate should be detected in the crc-8. When the crc-8 goes bad it means that the data can no longer be trusted and the data will not be found. If older data is still present the older data will be used. For bit flipping using a larger crc could even make things worse as the bit flipping would also happen in the crc data.

For the data a bigger crc needs to be used as the data is allowed to be 64k (not really, but in theory). The metadata is only 7 bytes (56bit) and thus the crc-8 should be fine.

An improvement could be made not to use a crc8 but instead a 8bit ecc that would allow detecting and correcting bit flips.

To conclude, the use of the crc8 in the ate is primarily done to determine that a write of data has been completed succesfully, the use of a 8-bit crc for 7 byte data is sufficient and allows detection of bit flipping in the ate rendering the stored data invalid. Increasing the 8-bit crc to a 24 bit crc is not needed and might even result in worse reliability of nvs.

Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants