-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New non-volatile storage system #77929
Comments
Zephyr platforms have a maximum write size of up to 512 bytes. |
@butok I saw this RFC #77576 |
Architecture WG:
|
I was thinking about one thing when learning about how NVS works - is separating data ATE From actual data worth it? We can make the data more dense by storing ATE - data pairs from the start of the sector.
The advantages would be that the ATE and DATA could be placed right next to each other - so we waste less space in case of larger write block size. On the other end the disadvantage is that we would need to do some address calculation to find every other data ATE except of the first one. But I think it would not be a reason for a noticeable slowdown - just calculate ATE start address + ATE size + data length and align it to the next start of the write block. |
Yes it is. It is easier to recover if something happens, otherwise you may just write something that looks like ATE to data and glitch device into attempting to read storage as your data mandates or loop. Also if you write in the loop you may basically loop the ATE/DATA storage without a way to figure out where it really ends. It is much easier to keep things working if you keep users out of area where metadata of your storage is stored. Same happens with any block-device oriented FS, where metadata is separated from data streams. Of course there is also a way to do that, for example introducing different alphabets for metadata and data, but this means that you end up in some 8 to N encodings (N > 8), and have to make sure that user data will not get encoded to look like metadata. |
OK I understand the reason now. For the purpose of saving space I really like the small data inside an ATE feature. For data that is a little larger than 4 bytes - would it be acceptable to write that right after ATE if the block is large enough? So maybe there could be another rule that if there is enough space in a block right after the ATE for the data than it would be stored there. Technically this is also mixing ATE and data but we would search for ATEs only on the start of the blocks anyway. What do you think about such feature? |
This could be done once the multiple format entries feature is added. Which means that you can have a different format which is larger and that holds N bytes of data |
The original design of NVS and the ZMS here intends to work with devices with relatively small write block sizes, wbs, that can be appended without altering other data (unless area is overwritten); this allows placing metadata in small chunks of constant size and data at variable size, with no mandated boundaries (except write block size) between data. Because ATE have same sizes, if wbs becomes large, it should be possible to start placing some data in it, for example if you have 32byte long wbs and 16 bytes of ATE, then any data of size <= 16 bytes can go into ATE, and it would not be a problem as wbs and size of ATE set boundaries, which means that the ATE and data in ATE wbs are still separated. Eventually you may have to erase some part of storage, but that happens because device requires it, for example flash, before it can be written. Using a magnetic tape analogy: erase head has to erase data before r/w head can write in are previously used. I understand that what you are trying to solve in your case @andrisk-dev , is a problem of relatively big write block size of your device that equals to erase block size - so you basically have a block device. You can see a difference here, where you can not really append data, directly on storage, you have to basically replace entire block contents, unless you are willing to append data at wbs of sector size. In your case, the scheme you have presented in comment #77929 (comment) could work, if you decide to divide your sector into ATE and data part assuming that you always write both as a single sector, for every sector, even if it carries continuation of data from previous sector, has that ATE part reserved and not available for users. Still, you will probably have some unused space wasted. What I understand is you are trying to provide your users with small reliable storage for basic data or settings, but I do not think that this PR will effectively solve your problem, at least not without significant complexity being introduced, as it is basically based on ability to freely append data at small granularity of xRAM and small wbs Flash devices, something your device does not provide. We can try to bend it your way, but I would rather focus first on making it solid solution for the devices it has been originally designed for. |
Thanks for your replies @rghaddab @de-nordic , I understand that the first version is to be as simple as possible. I think one solution that would enable us at NXP to make most of the 512 bytes write block size is to have ATE in different format, maybe we can call it long ATE here, which could store information about multiple data records in one place. The format would include information about a number of data items stored in that ATE and a list of metadata about all of them would follow. That way, even if the data stored individually would be still sparse in flash, when reallocating the data from erased sector to a new one we could pack the data much more densely. As this is more of a future release thing, I think the main question for now is how would the filesystem distinguish between normal entry fromat and an entry in different format. I think that should be decided now to make sure the "Support for entries in multiple formats" is possible in the future. |
This change is planned as following : |
@rghaddab, This needs to be a requirement on ZMS to not artificially limit the size. Optimize later if needed. Add warnings to make sure the users are aware of the impacts. |
Introduction
In recent years, advances to process nodes in embedded hardware have made it necessary to support non-volatile technologies different from the classical on-chip NOR flash, which is written in words but erased in pages. These new technologies do not require a separate erase operation at all, and data can be overwritten directly at any time.
On top of that, complexity of firware has not stopped growing, making it necessary to ensure that a solid, scalable storage mechanism is available for all applications. This storage needs to support millions of entries with solid CRC protection and multiple advanced features.
Problem description
In Zephyr, there are currently a few alternatives for non-volatile memory storage:
None of them are optimal for the current new wave of solid-state non-volatile memory technologies, including resistive (RRAM) and magnetic (MRAM) random-access, non-volatile memory, because they rely on the "page erase" abstraction whereas these devices do not require an erase operation at all, and data can be overwritten directly.
Additionally, none of the storage systems above is a good match for the widely used settings subsystem, given that they were never designed to operate as a backend for it.
The closest one is NVS, and an analysis of why it is not suitable can be found in the Alternatives section of this issue.
Proposed change
Create a new storage mechanism that fulfills the following requirements:
no-erase-required
flash drivers (i.e. RRAM, MRAM, etc)Potential names
Detailed RFC
Proposed change (Detailed)
General behavior:
ZMS divides the memory space into sectors (minimum 2), and each sector is filled with key/value pair until it is full , we close it then the storage system will move forward to the next sector until it reaches the end and then it starts again from the first sector after garbage collecting it and erasing its content.
Mounting the FS:
Mounting the filesystem will start by getting the flash parameters, checking that the file system properties are correct (sector_size, sector_count ...) Then initializes the file system.
Initialization of ZMS:
As the ZMS has a fast-forward write mechanism, we must find the last sector and the last pointer of the entry where it stopped the last time.
It must look for a closed sector followed by an open one, then within the open sector, it finds (recover) the last written ATE (Allocation Table Entry).
After that it checks that the sector after this one is empty, or it will erase it.
Composition of a sector.
A sector is organized in this form :
Close ATE is used to close a sector if a sector is full
Empty ATE is used to erase a sector
ATEn are entries that describe where the data is stored, its size and its crc32
Data is the written value
ZMS Key/value write :
To avoid rewriting the same data with the same ID again, it must look in all the sectors if the same ID exist then compares its data, if the data is identical no write is performed.
If we must perform a write, then an ATE and Data (if not a delete) are written in a sector
If the sector is full (cannot hold the current data + ATE) we have to move to the next sector, garbage collect the sector after the newly opened one then erase it.
Data size that is smaller or equal to 4 bytes are written within the ATE
ZMS read (with history):
By default it looks for the last data with the same ID and retrieves its data.
If history count is provided that is different than 0, older data with same ID is retrieved.
ZMS how does the cycle counter works ?
Each sector has a lead cycle counter which is a uin8_t that is used to validate all the other ATEs.
The lead cycle counter is stored in the empty ATE.
To become valid, an ATE must have the same cycle counter as the one stored in the empty ATE.
Each time an ATE is moved from a sector to another it must get the cycle counter of the destination sector.
To erase a sector, the cycle counter of the empty ATE is incremented. All the ATEs in that sector become invalid
ZMS how to close a sector ?
To close a sector a close ATE is added at the end of the sector and it must have the same cycle counter as the empty ATE
When closing a sector, all the remaining space that has not been used is filled with garbage data to avoid having old ATEs with a valid cycle counter.
ZMS triggering Garbage collector
Some applications need to make sure that storage writes have a maximum defined latency.
When calling a ZMS write, the current sector could be alomst full and we need to trigger the GC to switch to the next sector.
This operation is time consuming and it will cause some applications to not meet their real time constraints.
ZMS adds an API for the application to get the current remaining free space in a sector. The application could then decide when needed to switch to the next sector if the current one is almost full and of course it will trigger the garbage collection on the next sector. This will guarantee the application that the next write won't trigger the garbage collection.
ZMS structure of ATE (Allocation Table Entries)
An entry has 16 bytes divided between these variables :
ZMS wear leveling feature
This storage system is optimized for devices that do not require an erase.
Using storage systems that rely on an erase-value (NVS as an example) will need to emulate the erase with write operations. This will cause a significant decrease in the life expectancy of these devices and will cause more delays for write operations and for initialization.
ZMS introduces a cycle count mechanism that avoids emulating erase operation for these devices.
It also guarantees that every memory location is written only once for each cycle of sector write.
Dependencies
Only on flash drivers.
Concerns and Unresolved Questions
The first draft of this new storage system will not include all the features listed in the proposed change section.
This is intended to minimize the effort of reviewing this new storage system for developers that are familiar to NVS filesystem.
More changes will come in future patch lists
Alternatives
The one alternative we have considered would be to expand the existing NVS codebase in order to remove its described shortcomings. This is in fact how this new proposal was born, once expanding NVS was identified as suboptimal.
Among other issues, we identified the following:
More info in these Pull Requests:
The text was updated successfully, but these errors were encountered: