Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new fields to status/dom_sensor/pm tables in STATE_DB for CMIS/C-CMIS #304

Merged
merged 12 commits into from
Dec 7, 2022

Conversation

longhuan-cisco
Copy link
Contributor

@longhuan-cisco longhuan-cisco commented Oct 18, 2022

Add new fields to status/dom_sensor/pm tables in STATE_DB for CMIS/C-CMIS

Description

  1. Add all the fields that are supported by c_cmis get_transceiver_bulk_status API but not currently present to dom_sensor table.
  2. Add new fields (provided by cmis.py:get_transceiver_status() API and c_cmis.py:get_transceiver_status() API ) to status table(TRANSCEIVER_STATUS).
    following SW fields(non-eeprom-fetched) are already present in today's status table:
status                       = 1*255VCHAR                       ; code of the module status (plug in, plug out)
error                        = 1*255VCHAR                       ; module error (N/A or a string consisting of error descriptions joined by "|", like "error1 | error2" )

        following HW fields (eeprom-fetched) are added per this HLD (this is just snippet, refer to HLD for complete list):

module_state                 = 1*255VCHAR                       ; current module state (ModuleLowPwr, ModulePwrUp, ModuleReady, ModulePwrDn, Fault)
module_fault_cause           = 1*255VCHAR                       ; reason of entering the module fault state
datapath_firmware_fault      = BOOLEAN                          ; datapath (DSP) firmware fault
module_firmware_fault        = BOOLEAN                          ; module firmware fault
module_state_changed         = BOOLEAN                          ; module state changed
datapath_hostlane1           = 1*255VCHAR                       ; data path state indicator on host lane 1
......
rxsigpowerhighwarning_flag   = BOOLEAN                          ; rxsigpower high warning flag
rxsigpowerlowwarning_flag    = BOOLEAN                          ; rxsigpower low warning flag
  1. Add pm related fields to a newly created table TRANSCEIVER_PM in STATE_DB. (DB schema is defined in this HLD
  2. Add corresponding DB update/delete handling along the xcvrd flow.
  3. Add/modify test cases on test_xcvrd.py to cover newly added code.

Here's the entire DB schema of status tbl and dom_sensor tbl, according to the new HLD from PR sonic-net/SONiC#1076

Motivation and Context

How Has This Been Tested?

Verified on CMIS/C-CMIS optics, interfaces come up fine, and new fields show up in DB

UT output for dom_sensor table:
dom_sensor_tbl_ut_output.txt

UT output for status table:
status_tbl_ut_output.txt

UT output for status table in xcvrd restart case:
xcvrd_restart_test.txt

UT output for pm table:
pm_table_output.txt

UT output for pm table in xcvrd restart case:
xcvrd_restart_test_pm_tbl.txt

UT result for test_xcvrd.py:
ut_w_coverage.txt

Additional Information (Optional)

Note
Please don't merge this PR until the dependency PR sonic-net/sonic-platform-common#315 gets merged.

* Push new fields to DOM_SENSOR table in STATE_DB

* Remove empty line

Signed-off-by: Longyin Huang [email protected]
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 18, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@longhuan-cisco
Copy link
Contributor Author

longhuan-cisco commented Oct 19, 2022

@prgeor please review (couldn't add to reviewers list directly, thus add here)

Copy link
Contributor

@mihirpat1 mihirpat1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also test restarting xcvrd to ensure that the table gets deleted and updated accordingly

return SFP_EEPROM_NOT_READY

except NotImplementedError:
helper_logger.log_error("This functionality is currently not implemented for this platform")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be better to add some unique info here so that we can distinguish this error message from other error messages (as the same string is being used in multiple places in the file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unique log.

@longhuan-cisco
Copy link
Contributor Author

Can you please also test restarting xcvrd to ensure that the table gets deleted and updated accordingly

Yea Attached restart test log in description.

@longhuan-cisco
Copy link
Contributor Author

@prgeor @mihirpat1 I took care of the existing comments, can you please review?

@longhuan-cisco longhuan-cisco changed the title Add new fields to status table and dom_sensor table in STATE_DB for CMIS/C-CMIS Add new fields to status/dom_sensor/pm tables in STATE_DB for CMIS/C-CMIS Nov 8, 2022
@longhuan-cisco
Copy link
Contributor Author

longhuan-cisco commented Nov 8, 2022

Added pm table related change, to use this PR to track changes for all three tables: status/dom_sensor/pm. (Since they all touch the similar flow in xcvrd, and some is dependent on another, it would be good to combine them together.)
@prgeor @mihirpat1 Could you please take another look?

@prgeor prgeor added the CMIS label Nov 22, 2022
@prgeor prgeor self-requested a review November 22, 2022 02:27
@@ -2124,6 +2212,8 @@ def retry_eeprom_reading(self):
if rc != SFP_EEPROM_NOT_READY:
post_port_dom_info_to_db(logical_port, self.port_mapping, self.xcvr_table_helper.get_dom_tbl(asic_index))
post_port_dom_threshold_info_to_db(logical_port, self.port_mapping, self.xcvr_table_helper.get_dom_tbl(asic_index))
update_port_transceiver_status_table_hw(logical_port, self.port_mapping, self.xcvr_table_helper.get_status_tbl(asic_index))
post_port_pm_info_to_db(logical_port, self.port_mapping, self.xcvr_table_helper.get_pm_tbl(asic_index))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we making sure this is done only for ZR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside post_port_pm_info_to_db(), the following statement will make sure to skip for other non-ZR cases (where platform API get_transceiver_pm() throws NotImplementedError and then _wrapper_get_transceiver_pm() returns empty dict):

            if not pm_info_dict:
                continue

@prgeor prgeor requested a review from keboliu November 22, 2022 02:36
mihirpat1
mihirpat1 previously approved these changes Nov 22, 2022
pm_info_dict = _wrapper_get_transceiver_pm(physical_port)
if pm_info_cache is not None:
# If cache is enabled, put dom information to cache
pm_info_cache[physical_port] = pm_info_dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we going to use this pm_info_cache[physica_port] ? I dont see it being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pm_info_cache is passed to post_port_pm_info_to_db() from task_worker() of class DomInfoUpdateTask (in task_worker, we already have similar cache logic for dom_info dom_th_info in today's code)
pm_info_cache is used here:

        if pm_info_cache is not None and physical_port in pm_info_cache:
            # If cache is enabled and pm info is in cache, just read from cache, no need read from EEPROM
            pm_info_dict = pm_info_cache[physical_port]

One possible situation I can think of: task_worker() later walks through a different logical port which belongs to a previously visited physical port, then cache is hit.

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved

def delete_port_from_status_table_hw(logical_port_name, port_mapping, status_tbl):
for physical_port_name in get_physical_port_name_dict(logical_port_name, port_mapping).values():
found, fvs = status_tbl.get(physical_port_name)
Copy link
Collaborator

@keboliu keboliu Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use physical_port_name as the index instead of logical_port_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this function is supposed to only delete the HW fields of status tbl, which are fetched from eeprom (more precisely VDM) and per physical port.

Copy link
Collaborator

@keboliu keboliu Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe most of the current xcvrd relevant fields in the state DB are fetched from the hw(EEPROM), but all of them are using 'logical_port_name' as the index, there will be duplicated information in the DB(multiple logical_port can be created from the same physical port), but with this design, CLI/SNMP can easily query them with the logical port name. I suggest following the same convention here. @prgeor please also share your view here.

Copy link
Contributor Author

@longhuan-cisco longhuan-cisco Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, SW part (fields 'status' and 'error') of status table are not from EEPROM, and all the other fields on status table (introduced by this PR) and other tables are from EEPROM.
And currently, only SW part of status table is using logical_port_name as key, all the other existing tables (dom_sensor tbl/info tbl, those are eeprom-feteched) are using physical_port_name as index. I'm just making the HW part of status table aligned with other existing tables.

e.g. in today’s post_port_sfp_info_to_db(), post_port_dom_threshold_info_to_db() and post_port_dom_info_to_db(), they all have the following code to use physical port to set DB:

    for physical_port in physical_port_list:
        …..
        port_name = get_physical_port_name(logical_port_name, ganged_member_num, ganged_port)
        …..
        table.set(port_name, fvs)

We discussed with @prgeor about this before, and we agreed to better use logical_port_name as key, and sonic-net/sonic-buildimage#10144 can be used to track this pending work item to clean up all the existing code to use logical_port_name as key for DB tables. But the plan was to keep this item as lower priority, since it may only cause issue on DPB case for now. And also the history/necessarity of ganged port needs to be figured out first (ganged port can have 1:n mapping from logical port to physical port), so that we can decide wether the ganged port related logic can be cleaned up around this area.
Also, to clean this area up, more consideration/discussion would be needed later on whether we need to divide the corresponding eeprom-fetched fields belonging to the same physical port into multiple groups based on logical ports.
e.g. for the following fields in TRANSCEIVER_INFO table, if we are in breakout mode, each logical port will only have part of the 8 lanes, we might need to divide them based on logical port.

        active_apsel_hostlane1       = INTEGER                  ; active application selected code assigned to host lane 1
        active_apsel_hostlane2       = INTEGER                  ; active application selected code assigned to host lane 2
        active_apsel_hostlane3       = INTEGER                  ; active application selected code assigned to host lane 3
        active_apsel_hostlane4       = INTEGER                  ; active application selected code assigned to host lane 4
        active_apsel_hostlane5       = INTEGER                  ; active application selected code assigned to host lane 5
        active_apsel_hostlane6       = INTEGER                  ; active application selected code assigned to host lane 6
        active_apsel_hostlane7       = INTEGER                  ; active application selected code assigned to host lane 7
        active_apsel_hostlane8       = INTEGER                  ; active application selected code assigned to host lane 8

Copy link
Collaborator

@keboliu keboliu Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, SW part (fields 'status' and 'error') of status table are not from EEPROM, and all the other fields on status table (introduced by this PR) and other tables are from EEPROM. And currently, only SW part of status table is using logical_port_name as key, all the other existing tables (dom_sensor tbl/info tbl, those are eeprom-feteched) are using physical_port_name as index. I'm just making the HW part of status table aligned with other existing tables.

e.g. in today’s post_port_sfp_info_to_db(), post_port_dom_threshold_info_to_db() and post_port_dom_info_to_db(), they all have the following code to use physical port to set DB:

    for physical_port in physical_port_list:
        …..
        port_name = get_physical_port_name(logical_port_name, ganged_member_num, ganged_port)
        …..
        table.set(port_name, fvs)

If I am not wrong, from the below code seems that for all none-ganged ports it will always use logical port name including table DOM and Transceiver info table. By the way, I haven't seen any vendor claiming that they have ganged port on any of their platforms.

def get_physical_port_name(logical_port, physical_port, ganged):
    if ganged:
        return logical_port + ":{} (ganged)".format(physical_port)
    else:
        return logical_port

We discussed with @prgeor about this before, and we agreed to better use logical_port_name as key, and sonic-net/sonic-buildimage#10144 can be used to track this pending work item to clean up all the existing code to use logical_port_name as key for DB tables. But the plan was to keep this item as lower priority, since it may only cause issue on DPB case for now. And also the history/necessarity of ganged port needs to be figured out first (ganged port can have 1:n mapping from logical port to physical port), so that we can decide wether the ganged port related logic can be cleaned up around this area. Also, to clean this area up, more consideration/discussion would be needed later on whether we need to divide the corresponding eeprom-fetched fields belonging to the same physical port into multiple groups based on logical ports. e.g. for the following fields in TRANSCEIVER_INFO table, if we are in breakout mode, each logical port will only have part of the 8 lanes, we might need to divide them based on logical port.

If you change the index of the tables, I supposed the CLIs (e.g 'show interface transceiver eeprom') will be broken, and also SNMP and other telemetry applications assuming the index shall be logical port index will also be broken, please double confirm.

        active_apsel_hostlane1       = INTEGER                  ; active application selected code assigned to host lane 1
        active_apsel_hostlane2       = INTEGER                  ; active application selected code assigned to host lane 2
        active_apsel_hostlane3       = INTEGER                  ; active application selected code assigned to host lane 3
        active_apsel_hostlane4       = INTEGER                  ; active application selected code assigned to host lane 4
        active_apsel_hostlane5       = INTEGER                  ; active application selected code assigned to host lane 5
        active_apsel_hostlane6       = INTEGER                  ; active application selected code assigned to host lane 6
        active_apsel_hostlane7       = INTEGER                  ; active application selected code assigned to host lane 7
        active_apsel_hostlane8       = INTEGER                  ; active application selected code assigned to host lane 8

continue
beautify_pm_info_dict(pm_info_dict, physical_port)
fvs = swsscommon.FieldValuePairs([(k, v) for k, v in pm_info_dict.items()])
table.set(physical_port_name, fvs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the key as logical port just like done for DOM and other table?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the CLI and streaming telemetry will need logical port as key in future when we do breakout of physical port to many logical ports.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keboliu if you check the usage of port_name in previous change:-

table.set(port_name, fvs)

port_name is nothing but physical port name at line 496 in previous code :-

port_name = get_physical_port_name(logical_port_name, ganged_member_num, ganged_port)

so, i don't see this PR causing any issue. We need to revisit once we support breakout in Xcvrd.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor I think we already support breakout in XCVRD, please check this PR: #212

mihirpat1
mihirpat1 previously approved these changes Dec 5, 2022
prgeor
prgeor previously approved these changes Dec 5, 2022
@longhuan-cisco longhuan-cisco dismissed stale reviews from prgeor and mihirpat1 via 4e30850 December 6, 2022 22:20
@lgtm-com
Copy link

lgtm-com bot commented Dec 6, 2022

This pull request introduces 1 alert when merging 4e30850 into adcd69b - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog.

@prgeor prgeor merged commit 56046dc into sonic-net:master Dec 7, 2022
@prgeor
Copy link
Collaborator

prgeor commented Jan 18, 2023

@yxieca please cherry-pick to 202205

@yxieca
Copy link

yxieca commented Jan 19, 2023

@longhuan-cisco this change cannot be cherry-picked to 202205 cleanly. Please raise separate PR. Thanks!

@prgeor
Copy link
Collaborator

prgeor commented Jan 19, 2023

@StormLiangMS please help cherry-pick to 202211

@StormLiangMS
Copy link

@prgeor I suppose you want to have this one in 202211 other than 202111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants