Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attributes flag does not have expected function #6619

Closed
ryan-peck opened this issue Nov 6, 2019 · 2 comments · Fixed by #6638
Closed

Attributes flag does not have expected function #6619

ryan-peck opened this issue Nov 6, 2019 · 2 comments · Fixed by #6638
Labels
area/smart bug unexpected problem or unintended behavior
Milestone

Comments

@ryan-peck
Copy link
Contributor

Relevant telegraf.conf:

[[inputs.smart]]
  use_sudo = true

System info:

Telegraf version 1.12.4
CentOS 7
Smartctl 7.0

Steps to reproduce:

  1. Run telegraf with attributes flag set to false
  2. Observe that NVMe devices do not record temperature, while non-NVMe devices do, even if the non-NVMe devices hold this information in the vendor specific attributes section
  3. Run telegraf with attributes flag set to true
  4. Observe that NVMe and non-NVMe devices now record temperature

Expected behavior:

smartctl --info --health --attributes --tolerance=verypermissive --nocheck standby --format=brief /dev/nvme0
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZQLB3T8HALS-00007
Serial Number:                      S438NF0M304843
Firmware Version:                   EDA5202Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 3,840,755,982,336 [3.84 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          3,840,755,982,336 [3.84 TB]
Namespace 1 Utilization:            60,272,201,728 [60.2 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue Nov  5 17:01:43 2019 PST

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        27 Celsius   

^The above line should be recorded with attributes set to false

Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    7,994,886 [4.09 TB]
Data Units Written:                 333,054 [170 GB]
Host Read Commands:                 17,607,817
Host Write Commands:                1,411,082
Controller Busy Time:               44
Power Cycles:                       52
Power On Hours:                     506
Unsafe Shutdowns:                   34
Media and Data Integrity Errors:    0
Error Information Log Entries:      5
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               27 Celsius
Temperature Sensor 2:               31 Celsius
Temperature Sensor 3:               36 Celsius
smartctl --info --health --attributes --tolerance=verypermissive --nocheck standby --format=brief /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     SAMSUNG MZ7LM3T8HMLP-00005
Serial Number:    S2TYNX0J702931
LU WWN Device Id: 5 002538 c406fe884
Firmware Version: GXT5404Q
User Capacity:    3,840,755,982,336 bytes [3.84 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov  5 16:49:04 2019 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Power mode was:   IDLE

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  9 Power_On_Hours          -O--CK   098   098   000    -    5758
 12 Power_Cycle_Count       -O--CK   098   098   000    -    1487
177 Wear_Leveling_Count     PO--C-   099   099   005    -    62
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   100   100   010    -    0
180 Unused_Rsvd_Blk_Cnt_Tot PO--C-   100   100   010    -    13078
181 Program_Fail_Cnt_Total  -O--CK   100   100   010    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   010    -    0
183 Runtime_Bad_Block       PO--C-   100   100   010    -    0
184 End-to-End_Error        PO--CK   100   100   097    -    0
187 Uncorrectable_Error_Cnt -O--CK   100   100   000    -    0
190 Airflow_Temperature_Cel -O--CK   073   046   000    -    27
194 Temperature_Celsius     -O---K   073   046   000    -    27 (Min/Max 20/54) 

^The above line should not record temperature with attributes set to false

195 ECC_Error_Rate          -O-RC-   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   100   100   000    -    0
199 CRC_Error_Count         -OSRCK   099   099   000    -    4
202 Exception_Mode_Status   PO--CK   100   100   010    -    0
235 POR_Recovery_Count      -O--C-   099   099   000    -    1474
241 Total_LBAs_Written      -O--CK   099   099   000    -    264081103488
242 Total_LBAs_Read         -O--CK   099   099   000    -    210403669236
243 SATA_Downshift_Ct       -O--CK   100   100   000    -    0
244 Thermal_Throttle_St     -O--CK   100   100   000    -    0
245 Timed_Workld_Media_Wear -O--CK   100   100   000    -    65535
246 Timed_Workld_RdWr_Ratio -O--CK   100   100   000    -    65535
247 Timed_Workld_Timer      -O--CK   100   100   000    -    65535
251 NAND_Writes             -O--CK   100   100   000    -    531474149184
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

Actual behavior:

The opposite. When attributes is false, non-nvme temperature is recorded from the attributes section, while nvme temperature is not. When attributes is true, all temperatures are recorded.

Additional info:

I believe the issue comes from misplacing the if collectAttributes line in the smart.go file. I also believe that smart_test.go should be amended to not only check that all required fields are present, but also that all fields that should be excluded are not present.

@danielnelson
Copy link
Contributor

Thanks for the detailed report. I think you are right on what needs done, any chance you would be able to make your changes and open a pull request?

@danielnelson danielnelson added area/smart bug unexpected problem or unintended behavior labels Nov 6, 2019
@ryan-peck
Copy link
Contributor Author

Sure, I can do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/smart bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants