Errar es humano. Propagar errores automáticamente es #devops

https://vsis.online/

  • 2 Posts
  • 50 Comments
Joined 1 year ago
cake
Cake day: June 18th, 2023

help-circle










  • vsis@feddit.clOPtoLinux@lemmy.mlIs my NVME drive dying?
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    8 months ago

    I opened it. All cables were looking good. I used a hand blower to clean the dust. Taked out the SSD and blew the socket and everything around.

    Now I’m going to monitor if it keeps happening.

    $ journalctl --since yesterday  | grep -c "nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical"
    16
    

  • vsis@feddit.clOPtoLinux@lemmy.mlIs my NVME drive dying?
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 months ago

    sudo smartctl -a /dev/nvme0

    $ sudo smartctl -a /dev/nvme0
    [sudo] password for ****:
    smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.7.6-arch1-2] (local build)
    Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Number:                       INTEL HBRPEKNX0202A
    Serial Number:                      BTTE95101RQM512B-1
    Firmware Version:                   G002
    PCI Vendor/Subsystem ID:            0x8086
    IEEE OUI Identifier:                0x5cd2e4
    Controller ID:                      1
    NVMe Version:                       1.3
    Number of Namespaces:               1
    Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
    Namespace 1 Formatted LBA Size:     512
    Local Time is:                      Fri Mar  8 12:09:53 2024 CET
    Firmware Updates (0x14):            2 Slots, no Reset required
    Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
    Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
    Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
    Maximum Data Transfer Size:         32 Pages
    Warning  Comp. Temp. Threshold:     77 Celsius
    Critical Comp. Temp. Threshold:     80 Celsius
    
    Supported Power States
    St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
     0 +     3.50W       -        -    0  0  0  0        0       0
     1 +     2.70W       -        -    1  1  1  1        0       0
     2 +     2.00W       -        -    2  2  2  2        0       0
     3 -   0.0250W       -        -    3  3  3  3     2000    5000
     4 -   0.0040W       -        -    4  4  4  4     5000    9000
    
    Supported LBA Sizes (NSID 0x1)
    Id Fmt  Data  Metadt  Rel_Perf
     0 +     512       0         0
    
    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    SMART/Health Information (NVMe Log 0x02)
    Critical Warning:                   0x00
    Temperature:                        30 Celsius
    Available Spare:                    100%
    Available Spare Threshold:          10%
    Percentage Used:                    32%
    Data Units Read:                    6,877,173 [3.52 TB]
    Data Units Written:                 9,397,485 [4.81 TB]
    Host Read Commands:                 54,359,124
    Host Write Commands:                239,213,047
    Controller Busy Time:               2,412
    Power Cycles:                       536
    Power On Hours:                     6,350
    Unsafe Shutdowns:                   62
    Media and Data Integrity Errors:    0
    Error Information Log Entries:      0
    Warning  Comp. Temperature Time:    0
    Critical Comp. Temperature Time:    0
    
    Error Information (NVMe Log 0x01, 16 of 256 entries)
    No Errors Logged
    
    Self-test Log (NVMe Log 0x06)
    Self-test status: No self-test in progress
    Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
     0   Extended          Completed without error                6334            -     -   -   -    -
     1   Short             Completed without error                6334            -     -   -   -    -
    


  • vsis@feddit.clOPtoLinux@lemmy.mlIs my NVME drive dying?
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    8 months ago

    I did a short and a long test. It looks good

    $ sudo smartctl -l selftest /dev/nvme0
    smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.7.6-arch1-2] (local build)
    Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF SMART DATA SECTION ===
    Self-test Log (NVMe Log 0x06)
    Self-test status: No self-test in progress
    Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
     0   Extended          Completed without error                6334            -     -   -   -    -
     1   Short             Completed without error                6334            -     -   -   -    -
    

  • vsis@feddit.clOPtoLinux@lemmy.mlIs my NVME drive dying?
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    8 months ago

    […] by replacing the motherboard, by replacing the processor, by reseating the NVME drive in its slot, by verifying that your power supply is reliable…

    I will start with the cheapest option 😅

    I assume the power supply is reliable. Having a battery should make it more stable I guess.