• How to salvage a degraded mdadm RAID1 with as little data loss as poss

    From Paul Leiber@3:633/10 to All on Saturday, June 20, 2026 18:10:02
    Subject: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Hi everybody,

    I am using a MDADM software RAID1 as a dedicated MariaDB database file system. The devices used for the RAID1 are two partitions of identical size which are LUKS encrypted. The devices are decrypted via entries in /etc/crypttab. The resulting RAID1 is called /dev/md0, formatted as XFS. (For completeness' sake: md0 is then forwarded to a database VM which stores the database on the device, but that shouldn't play a role for my questions, IIUC.)

    Some time ago, I noticed that the database content changed after a reboot. Recent changes to the databases were seemingly lost. I couldn't pinpoint the cause for this, but attributed it to an unclean shutdown of the database prior to reboot of the database VM. Data loss in a database of course is not ideal, so I kept on looking. It seems that I have now identified the root cause for the data loss in the RAID1.

    I checked the RAID1:

    root@xxx:~# cat /proc/mdstat
    Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md0 : active raid1 dm-30[2]
    ˙ ˙ ˙ 1073593280 blocks super 1.2 [2/1] [_U]
    ˙ ˙ ˙ bitmap: 7/8 pages [28KB], 65536KB chunk

    The [_U] seems to indicate that the RAID1 is currently degraded and using just one of the two partitions is currently used for the RAID1.

    Checking the partitions the RAID1 is based on gives the following output:

    root@xxx:~# mdadm --examine /dev/dm-31
    /dev/dm-31:
    ˙ ˙ ˙ ˙ ˙ Magic : a92b4efc
    ˙ ˙ ˙ ˙ Version : 1.2
    ˙ ˙ Feature Map : 0x1
    ˙ ˙ ˙Array UUID : 6834a10d:edb03a51:cef24158:f9abc812
    ˙ ˙ ˙ ˙ ˙ ˙Name : xxx:0˙ (local to host xxx)
    ˙ Creation Time : Fri Nov˙ 4 16:05:45 2022
    ˙ ˙ ˙Raid Level : raid1
    ˙ ˙Raid Devices : 2

    ˙Avail Dev Size : 2147186655 sectors (1023.86 GiB 1099.36 GB)
    ˙ ˙ ˙Array Size : 1073593280 KiB (1023.86 GiB 1099.36 GB)
    ˙ Used Dev Size : 2147186560 sectors (1023.86 GiB 1099.36 GB)
    ˙ ˙ Data Offset : 264192 sectors
    ˙ ˙Super Offset : 8 sectors
    ˙ ˙Unused Space : before=264112 sectors, after=95 sectors
    ˙ ˙ ˙ ˙ ˙ State : clean
    ˙ ˙ Device UUID : 01c96166:ee782cc7:57bcf889:2ee53b43

    Internal Bitmap : 8 sectors from superblock
    ˙ ˙ Update Time : Wed Jun 17 13:17:46 2026
    ˙ Bad Block Log : 512 entries available at offset 16 sectors
    ˙ ˙ ˙ ˙Checksum : d46fa108 - correct
    ˙ ˙ ˙ ˙ ˙Events : 5397997


    ˙ ˙Device Role : Active device 0
    ˙ ˙Array State : A. ('A' == active, '.' == missing, 'R' == replacing)


    and


    root@xxx:~# mdadm --examine /dev/dm-30
    /dev/dm-30:
    ˙ ˙ ˙ ˙ ˙ Magic : a92b4efc
    ˙ ˙ ˙ ˙ Version : 1.2
    ˙ ˙ Feature Map : 0x1
    ˙ ˙ ˙Array UUID : 6834a10d:edb03a51:cef24158:f9abc812
    ˙ ˙ ˙ ˙ ˙ ˙Name : xxx:0˙ (local to host xxx)
    ˙ Creation Time : Fri Nov˙ 4 16:05:45 2022
    ˙ ˙ ˙Raid Level : raid1
    ˙ ˙Raid Devices : 2

    ˙Avail Dev Size : 2147186655 sectors (1023.86 GiB 1099.36 GB)
    ˙ ˙ ˙Array Size : 1073593280 KiB (1023.86 GiB 1099.36 GB)
    ˙ Used Dev Size : 2147186560 sectors (1023.86 GiB 1099.36 GB)
    ˙ ˙ Data Offset : 264192 sectors
    ˙ ˙Super Offset : 8 sectors
    ˙ ˙Unused Space : before=264112 sectors, after=95 sectors
    ˙ ˙ ˙ ˙ ˙ State : clean
    ˙ ˙ Device UUID : 637fc155:8fb21b7c:fff27b71:c7ea1094

    Internal Bitmap : 8 sectors from superblock
    ˙ ˙ Update Time : Fri Jun 19 21:34:19 2026
    ˙ Bad Block Log : 512 entries available at offset 16 sectors
    ˙ ˙ ˙ ˙Checksum : b110fe9d - correct
    ˙ ˙ ˙ ˙ ˙Events : 4814810


    ˙ ˙Device Role : Active device 1
    ˙ ˙Array State : .A ('A' == active, '.' == missing, 'R' == replacing)

    It can be seen that update time and number of events differ between both partitions, which seems to indicate different data. I am assuming that due to some circumstance (wild guess: a race condition when unlocking the LUKS encryption), the RAID1 is more or less randomly using only one of the partitions, which then results in differing database versions, depending on which of the two partitions is currently used.

    I also think that I found a possible cause for this misbehaviour. My /etc/mdadm/mdadm.conf contains just the default settings:

    # mdadm.conf
    #
    # !NB! Run update-initramfs -u after updating this file.
    # !NB! This will ensure that initramfs has an uptodate copy.
    #
    # Please refer to mdadm.conf(5) for information about this file.
    #

    # by default (built-in), scan all partitions (/proc/partitions) and all
    # containers for MD superblocks. alternatively, specify devices to scan, using # wildcards if desired.
    #DEVICE partitions containers

    # automatically tag new arrays as belonging to the local system
    HOMEHOST <system>

    # instruct the monitoring daemon where to send mail alerts
    MAILADDR root

    # definitions of existing MD arrays

    # This configuration was auto-generated on Fri, 04 Nov 2022 15:52:55 +0100 by mkconf

    Somehow, I missed to include the RAID1 information for md0 to the configuration file (e. g. by entering root@localhost:~# mdadm --detail --scan /dev/md0 >> /etc/mdadm/mdadm.conf). I am not sure if this actually is the cause and adding that information would solve the issue.

    My questions are the following:

    1. Is my analysis valid in principle? Especially: Could the root cause for this issue be that mdadm.conf is missing the information for md0, and could adding that information prevent data loss or inconsistencies in the future?
    2. Can I (re)create the RAID1 md0 or (re-)add the missing partition in an easy way that no or at least not all information is lost? If yes, how?

    I assume that it might not be possible to sync the data from two different database versions without data loss. If this assumption is correct, I am willing to use one data set (e. g. the one on dm-31) and discard the other data set (e. g. the one on dm-30). Guides I found so far describe how to set up a new RAID1 and copy the data from a partition to the new RAID1. However, perhaps I am wondering if it is possible to (re-)create a RAID1 using just one existing partition (e. g. dm-31) without losing the data on this partition, and then add the other partition to the RAID1?

    The databases are backed up regularly. However, the backup is incremental, and it seems that the different database versions are messing up the incremental backup, therefore my last valid backup doesn't include the most recent changes to the database. If it is not possible to salvage the data on one or both of the partitions, I could swallow the bitter pill and go back to a previous database state without unacceptable consequences. However, I would like to try to salvage as much data as possible.

    Thank you in advance,

    Paul

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Charles Curley@3:633/10 to All on Saturday, June 20, 2026 20:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On Sat, 20 Jun 2026 18:01:15 +0200
    Paul Leiber <paul@onlineschubla.de> wrote:

    Somehow, I missed to include the RAID1 information for md0 to the configuration file (e. g. by entering root@localhost:~# mdadm
    --detail --scan /dev/md0 >> /etc/mdadm/mdadm.conf). I am not sure if
    this actually is the cause and adding that information would solve
    the issue.
    My questions are the following:

    1. Is my analysis valid in principle? Especially: Could the root
    cause for this issue be that mdadm.conf is missing the information
    for md0, and could adding that information prevent data loss or inconsistencies in the future?

    I doubt that this is the culprit. the man page for mdadm says, in part:

    Assemble
    Assemble the components of a previously created array
    into an active array. Components can be explicitly given or can
    be searched for. mdadm checks that the components do form a
    bona fide array, and can, on request, fiddle superblock
    information so as to assemble a faulty array.

    So mdadm *should* find both devices. But it might not be. And adding
    that line will not hurt. I have a similar line in my mdadm.conf.

    I built my RAID array up a bit differently that you did yours. You made
    your partitions, put LUKS on the partitions, then the RAID on top of
    that. I have the partitions, then the RAID array, LUKS on top of that,
    then LVM, with file systems on top of the LVs. But I know of no reason
    your setup shouldn't work.

    I have found that when I have multiple LUKS partitions, giving them
    all the same passphrase means I need give only one passphrase to
    decrypt on boot.

    2. Can I (re)create the RAID1 md0 or (re-)add the missing partition
    in an easy way that no or at least not all information is lost? If
    yes, how?

    Yes. For the gory details see https://oneuptime.com/blog/post/2026-03-02-how-to-replace-a-failed-disk-in-mdadm-raid-on-ubuntu/view.

    In short,

    * Fail the offending disk. It looks like this has already happened, but
    it shouldn't hurt to do it again.

    * Remove the disk from the array.

    * Add the disk back in again. This should trigger rebuilding, which
    takes a while. During the rebuild, the data should be both readable
    and writable. You may monitor with:

    cat /proc/mdstat

    --
    Does anybody read signatures any more?

    https://charlescurley.com
    https://charlescurley.com/blog/

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Christensen@3:633/10 to All on Saturday, June 20, 2026 21:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/20/26 09:01, Paul Leiber wrote:
    Hi everybody,

    I am using a MDADM software RAID1 as a dedicated MariaDB database file system. The devices used for the RAID1 are two partitions of identical
    size which are LUKS encrypted. The devices are decrypted via entries
    in /etc/crypttab. The resulting RAID1 is called /dev/md0, formatted as
    XFS. (For completeness' sake: md0 is then forwarded to a database VM
    which stores the database on the device, but that shouldn't play a role
    for my questions, IIUC.)

    Some time ago, I noticed that the database content changed after a
    reboot. Recent changes to the databases were seemingly lost. I couldn't pinpoint the cause for this, but attributed it to an unclean shutdown of
    the database prior to reboot of the database VM. Data loss in a database
    of course is not ideal, so I kept on looking. It seems that I have now identified the root cause for the data loss in the RAID1.

    I checked the RAID1:

    root@xxx:~# cat /proc/mdstat
    Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md0 : active raid1 dm-30[2]
    ˙ ˙ ˙ 1073593280 blocks super 1.2 [2/1] [_U]
    ˙ ˙ ˙ bitmap: 7/8 pages [28KB], 65536KB chunk

    The [_U] seems to indicate that the RAID1 is currently degraded and
    using just one of the two partitions is currently used for the RAID1.

    Checking the partitions the RAID1 is based on gives the following output:

    root@xxx:~# mdadm --examine /dev/dm-31
    /dev/dm-31:
    ˙ ˙ ˙ ˙ ˙ Magic : a92b4efc
    ˙ ˙ ˙ ˙ Version : 1.2
    ˙ ˙ Feature Map : 0x1
    ˙ ˙ ˙Array UUID : 6834a10d:edb03a51:cef24158:f9abc812
    ˙ ˙ ˙ ˙ ˙ ˙Name : xxx:0˙ (local to host xxx)
    ˙ Creation Time : Fri Nov˙ 4 16:05:45 2022
    ˙ ˙ ˙Raid Level : raid1
    ˙ ˙Raid Devices : 2

    ˙Avail Dev Size : 2147186655 sectors (1023.86 GiB 1099.36 GB)
    ˙ ˙ ˙Array Size : 1073593280 KiB (1023.86 GiB 1099.36 GB)
    ˙ Used Dev Size : 2147186560 sectors (1023.86 GiB 1099.36 GB)
    ˙ ˙ Data Offset : 264192 sectors
    ˙ ˙Super Offset : 8 sectors
    ˙ ˙Unused Space : before=264112 sectors, after=95 sectors
    ˙ ˙ ˙ ˙ ˙ State : clean
    ˙ ˙ Device UUID : 01c96166:ee782cc7:57bcf889:2ee53b43

    Internal Bitmap : 8 sectors from superblock
    ˙ ˙ Update Time : Wed Jun 17 13:17:46 2026
    ˙ Bad Block Log : 512 entries available at offset 16 sectors
    ˙ ˙ ˙ ˙Checksum : d46fa108 - correct
    ˙ ˙ ˙ ˙ ˙Events : 5397997


    ˙ ˙Device Role : Active device 0
    ˙ ˙Array State : A. ('A' == active, '.' == missing, 'R' == replacing)


    and


    root@xxx:~# mdadm --examine /dev/dm-30
    /dev/dm-30:
    ˙ ˙ ˙ ˙ ˙ Magic : a92b4efc
    ˙ ˙ ˙ ˙ Version : 1.2
    ˙ ˙ Feature Map : 0x1
    ˙ ˙ ˙Array UUID : 6834a10d:edb03a51:cef24158:f9abc812
    ˙ ˙ ˙ ˙ ˙ ˙Name : xxx:0˙ (local to host xxx)
    ˙ Creation Time : Fri Nov˙ 4 16:05:45 2022
    ˙ ˙ ˙Raid Level : raid1
    ˙ ˙Raid Devices : 2

    ˙Avail Dev Size : 2147186655 sectors (1023.86 GiB 1099.36 GB)
    ˙ ˙ ˙Array Size : 1073593280 KiB (1023.86 GiB 1099.36 GB)
    ˙ Used Dev Size : 2147186560 sectors (1023.86 GiB 1099.36 GB)
    ˙ ˙ Data Offset : 264192 sectors
    ˙ ˙Super Offset : 8 sectors
    ˙ ˙Unused Space : before=264112 sectors, after=95 sectors
    ˙ ˙ ˙ ˙ ˙ State : clean
    ˙ ˙ Device UUID : 637fc155:8fb21b7c:fff27b71:c7ea1094

    Internal Bitmap : 8 sectors from superblock
    ˙ ˙ Update Time : Fri Jun 19 21:34:19 2026
    ˙ Bad Block Log : 512 entries available at offset 16 sectors
    ˙ ˙ ˙ ˙Checksum : b110fe9d - correct
    ˙ ˙ ˙ ˙ ˙Events : 4814810


    ˙ ˙Device Role : Active device 1
    ˙ ˙Array State : .A ('A' == active, '.' == missing, 'R' == replacing)

    It can be seen that update time and number of events differ between both partitions, which seems to indicate different data. I am assuming that
    due to some circumstance (wild guess: a race condition when unlocking
    the LUKS encryption), the RAID1 is more or less randomly using only one
    of the partitions, which then results in differing database versions, depending on which of the two partitions is currently used.

    I also think that I found a possible cause for this misbehaviour. My / etc/mdadm/mdadm.conf contains just the default settings:

    # mdadm.conf
    #
    # !NB! Run update-initramfs -u after updating this file.
    # !NB! This will ensure that initramfs has an uptodate copy.
    #
    # Please refer to mdadm.conf(5) for information about this file.
    #

    # by default (built-in), scan all partitions (/proc/partitions) and all
    # containers for MD superblocks. alternatively, specify devices to scan, using
    # wildcards if desired.
    #DEVICE partitions containers

    # automatically tag new arrays as belonging to the local system
    HOMEHOST <system>

    # instruct the monitoring daemon where to send mail alerts
    MAILADDR root

    # definitions of existing MD arrays

    # This configuration was auto-generated on Fri, 04 Nov 2022 15:52:55
    +0100 by mkconf

    Somehow, I missed to include the RAID1 information for md0 to the configuration file (e. g. by entering root@localhost:~# mdadm --detail --scan /dev/md0 >> /etc/mdadm/mdadm.conf). I am not sure if this
    actually is the cause and adding that information would solve the issue.

    My questions are the following:

    1. Is my analysis valid in principle? Especially: Could the root cause
    for this issue be that mdadm.conf is missing the information for md0,
    and could adding that information prevent data loss or inconsistencies
    in the future?
    2. Can I (re)create the RAID1 md0 or (re-)add the missing partition in
    an easy way that no or at least not all information is lost? If yes, how?

    I assume that it might not be possible to sync the data from two
    different database versions without data loss. If this assumption is correct, I am willing to use one data set (e. g. the one on dm-31) and discard the other data set (e. g. the one on dm-30). Guides I found so
    far describe how to set up a new RAID1 and copy the data from a
    partition to the new RAID1. However, perhaps I am wondering if it is possible to (re-)create a RAID1 using just one existing partition (e. g. dm-31) without losing the data on this partition, and then add the other partition to the RAID1?

    The databases are backed up regularly. However, the backup is
    incremental, and it seems that the different database versions are
    messing up the incremental backup, therefore my last valid backup
    doesn't include the most recent changes to the database. If it is not possible to salvage the data on one or both of the partitions, I could swallow the bitter pill and go back to a previous database state without unacceptable consequences. However, I would like to try to salvage as
    much data as possible.

    Thank you in advance,

    Paul


    I will interpret the above as MariaDB is storing data on files on an XFS
    file system on an mdadm RAID1 block device built from two LUKS
    containers on two partitions of the same size on two hard disk drives.


    AIUI, mdadm RAID protects you when the disk controller is unable to read
    a block on one disk. When that happens, mdadm will read other disk(s)
    in the array, compute the requested block, and return the requested information to the calling application. (I assume mdadm will also write
    the computed block back to the original disk, write a log entry, and
    take other actions as designed and configured.)


    AIUI neither XFS nor mdadm compute, store, or verify checksums of data
    or metadata on disk. So, if a bit, byte, block, etc., changes on disk unexpectedly, neither XFS nor mdadm will know; they will simply use the information on disk. Whatever is looking at that information (e.g.
    MariaDB or XFS) may or may not notice the corruption. The user may or
    may not notice the corruption.


    To protect against corruption in memory, you need error correction code memory. This requires hardware support on the motherboard and memory
    modules.


    To protect against data corruption on storage, you need a checksumming
    storage system. AIUI btrfs and ZFS are the obvious choices on Debian GNU/Linux. Unfortunately, ZFS is not supported OOTB due to licensing conflicts; you must install ZFS. If you choose to do so, it is wise to
    also install ZFS on your maintenance/ rescue media.


    Over the past several years, I migrated my file storage services and CVS repository from a desktop computer with non-ECC memory, Debian
    GNU/Linux, SATA HDD's, LUKS, mdadm RAID1, and ext4 to an entry-level
    server with ECC memory, FreeBSD, SAS/SATA HBA's, new SAS HDD's, new SAS
    and SATA cables, GELI, and ZFS mirror. The cost was moderate, the
    learning curve was non-trivial, and I caused some non-critical data loss
    along the way, but now everything is accurate and reliable. I suggest
    that you migrate your MariaDB storage similarly.


    David

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From debian-user@3:633/10 to All on Saturday, June 20, 2026 22:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    David Christensen <dpchrist@holgerdanske.com> wrote:
    AIUI neither XFS nor mdadm compute, store, or verify checksums of
    data or metadata on disk.

    XFS checksums metadata for many years now (12?), but it doesn't checksum
    user data.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Robert Heller@3:633/10 to All on Saturday, June 20, 2026 22:30:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss? as possible?

    At Sat, 20 Jun 2026 21:01:32 +0100 debian-user@howorth.org.uk wrote:


    David Christensen <dpchrist@holgerdanske.com> wrote:
    AIUI neither XFS nor mdadm compute, store, or verify checksums of
    data or metadata on disk.

    XFS checksums metadata for many years now (12?), but it doesn't checksum
    user data.

    I've always thought that the hardware controller checksums raw disk blocks (sectors) as part of the low-level I/O processing in the controller hardware's "firmware" and that this is how the controller knows it has a bad block.





    --
    Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
    Deepwoods Software -- Custom Software Services
    http://www.deepsoft.com/ -- Linux Administration Services
    heller@deepsoft.com -- Webhosting Services


    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Sunday, June 21, 2026 01:30:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 20.06.26 um 20:00 schrieb Charles Curley:
    On Sat, 20 Jun 2026 18:01:15 +0200
    Paul Leiber <paul@onlineschubla.de> wrote:

    Somehow, I missed to include the RAID1 information for md0 to the
    configuration file (e. g. by entering root@localhost:~# mdadm
    --detail --scan /dev/md0 >> /etc/mdadm/mdadm.conf). I am not sure if
    this actually is the cause and adding that information would solve
    the issue.
    My questions are the following:

    1. Is my analysis valid in principle? Especially: Could the root
    cause for this issue be that mdadm.conf is missing the information
    for md0, and could adding that information prevent data loss or
    inconsistencies in the future?
    I doubt that this is the culprit. the man page for mdadm says, in part:

    Assemble
    Assemble the components of a previously created array
    into an active array. Components can be explicitly given or can
    be searched for. mdadm checks that the components do form a
    bona fide array, and can, on request, fiddle superblock
    information so as to assemble a faulty array.

    So mdadm *should* find both devices. But it might not be. And adding
    that line will not hurt. I have a similar line in my mdadm.conf.

    I built my RAID array up a bit differently that you did yours. You made
    your partitions, put LUKS on the partitions, then the RAID on top of
    that. I have the partitions, then the RAID array, LUKS on top of that,
    then LVM, with file systems on top of the LVs. But I know of no reason
    your setup shouldn't work.

    I have found that when I have multiple LUKS partitions, giving them
    all the same passphrase means I need give only one passphrase to
    decrypt on boot.

    2. Can I (re)create the RAID1 md0 or (re-)add the missing partition
    in an easy way that no or at least not all information is lost? If
    yes, how?
    Yes. For the gory details see https://oneuptime.com/blog/post/2026-03-02-how-to-replace-a-failed-disk-in-mdadm-raid-on-ubuntu/view.

    In short,

    * Fail the offending disk. It looks like this has already happened, but
    it shouldn't hurt to do it again.

    * Remove the disk from the array.

    * Add the disk back in again. This should trigger rebuilding, which
    takes a while. During the rebuild, the data should be both readable
    and writable. You may monitor with:

    cat /proc/mdstat

    I just noticed that I didn't manage to make clear that (1) I don't think that there is one specific failed partition, but that both partitions containing databases seem to work, but not at the same time, and that (2) I want to keep the data on the seemingly failed device.

    Let me explain with an example: I am using KODI to access my video data from different devices. A couple of months ago, I switched KODI to using a centralized database (containing metadata information on movies, watch status, etc.) in order to maintain only one database instead of a database on each device running KODI. The data is stored on the database VM, running MariaDB, which stores the data on the md software RAID1 (at least that's what was supposed to happen). I spent some time configuring the metadata, e.g. correcting mistakes in the movie titles etc. I then noticed that I mistakenly selected English language to display the movie descriptions. Because of family members not fluent in English, I redid the metadata configuration in German. (It was an annoying work, therefore I remember it well.) Then, some time later, after a reboot of the hypervisor (and the database VM) due to kernel updates, the language of the movie descriptions was displayed in English again. I
    attributed this to a corrupt database after the database VM reboot and loaded a database backup from some time ago, where the movie description was still in English. So I did the metadata configuration a third time, again in German. (I guess you can imagine the fun I had.) Then, a couple of days ago, after a reboot of the hypervisor and the database VM, the KODI movie description was displayed in English again. That's when I really started digging, because now it was clear that there were actually two intact, but differing databases. (To be clear: There were some other changes to other databases that also were affected in a similar manner which I don't mention in this example, so this issue is not restricted to the KODI database).

    Based on the available data, I attribute this issue to the RAID1 which seems to select one of two partitions at random when booting the hypervisor. Indications:
    - The last update time in the description of the (seemingly) failed device given by mdadm --examine match the point in time of the switch from one database version ("German") to the other ("English"), therefore I assume that the switch happens at the software RAID level.
    - A failure at hardware level doesn't seem likely, because how could there suddenly be an older version of a database available in a RAID1 if one device fails and the RAID1 is degraded, and this after entirely rebuilding the database from a backup? And, mind you, this switch to an older version of the database didn't happen just once, but at least two times. The data (in English) simply shouldn't have been available anymore at this point if the RAID1 had been working as intended.

    The most likely explanation to me is that the RAID1 has been running in a degraded state for some time (unnoticed by me), the database changes (e. g. from English to German) were stored to just one of the two partitions, and at some point the RAID1 switched to the other partition after a reboot, containing intact, but older (e. g. English) data. As a defective hardware doesn't seem likely, I assume that something in my setup causes this behaviour by md. But of course, I might be wrong and I am open to other explanations. For example, what my assumption fails to explain is why the switch only happens from time to time, and not more often, e.g. after each reboot.

    The example you kindly give is for removing a seemingly failed partition (currently dm-30, "German" database) from a md RAID1, keeping the data on an intact partition (currently dm-31, "English" database) and than re-fgadding a partition to the RAID1. This is pretty straightforward: the data is kept and replicated from the valid partition to the freshly added one. However, in my case, the dataset I want to keep is on the seemingly failed partition not used in the RAID (currently dm-30, "German").

    Options I see (besides recreating the RAID1 from scratch and using an available backup to restore the data, losing some data):

    1. I could fail the seemingly intact partition or remove the RAID1 entirely, somehow use the seemingly failed partition (dm-30, "German") to create a new RAID without losing the data on it, then add the other partition (dm-31) as a new drive and have the data replicated. I am not sure if this is possible, therefore my question to this list.

    2. Another option is to reboot the hypervisor and hope for a switch of the RAID to the partition containing the more recent version of the database, then follow your guide. But I am not really confident that such a "strategy" is the best choice I have at the moment. Also, I just tried a reboot three times, each time the data in the database is the wrong, old one.

    3. I could also backup the database from the seemingly failed partition in order to not lose data and then use this backup to recreate the RAID1, but I would need to mount that partition, which ended in an error when I tried it.

    And, of course, I don't want this to happen again, therefore I want to find the root cause for this situation and fix it. If it is not the missing information in /etc/mdadm/mdadm.conf, what else could it be?

    Sorry for the lengthy posts, I don't know how to describe this situation clearly in a shorter way.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From nwe@3:633/10 to All on Sunday, June 21, 2026 04:20:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/20/26 6:23 PM, Paul Leiber wrote:

    I just noticed that I didn't manage to make clear that (1) I don't
    think that there is one specific failed partition, but that both
    partitions containing databases seem to work, but not at the same
    time, and that (2) I want to keep the data on the seemingly failed device.

    In case this helps: I think what you are trying to describe is "split
    brain error" https://en.wikipedia.org/wiki/Split-brain_(computing)

    In short you have two versions, each with separate up-to-date data,
    which you will want to merge. Maybe there is someone here who knows a
    good way to do this. I currently have no experience working with such an error, but have read about it some.



    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Christensen@3:633/10 to All on Sunday, June 21, 2026 06:50:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/20/26 13:01, debian-user@howorth.org.uk wrote:
    David Christensen <dpchrist@holgerdanske.com> wrote:
    AIUI neither XFS nor mdadm compute, store, or verify checksums of
    data or metadata on disk.

    XFS checksums metadata for many years now (12?), but it doesn't
    checksum user data.


    Thank you for the clarification:

    https://wiki.archlinux.org/title/XFS#Checksumming


    On 6/20/26 13:26, Robert Heller wrote:
    I've always thought that the hardware controller checksums raw disk
    blocks (sectors) as part of the low-level I/O processing in the
    controller hardware's "firmware" and that this is how the controller
    knows it has a bad block.


    That is also my understanding. The HDD's I own are proprietary, so the engineering documentation is unavailable. But, I seem to recall reading
    an article on the WWW stating that HDD's store additional data in hidden blocks on the media to allow detecting and/or correcting several flipped
    bits.


    On 6/20/26 16:23, Paul Leiber wrote:
    I just noticed that I didn't manage to make clear that (1) I don't
    think that there is one specific failed partition, but that both
    partitions containing databases seem to work, but not at the same
    time, and that (2) I want to keep the data on the seemingly failed
    device.

    Let me explain with an example: I am using KODI to access my video
    data from different devices. A couple of months ago, I switched KODI
    to using a centralized database (containing metadata information on
    movies, watch status, etc.) in order to maintain only one database
    instead of a database on each device running KODI. The data is
    stored on the database VM, running MariaDB, which stores the data on
    the md software RAID1 (at least that's what was supposed to happen).
    I spent some time configuring the metadata, e.g. correcting mistakes
    in the movie titles etc. I then noticed that I mistakenly selected
    English language to display the movie descriptions. Because of
    family members not fluent in English, I redid the metadata
    configuration in German. (It was an annoying work, therefore I
    remember it well.) Then, some time later, after a reboot of the
    hypervisor (and the database VM) due to kernel updates, the
    language of the movie descriptions was displayed in English again. I attributed this to a corrupt database after the database VM reboot
    and loaded a database backup from some time ago, where the movie
    description was still in English. So I did the metadata
    configuration a third time, again in German. (I guess you can
    imagine the fun I had.) Then, a couple of days ago, after a reboot
    of the hypervisor and the database VM, the KODI movie description
    was displayed in English again. That's when I really started
    digging, because now it was clear that there were actually two
    intact, but differing databases. (To be clear: There were some other
    changes to other databases that also were affected in a similar
    manner which I don't mention in this example, so this issue is not restricted to the KODI database).

    Based on the available data, I attribute this issue to the RAID1
    which seems to select one of two partitions at random when booting
    the hypervisor. Indications: - The last update time in the
    description of the (seemingly) failed device given by mdadm --
    examine match the point in time of the switch from one database
    version ("German") to the other ("English"), therefore I assume that
    the switch happens at the software RAID level. - A failure at
    hardware level doesn't seem likely, because how could there suddenly
    be an older version of a database available in a RAID1 if one device
    fails and the RAID1 is degraded, and this after entirely rebuilding
    the database from a backup? And, mind you, this switch to an older
    version of the database didn't happen just once, but at least two
    times. The data (in English) simply shouldn't have been available
    anymore at this point if the RAID1 had been working as intended.

    The most likely explanation to me is that the RAID1 has been running
    in a degraded state for some time (unnoticed by me), the database
    changes (e. g. from English to German) were stored to just one of
    the two partitions, and at some point the RAID1 switched to the
    other partition after a reboot, containing intact, but older (e. g.
    English) data. As a defective hardware doesn't seem likely, I assume
    that something in my setup causes this behaviour by md. But of
    course, I might be wrong and I am open to other explanations. For
    example, what my assumption fails to explain is why the switch only
    happens from time to time, and not more often, e.g. after each
    reboot.

    The example you kindly give is for removing a seemingly failed
    partition (currently dm-30, "German" database) from a md RAID1,
    keeping the data on an intact partition (currently dm-31, "English" database) and than re-fgadding a partition to the RAID1. This is
    pretty straightforward: the data is kept and replicated from the
    valid partition to the freshly added one. However, in my case, the
    dataset I want to keep is on the seemingly failed partition not used
    in the RAID (currently dm-30, "German").

    Options I see (besides recreating the RAID1 from scratch and using
    an available backup to restore the data, losing some data):

    1. I could fail the seemingly intact partition or remove the RAID1
    entirely, somehow use the seemingly failed partition (dm-30,
    "German") to create a new RAID without losing the data on it, then
    add the other partition (dm-31) as a new drive and have the data
    replicated. I am not sure if this is possible, therefore my question
    to this list.

    2. Another option is to reboot the hypervisor and hope for a switch
    of the RAID to the partition containing the more recent version of
    the database, then follow your guide. But I am not really confident
    that such a "strategy" is the best choice I have at the moment.
    Also, I just tried a reboot three times, each time the data in the
    database is the wrong, old one.

    3. I could also backup the database from the seemingly failed
    partition in order to not lose data and then use this backup to
    recreate the RAID1, but I would need to mount that partition, which
    ended in an error when I tried it.

    And, of course, I don't want this to happen again, therefore I want
    to find the root cause for this situation and fix it. If it is not
    the missing information in /etc/mdadm/mdadm.conf, what else could it
    be?

    Sorry for the lengthy posts, I don't know how to describe this
    situation clearly in a shorter way.


    Do you own a power supply tester? If not, I suggest buying one. When
    you have one, test your power supply. It is possible for one rail to
    fail (e.g. +12 VDC), the computer to boot, and all or part of the
    computer to operate incorrectly. Without a power supply tester, you
    will be chasing seemingly random errors until the power supply goes to
    100% failure and/or you damage/ destroy other hardware.


    Does your computer have ECC memory? If not, I suggest getting a
    computer with ECC memory. In any case, I suggest testing your memory
    with memtest86+ for 24 hours.


    Have you tested your hard disks? If not, I suggest running smartctl(8) "--test long". When testing is done, view the results with "--xall".


    Have you validated the filesystem with fsck.xfs(8)? If not, I suggest
    doing so.


    Do you have streams of database transactions since the last known good backups? If so, can they be replayed?


    Can you switch the databases to read-only, shutdown, disconnect the
    first disk, boot, backup the database(s), shutdown, connect the first
    disk, disconnect the second disk, boot, backup the database(s),
    shutdown, and connect the second disk? If so, you could then restore
    those backups, and the last known good backups, (with different names, read-only) and trouble-shoot. It may or may not be possible to identify
    the newest data and implement queries/ scripts to do the three-way merge.


    David

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Christensen@3:633/10 to All on Sunday, June 21, 2026 07:00:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/20/26 21:47, David Christensen wrote:
    Can you switch the databases to read-only, shutdown, disconnect the
    first disk, boot, backup the database(s), shutdown, connect the first
    disk, disconnect the second disk, boot, backup the database(s),
    shutdown, and connect the second disk?˙ If so, you could then restore
    those backups, and the last known good backups, (with different names, read-only


    Correction -- add ", on a known good database server".


    ) and trouble-shoot.˙ It may or may not be possible to identify
    the newest data and implement queries/ scripts to do the three-way merge.


    David

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Sunday, June 21, 2026 07:50:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On Sat, Jun 20, 2026 at 09:47:23PM -0700, David Christensen wrote:
    [...]
    That is also my understanding. The HDD's I own are proprietary, so the engineering documentation is unavailable. But, I seem to recall reading an article on the WWW stating that HDD's store additional data in hidden blocks on the media to allow detecting and/or correcting several flipped bits.
    Much more than that. They do accept some error rate which is corrected
    (it has long been an engineering tradeoff: more density -> higher error
    rate -> better error correction codes):
    "For example, a typical 1 TB hard disk with 512-byte sectors provides
    additional capacity of about 93 GB for the ECC data."
    From: https://en.wikipedia.org/wiki/Hard_disk#Error_rates_and_handling Wikipedia: better than "the WWW". Especially in these times.
    Cheers
    --
    t


    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Charles Curley@3:633/10 to All on Sunday, June 21, 2026 15:10:02
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss? as possible?

    On Sat, 20 Jun 2026 16:26:59 -0400 (EDT)
    Robert Heller <heller@deepsoft.com> wrote:

    I've always thought that the hardware controller checksums raw disk
    blocks (sectors) as part of the low-level I/O processing in the
    controller hardware's "firmware" and that this is how the controller
    knows it has a bad block.

    Correct.

    Each sector has the data, and enough checksum data that the controller
    can test the data for integrity, and correct some small errors (a few
    bytes, typically).

    When the controller detects an error it attempts to correct it. If it
    succeeds, it returns the data to the computer and re-writes the data to
    the platter. If that write fails, it marks the sector as bad, and
    selects a spare sector to replace it.

    If the attempt to correct the error fails, I believe the drive reports
    the error to the computer, and allocates a spare sector to replace the
    failed one. It is then up to the OS or even application software to
    generate a replacement sector or otherwise handle the problem.

    These days the controller runs surface tests in the background to
    detect and correct errors before they get too big to correct. In
    addition, you can run an extended self-test, which includes a surface
    test, with SMART software.

    There is a fixed supply of spare sectors to be allocated; when that is exhausted the OS starts marking bad blocks. These days that means it is
    time to replace the drive. This is why it is important to monitor
    drives for these (and other) failures with SMART software.

    --
    Does anybody read signatures any more?

    https://charlescurley.com
    https://charlescurley.com/blog/

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michel Verdier@3:633/10 to All on Sunday, June 21, 2026 16:20:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 2026-06-21, Paul Leiber wrote:

    I just noticed that I didn't manage to make clear that (1) I don't think that there is one specific failed partition, but that both partitions containing databases seem to work, but not at the same time, and that (2) I want to keep the data on the seemingly failed device.
    [...]

    I used raid for tens of years and never had the problem you described. So
    I believe it should come from something else than raid failure. raid
    duplicates data on both partitions. If you don't have a failed partition
    the duplication is done transparently and quickly (<second). When a
    partition fails you can add a new and clean partition and the original
    one, the partition still in raid, is synchronized into it. As others told
    you you should use smartd to monitor your disks.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Sunday, June 21, 2026 21:50:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 21.06.26 um 06:47 schrieb David Christensen:
    On 6/20/26 13:01, debian-user@howorth.org.uk wrote:
    David Christensen <dpchrist@holgerdanske.com> wrote:
    AIUI neither XFS nor mdadm compute, store, or verify checksums of data or metadata on disk.

    XFS checksums metadata for many years now (12?), but it doesn't checksum user data.


    Thank you for the clarification:

    https://wiki.archlinux.org/title/XFS#Checksumming


    On 6/20/26 13:26, Robert Heller wrote:
    I've always thought that the hardware controller checksums raw disk blocks (sectors) as part of the low-level I/O processing in the controller hardware's "firmware" and that this is how the controller
    knows it has a bad block.


    That is also my understanding.˙ The HDD's I own are proprietary, so the engineering documentation is unavailable.˙ But, I seem to recall reading an article on the WWW stating that HDD's store additional data in hidden blocks on the media to allow detecting and/or correcting several flipped bits.


    On 6/20/26 16:23, Paul Leiber wrote:
    I just noticed that I didn't manage to make clear that (1) I don't think that there is one specific failed partition, but that both partitions containing databases seem to work, but not at the same time, and that (2) I want to keep the data on the seemingly failed device.

    Let me explain with an example: I am using KODI to access my video data from different devices. A couple of months ago, I switched KODI to using a centralized database (containing metadata information on movies, watch status, etc.) in order to maintain only one database instead of a database on each device running KODI. The data is stored on the database VM, running MariaDB, which stores the data on the md software RAID1 (at least that's what was supposed to happen). I spent some time configuring the metadata, e.g. correcting mistakes in the movie titles etc. I then noticed that I mistakenly selected English language to display the movie descriptions. Because of family members not fluent in English, I redid the metadata configuration in German. (It was an annoying work, therefore I remember it well.) Then, some time later, after a reboot of the hypervisor (and the database VM) due to kernel updates, the
    language of the movie descriptions was displayed in English again. I attributed this to a corrupt database after the database VM reboot and loaded a database backup from some time ago, where the movie description was still in English. So I did the metadata configuration a third time, again in German. (I guess you can imagine the fun I had.) Then, a couple of days ago, after a reboot of the hypervisor and the database VM, the KODI movie description was displayed in English again. That's when I really started digging, because now it was clear that there were actually two intact, but differing databases. (To be clear: There were some other changes to other databases that also were affected in a similar manner which I don't mention in this example, so this issue is not restricted to the KODI database).

    Based on the available data, I attribute this issue to the RAID1 which seems to select one of two partitions at random when booting the hypervisor. Indications: - The last update time in the description of the (seemingly) failed device given by mdadm -- examine match the point in time of the switch from one database version ("German") to the other ("English"), therefore I assume that the switch happens at the software RAID level. - A failure at hardware level doesn't seem likely, because how could there suddenly be an older version of a database available in a RAID1 if one device fails and the RAID1 is degraded, and this after entirely rebuilding the database from a backup? And, mind you, this switch to an older version of the database didn't happen just once, but at least two times. The data (in English) simply shouldn't have been available anymore at this point if the RAID1 had been working as intended.

    The most likely explanation to me is that the RAID1 has been running in a degraded state for some time (unnoticed by me), the database changes (e. g. from English to German) were stored to just one of the two partitions, and at some point the RAID1 switched to the other partition after a reboot, containing intact, but older (e. g. English) data. As a defective hardware doesn't seem likely, I assume that something in my setup causes this behaviour by md. But of course, I might be wrong and I am open to other explanations. For example, what my assumption fails to explain is why the switch only happens from time to time, and not more often, e.g. after each reboot.

    The example you kindly give is for removing a seemingly failed partition (currently dm-30, "German" database) from a md RAID1, keeping the data on an intact partition (currently dm-31, "English" database) and than re-fgadding a partition to the RAID1. This is pretty straightforward: the data is kept and replicated from the valid partition to the freshly added one. However, in my case, the dataset I want to keep is on the seemingly failed partition not used in the RAID (currently dm-30, "German").

    Options I see (besides recreating the RAID1 from scratch and using an available backup to restore the data, losing some data):

    1. I could fail the seemingly intact partition or remove the RAID1 entirely, somehow use the seemingly failed partition (dm-30, "German") to create a new RAID without losing the data on it, then add the other partition (dm-31) as a new drive and have the data replicated. I am not sure if this is possible, therefore my question to this list.

    2. Another option is to reboot the hypervisor and hope for a switch of the RAID to the partition containing the more recent version of the database, then follow your guide. But I am not really confident that such a "strategy" is the best choice I have at the moment. Also, I just tried a reboot three times, each time the data in the database is the wrong, old one.

    3. I could also backup the database from the seemingly failed partition in order to not lose data and then use this backup to recreate the RAID1, but I would need to mount that partition, which ended in an error when I tried it.

    And, of course, I don't want this to happen again, therefore I want to find the root cause for this situation and fix it. If it is not the missing information in /etc/mdadm/mdadm.conf, what else could it be?

    Sorry for the lengthy posts, I don't know how to describe this situation clearly in a shorter way.


    Do you own a power supply tester?˙ If not, I suggest buying one. When you have one, test your power supply.˙ It is possible for one rail to fail (e.g. +12 VDC), the computer to boot, and all or part of the computer to operate incorrectly.˙ Without a power supply tester, you will be chasing seemingly random errors until the power supply goes to 100% failure and/or you damage/ destroy other hardware.


    There are two identical hard drives (manufacturer, type, size) in my system used for data storage. Both have an identical layout: One large partition used as BTRFS RAID1 (duplicated data and metadata), one small partition which is used for the md software RAID1. BTRFS is working fine, it's just the md software RAID that is not working correctly. I can't rule out an issue with power supply, but it seems unlikely, considering that BTRFS is not having any issue. But I will put a test of the power supply on the list of things to try.

    Does your computer have ECC memory?˙ If not, I suggest getting a computer with ECC memory.˙ In any case, I suggest testing your memory with memtest86+ for 24 hours.

    Yes, it does have ECC memory. I will put the memory test on the list as well.

    Have you tested your hard disks?˙ If not, I suggest running smartctl(8) "--test long".˙ When testing is done, view the results with "--xall".

    Both disks are monitored via smartctl. Automated short and long tests are being done regularly. There are no indications for hardware failure in the smart data.

    Have you validated the filesystem with fsck.xfs(8)?˙ If not, I suggest doing so.

    I just did a check of the file system (using xfs_repair -n), with no errors reported.

    Do you have streams of database transactions since the last known good backups?˙ If so, can they be replayed?

    Not exactly knowing what such a stream is, I guess I don't have one. But I am not sure. Will check.

    Can you switch the databases to read-only, shutdown, disconnect the first disk, boot, backup the database(s), shutdown, connect the first disk, disconnect the second disk, boot, backup the database(s), shutdown, and connect the second disk?˙ If so, you could then restore those backups, and the last known good backups, (with different names, read-only) and trouble-shoot.˙ It may or may not be possible to identify the newest data and implement queries/ scripts to do the three-way merge.

    That's a good suggestion. I will need to check what booting with just one disk could do to the BTRFS filesystem, but this might be a way to force md to use the disk which is currently indicated as failed.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Sunday, June 21, 2026 22:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 21.06.26 um 16:12 schrieb Michel Verdier:
    On 2026-06-21, Paul Leiber wrote:

    I just noticed that I didn't manage to make clear that (1) I don't think that
    there is one specific failed partition, but that both partitions containing >> databases seem to work, but not at the same time, and that (2) I want to keep
    the data on the seemingly failed device.
    [...]

    I used raid for tens of years and never had the problem you described. So
    I believe it should come from something else than raid failure. raid duplicates data on both partitions. If you don't have a failed partition
    the duplication is done transparently and quickly (<second). When a
    partition fails you can add a new and clean partition and the original
    one, the partition still in raid, is synchronized into it. As others told
    you you should use smartd to monitor your disks.


    My knowledge in IT is limited. I just can describe what I can observe and make guesses. (The md RAID is part of a setup I do for fun at home.) I know that it sounds strange, but my best guess is that there are two differing databases stored on my hard drives. How else can the repeated switch between different data sets be explained?

    To comment on your suggestion to monitor hardware status of the disks: Both disks are monitored using smartd (short and long tests being conducted on a regular basis), smartctl doesn't indicate any issue with the drives.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Sunday, June 21, 2026 23:50:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 21.06.26 um 01:23 schrieb Paul Leiber:
    Am 20.06.26 um 20:00 schrieb Charles Curley:
    On Sat, 20 Jun 2026 18:01:15 +0200
    Paul Leiber <paul@onlineschubla.de> wrote:

    Somehow, I missed to include the RAID1 information for md0 to the
    configuration file (e. g. by entering root@localhost:~# mdadm
    --detail --scan /dev/md0 >> /etc/mdadm/mdadm.conf). I am not sure if
    this actually is the cause and adding that information would solve
    the issue.
    My questions are the following:

    1. Is my analysis valid in principle? Especially: Could the root
    cause for this issue be that mdadm.conf is missing the information
    for md0, and could adding that information prevent data loss or
    inconsistencies in the future?
    I doubt that this is the culprit. the man page for mdadm says, in part:

    ˙˙˙˙˙ Assemble
    ˙˙˙˙˙˙˙˙˙˙˙˙˙˙ Assemble the components of a previously created array
    ˙˙˙˙˙ into an active array.˙ Components can be explicitly given or can
    ˙˙˙˙˙ be searched for.˙ mdadm checks˙ that˙ the components do form a
    ˙˙˙˙˙ bona fide array, and can, on request, fiddle superblock
    ˙˙˙˙˙ information so as to assemble a faulty array.

    So mdadm *should* find both devices. But it might not be. And adding
    that line will not hurt. I have a similar line in my mdadm.conf.

    I built my RAID array up a bit differently that you did yours. You made
    your partitions, put LUKS on the partitions, then the RAID on top of
    that. I have the partitions, then the RAID array, LUKS on top of that,
    then LVM, with file systems on top of the LVs. But I know of no reason
    your setup shouldn't work.

    I have found that when I have multiple LUKS partitions, giving them
    all the same passphrase means I need give only one passphrase to
    decrypt on boot.

    2. Can I (re)create the RAID1 md0 or (re-)add the missing partition
    in an easy way that no or at least not all information is lost? If
    yes, how?
    Yes. For the gory details see
    https://oneuptime.com/blog/post/2026-03-02-how-to-replace-a-failed-disk-in-mdadm-raid-on-ubuntu/view.

    In short,

    * Fail the offending disk. It looks like this has already happened, but
    ˙˙ it shouldn't hurt to do it again.

    * Remove the disk from the array.

    * Add the disk back in again. This should trigger rebuilding, which
    ˙˙ takes a while. During the rebuild, the data should be both readable
    ˙˙ and writable. You may monitor with:

    ˙˙ cat /proc/mdstat

    I just noticed that I didn't manage to make clear that (1) I don't think that there is one specific failed partition, but that both partitions containing databases seem to work, but not at the same time, and that (2) I want to keep the data on the seemingly failed device.

    Let me explain with an example: I am using KODI to access my video data from different devices. A couple of months ago, I switched KODI to using a centralized database (containing metadata information on movies, watch status, etc.) in order to maintain only one database instead of a database on each device running KODI. The data is stored on the database VM, running MariaDB, which stores the data on the md software RAID1 (at least that's what was supposed to happen). I spent some time configuring the metadata, e.g. correcting mistakes in the movie titles etc. I then noticed that I mistakenly selected English language to display the movie descriptions. Because of family members not fluent in English, I redid the metadata configuration in German. (It was an annoying work, therefore I remember it well.) Then, some time later, after a reboot of the hypervisor (and the database VM) due to kernel updates, the language of the movie descriptions was displayed in English again. I
    attributed this to a corrupt database after the database VM reboot and loaded a database backup from some time ago, where the movie description was still in English. So I did the metadata configuration a third time, again in German. (I guess you can imagine the fun I had.) Then, a couple of days ago, after a reboot of the hypervisor and the database VM, the KODI movie description was displayed in English again. That's when I really started digging, because now it was clear that there were actually two intact, but differing databases. (To be clear: There were some other changes to other databases that also were affected in a similar manner which I don't mention in this example, so this issue is not restricted to the KODI database).

    Based on the available data, I attribute this issue to the RAID1 which seems to select one of two partitions at random when booting the hypervisor. Indications:
    - The last update time in the description of the (seemingly) failed device given by mdadm --examine match the point in time of the switch from one database version ("German") to the other ("English"), therefore I assume that the switch happens at the software RAID level.
    - A failure at hardware level doesn't seem likely, because how could there suddenly be an older version of a database available in a RAID1 if one device fails and the RAID1 is degraded, and this after entirely rebuilding the database from a backup? And, mind you, this switch to an older version of the database didn't happen just once, but at least two times. The data (in English) simply shouldn't have been available anymore at this point if the RAID1 had been working as intended.

    The most likely explanation to me is that the RAID1 has been running in a degraded state for some time (unnoticed by me), the database changes (e. g. from English to German) were stored to just one of the two partitions, and at some point the RAID1 switched to the other partition after a reboot, containing intact, but older (e. g. English) data. As a defective hardware doesn't seem likely, I assume that something in my setup causes this behaviour by md. But of course, I might be wrong and I am open to other explanations. For example, what my assumption fails to explain is why the switch only happens from time to time, and not more often, e.g. after each reboot.

    The example you kindly give is for removing a seemingly failed partition (currently dm-30, "German" database) from a md RAID1, keeping the data on an intact partition (currently dm-31, "English" database) and than re-fgadding a partition to the RAID1. This is pretty straightforward: the data is kept and replicated from the valid partition to the freshly added one. However, in my case, the dataset I want to keep is on the seemingly failed partition not used in the RAID (currently dm-30, "German").

    Options I see (besides recreating the RAID1 from scratch and using an available backup to restore the data, losing some data):

    1. I could fail the seemingly intact partition or remove the RAID1 entirely, somehow use the seemingly failed partition (dm-30, "German") to create a new RAID without losing the data on it, then add the other partition (dm-31) as a new drive and have the data replicated. I am not sure if this is possible, therefore my question to this list.

    2. Another option is to reboot the hypervisor and hope for a switch of the RAID to the partition containing the more recent version of the database, then follow your guide. But I am not really confident that such a "strategy" is the best choice I have at the moment. Also, I just tried a reboot three times, each time the data in the database is the wrong, old one.

    3. I could also backup the database from the seemingly failed partition in order to not lose data and then use this backup to recreate the RAID1, but I would need to mount that partition, which ended in an error when I tried it.

    And, of course, I don't want this to happen again, therefore I want to find the root cause for this situation and fix it. If it is not the missing information in /etc/mdadm/mdadm.conf, what else could it be?

    Sorry for the lengthy posts, I don't know how to describe this situation clearly in a shorter way.






    I managed to rebuild the md RAID1 using the data on the seemingly failed device (partition_1). First, I did a dd dump of partition_2 (currently in usage) in order not to lose data. Then, I recreated the RAID1 using the data from partition_1:

    mdadm --stop /dev/md0 # This stops the degraded RAID1
    mdadm --assemble --update=uuid /dev/md0 /dev/mapper/partition_1 # This creates a new RAID1 using the partition_1, a new array UUID is required in order for --assembly to work
    mdadm --manage --add /dev/md0 /dev/mapper/partition_2 # This adds partition_2 to the RAID1, contents of partition_1 are replicated to partition_2 automatically

    md is currently replicating the data (German movie descriptions in Kodi, yay!) from partition_1 to partition_2. I might have to turn partition_2 from "spare" to "active", but I'll let the replication complete first.

    In any case, I set up mdmonitor to alert me if the RAID1 degrades again. That's something I should have thought of earlier.

    We'll see if this issue occurs again. I'll give an update if this is the case.

    Thanks to everybody for trying to help me!

    Paul

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michel Verdier@3:633/10 to All on Monday, June 22, 2026 07:40:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 2026-06-21, Paul Leiber wrote:

    My knowledge in IT is limited. I just can describe what I can observe and make
    guesses. (The md RAID is part of a setup I do for fun at home.) I know that it
    sounds strange, but my best guess is that there are two differing databases stored on my hard drives. How else can the repeated switch between different data sets be explained?

    I think you should investigate your database installation. The problem
    you described cannot come from raid1. It could be database installed on "partition" and not on "md array". Or a backup reverted. Or something
    else.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michel Verdier@3:633/10 to All on Monday, June 22, 2026 08:00:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 2026-06-21, Paul Leiber wrote:

    I managed to rebuild the md RAID1 using the data on the seemingly failed device (partition_1). First, I did a dd dump of partition_2 (currently in usage) in order not to lose data. Then, I recreated the RAID1 using the data from partition_1:

    mdadm --stop /dev/md0 # This stops the degraded RAID1
    mdadm --assemble --update=uuid /dev/md0 /dev/mapper/partition_1 # This creates a new RAID1 using the partition_1, a new array UUID is required in order for --assembly to work
    mdadm --manage --add /dev/md0 /dev/mapper/partition_2 # This adds partition_2 to the RAID1, contents of partition_1 are replicated to partition_2 automatically

    Your commands are strange : the partitions should be the disk partitions
    from /dev and not mapped ones. Or you have another layer ? From where
    come /dev/mapper/partition_1 and /dev/mapper/partition_2 ?

    Beside this it is much quicker and safer to go this way :
    - do not stop the md array (and thus the assemble is not needed)
    - remove the failed partition
    mdadm --manage /dev/md0 --remove "failed partition"
    - add the new clean partition
    mdadm --manage /dev/md0 --add "good partition"
    - and let mdadm sync the array

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Monday, June 22, 2026 08:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On Mon, Jun 22, 2026 at 07:50:10AM +0200, Michel Verdier wrote:
    On 2026-06-21, Paul Leiber wrote:

    I managed to rebuild the md RAID1 using the data on the seemingly failed device (partition_1). First, I did a dd dump of partition_2 (currently in usage) in order not to lose data. Then, I recreated the RAID1 using the data
    from partition_1:

    mdadm --stop /dev/md0 # This stops the degraded RAID1
    mdadm --assemble --update=uuid /dev/md0 /dev/mapper/partition_1 # This creates a new RAID1 using the partition_1, a new array UUID is required in order for --assembly to work
    mdadm --manage --add /dev/md0 /dev/mapper/partition_2 # This adds partition_2 to the RAID1, contents of partition_1 are replicated to partition_2 automatically

    Your commands are strange : the partitions should be the disk partitions
    from /dev and not mapped ones. Or you have another layer ? From where
    come /dev/mapper/partition_1 and /dev/mapper/partition_2 ?
    If I have been following along, the RAID parts are LUKS encrypted devices,
    so to me it does make sense.
    Cheers
    --
    t


    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michel Verdier@3:633/10 to All on Monday, June 22, 2026 08:20:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 2026-06-22, tomas@tuxteam.de wrote:

    Your commands are strange : the partitions should be the disk partitions
    from /dev and not mapped ones. Or you have another layer ? From where
    come /dev/mapper/partition_1 and /dev/mapper/partition_2 ?

    If I have been following along, the RAID parts are LUKS encrypted devices,
    so to me it does make sense.

    RAID is always before LUKS : partition > RAID array > LUKS > filesystem

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Monday, June 22, 2026 09:40:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On Mon, Jun 22, 2026 at 08:18:45AM +0200, Michel Verdier wrote:
    On 2026-06-22, tomas@tuxteam.de wrote:

    Your commands are strange : the partitions should be the disk partitions >> from /dev and not mapped ones. Or you have another layer ? From where
    come /dev/mapper/partition_1 and /dev/mapper/partition_2 ?

    If I have been following along, the RAID parts are LUKS encrypted devices, so to me it does make sense.

    RAID is always before LUKS : partition > RAID array > LUKS > filesystem
    "Is always" means for you "should always be" or "has to be"?
    As far as I understand OP, their case is the other way around (and I don't
    see why it shouldn't be technically possible: a block device is a block
    device is a block device, after all).
    Cheers
    --
    t


    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Monday, June 22, 2026 09:50:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 22.06.26 um 07:50 schrieb Michel Verdier:
    On 2026-06-21, Paul Leiber wrote:

    I managed to rebuild the md RAID1 using the data on the seemingly failed
    device (partition_1). First, I did a dd dump of partition_2 (currently in
    usage) in order not to lose data. Then, I recreated the RAID1 using the data >> from partition_1:

    mdadm --stop /dev/md0 # This stops the degraded RAID1
    mdadm --assemble --update=uuid /dev/md0 /dev/mapper/partition_1 # This creates a new RAID1 using the partition_1, a new array UUID is required in order for --assembly to work
    mdadm --manage --add /dev/md0 /dev/mapper/partition_2 # This adds partition_2 to the RAID1, contents of partition_1 are replicated to partition_2 automatically
    Your commands are strange : the partitions should be the disk partitions
    from /dev and not mapped ones. Or you have another layer ? From where
    come /dev/mapper/partition_1 and /dev/mapper/partition_2 ?
    The partitions are LUKS encrpyted and hence decrypted before being assembled into the RAID array. Thus the mapped drives.

    Beside this it is much quicker and safer to go this way :
    - do not stop the md array (and thus the assemble is not needed)
    - remove the failed partition
    mdadm --manage /dev/md0 --remove "failed partition"
    - add the new clean partition
    mdadm --manage /dev/md0 --add "good partition"
    - and let mdadm sync the array


    There was a partition which md claimed was failed (partition_1) which contained the newer database. There was a partition the degraded array was using (partition_2) which contained the older database. I wanted to keep the newer database on the seemingly failed device.

    I was not sure if

    1) it is possible to remove all devices from a md array (removing partition_2 would have resulted in an array without any partition)
    2) adding a device with data on it I want to keep (partition_1) to an array is possible without losing the data (adding partition_2 to the freshly created md array resulted in the data on partition_2 being overwritten, as intended, but I wanted to avoid that partition_1 is overwritten with data from partition 2)

    Hence I decided to reassemble the array starting with partition_1 and adding partition_2.

    Are you positive that the procedure you recommend would have ended in the same result?

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Monday, June 22, 2026 10:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 22.06.26 um 09:35 schrieb tomas@tuxteam.de:
    On Mon, Jun 22, 2026 at 08:18:45AM +0200, Michel Verdier wrote:
    On 2026-06-22, tomas@tuxteam.de wrote:

    Your commands are strange : the partitions should be the disk partitions >>>> from /dev and not mapped ones. Or you have another layer ? From where
    come /dev/mapper/partition_1 and /dev/mapper/partition_2 ?
    If I have been following along, the RAID parts are LUKS encrypted devices, >>> so to me it does make sense.
    RAID is always before LUKS : partition > RAID array > LUKS > filesystem
    "Is always" means for you "should always be" or "has to be"?

    As far as I understand OP, their case is the other way around (and I don't see why it shouldn't be technically possible: a block device is a block device is a block device, after all).

    Tomas' description of my setup is correct, LUKS before RAID. It has been working in the past, and it is working right now again. Is this type of setup recommended? I don't know. BTRFS doesn't show any issues with this setup.

    However, my main suspect for the root cause of the dual-head database is indeed that the LUKS decryption messes with the md RAID assembly at boot, e. g. some timing issue or race condition. The database content doesn't change constantly, there are very few writes per day, so I'll rely on daily backups and monitor the RAID closely. There was another kernel update today, so I'll see what happens after a reboot, which probably was what triggered the issue in the past. If another issue occurs, I'll probably have a chance to find more information in the logs now that I know what to look for. If my assumption is confirmed, I'll change the order to RAID before LUKS and restore the data from backup. (Or I'll do it anyway out of lack of other ideas...)

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Christensen@3:633/10 to All on Monday, June 22, 2026 10:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/21/26 12:46, Paul Leiber wrote:
    Am 21.06.26 um 06:47 schrieb David Christensen:
    Do you own a power supply tester?

    ... I will put a test of
    the power supply on the list of things to try.


    Good.


    Does your computer have ECC memory?

    Yes, it does have ECC memory.


    Good.


    I will put the memory test on the list as well.


    Choose your testing tool and methodology carefully -- memtest86
    (commercial) vs. memtest86+ (FOSS), memory correction report logging to motherboard firmware vs. operating system, etc.. See this thread and
    research carefully:

    https://lists.debian.org/debian-user/2026/05/msg00386.html


    Have you tested your hard disks?

    Both disks are monitored via smartctl. Automated short and long tests
    are being done regularly. There are no indications for hardware failure
    in the smart data.


    Good.


    Have you validated the filesystem with fsck.xfs(8)?

    I just did a check of the file system (using xfs_repair -n), with no
    errors reported.


    Good.



    Do you have streams of database transactions since the last known good
    backups?˙ If so, can they be replayed?

    Not exactly knowing what such a stream is, I guess I don't have one. But
    I am not sure. Will check.


    Good.


    Can you switch the databases to read-only, shutdown, disconnect the
    first disk, boot, backup the database(s), shutdown, connect the first
    disk, disconnect the second disk, boot, backup the database(s),
    shutdown, and connect the second disk?

    That's a good suggestion. I will need to check what booting with just
    one disk could do to the BTRFS filesystem, but this might be a way to
    force md to use the disk which is currently indicated as failed.


    Is your OS on the btrfs mirror? I have found that putting the OS on a dedicated SSD makes operations, maintenance, trouble-shooting, disaster preparedness/ recovery, etc., much easier.


    On 6/21/26 14:45, Paul Leiber wrote:
    I managed to rebuild the md RAID1 using the data on the seemingly
    failed device (partition_1). First, I did a dd dump of partition_2 (currently in usage) in order not to lose data. Then, I recreated the
    RAID1 using the data from partition_1:

    mdadm --stop /dev/md0 # This stops the degraded RAID1
    mdadm --assemble --update=uuid /dev/md0 /dev/mapper/partition_1 # This creates a new RAID1 using the partition_1, a new array UUID is
    required
    in order for --assembly to work
    mdadm --manage --add /dev/md0 /dev/mapper/partition_2 # This adds partition_2 to the RAID1, contents of partition_1 are replicated to partition_2 automatically

    md is currently replicating the data (German movie descriptions in
    Kodi, yay!) from partition_1 to partition_2. I might have to turn partition_2 from "spare" to "active", but I'll let the replication
    complete first.

    In any case, I set up mdmonitor to alert me if the RAID1 degrades
    again. That's something I should have thought of earlier.

    We'll see if this issue occurs again. I'll give an update if this is
    the case.

    Thanks to everybody for trying to help me!

    Paul


    Thank you for the curious problem. We all learn when we work together
    on a solution. Please let us know how it works out.


    David

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Christensen@3:633/10 to All on Monday, June 22, 2026 10:50:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/21/26 23:18, Michel Verdier wrote:
    RAID is always before LUKS : partition > RAID array > LUKS >
    filesystem


    On 6/22/26 00:35, tomas@tuxteam.de wrote:
    "Is always" means for you "should always be" or "has to be"?

    As far as I understand OP, their case is the other way around (and I
    don't see why it shouldn't be technically possible: a block device
    is a block device is a block device, after all).


    On 6/22/26 01:04, Paul Leiber wrote:
    Tomas' description of my setup is correct, LUKS before RAID. It has
    been working in the past, and it is working right now again. Is this
    type of setup recommended? I don't know. BTRFS doesn't show any
    issues with this setup.


    Stackable I/O layers is a feature of Linux and other operating systems. Depending upon which layers you want, there may be more than one way to
    stack them.


    An advantage of:

    partitions > md RAID > LUKS > filesystem

    Versus:

    partitions > LUKS > md RAID > filesystem

    Is that the former only has to do the encryption once for the RAID
    virtual device, while the latter has to do encryption N times; once for
    each partition.


    When you have a layer that combines RAID, volume management, and
    filesystems, such as ZFS and btrfs, the stackable encryption layer must
    be underneath (e.g. the latter of above two I/O layering configurations).


    For N=2, magnetic hard disk drives, and a 2+ core processor with
    hardware cryptographic acceleration (e.g. Intel AES-NI), your current
    I/O layering configuration should be okay.


    David

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Andy Smith@3:633/10 to All on Monday, June 22, 2026 15:40:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Hi,

    The lack of an mdadm.conf should not cause you any issues. It's only
    really used to set non-default options, give a monitoring email address
    and so on.

    udev incrementally assembles MDADM arrays as devices appear. It does not
    need any configuration to do this. In order to end up in the situation
    OP is in, I can only imagine that they rebooted and only one of the
    LUKS devices was set up, so md0 proceeded in a degraded fashion with
    that device.

    What is confusing to me is how OP had an active mdadm array member with
    an event count significantly *behind* the inactive one. It makes me
    think that this may have happened more than once, with different single
    LUKS devices being activated each time.

    The mdadm monitor daemon runs by default and should email you about
    degraded arrays. Without any configuration that would be sending to root@localhost. OP should make sure that these emails will arrive
    somewhere useful, or look into other ways of checking status of mdadm
    arrays. What's happened here was likely trivial to fix at the
    time of first problem but became a complete nightmare that likely
    involved data loss (OP has backup of a device with unique data that
    cannot be integrated).

    OP, after sorting out the monitoring I think you need to verify that
    both LUKS devices are always successfully unlocked and available at boot
    so that the RAID 1 assembles fully and properly.

    I think it's unlikely that you have had a hardware failure of the
    underlying drives, though you should of course check your logs and
    smartctl for that. Given that LUKS is in use and is the most complicated
    thing in your storage stack, I'd be looking into whether both LUKS
    devices are being reliably created.

    If setting this system up from scratch my preference would be to do the redundancy as near to the hardware as possible and the encryption as far
    away as possible. So I'd put LUKS on md0, not md0 on two LUKS devices.

    Thanks,
    Andy

    --
    https://bitfolk.com/ -- No-nonsense VPS hosting

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Monday, June 22, 2026 18:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 22.06.26 um 15:30 schrieb Andy Smith:
    Hi,

    The lack of an mdadm.conf should not cause you any issues. It's only
    really used to set non-default options, give a monitoring email address
    and so on.

    O.k. I added an entry to mdadm.conf anyway, it shouldn't hurt at least.

    udev incrementally assembles MDADM arrays as devices appear. It does not
    need any configuration to do this. In order to end up in the situation
    OP is in, I can only imagine that they rebooted and only one of the
    LUKS devices was set up, so md0 proceeded in a degraded fashion with
    that device.

    What is confusing to me is how OP had an active mdadm array member with
    an event count significantly *behind* the inactive one. It makes me
    think that this may have happened more than once, with different single
    LUKS devices being activated each time.
    The switch of active and inactive devices happened definitely more than once. The md array most likely switched between the LUKS devices at boot several times, therefore the different event counts. The device with newer data of course had the higher event count, as it was the one the data had been written on in the weeks before the latest switch. My best guess is also that something happened while the LUKS devices have been created which made md believe that one device is not intact or available.
    The mdadm monitor daemon runs by default and should email you about
    degraded arrays. Without any configuration that would be sending to root@localhost. OP should make sure that these emails will arrive
    somewhere useful, or look into other ways of checking status of mdadm
    arrays. What's happened here was likely trivial to fix at the
    time of first problem but became a complete nightmare that likely
    involved data loss (OP has backup of a device with unique data that
    cannot be integrated).

    Oh well, experience is never too expensive, a German saying goes... This issue will not bug me again as much as it did. The next time I curse, it will be due a different issue, I am sure. :-)

    OP, after sorting out the monitoring I think you need to verify that
    both LUKS devices are always successfully unlocked and available at boot
    so that the RAID 1 assembles fully and properly.
    mdmon is now set up, I tested that e-mails actually arrive.Latest news: The md array survived a first reboot today.
    I think it's unlikely that you have had a hardware failure of the
    underlying drives, though you should of course check your logs and
    smartctl for that. Given that LUKS is in use and is the most complicated thing in your storage stack, I'd be looking into whether both LUKS
    devices are being reliably created.
    O.k.
    If setting this system up from scratch my preference would be to do the redundancy as near to the hardware as possible and the encryption as far
    away as possible. So I'd put LUKS on md0, not md0 on two LUKS devices.

    Thank you very much for your advice!

    Paul

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Max Nikulin@3:633/10 to All on Monday, June 22, 2026 18:40:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 22/06/2026 3:04 pm, Paul Leiber wrote:
    my main suspect for the root cause of the dual-head database

    I read about rollback to earlier state of filesystem when a device
    supporting snapshots (LVM or filesystem) was mounted using FS UUID
    instead of volume identifier. Snapshots have the same UUID as the real
    device, so it is undefined what is found firs on boot. Just ignore this
    remark if snapshots are not supported on all stack levels of your
    storage for the DB.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michel Verdier@3:633/10 to All on Tuesday, June 23, 2026 08:10:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 2026-06-22, tomas@tuxteam.de wrote:

    RAID is always before LUKS : partition > RAID array > LUKS > filesystem

    "Is always" means for you "should always be" or "has to be"?

    "has to be". LUKS encrypt a partition in a unique way. So 2 encrypted partitions are always different and cannot be synced.

    As far as I understand OP, their case is the other way around (and I don't see why it shouldn't be technically possible: a block device is a block device is a block device, after all).

    Perhaps the problem but I don't have enough informations on its
    installation.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Michel Verdier@3:633/10 to All on Tuesday, June 23, 2026 08:30:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 2026-06-22, Paul Leiber wrote:

    RAID is always before LUKS : partition > RAID array > LUKS > filesystem
    "Is always" means for you "should always be" or "has to be"?

    As far as I understand OP, their case is the other way around (and I don't >> see why it shouldn't be technically possible: a block device is a block
    device is a block device, after all).

    Tomas' description of my setup is correct, LUKS before RAID. It has
    been working in the past, and it is working right now again. Is this
    type of setup recommended? I don't know. BTRFS doesn't show any issues
    with this setup.

    So Tomas found your problem. It is at best useless to have
    partition > LUKS > RAID array > filesystem
    I cannot see how it managed to work. It suppose the 2 LUKS are identical
    which is a nonsense. Also a small change in data gives a bigger change in
    a LUKS partition thus bigger to sync. I don't know enough about LUKS but
    I suppose you loose LUKS atomicity during sync.

    However, my main suspect for the root cause of the dual-head database is indeed that the LUKS decryption messes with the md RAID assembly at boot,
    e. g. some timing issue or race condition. The database content doesn't change
    constantly, there are very few writes per day, so I'll rely on daily backups and monitor the RAID closely. There was another kernel update today, so I'll see what happens after a reboot, which probably was what triggered the issue in the past. If another issue occurs, I'll probably have a chance to find more
    information in the logs now that I know what to look for. If my assumption is confirmed, I'll change the order to RAID before LUKS and restore the data from
    backup. (Or I'll do it anyway out of lack of other ideas...)

    You are right to suspect that. Don't wait and change it even if you can't confirm a bug. The good and safe way is
    partition > RAID array > LUKS > filesystem
    And you should also improve performances.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Tuesday, June 23, 2026 10:30:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On Tue, Jun 23, 2026 at 08:00:11AM +0200, Michel Verdier wrote:
    On 2026-06-22, tomas@tuxteam.de wrote:

    RAID is always before LUKS : partition > RAID array > LUKS > filesystem

    "Is always" means for you "should always be" or "has to be"?

    "has to be". LUKS encrypt a partition in a unique way. So 2 encrypted partitions are always different and cannot be synced.
    I think that is wrong. You don't sync the *encrypted* partitions (how would you?) but the decrypted block layer, one level up. I don't see a reason it wouldn't work.
    As far as I understand OP, their case is the other way around (and I don't see why it shouldn't be technically possible: a block device is a block device is a block device, after all).

    Perhaps the problem but I don't have enough informations on its
    installation.
    OP's initial description was (to me) so clear that I think I understood
    it.
    Cheers
    --
    tom s


    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Tuesday, June 23, 2026 10:40:02
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On Tue, Jun 23, 2026 at 08:28:50AM +0200, Michel Verdier wrote:
    On 2026-06-22, Paul Leiber wrote:

    RAID is always before LUKS : partition > RAID array > LUKS > filesystem >> "Is always" means for you "should always be" or "has to be"?

    As far as I understand OP, their case is the other way around (and I don't >> see why it shouldn't be technically possible: a block device is a block
    device is a block device, after all).

    Tomas' description of my setup is correct, LUKS before RAID. It has
    been working in the past, and it is working right now again. Is this
    type of setup recommended? I don't know. BTRFS doesn't show any issues
    with this setup.

    So Tomas found your problem. It is at best useless to have
    partition > LUKS > RAID array > filesystem
    I strongly disagree here.
    I cannot see how it managed to work. It suppose the 2 LUKS are identical which is a nonsense. Also a small change in data gives a bigger change in
    a LUKS partition thus bigger to sync. I don't know enough about LUKS but
    I suppose you loose LUKS atomicity during sync.
    Not the LUKS are identical. Their decrypted layers are, ideally. Of
    course this costs additional processing power (you have to de-/encrypt
    things twice), and I don't (yet) see an advantage to this scheme, but
    it is definitely feasible.
    Cheers
    --
    t


    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Tuesday, June 23, 2026 14:30:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 23.06.26 um 10:22 schrieb tomas@tuxteam.de:
    On Tue, Jun 23, 2026 at 08:00:11AM +0200, Michel Verdier wrote:
    On 2026-06-22, tomas@tuxteam.de wrote:

    RAID is always before LUKS : partition > RAID array > LUKS > filesystem >>> "Is always" means for you "should always be" or "has to be"?
    "has to be". LUKS encrypt a partition in a unique way. So 2 encrypted
    partitions are always different and cannot be synced.
    I think that is wrong. You don't sync the *encrypted* partitions (how would you?) but the decrypted block layer, one level up. I don't see a reason it wouldn't work.

    Thomas is correct. The decrypted devices are assembled and synced, not the encrypted devices.

    And my experience shows that it is not mandatory to have RAID before LUKS. My btrfs RAID1 has been running for years in this way without any issue. Right now, the md RAID1 is doing what it should be doing. (Yeah, that's right, I am watching you, md0!) David has even pointed out in another mail that it is mandatory to use LUKS before RAID in special cases:

    Am 22.06.26 um 10:42 schrieb David Christensen:
    When you have a layer that combines RAID, volume management, and filesystems, such as ZFS and btrfs, the stackable encryption layer must be underneath (e.g. the latter of above two I/O layering configurations).

    I think this is technically correct, as btrfs is a filesystem and doesn't provide a block device that can be encrypted via LUKS, IIUC. Please correct me if I am wrong. (Coming to think of it: The btrfs RAID1 was first on the disk, the md RAID1 came much later. Most likely I just transferred the way the btrfs RAID1 is set up to the md RAID1, without thinking.)

    Now, am I saying that LUKS before *md* RAID is a smart setup? No, I am not. Probably there are good reasons to do it the other way round. And I still think it is likely that my issue comes from some hickup in the way the RAID is assembled at boot from the decrypted devices. However, I recommend to not jump to conclusions before we have more data on this. And it might be beneficial to actually find out what the root cause for my issue is in order to be able to fix it. If I have been doing it this way, chances are that somebody else is doing it this way as well...

    Anyway, thanks, I learned a lot again!

    Paul

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Paul Leiber@3:633/10 to All on Tuesday, June 23, 2026 14:30:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    Am 22.06.26 um 10:06 schrieb David Christensen:

    (...)


    Is your OS on the btrfs mirror?˙ I have found that putting the OS on a dedicated SSD makes operations, maintenance, trouble-shooting, disaster preparedness/ recovery, etc., much easier.

    No, the OS is on a single, separate SSD. The two hard drives (with BTRFS and md RAID1s) are for data storage only. I was considering to create another RAID1 for the OS with a second SSD, but so far I refrained from doing so.

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Christensen@3:633/10 to All on Tuesday, June 23, 2026 19:40:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/23/26 01:22, tomas@tuxteam.de wrote:
    On Tue, Jun 23, 2026 at 08:00:11AM +0200, Michel Verdier wrote:
    On 2026-06-22, tomas@tuxteam.de wrote:
    On 6/21/26 23:18, Michel Verdier wrote:
    RAID is always before LUKS : partition > RAID array > LUKS > filesystem >>>
    "Is always" means for you "should always be" or "has to be"?

    "has to be". LUKS encrypt a partition in a unique way. So 2 encrypted
    partitions are always different and cannot be synced.

    I think that is wrong.


    +1


    Both configurations work, but have different performance and security considerations:

    * partitions > RAID > encryption > filesystem

    Will encrypt the RAID virtual block device, saving CPU cycles and requiring one passphrase and/or key.

    * partitions > encryption > RAID > filesystem

    Will encrypt each partition, arguably improving security but
    requiring more CPU cycles and passphrases/ keys.


    My SOHO file server uses ZFS, which combines RAID > filesystem. (ZFS
    native encryption has issues, so I avoid it.) So, the file server must
    use a variation of the above latter I/O layering configuration:

    partitions > encryption > ZFS


    David

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Tuesday, June 23, 2026 20:20:01
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On Tue, Jun 23, 2026 at 10:33:01AM -0700, David Christensen wrote:
    [...]
    Both configurations work, but have different performance and security considerations:

    * partitions > RAID > encryption > filesystem

    Will encrypt the RAID virtual block device, saving CPU cycles and requiring one passphrase and/or key.

    * partitions > encryption > RAID > filesystem

    Will encrypt each partition, arguably improving security but requiring more CPU cycles and passphrases/ keys.
    Actually it would reduce security, IMO, because the opponent would have
    to find just one of both keys (the content is mirrored), thus potentially reducing the key strength by one bit. Not a big deal, granted :)
    Cheers
    --
    tom s


    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Christensen@3:633/10 to All on Wednesday, June 24, 2026 00:10:02
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/23/26 11:10, tomas@tuxteam.de wrote:
    On Tue, Jun 23, 2026 at 10:33:01AM -0700, David Christensen wrote:

    [...]

    Both configurations work, but have different performance and security
    considerations:

    * partitions > RAID > encryption > filesystem

    Will encrypt the RAID virtual block device, saving CPU cycles and
    requiring one passphrase and/or key.

    * partitions > encryption > RAID > filesystem

    Will encrypt each partition, arguably improving security but requiring >> more CPU cycles and passphrases/ keys.

    Actually it would reduce security, IMO, because the opponent would have
    to find just one of both keys (the content is mirrored), thus potentially reducing the key strength by one bit. Not a big deal, granted :)

    Cheers


    I agree that successfully cracking two or more disks from an encrypted
    RAID will give an attacker greater confidence in the resulting data and metadata.


    But I would expect a cracking algorithm for an encryption layer with
    on-disk cryptographic details (e.g. LUKS header) would primarily attack
    those on-disk cryptographic details:

    * Assuming a brute-force cracking algorithm, each crack attempt (e.g. passphrase and/or key generated by an iterator) is an independent trial
    and the work is readily partitioned across multiple computers working in parallel. So, cracking 1 LUKS header with N computers will take the
    same average time as cracking any one of 2 to N different LUKS headers
    with N computers.

    * What an attacker wants is a cracking algorithm where each new cracking attempt leverages the results from previous failed attempts. AIUI LUKS, dm-crypt, and other professional cryptographic systems are specifically designed to thwart such. But if you design such an algorithm, you could become famous, make money, become an enemy of the state, go to prison,
    flee into exile, etc..


    I was thinking of what happens if a disk fails, the sysadmin disposes of
    the disk, an attacker obtains the disk, and the attacker successfully
    cracks the encryption. The attacker now has all or part of the
    plaintext data, the plaintext metadata, and the plaintext cryptographic details at the time the disk failed:

    * If encryption was applied on top of RAID and the attacker obtains a
    second encrypted disk, the attacker can use the plaintext cryptographic details from the first disk to crack the second disk. This could be as
    simple as entering the passphrase and/or key from the first disk.

    * If encryption was applied under RAID and the sysadmin used different
    strong passphrases and/or keys on every disk, the plaintext
    cryptographic details from any one cracked disk will not help to crack additional encrypted disks.


    David

    --- PyGate Linux v1.5.17
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Wednesday, June 24, 2026 07:50:02
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On Tue, Jun 23, 2026 at 03:04:38PM -0700, David Christensen wrote:
    On 6/23/26 11:10, tomas@tuxteam.de wrote:
    On Tue, Jun 23, 2026 at 10:33:01AM -0700, David Christensen wrote:

    [...]

    Both configurations work, but have different performance and security considerations:

    * partitions > RAID > encryption > filesystem

    Will encrypt the RAID virtual block device, saving CPU cycles and requiring one passphrase and/or key.

    * partitions > encryption > RAID > filesystem

    Will encrypt each partition, arguably improving security but requiring
    more CPU cycles and passphrases/ keys.

    Actually it would reduce security, IMO, because the opponent would have
    to find just one of both keys (the content is mirrored), thus potentially reducing the key strength by one bit. Not a big deal, granted :)

    Cheers


    I agree that successfully cracking two or more disks from an encrypted RAID will give an attacker greater confidence in the resulting data and metadata.
    No, no: I meant the attacker has to crack *just one of two*, thus
    potentially halving the search time (assuming enough parallelism,
    which seems a semsible to assume in these crazy days we live in).
    But I would expect a cracking algorithm for an encryption layer with on-disk cryptographic details (e.g. LUKS header) would primarily attack those
    on-disk cryptographic details:

    * Assuming a brute-force cracking algorithm, each crack attempt (e.g. passphrase and/or key generated by an iterator) is an independent trial and the work is readily partitioned across multiple computers working in parallel. So, cracking 1 LUKS header with N computers will take the same average time as cracking any one of 2 to N different LUKS headers with N computers.
    Now that makes sense to me: space ? time is constant, you double the
    one and halve the other. You're right.
    * What an attacker wants is a cracking algorithm where each new cracking attempt leverages the results from previous failed attempts. AIUI LUKS, dm-crypt, and other professional cryptographic systems are specifically designed to thwart such. But if you design such an algorithm, you could become famous, make money, become an enemy of the state, go to prison, flee into exile, etc..
    I'd expect that, yes. Current attacks seem to concentrate on the PBKDF,
    that's why argon2, specifically argon2id [1] [2] is currently recommended
    (it makes highly parallel attacks by SIMD GPUs difficult)
    I was thinking of what happens if a disk fails, the sysadmin disposes of the disk, an attacker obtains the disk, and the attacker successfully cracks the encryption. The attacker now has all or part of the plaintext data, the plaintext metadata, and the plaintext cryptographic details at the time the disk failed:
    Never do that. If the electronics still work to dd to the first sectors
    of the disk, by all means, do.
    * If encryption was applied on top of RAID and the attacker obtains a second encrypted disk, the attacker can use the plaintext cryptographic details
    from the first disk to crack the second disk. This could be as simple as entering the passphrase and/or key from the first disk.

    * If encryption was applied under RAID and the sysadmin used different
    strong passphrases and/or keys on every disk, the plaintext cryptographic details from any one cracked disk will not help to crack additional
    encrypted disks.
    Which you don't need to, since we are talking RAID1, and they should
    have (roughly ;) equal content.
    Other RAID schemata are different, granted.
    Cheers
    --
    t


    --- PyGate Linux v1.5.18
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From David Christensen@3:633/10 to All on Wednesday, June 24, 2026 19:40:02
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    On 6/23/26 22:39, tomas@tuxteam.de wrote:
    On Tue, Jun 23, 2026 at 03:04:38PM -0700, David Christensen wrote:
    I was thinking of what happens if a disk fails, the sysadmin disposes of the >> disk, an attacker obtains the disk, and the attacker successfully cracks the >> encryption. The attacker now has all or part of the plaintext data, the
    plaintext metadata, and the plaintext cryptographic details at the time the >> disk failed:

    Never do that. If the electronics still work to dd to the first sectors
    of the disk, by all means, do.


    Using software to write zeroes to a drive will get the sectors that the
    drive controller allows the host to see, but HDD failed/ remapped
    sectors will still contain content; as will dirty SSD/ USB flash sectors waiting to be erased. If a skilled attacker obtains the drive at this
    point, the remaining data could be compromised.


    ATA Secure Erase is supposed to get more (all?) sectors, but I do not
    know what happens with broken sectors.


    I have heard of people using magnetic erasers for magnetic HDD's.


    I have heard of disk shredding and/or incineration services, but that is
    above my scale.


    My practice has been zeroes and/or secure erase, followed by a 3 pound drilling hammer.


    David

    --- PyGate Linux v1.5.18
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Nicolas George@3:633/10 to All on Wednesday, June 24, 2026 19:50:02
    Subject: Re: How to salvage a degraded mdadm RAID1 with as little data loss as possible?

    David Christensen (HE12026-06-24):
    I have heard of disk shredding and/or incineration services, but that is above my scale.

    I confirm. My chief organized that for our school last year. I think
    they used a hydraulic press, but I was not involved in the process and
    did not go near and see.

    Regards,

    --
    Nicolas George

    --- PyGate Linux v1.5.18
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)