I'm running a pretty beefy zfs server for quite awhile now. Probably 3-4 years. And never really any problems.
I had drives go bad before, I mean it happens especially when I'm running 72 drives (at almost 200TB) for all this time. When I built my server and the chassis I always order an extra drive or 2 for this reason as spares. And when the time comes, a simple offline, hotswap, replace, online, scrub (iirc) and I'm back in business. Couldn't ask for anything simpler.
So this past week one of my drives died. I pulled it, swapped and replaced it without issue. But I noticed another issue with this pool.
The drive that WAS bad was c4t8d7. I've already replaced and got it back online. But if you notice c4t6d5 is 'faulted'. I took it offline when I noticed the fault. Which was strange as I never touched that drive nor was any problems reported on it on my Areca logs. All the previous replaces always alarmed my areca logs first which always prompted me to replace said drive asap.
The other issue is, this says c4t6d5. The drive should be c4t9d5
That drive does exist in my machine and I tried replace the one listed with that.
So I'm wondering what I'm doing wrong or what could possibly be happening.
Just for reference: I'm running ESXI/OI with an areca vt-d and napp-it 0.9b3. That drive listed (c4t6d5) is actually in another pool in the same machine that's functioning perfectly. I just don't understand how this pool ended up with this drive/label in the incorrect pool. Duplicated and at the same time even.
I'll can provide any additional information you think would help or needed.
I had drives go bad before, I mean it happens especially when I'm running 72 drives (at almost 200TB) for all this time. When I built my server and the chassis I always order an extra drive or 2 for this reason as spares. And when the time comes, a simple offline, hotswap, replace, online, scrub (iirc) and I'm back in business. Couldn't ask for anything simpler.
So this past week one of my drives died. I pulled it, swapped and replaced it without issue. But I noticed another issue with this pool.
Code:
root@blackhole:/# zpool status Movies
pool: Movies
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub in progress since Fri May 2 11:42:36 2014
787G scanned out of 52.0T at 602M/s, 24h46m to go
0 repaired, 1.48% done
config:
NAME STATE READ WRITE CKSUM
Movies DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
c4t7d0 ONLINE 0 0 0
c4t7d1 ONLINE 0 0 0
c4t7d2 ONLINE 0 0 0
c4t7d3 ONLINE 0 0 0
c4t7d4 ONLINE 0 0 0
c4t7d5 ONLINE 0 0 0
c4t7d6 ONLINE 0 0 0
c4t7d7 ONLINE 0 0 0
c4t8d0 ONLINE 0 0 0
c4t8d1 ONLINE 0 0 0
c4t8d2 ONLINE 0 0 0
c4t8d3 ONLINE 0 0 0
raidz2-1 DEGRADED 0 0 0
c4t8d4 ONLINE 0 0 0
c4t8d5 ONLINE 0 0 0
c4t8d6 ONLINE 0 0 0
c4t8d7 ONLINE 0 0 0
c4t9d0 ONLINE 0 0 0
c4t9d1 ONLINE 0 0 0
c4t9d2 ONLINE 0 0 0
c4t9d3 ONLINE 0 0 0
c4t9d4 ONLINE 0 0 0
c4t6d5 OFFLINE 0 0 0
c4t9d6 ONLINE 0 0 0
c4t9d7 ONLINE 0 0 0
errors: No known data errors
The drive that WAS bad was c4t8d7. I've already replaced and got it back online. But if you notice c4t6d5 is 'faulted'. I took it offline when I noticed the fault. Which was strange as I never touched that drive nor was any problems reported on it on my Areca logs. All the previous replaces always alarmed my areca logs first which always prompted me to replace said drive asap.
The other issue is, this says c4t6d5. The drive should be c4t9d5
That drive does exist in my machine and I tried replace the one listed with that.
Code:
root@blackhole:/# zpool replace Movies c4t6d5 c4t9d5
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c4t9d5s0 is part of active ZFS pool Movies. Please see zpool(1M).
root@blackhole:/# zpool replace -f Movies c4t6d5 c4t9d5
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c4t9d5s0 is part of active ZFS pool Movies. Please see zpool(1M).
So I'm wondering what I'm doing wrong or what could possibly be happening.
Just for reference: I'm running ESXI/OI with an areca vt-d and napp-it 0.9b3. That drive listed (c4t6d5) is actually in another pool in the same machine that's functioning perfectly. I just don't understand how this pool ended up with this drive/label in the incorrect pool. Duplicated and at the same time even.
I'll can provide any additional information you think would help or needed.