• Thanks for visiting the Kaleidescape Owners' Forum

    This forum is for the community of Kaleidescape owners, and others interested in learning about the system, equipment, services, and the company itself.

    It is run by a group of enthusiastic Kaleidescape owners and dealers purely as a service to this community.

    This board is not affiliated in any way with Kaleidescape, Inc.
    For official technical support, product information, or customer service, please visit www.kaleidescape.com

  • You are currently in "Guest" mode and not logged in with a registered account.

    The forum is free to use and most of the forum can be used by guests who are not registered....

    ... but we strongly encourage you to register for a full account. There is no cost to register for a full account.

    Benefits of registering for a full account:

    • Participate in the discussions! You must have a registered account to make posts on the forums. You will be able to start your own thread on a topic or question, or you can reply to other threads/discussions.
    • Use the "Conversation" feature (known as "private messaging" on other forums) to communicate directly with any of the other users here.
    • Access the Files area. The "resources" area of the forum contains many "Favorite Scene" and Script files that can dramatically increase the enjoyment of your Kaleidescape system. Go directly to great scenes in your favorite movies, created by other owners, and add automation to playback of your system with Scripts.
    • You won't see this annoying notice at the top of every screen!😊

    It's easy and free to register for the forum. Just click the "Register" button in the upper right corner of this page, and follow the instructions there.

Force-fail upgrading of a drive (redux)

josh

Administrator
Staff member
Forum Administrator
Moderator
⭐️⭐️PATRON⭐️⭐️
I know it's been discussed many times over the years, but wanted to see if someone can give me some specific advice.

I have a 3U Premier that has the following drives/config installed
2,2,2,2,2,2,3,3,3,4,4,6,x,6
(x) is the single remaining open slot. The right-most 6tb is blinking, indicating it's in use as the hot-spare, which is as expected

I have a new fresh KDISK 6Tb to install to increase the capacity of my Premier storage.

My inclination is to remove one of the 2TB drives and replace it with this new 6TB, using the "force-fail" method described in this forum for many years. I realize i'll get slightly less storage that way than if I just put this in the last open slot, but I like the idea of fewer spinning hard disks in general for less power draw and fewer things to fail.

Questions:
1) How much more usable storage will I get with my crazy mixed-drive-size setup if I replace a 2TB rather than sticking new 6TB into the last empty slot? I'm still unclear on the algorithm for calculating this in a mixed drive-size environment.

2) If I do the replace-a-2TB, what are the specific steps to take? I see old posts here that recommend taking out the old drive with the power on, and others say shut it down first. Do I wait some period of time with old drive gone before putting in new one? Reboots needed along the way?

3) Is there some way to monitor how the rebuild process is going, and when it is complete and system is back up and running with all redundancy and features restored?

4) Do I need to notify Kaleidescape and/or my dealer so they don't do anything on their end when they see I have a drive "failure"? (if they even know or monitor such things)

5) I know I'll be running a compromised system for the time the rebuild takes, and there's risk in that, including possible loss of all data... any advice on how likely or unlikely that may be for the time it is in a degraded state?

Thanks!

-josh
 
Just did this with my new to me 3u. I actually received it with a dead 750 and 1tb, so no hot spare. Interestingly it will will work and boot, just complain.

I also acquired 5 4tb NOS drives at the same time. Considering the 750gb drives had to be close to running out, I decided to force fail as many as I could.

Given I was very concerned about accidentally bumping to drives, I used the turn off, swap, turn on method. So first go I just added two of the drives. This started the rebuild process and gave me a flashing hot spare.

You can easily see the progress in the main web page. I was averaging about 24 hours to rebuild a 750g drive. If your system has a dealer attached, they will receive an email stating a drive has failed. Once I completed all of the swapping, I did have to get Rusty in support to update my system to reflect the current drive configuration, this is not automatic unless the drive is ordered as an add on for that server.

After the initial two, I would just shut down, swap a 750 for a 4, turn on. The old hot spare would start rebuilding and the new 4tb would flash as the hot. Wait 24h, repeat.

Given you have a 6tb in use and a 6tb spare, increased storage would be 6-2, so 4tb. Given the 2tb are not as old as some drives, I would think the risk is lower then what I went through, but if we could predict when drives are going to fail life would be a while lot easier.

Kevin D.
 
Thanks Kevin.

I’m still confused if the best way of doing it is to pull one of the small drives out while unit is on, and then do I have to wait some time before putting in the new one?
 
I certainly didn't experiment, but as soon as the drive fails the hot spare will start rebuilding. If a new drive is inserted it becomes the hot spare.

Examining the existing drives and even the NOS drives, there's a lot of cracking in the back plastic that's part of the tensioning system to make the retaining clips work.

I just didn't have enough confidence in the retaining system (or more likely myself) that I wouldn't accidently fail a second drive by mistake if the system was running.

Given it only takes about 5 minutes to boot up, I didn't see much of a downside to the turn off method. I literally turned off, replaced a drive, double checked all the other drive locks, turned back on, waited a day, and repeated.

Kevin D.
 
I’m still confused if the best way of doing it is to pull one of the small drives out while unit is on, and then do I have to wait some time before putting in the new one?

If you remove a drive while the server is running, the system will wait 30 seconds before marking that drive as failed/missing. Once that occurs, as Kevin points out, the hot spare will immediately start rebuilding. You should wait for the hot spare rebuild process to begin (blue light on the drive cartridge stops blinking) before inserting the replacement drive (which will then become the hot spare unless it's too small to do so).
 
If you remove a drive while the server is running, the system will wait 30 seconds before marking that drive as failed/missing. Once that occurs, as Kevin points out, the hot spare will immediately start rebuilding. You should wait for the hot spare rebuild process to begin (blue light on the drive cartridge stops blinking) before inserting the replacement drive (which will then become the hot spare unless it's too small to do so).

Perfect. Thanks John.
 
If you remove a drive while the server is running, the system will wait 30 seconds before marking that drive as failed/missing. Once that occurs, as Kevin points out, the hot spare will immediately start rebuilding. You should wait for the hot spare rebuild process to begin (blue light on the drive cartridge stops blinking) before inserting the replacement drive (which will then become the hot spare unless it's too small to do so).

@J.Green - one more question. Is there any possible way to ultimately reduce the number of drives in my system? Would like to pull multiple 2Tb drives and replace them with a fewer number of 6TB drives.
 
Last edited:
John - one more question. Is there any possible way to ultimately reduce the number of drives in my system? Would like to pull multiple 2Tb drives and replace them with a fewer number of 6TB drives.

The only viable way to do this would be to buy a new full set of drives and then replicate the content from your old drives to the new filesystem. Of course, this would require access to a second server chassis in order to do the replication.
 
Not being overly familiar with whatever file system K is using but being dangerously familiar with various raid systems I’m curious if assuming the same total storage space (unformatted) is it more reliable or have more drives on a smaller volume or less drives of a smaller volume? Only assuming drive failures I would say more drives but depending on the raid this isn’t always the case.
 
Wow, I have not logged on in a while and totally missed this thread but will throw in my $0.02 as I have force fail upgraded several servers over the years.

First off, it is more stressful than a replication. For this main reason, Kaleidescape does not recommend using this method, especially with very old systems (8+ years on the drives, I am thinking).

When I ran my last force fail upgrade, I swapped 4TB drives for 6TB drives and I think it took about 48 hours to rebuild the drive each time, making it a 26 day process (The hot spare can be ejected without a problem if you are upgrading all the drives and that saves a couple days if you do that as there is no rebuild process needed there). Still, with 26 days on a force fail upgrade in a longest time possible scenario, it was still far faster than the 48 days it would have taken to replicate off a fully loaded 3U server with 4TB drives.

As for Nomad07's question, RAID provides redundancy but if you lose 2 drives, you have catastrophic data loss. Given that, once one drive is down, the odds of a second drive failing during the window to rebuild the array goes up as you have more drives that can fail. As such, I would always recommend getting the largest drives possible and as few as needed to comfortably house your data for the next 12 months.

There is no way to reduce the # of drives in an array aside from the aforementioned replication process, while will tie up a second chassis for a long time. Now, here is something interesting. You can replicate from a 3U chassis with 18tb or less data onto a 1U server with 4x6tb drives and when finished, remove the 4 drives from the 1U and put them into a 3U chassis and it will work - though the system will expect you to add a 5th drive for a hot spare. If you went this route and were only going to need 12TB of storage, you could replicate onto a 3 drive array in a 1U server and then when you swap into the 3U server, add your 4th drive. A 1U server will never hold aside a drive as a spare, but a 3U will. While this has not been explained why, my guess is there are different raid cards in each chassis and the 3U is configured to have a hot spare while the 1U is not.

Unfortunately, Josh's system cannot be replicated using a 1U but would need a full 3U server to replicate onto. It is 35TG and will upgrade to 39TB when the 2TB drive is replaced with a 6TB drive.

I will say that when I ran my force fail upgrades, my servers were on the best power filters available (balanced power from Equi=Tech) and rack mounted in a properly cooled room. If that isn't available to you, at a minimum I would recommend making certain that the cooling is adequate because the server will be putting a lot more demand on the drives and thus creating more excess heat. It will also be at the maximum power draw for a 3U server as a rebuild is the most intensive operation they handle.

When doing a multiple force fail upgrade, I highly recommend starting with the oldest drives first (which generally coincides with the smallest drives, but if you have drives in that size that had been purchased separately, some might have months or even a year more wear and tear on them and playing the odds, it is best to upgrade the oldest one first).
 
A 1U server will never hold aside a drive as a spare, but a 3U will. While this has not been explained why, my guess is there are different raid cards in each chassis and the 3U is configured to have a hot spare while the 1U is not.

It was simply a matter of scale. Assuming all drives in the file system are the same size, declaring one drive a hot spare in a 1U server would represent too large a loss in available storage space.

Take, for example, a 4 disk file system comprised of 1 TB drives. Without a hot spare, the actual usable storage space on the server will be 3 TB because one of those drives is used for parity. If you want to have a hot spare it would come out of the remaining pool of three drives that are being used for storage which would end up reducing the available storage space to 2 TB.

The same thing happens on a 3U server but with 14 drives to work with, the loss of the storage space of two drives (parity and hot spare) isn't as significant a hit. Plus, more drives equals higher probability of failure so having a hot spare that can immediately replace a failed drive and start rebuilding is a fair trade-off.
 
Last edited:
Back
Top