Collection of my own IT Horror Stories. Feel free to post your own stories in the comments.
Sunday, 7 August 2016
2003 Mysterious D: Drive renegade
I have seen this happen a few times. On a 2003 server you wake up on morning and have the dreaded "This drive is not formatted, would you like to format drive D:".
You normally only see this when you have a faulty USB stick, or a new hard drive that actually does need formatting. However, when you see it on a live server, on their main data partition. Any IT Engineer's heart will sink at this point.
This issue usually occurs when you have a server that is over 5 years old and it's time for a replacement. Usually indicating that the drives are slowly failing and getting hard write errors, and or bad sectors.
I'll start with an example which actually happened after this incident. I once had one of my colleagues call me up in the late evening, in quite a panic. The backup on the server had been failing for about a week, it was getting stuck at a certain point and not finishing.
He tried rebooting the server to see if that would clear the issue. However, when the server came back on, something was different. The D drive was no longer accessible. It was present in 'My Computer', however, when you tried to open it, it came back with 'Drive D not formatted, etc etc, do you want to format?'.
This was a big problem, as the backup was a week old and this was their main application server. I managed to find a nice bit of software called 'Partition Table Doctor'. Which scans disks for missing/corrupt partition information. If you search for that software now, you will find it's been bought out and you need to pay for it. However, the version I found was a nice old freeware copy
It found the partition within seconds and was able to recover it. After rebooting, it was still there and we were even able to run the backup again. A tense situation nicely diffused!
However, they had lost some data. Data that they were accessing recently. I think the software maker was able to help them out in the end. However, we had done our part. We had made the best of the situation.
Now, back to the original story. This one was not a quick fix, it was a much more time consuming recovery.
Interestingly, the situations were very similar. This client also had not had a backup for about a week. The engineer that picked up the case, presumed the drive as unrecoverable. So they loaded up the classic 'Get Data Back for NTFS'. Which was the best data recovery software at the time.
They immediately started running recovery scans on the now RAW partition and then began copying recovered data to the USB drive. Which is slow on USB2!
So already the client has lost a day at this point. As the D drive contained the exchange data as well as their company files and accounting data. So they were basically just left with an operating system!
After data recovery had finished. They then had to rebuild the shared folders and setup the permissions again. However, after copying the exchange database back. It refused the mount, moaning about lost log files. So once again, the same engineer deemed it a lost cause and claimed the database would have to be created from scratch and email would need to be recovered from OST files.
So at this point, I think it was the third day. They sent me to site, to finish things off. I had the thrilling task of going round everyone's machines, opening up their cached outlook profile and backing it up to a PST file. Then creating a new profile, linking up to the new exchange database and importing the data.
I think all of their mail had been collecting in a catch all POP mailbox. So the POP3 connector was going to have a busy night bringing down 2 days of email for all the users!
Their accounting database also wasn't working. I think the .QBW database file was OK, but the TLG (transaction log) was missing as it had not been successfully recovered, so and it refused to open.
So, without any prior knowledge. I copied the .TLG file from the 7 day old backup, into the same folder with the current .QBW database. Which then allowed them to open the database successfully. I know now you aren't supposed to do that, however, it was still significantly better than losing a weeks worth of transactions etc. The customer was delighted, so I would chalk it up to luck on that one.
Then there was a lot of mopping up to do. First the the file permissions, users could not save or edit files as some areas still had the default permissions assigned.
Then there was restoring files from the backup, where the initial file recovery had either missed things or recovered corrupted files.
All in all, a very time consuming process, and I feel with a bit more investigation and prior knowledge, this would have been a much quicker recovery. Even if recovering the partition wasn't possible, then they still may have been able to soft repair the exchange database. Also recovering the permissions should have been possible from the backup, using ICACLS. However, the circumstances and the fact that it's 2003 may not have allowed this. So these are merely observations rather than criticisms as such.
Morale of the story? Again it comes down to backups and making sure they are monitored. Also a quick and brutal decision to format the drive can lead to a longer and more painful recovery process. However, the steps taken may have been time consuming, they did recover most of the data. So in the end, not a bad result.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment