#1  
Old 12-28-2007, 05:41 PM
matta matta is offline
TekTonic Principal
 
Join Date: Aug 2006
Posts: 873
Default Outage: VZ18

Earlier today we noticed VZ18 was giving errors attempting to login related to tr/glibc. Login on the console was attempted next, after the username was entered no password prompt was displayed and instead just another login prompt. A reboot was attempted, grub did not fully load and dropped to the default grub console. We have booted the server with a rescue CD and we cannot login over SSH as it is giving the error "Corrupted MAC on input". A Google search for this error shows that it is indicative of bad RAM or motherboard. At this point we are attempting to bring the server up with minimal RAM to see if that is the issue, otherwise we will swap the motherboard. We did try mounting the partition containing VPS data and it is not listed as a valid filesystem. Once we correct the hardware issue we will be attempting to repair the partition and/or perform a filesystem check. There are no guarantees that any of the data is going to be recoverable. This is not a managed VPS node so no exterior backups exist for the data except what was performed by the VPS administrators.

Our best guess at this point is that the RAM/motherboard going bad went in such a way that the system was still half-operable and corrupted data was written to the hard disks.
__________________
Matt Ayres
Reply With Quote
  #2  
Old 12-28-2007, 07:03 PM
matta matta is offline
TekTonic Principal
 
Join Date: Aug 2006
Posts: 873
Default

The RAM has been swapped and the issue persists. The motherboard is now being replaced.
__________________
Matt Ayres
Reply With Quote
  #3  
Old 12-28-2007, 07:28 PM
matta matta is offline
TekTonic Principal
 
Join Date: Aug 2006
Posts: 873
Default

The filesystem is fscking. We are receiving many errors similar to below.

Code:
Block #7 (131072) causes symlink to be too big.  CLEARED.
Inode 13377707 is too big.  Truncate? yes

Block #7 (131072) causes symlink to be too big.  CLEARED.
Inode 13377750, i_size is 321, should be 8192.  Fix? yes

Inode 13377750, i_blocks is 2, should be 4.  Fix? yes

Inode 13377754, i_size is 389, should be 8192.  Fix? yes

Inode 13377754, i_blocks is 2, should be 4.  Fix? yes

Inode 13377786, i_size is 379, should be 8192.  Fix? yes

Inode 13377786, i_blocks is 2, should be 4.  Fix? yes

Inode 13377803, i_size is 360, should be 8192.  Fix? yes

Inode 13377803, i_blocks is 2, should be 4.  Fix? yes

Inode 13377805, i_size is 286, should be 8192.  Fix? yes
I'm unsure how well this bodes for the data.
__________________
Matt Ayres
Reply With Quote
  #4  
Old 12-28-2007, 10:30 PM
matta matta is offline
TekTonic Principal
 
Join Date: Aug 2006
Posts: 873
Default

FYI We have confirmed when swapping the motherboard that it was indeed the problem. A brand new motherboard is now in the server.
__________________
Matt Ayres
Reply With Quote
  #5  
Old 12-29-2007, 12:02 AM
mptalaga mptalaga is offline
Junior Member
 
Join Date: Sep 2005
Posts: 5
Default

Is the server going through fsck now? My VPS still isn't up.
Reply With Quote
  #6  
Old 12-29-2007, 12:13 AM
boris boris is offline
Junior Member
 
Join Date: Dec 2007
Posts: 5
Default user data integrity / ETA for rebooting?

Yup - so when are things up? and what is the chance user data will be OK?
Reply With Quote
  #7  
Old 12-29-2007, 03:20 AM
matta matta is offline
TekTonic Principal
 
Join Date: Aug 2006
Posts: 873
Default

The fsck is still running. It will still be a few hours. So far it looks like user data will be recovered, although the servers / and /boot partitions are gone which means a reinstall of the host node will need to be performed after the fsck in order to start the VPS's up.
__________________
Matt Ayres
Reply With Quote
  #8  
Old 12-29-2007, 12:09 PM
matta matta is offline
TekTonic Principal
 
Join Date: Aug 2006
Posts: 873
Default

Yes, it is still fscking. However, from the looks of things the data is safe. Please allow time for the fsck to take place, it is necessary for data recovery.
__________________
Matt Ayres
Reply With Quote
  #9  
Old 12-29-2007, 02:46 PM
pythoncoder pythoncoder is offline
Junior Member
 
Join Date: Oct 2005
Posts: 10
Default

I still can not access my server that is on VZ18. I can't SSH or reach it via the control panel. My ip is 207.210.65.103
Reply With Quote
  #10  
Old 12-29-2007, 03:25 PM
matta matta is offline
TekTonic Principal
 
Join Date: Aug 2006
Posts: 873
Default

The server is still fsck'ing. Our HyperSpin page shows it as up since SSH is technically up (it is on a rescue CD). The fsck is taking a very long time and will take quite a few more hours at least. The reason it is taking so long is due to the massive amounts of corruption that took place. I can only ask again to please be patient, we are going through the process in hopes to salvage any data that we can -- even though the unmanaged plans do not advertise backups I know all too well that doesn't mean that most users make them on their own.

If you DO have backups of your own then contact support to get a new VPS setup on different hardware and then you can restore there.
__________________
Matt Ayres
Reply With Quote
Reply
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 06:08 PM.

Powered by vBulletin® Version 3.8.2
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.