Jump to content

  • Log in with Facebook Log in with Twitter Log In with Steam Log In with Google Sign In
  • Create Account
Photo

Servers & Teamspeak Down

* * * * * 3 votes

  • Please log in to reply
107 replies to this topic

#61
Unknown

Unknown

    L9: Master

  • Co-Leader
  • 2,491 posts
1,899
Name known to all
  • EvolveHQ:faunknown
  • Admin:19
  • Server:Jay1
  • Alias:Unknown*
  • T-M:ET: 29-16

Donator

Servers are back :) 




Click here to login or here to register to remove this ad, it's free!

#62
daredevil

daredevil

    Profiler

  • Administrators
  • 25,166 posts
14,970
Guardian of the faith
  • Xbox Live:hellreturn
  • EvolveHQ:hellreturn
  • Admin:21
  • Server:None
  • Alias:hellreturn
  • T-M:1-0
Contributor

*
POPULAR

Had received old hard drive as replacement which I rejected and requested to get brand new one. Which delayed things by 24 hrs. 

 

All servers should be up and running. I need to re-configure admin mods for COD4 servers i.e. install WAMP stack again. So I will get admin mod going for COD4 server by this weekend but all servers should be up.

 

Massive camping would be required to attract our regulars back. I am going to work on few more things over the weekend. 

 

Sorry for the down time since I was on mini vacation. I will post the pics of vacation. Sorry for the delay once again. 

 

PS One more server restart would be required but I will try to schedule it for next week midnight or so. Need to remove old non working HD. 

 

Some questions were posted and I will try to answer them:

1. We don't have RAID 1 on US machine. If we do that cost would add up to 25$/month. Sorry, considering donations through out the year, it's not possible for all servers. Our 2 Euro machines are on RAID 1 i.e. SSD + 3TB HD's. i.e. 2 on each machine. 

2. Hot swap is not possible. Hot swap add's xtra cost and since we don't run Enterprise level services, we never added that. In fact it would add extra price as well.

3. In last 6 years first time US machine got this long down time and that too due to I was on vacation or else it would be up in next 3 days or so. 

4. We have monitoring system. Server goes down, datacenter checks what's wrong and does basic analysis which detected HD failure. 

5. Adding nagios just add's over head on the system and since we don't run critical system, I haven't added yet on Windows machine. Linux has it own goodies.

 

Luckily this time we where able to recover 80% of data from bad hard drive. I have started downloading back up to my local hard drive so it would take appx 1-2 days @ 100GB of data with BW cap so server ping does't increases.

 

New hard drive is SATA 6Gb/s compare to old one i.e. 6 yr old one of 3Gb/s. 

 

PS Server lag which players noticed on jay2 since last one week or so was due to HD failure going on. It should be all fixed.

PS2 I have enabled ICMP replies so if you still notice lag fire up winMTR and let's get the network reports rolling. 
 

PS3 Might have to tweak network drives for max performance but that's being it. 

PS4 Control panel access @ leaders @ FTP access - I will get it going soon. 

 

Pretty much sums up whole topic.  Thank you for the donations and love towards =F|A=. I am not politician so will not say sugar coated things but I would just say, thank you, thank you and thank you! Might go for RAID 1 on US Machine as well if I can spare some extra funds. 



#63
bigbro

bigbro

    L8: Grand Teacher

  • ET Member
  • 1,951 posts
881
I am just really nice
  • Admin:14
  • Server:NQ1
  • Alias:BigBro
  • T-M:4-2
I will camp as much as I cam

#64
+ TheJuice

TheJuice

    L4: Apprentice

  • + Silver VIP
  • 456 posts
363
Has a spectacular aura
  • Admin:12
  • Server:Jay2
  • Alias:=F|A=TheJuicy
  • Steam ID:k1nm0f0
  • T-M:2-0

i will camp much in jay2 :D i can finally go home?



#65
daredevil

daredevil

    Profiler

  • Administrators
  • 25,166 posts
14,970
Guardian of the faith
  • Xbox Live:hellreturn
  • EvolveHQ:hellreturn
  • Admin:21
  • Server:None
  • Alias:hellreturn
  • T-M:1-0
Contributor

i will camp much in jay2 :D i can finally go home?

 

Yes :)



#66
Night Hunter

Night Hunter

    Through the darkness

  • News Reporter
  • 3,820 posts
2,956
Is a bearer of wisdom
  • Xbox Live:Night Hunter322
  • EvolveHQ:Night-Hunter
  • Admin:17
  • Server:Jay1
  • Alias:Panth3r-Night Hunter
  • Steam ID:STEAM_0:0:57409882
  • T-M:ET: 18-13
    COD4: 2-0

Donator

Contributor

Sorry for the down time since I was on mini vacation. I will post the pics of vacation. Sorry for the delay once again. 

 

Danke!, No problem wait few days at least for me it's no problem lol! nice to see than mostly of data are recovered.

 

Il wait see that pics soon!


Edited by Night Hunter, 06 August 2015 - 08:50 PM.


#67
Unknown

Unknown

    L9: Master

  • Co-Leader
  • 2,491 posts
1,899
Name known to all
  • EvolveHQ:faunknown
  • Admin:19
  • Server:Jay1
  • Alias:Unknown*
  • T-M:ET: 29-16

Donator

(...) We don't have RAID 1 on US machine. If we do that cost would add up to 25$/month. Sorry, considering donations through out the year, it's not possible for all servers. Our 2 Euro machines are on RAID 1 i.e. SSD + 3TB HD's. i.e. 2 on each machine. (...)

Please specify in short the hard drives we have. (brand and model without specs)



#68
daredevil

daredevil

    Profiler

  • Administrators
  • 25,166 posts
14,970
Guardian of the faith
  • Xbox Live:hellreturn
  • EvolveHQ:hellreturn
  • Admin:21
  • Server:None
  • Alias:hellreturn
  • T-M:1-0
Contributor

Please specify in short the hard drives we have. (brand and model without specs)

 

US machine - Windows

500GB - WD - 7200 RPM.

 

Euro machine 

1. Windows machine - RAID 1 - SSD - 256GB Intel 530

2. Linux machine  - RAID 1 - 2TB Seagate

 

 

@ COD4 - Admin mod for Beginners Fixed. HC admin mod is going to get some time since i need to fix B3 bot i.e. install MySQL DB tables again for it.



#69
Unknown

Unknown

    L9: Master

  • Co-Leader
  • 2,491 posts
1,899
Name known to all
  • EvolveHQ:faunknown
  • Admin:19
  • Server:Jay1
  • Alias:Unknown*
  • T-M:ET: 29-16

Donator

As a technician, I installed a lot of hard drives in digital and network video recorders. And honestly, hard drives from WD even Black series can work about 3-4 years all the time, then they just simply stop working either of bad sectors or electronic problems. Still, people buy WD black drives due to 5 years warranty.

Personally, I recommend Seagate hard drives. 

 

In terms od SSD, Intel and Kingston are my fav ones. Avoid Crucial SSD drives, they break on warranty.


Edited by Unknown*, 06 August 2015 - 09:24 PM.


#70
SiD

SiD

    Captain Jerkface

  • COD Member
  • 2,856 posts
4,249
Glorious beacon of light
  • Admin:15
  • Server:None
  • T-M:5-4

Donator

Lookit dis mofo. All on a roll'n'shit. Go Dare, go!

 

Good news all around. /cheer



#71
menatwork

menatwork

    L3: Novice

  • Regular User
  • 175 posts
128
On the road to fame
  • Admin:7
  • Server:Silent #1
  • Alias:w00tw00t

Had received old hard drive as replacement which I rejected and requested to get brand new one. Which delayed things by 24 hrs. 

 

All servers should be up and running. I need to re-configure admin mods for COD4 servers i.e. install WAMP stack again. So I will get admin mod going for COD4 server by this weekend but all servers should be up.

 

Massive camping would be required to attract our regulars back. I am going to work on few more things over the weekend. 

 

Sorry for the down time since I was on mini vacation. I will post the pics of vacation. Sorry for the delay once again. 

 

PS One more server restart would be required but I will try to schedule it for next week midnight or so. Need to remove old non working HD. 

 

Some questions were posted and I will try to answer them:

1. We don't have RAID 1 on US machine. If we do that cost would add up to 25$/month. Sorry, considering donations through out the year, it's not possible for all servers. Our 2 Euro machines are on RAID 1 i.e. SSD + 3TB HD's. i.e. 2 on each machine. 

2. Hot swap is not possible. Hot swap add's xtra cost and since we don't run Enterprise level services, we never added that. In fact it would add extra price as well.

3. In last 6 years first time US machine got this long down time and that too due to I was on vacation or else it would be up in next 3 days or so. 

4. We have monitoring system. Server goes down, datacenter checks what's wrong and does basic analysis which detected HD failure. 

5. Adding nagios just add's over head on the system and since we don't run critical system, I haven't added yet on Windows machine. Linux has it own goodies.

 

Luckily this time we where able to recover 80% of data from bad hard drive. I have started downloading back up to my local hard drive so it would take appx 1-2 days @ 100GB of data with BW cap so server ping does't increases.

 

New hard drive is SATA 6Gb/s compare to old one i.e. 6 yr old one of 3Gb/s. 

 

PS Server lag which players noticed on jay2 since last one week or so was due to HD failure going on. It should be all fixed.

PS2 I have enabled ICMP replies so if you still notice lag fire up winMTR and let's get the network reports rolling. 
 

PS3 Might have to tweak network drives for max performance but that's being it. 

PS4 Control panel access @ leaders @ FTP access - I will get it going soon. 

 

Pretty much sums up whole topic.  Thank you for the donations and love towards =F|A=. I am not politician so will not say sugar coated things but I would just say, thank you, thank you and thank you! Might go for RAID 1 on US Machine as well if I can spare some extra funds. 

 

Wow! This was the status update I was waiting on seeing. Thank you for your work & for the time you took. I hope you had a nice vacation (these things always happen whenever you're going on vacation).

 

If you don't mind, I would like to make some suggestions (obviously I'm not trying to tell you how to do your job but merely trying to help in some way). Your bullet points one by one:

 

1: This is understandable. Though does the budget allow for $6.99/m? If so - you might wish to consider a kimsufi server (kimsufi is an OVH brand. It's a huge provider in Europe and they've got a few (think it's 4 or 5) data centers they build and operate) - Kimsufi is their 'cheapest' brand which they use to convince people to move to their 'soyoustart.com' brand and eventually their main OVH brand. The servers of kimsufi are extremely cheap the cheapest going at $6.99 with a 500GB hard disk which should be plenty of space to make backups to. I'm actually also vouching for them (I've used them - currently on their soyoustart brand - Hint! Soyoustart also sells dedicated gameservers - might be worth looking into!).

 

If you can consider this - I would definitely recommend you look into rsync as well. rsync essentially makes backups - but instead of 'full' backups - it'll be incremental (except for the first of course, the first backup will be full - obviously) and it might prevent having to recover an entire hard disk - and transferring data from one server to another would be reasonably fast in case it's ever needed.

 

2. Yes, this is also understandable. One of my other suggestions was to set up backup 'shims' - essentially for ET this would mean a small ET server running on another server to which people would connect and they'll get the "Server is full, go to <different server>" - in this case, the DNS entry for silent.clan-fa.com could be updated temporarily or the IP re-routed (if the servers are at the same providers that is). The suggestion I'm making here would be that the 'server is full' message would actually say 'The server is under maintenance' or something like that and it'll allow people to get 'caught and redirected' to another server which would be more informative and I would guess it'd help minimize player-losses. Heck, it doesn't even need to be an actual ET server with a config - it merely needs to be a program mimicking ET and it's redirect. Something to consider at least as part of a 'rapid response' in case shit hits the fan.

 

* in fact, the IP wouldn't even need to be re-routed - one could essentially boot a Linux livecd; set up iptables to redirect packages to another IP if needed temporarily.

 

3. It always happen when one goes onto vacation! Generally when I go on holiday - I tend to let my servers know I'll beat 'm up if they break whilst I'm away (They have yet to actually listen, defiant machines I'll tell ya).

 

4. I get this as well - but perhaps combining with the 1st-point it wouldn't be that much overhead - 'backup' server would run the monitoring installation and the other servers merely an agent reporting the statuses; all in all, a simple I/O check could've given you an edge and it might even have allowed you to prevent this entire issue all together. (now there's no guarantee, but - it's why monitoring is so important - it seems like an overhead but in the situations I've been in [I actually had an internship at a data center a few years ago and I've worked with quite a few of their clients and whilst we monitored the boxes for being up/down we didn't do extensive monitoring on disks and the likes since that wasn't our place - though a box went down and the owner called for us to check it out - we would [if covered under the agreement; otherwise we couldn't legally touch the box]) - I really can't stress this enough though - monitoring = information = rapid response & preemptive preventing problems - they give a HUGE amount of insight that can also be used to uncover performance issues and the likes.

 

5. Nagios is actually something that generally runs on Linux (though you can also use it to monitor Windows machines) - what goodies are you referring to specifically though? top, htop, iotop? 

 

----

 

Once again I sincerely hope you had a great vacation and thank you for the hard work. I hope you'll take my suggestions under advisement and you should definitely come to Silent or HC one time whenever I'm there, let's see who's the better shooter eh!

 

Take a break man, you deserve it.

 

Cheers!



#72
.KeLFOuTO!r.

.KeLFOuTO!r.

    L6: Expert

  • Clan Friend
  • 964 posts
1,527
Name known to all
  • Admin:10
  • Server:Jay2
  • Alias:.KeLFOuTO!r.
  • T-M:ET: 2-1

Donator

My suggestion to host your servers in one of our datacenters in Europe is still active... ;)

We have a smart servers at 1€ / month which is our first blast offer.

 

And we guarantee no loss of data since we provide disaster recovery warranties and redundancy to avoid such losses...



#73
Aniky

Aniky

    L9: Master

  • Regular User
  • 2,491 posts
2,050
Is a bearer of wisdom

Donator

Great job, hope u had a good time at vacations, if u need anything, let me know :)



#74
Olykiller

Olykiller

    L5: Journeyman

  • Regular User
  • 678 posts
185
On a distinguished road
  • Server:Jay1

on Trackbase FA 2 server shows there are 0 people online but thats not true, no idea how comes



#75
Vanaraud

Vanaraud

    L10: Grand Master

  • ET Member
  • 3,416 posts
823
I am just really nice
  • Admin:15
  • Server:Jay2
  • Alias:Vael
  • T-M:ET:2-1
Contributor

The SSD is running in RAID? Good as I remembered that when SSD-s die, then they die. Recovering data from SSD is near impossible(times harder than from HDD)...

 

And Kingston as SSD? They switched NAND chips for the cheaper ones in the V300 series throughout the production time. Says enough about their trustworthiness... Don´t know though about other SSD\HDD usage for 24\7 but stumbled upon this study and Seagate wouldn´t be my first choise: http://www.pcworld.c...ive-makers.htmlor has there been any developmeants in recent 2-3 years?

Toshibas Desktar series more likely but on the study there was more expensive Deskstar series and exactly that model with that S\N...

 

Just my 2 cents






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users