Posts Tagged ‘fail’

5th April
2011
written by Nick Anderson

April 1st. I had a fairly eventful day, first I found that the phone system at $work has been mis-configured since it was installed. Second I got a call from someone I had worked with a few years back about a system that did not have backups that had a raid hiccup tanking the system a few days ago needing help with a restore.

The phone system

Several months back a new phone system was installed at my office. It has been a small nightmare ever since. Little issues seem to constantly pop up with it. I’ve had reports of people being unable to make long distance calls, but the error I recently found out is the default phone system error “Not enough lines”. So its unhelpful and misleading to say the least. We finally noticed that the errors were specific to long distance numbers with the same area code. So there seemed to be some issue with intralata calling.

I contacted the voice support team, provided them some testing phone numbers and got a ticket opened with the carrier. We all finally got on a conference call to troubleshoot the issue. The carrier said they weren’t getting the +1 and asked if 272 was in the local calling table to which voice support answered “Yes”. Everyone pretty much glazed over that statement because I had to point out that that is a long distanccd tme number. Thats when voice support said, “Yeah it looks like there are over 600 numbers in the local calling table.” Well, here in Lawrence KS there is no way that there are 600 nxx local numbers (nxx is the second set of digits, new info to me). So voice support removed the 272 number from the local calling table and now my test number started working.

Naturally I asked the carrier to provide us a list of local nxx numbers so that we can fix our local calling table. I was surprised that between all the people on the phone from the carrier, none could provide the list. In fact they all seemed to think it would be extraordinarily difficult to get that list. Finally they provided me a link to a form on their website that should spit this info out for me. But of course this form was broken, and every number I tried said that no information was available in the database. Upon relaying that information the carrier proceeded to give me another website to use http://www.localcallingguide.com. Now I don’t know about you, but I don’t know how authoritative the data provided by Raymond Chow from Ontario Canada is. Luckily he does provide an xml query interface so I was able to figure out how to get the list I needed, and told the carrier I would use that for now, but they needed to provide an authoritative list by EOD Monday, its Tuesday and voice support says they still haven’t received the list. If the carrier cant tell you what numbers are local, then they have no business billing you for any long distance!

What? No Backups?

My second April Fools day fun was the old “I have no backups, can you save me?” routine.

The company I used to work for was sold and I helped them migrate to the new companies equipment. They had Xen installed on one dell server, and it ran several virtual machines to run the e-commerce website. Well that dell server has a PERC 6i controller, and 4 drives in a raid 5 (no spare). I know your probably thinking they dropped two drives, but no. They just dropped a single drive. I am pretty sure I have heard of rare data loss on those controllers with just a degraded set. Well thats what appears to have happened to them. The box went into a reboot loop, each time coming up it would kernel panic and reboot again. Did I mention there were NO backups?

We were able to bring up a recovery environment on a bootable USB key and we were able to salvage each virtual disk. Just had to go back through and re-assemble the virtual machines and do a few fix-ups to get everything working again. Unfortunately this consumed the majority of my weekend doing various imports and exports to and from the recovery environment and then the rebuilt production server. Hopefully the close call and several days of downtime even before I got involved will be enough to convince them its time to have backups and verify they work.

I should point out one important thing. Backups are no good unless they are verified. During the recovery I exported three re-assembled virtual machines, all three exported without error but two of the three failed to import. I’m glad I tested them because if I had not no one would know the backups were faulty until they went to restore.

29th June
2010
written by Nick Anderson

Why do applications have such horrible error messages. Non-specific errors are really not any more helpful than not logging at all.

I was recently setting up autofs for mounting home directories from an nfs server. The little buggers refused to work right. All I was seeing in the logfile was a notice that an attempt was made, and failed.

attempting to mount entry /home/nanderso
failed to mount /home/nanderso
How helpful is that? Not very, I can tell you that for sure. I already knew it wasn’t working. Next time you setup autofs make sure to chmod 0644 /etc/auto.*
Tags: , ,
27th January
2010
written by Nick Anderson

Ewwww, scary isn’t it. No Its not Halloween, but you may have entered the twilight zone. Right, I never touch Microsoft products. Well in actuality sometimes I do (I just don’t brag about it). Some of the development at $work uses Microsofts Mediaroom, and I have a “Personal Server” (great name right?) that the developers use. I was trying to install the Mediaroom service pack yesterday and took some notes on the process. Some of my friends found it quite entertaining. I found it quite aggravating as you might imagine. (more…)

15th January
2010
written by Nick Anderson

From time to time  I have not so pleasant support experiences. Today I had another. (more…)

Tags:
24th August
2009
written by Nick Anderson

I am a big fan of chat support. I don’t have to drain my battery waiting on hold until its my turn in the queue. Plus when dealing with error messages its infinitely more helpful to be able to copy/paste to the agent. Sadly finding competent help is still an issue. Here is an excerpt from a recent chat support experience.

(8:39:41 PM) Sheldon He: The SSL will need to be re-issued, since they are IP specific.
(8:40:06 PM) Nick Anderson: ssl certificates are not tied to ip addresses
(8:40:56 PM) Sheldon He: Actually they are. They are domain and IP specific. That’s why it needs a dedicated IP.
(8:43:06 PM) Nick Anderson: ssl is just tied to a name, but you can only run one ssl cert per ip without the Server Name Indicator tls extension to openssl
(8:43:20 PM) Sheldon He: Well, according to our SSLs in the past, the SSL needs to be re-issued when you move from a Shared to a Reseller.
(8:45:40 PM) Sheldon He: I’m unable to change the settings here, this will require an admin. Please email them at [email protected] and we’ll be able to help you with this issue.

Luckily his final answer was correct and whoever answers the support emails was able to complete the ssl cert migration.

On a side note, its things like this that make me tell people to not use shared hosting accounts. Get yourself a VPS or a dedicated server if you want to do anything worth while.


Tags:
Previous

BLOGROLL

ARCHIVE