Posts Tagged ‘linux’
I’m curious what anyone who reads this blog thinks. My first reaction when someone mentions Ubuntu server is to grab the nearest trout and start slapping. Don’t get me wrong I like Ubuntu. It’s very nice on a workstation, and suitable for my wife, mother, aunt, etc …. But do you really think its good enough for prime time in the data center? According to a server-survey conducted by the Ubuntu marketing team almost 80% of users see Ubuntu as ready for mission critical use.
I was quite shocked when I first saw that. Then I dug a bit deeper. Turns out they announced surveys “variety of Ubuntu forums, websites and other channels”. They do note that “consequently, the results are focuses on Ubuntu users and not general Linux users”. I am not saying that I would _never_ run Ubuntu as a server. I just don’t think its proven itself yet. Perhaps in another LTS release or two I may consider it. I also know there is a chicken and egg paradox there. It of course won’t prove itself until people start putting it to the test.
I think this also brings up another interesting point. The survey noted that “95% of respondents considered hardware support as important to very important”. Included in the survey was a chart that listed Ubuntu users hardware preferences. Not surprisingly Tower/Desktop PC topped the list, but Dell servers, HP/Compaq x86 Servers, IBM x86 servers, and SUN x86_64 servers were also on the list. At the time of the report “Ubuntu Server Edition did not come pre-installed on machines from any major provider”. What happened to the desire for “Enterprise Support”?
I have long been greatly annoyed by “Enterprise Support”. More often than not it just gets in my way and to stay supported I have to do things in a less than optimal way. It would not hurt my feelings if the demand for “Enterprise Support” decreased and the demand for “knowledgeable individuals” increased. I would love to take the money spent on “Enterprise Support” and channel it back into training. Unfortunately I just don’t see that happening.
So what do you think? Do you think Ubuntu is ready for production? What does the lack of vendor support say about Ubuntu or the direction of “Enterprise Support” needs? Are you running Ubuntu in production (for anything serious, not your home server …)? Have any good or bad Ubuntu server stories to share?
Monitoring and analyzing performance is an important task for any sysadmin. Disk I/O bottlenecks can bring applications to a crawl. What are IOPS? Should I use SATA, SAS, or FC? How many spindles do I need? What RAID level should I use? Is my system read or write heavy? These are common questions for anyone embarking on an disk I/O analysis quest. Obligatory disclaimer: I do not consider myself an expert in storage or anything for that mater. This is just how I have done I/O analysis in the past. I welcome additions and corrections. I believe it’s also important to note that this analysis is geared toward random operations than sequential read/write workloads.
Let’s start at the very beginning … a very good place to start. Hey it worked for Julie Andrews … So what are IOPS? They are input output (I/O) operations measured in seconds. It’s good to note that IOPS are also referred to as transfers per second (tps). IOPs are important for applications that require frequent access to disk. Databases, version control systems, and mail stores all come to mind.
Great so now that I know what IOPS are how do I calculate them? IOPS are a function of rotational speed (aka spindle speed), latency and seek time. The equation is pretty simple, 1/(seek + latency) = IOPS. Scott Lowe has a good example on his techreplublic.com blog.
Sample drive:
- Model: Western Digital VelociRaptor 2.5″ SATA hard drive
- Rotational speed: 10,000 RPM
- Average latency: 3 ms (0.003 seconds)
- Average seek time: 4.2 (r)/4.7 (w) = 4.45 ms (0.0045 seconds)
Calculated IOPS for this disk: 1/(0.003 + 0.0045) = about 133 IOPS
It’s great to know how to calculate a disks IOPS but for the most part you can get by with commonly accepted averages. Of course sources vary but from what I have seen.
Rotational Speed (rpm) IOPS 5400 50-80 7200 75-100 10k 125-150 15k 175-210
Should I use SATA, SAS or FC? That’s a loaded question. As with most things the answer is “depends”. I don’t want to get into the SATA vs SAS debate you can do your own research and make your own decisions based on your needs, but I will point out a few things.
- SATA only gets up to 10k (at the time of this writing)
- SATA is only 1/2 duplex (From Tomak in comments)
- Differences in reliability (MTBF, BER) interesting article on Real Life Raid Reliability
- See differences in Native Command Queuing (NCQ) and Command Tag Queuing (CTQ)
These factors are key considerations when choosing what kind of drives to use.
What RAID level should I use? You know what IOPS are, how to calculate them and determined what kind of drives to use, the next logical question is commonly RAID 5 vs RAID 10. There is difference in reliability, especially as the number of drives in your raid-set increases but that is outside the scope of this post.
| Raid Level |
Write Operations | Read Operations | Notes |
| 0 | 1 | 1 | Write/Read: high throughput, low CPU utilization, no redundancy |
| 1 | 2 | 1 | Write: only as fast as single driveRead: Two read schemes available. Read data from both drives, or data from the drive that returns it first. One is higher throughput the other is faster seek times. |
| 5 | 4 | 1 | Write: Read-Modify-Write requires two reads and two writes per write request. Lower throughput higher CPU if the HBA doesn’t have a dedicated IO processor.Read-Modify-Write requires two reads and two writes per write request. Lower throughput higher CPU if the HBA doesn’t have a dedicated IO processor.Read: High throughput low CPU utilization normally, in a failed state performance falls dramatically due to parity calculation and any rebuild operations that are going on. |
| 6 | 5 | 1 | Write: Read-Modify-Write requires three reads and three writes per write request. Do not use a software implementation if it is availableRead: High throughput low CPU utilization normally, in a failed state performance falls dramatically due to parity calculation and any rebuild operations that are going on. |
As you can see in the table above, writes are where you take your performance hit. Now that the penalty or RAID factor is known for different raid levels we can get a good estimate of the theoretical maximum IOPS for a RAID set (excluding caching of course). To do this you take the product of the number of disks and IOPS per disk divided by the sum of the %read workload and the product of the raid factor (see write operations column) and %write workload.
Here is the equation:
d = number of disks
dIOPS = IOPS per disk
%r = % of read workload
%w = % of write workload
F = raid factor (write operations column)
Wait a second, where am I supposed to get %read and %write from?
You need to examine your workload. I usually turn to my favorite statistics collector, sysstat. sar -d -p will report activity for each block device and pretty print the device name. I am assuming you already know what block device you are looking to analyze but if your looking for the busiest device just look in the tps column. the rd_sec/s and wr_sec/s columns display number of sectors read/written from/to the device. To get the percentage of read or writes divide rd_sec/s by the sum of rd_sec/s and wr_sec/s.
The equations:
An example from my workstation:
Average for sdb rd_sec/s = 1150.80
Average for sdb wr_sec/s = 1166.53
As you can see my workstation read/write workload is pretty balanced at 49.6% read, and 50.3% write. Compare that to a cvs server (don’t get me started on how bad cvs is, its just something I have to deal with).
Average for sdb rd_sec/s = 27.78k
Average for sdb wr_sec/s = 2.07k
This server workload is extremely high on reads. Ok time to analyze the performance.
In and of itself being a heavy read workload is not a problem. My problem is user complaints of slowness. I note (again from sysstat collected metrics) that the tps or average IOPS on this device is about 574. Again thats not an issue in and of itself, we need to know what we can expect from its subsystem. This device happens to be SAN based storage. The raid set its on is comprised of 4 10kRPM FC drives in a raid 10. Remember from the table above that IOPS for a 10kRPM drive are in the 125-150ish range. We need to calculate the expected IOPS from that raid set using the IOPS equation above, our measured workloads for read/write, the number of disks, and the raid level (10 and 1 are treated the same).
Using the high end of the scale for 10kRPM IOPS per drive results in a maximum theoretical IOPS of 561.79, thats pretty close to what I am observing (remember cache is not taken into account). So based on these numbers it looks like my storage subsystem is saturated. I guess I better add some spindles. Unfortunately there is no historical data for this system so I have no way of knowing how many tps I need to aim for.
Don’t get stuck where I am and have to guess how many spindles need to be added to reduce the pain, start recording your trends now! Even better, once you start collecting your statistical information go ahead and set an alert for 65% or 70% utilization of theoretical max IOPS for an extended period as well as increasingly bothersome alerts going up from there. It’s never good to have to react to performance issues, always better to be proactive. There was absolutely nothing wrong with the sizing of this example raid set 2-4 years ago. Had it been under monitoring the entire time with proper thresholds set a proper plan could have been made, and spindles could have been added before causing users any pain.
If you want to use sysstat like I did, you might find this Nagios plug-in that I wrote helpful check_sar_perf. I use it with Zenoss, but it could be tied into any NMS that records the performance data from a Nagios plug-in.
Go forth, collect, analyze and plan so your users aren’t calling you with issues.
- http://wiki.horde.org/HardwareRequirements
- http://don.blogs.smugmug.com/2007/10/08/hdd-iops-limiting-factor-seek-or-rpm/
- http://blogs.techrepublic.com.com/datacenter/?p=2182
- http://www.sqlservercentral.com/blogs/sqlmanofmystery/archive/2009/12/07/fundamentals-of-storage-systems-raid-an-introduction.aspx
- http://blog.aarondelp.com/2009/10/its-now-all-about-iops.html
- http://adamstechblog.com/2009/02/10/how-to-calculate-iops-ios-per-second/
- http://www.performancewiki.com/diskio-tuning.html
- http://vmtoday.com/2009/12/storage-basics-part-i-intro/
- http://vmtoday.com/2009/12/storage-basics-part-ii-iops/
- http://vmtoday.com/2010/01/storage-basics-part-iii-raid/
- http://vmtoday.com/2010/01/storage-basics-part-iv-interface/
- http://vmtoday.com/2010/03/storage-basics-part-v-controllers-cache-and-coalescing/
- http://vmtoday.com/2010/04/storage-basics-part-vi-storage-workload-characterization/
- http://www.codecogs.com/components/equationeditor/equationeditor.php
Recently a developer came to me and said they are starting to see failed builds apparently due to open file handle limitations on the build server. In case your not aware, by default there are limitations on users to ensure they don’t hog the entire resources of a system. Sometimes these limitations need to be adjusted.
In my case the “bamboo” user needed more than 1024 open files on occasion. I determined my system had a maximum number of open files of 1572928.
$ cat /proc/sys/fs/file-max 1572928
And my bamboo user has a limit of 1024 based on the output of the ulimit -a command run as the bamboo user.
$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 139264 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 139264 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
It seems to be an intermittent problem so I’m pretty sure just doubling the number of open files the bamboo user can have will resolve the issue. To make the change you just need to edit /etc/security/limits.conf I added new hard and soft limits by adding these lines.
bamboo hard nofile 2048 bamboo soft nofile 2048
Now lets just make sure the new limits are in place. No need to reboot just log in as the bamboo user again and run “ulimit -a”.
$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 139264 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 2048 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 139264 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
As you can see open files is now 2048.
I’m not a fan of OSX and I try to avoid it with the same veracity that I avoid Windows. But I recently needed to have a Linux NFS export mounted on an OSX server. A simple mount server:/export /mymountpoint didn’t work and returned “Operation not permitted”. After a bit of digging I found the solution.
I needed to instruct the client to use a privledged port by adding the “-P” option.
mount -o -P nfssrv:/export /mymount
Now to make it persistent of course its not as simple as shoving it in /etc/fstab and running “mount -a”. No OSX has to be difficult. It turned out lookupd got in the way. To fix it I did the following after configuring my fstab.
mkdir /etc/lookupd echo "LookupOrder Cache NI FF" >> /etc/lookupd/mounts kill -HUP `cat /var/run/lookupd.pid` mount -a
Yay that should have mounted your NFS mount and have it be persistent.
Don’t even start with me about how telnet is horrid. Out side of my control but I recently had issues trying to enable telnet on a server. Typically its pretty straightforward.
- yum install telnet-server
- chkconfig telnet on
- chkconfig xinetd on
- service xinetd start
Unfortunately for me this was not working. Every time I tried to telnet to the host after enabling it I would get an error message.
telnet host Trying 203.0.113.10... Connected to host (203.0.113.10). Escape character is '^]'. getaddrinfo: localhost Name or service not known Connection closed by foreign host.
I tried everything I could think of, selinux disabled, ensure localhost in /etc/hosts, connect to ip instead of hostname. Nothing was working. All of my searching was just turning up the obligatory “Don’t use telnet, use ssh”.
While that is generally good advice, in the event you are restricted to using telnet it’s not very helpful. Obviously is something related to name resolution. From both sides the fqdn was resolvable. Then it dawned on me. This environment also has the standard of not using the fqdn as the hostname as set in /etc/sysconfig/network. I had not ensured that the shorthand hostname was resolvable. I resolved the error by adding the hostname in /etc/hosts, but adding a default search domain in /etc/resolv.conf would work just as well.
This leads me back to the error message. Really it had nothing to do with “localhost” or “127.0.0.1″. Had it said “host” Name or service not known I would have chased down the issue much sooner.

Identica
Twitter
LinkedIn