Thoughts of a technocrat as letter sequences: December 2010

In a previous article, I described a KVM installation of the new RHEL 6 distro. My next article describes some production XFS considerations. Now that RHEL 6 is running, let's see how it is different under the hood from its predecessor RHEL 5. This was a good (fast and inexpensive) way for all of you to give it a go. In this article, I actually draw conclusions after performing a physical bare metal installation of RHEL 6 on a Dell PowerEdge 1950 server.

The very first impression I got by booting the system is a familiar conservative GNOME 2.28 theme. It works, but it shows its age when compared to the most recent 2.32 GNOME version of its Fedora 14 cousin. The same can be said for the KDE environment: RHEL 6 is at KDE 4.3.0, miles away from KDE 4.5.3. Frankly, if you really want the latest and the greatest on your desktop/laptop, head for the Fedora distro and leave RHEL for your server silicon.

Nevertheless, RHEL 6 does have some desktop improvements over RHEL 5. In summary, it includes support for Kernel Mode Setting (KMS), DR12, OpenGL 3D. The nouveau driver is there, and this means that you are more likely to get a more decent display (my Dell Precision 490 and E6410s did wearing Nvidia gear, although your mileage might vary, depending on your hardware).

On the scripting front-end, Perl has moved to version 5.10.1 (RHEL 5.5 is still on the archaic 5.8.8) and Python is on 2.6.5 (5 was on 2.4.3). For Python funs, there is nothing in the official RHEL repository for Python 3, however, I would expect that in the near future, the EPEL repository and/or the IUS Community Project will package something suitable. An improvement on standard utilities like grep is noticeable (RHEL 6 is on version 2.6.3, whereas RHEL 5 used the ancient 2.5.1). It still beats me why a major commercial distribution (5) still uses 2.5.1. Grep 2.5.1 is so buggy and untrustworthy that I am very surprised why somebody has not screamed at the RedHat release engineers to push an upgrade for something as vital as grep! I understand it is good to be conservative, but not to that extent!

At the heart of the operating system runs a 2.6.32 kernel (2.6.32-71.7.1.el6.x86_64 is the RHEL 6 version at the time I am writing this article), with back-ported features and fixes from the 2.6.33 and 2.6.34 vanilla/mainline source kernel versions. There is a big scalability metric change over the 2.6.18 based kernels in a number of different areas, as shown on the table below.

The big increase in the number of supported CPUs shows a shift of the RedHat development team towards NUMA architectures. With multi-core processors raising the limit every year, suddenly 192 CPU systems are within the range of many mid-range shops and the trend will continue in the future. The same can be said for "fat RAM" nodes, a popular choice for speedy RDBMS and science number crunching platforms. Ticketed-spinlocks, Read-Copy-Update (RCU), and Transparent Huge Pages are some of the techniques the RHEL kernel employs to drive the scalability for CPU and RAM to these levels.

In my honest opinion, the greatest problem in terms of RHEL scalability lies in the area of filesystems. The inadequate 16 TB of the officially supported ext3 of RHEL 5 is now raised to 100 TB in RHEL 6, by means of providing the option of officially supported XFS. RHEL 6's (and RHEL 5.6's) ext4 is still on 16 Tbytes and the only thing it provides is slightly better performance and much improved fsck times.

I was always curious to do a head to head comparison between ext3 and ext4 on the same OS platform, so I fired up a RHEL 6 test node on a Dell PowerEdge 1950 with 8 cores and 16 GB of RAM and setup a RAID 50 (21 x SATA 1Tb) 12 TB partition on hardware RAID controller. Using iozone I provide the comparative results of writing a 32 Gig file (2 x size of RAM) on that 12 TB partition.

The results indicate that ext4 is not such a big improvement over ext3. However, when I mungle some inodes and bring the fs into an inconsistent state to simulate an fs failure, fsck-ing the ext3 filesystem takes me 2.5 hours, whereas doing the same on the ext4 restores the same filesystem in just under 14 minutes! That's a big difference, so for all of you that run multi-TB ext based fs-es in production environments, ext4 is a big plus only for that reason.

XFS is a much more scalable filesystem that ext4. The latter is clearly a marginally better performing filesystem than ext3 and is a temporary step to the roadmap for a more scalable next generation solution. My view is that RedHat is moving to the right direction, but it needs a lot more than the 100 TB it currently supports on XFS, to get a good slice of the Enteprise storage market. Just think of all of these people that run their Netapps/ Panasas solutions or all the folk that really love their ZFS on Solaris. The market is there, RedHat has the experties and it should address the problem really soon. My comment addresses also the clustered filesystem area, where GFS2 is traditionally employed by RedHat. Apart from minor differences in RHEL6 GFS2 bundling, I read no substantial improvements in that area.

Moving on to a more realistic workload comparison between RHEL 5 on ext3 and RHEL 6 on ext4, I could not think of a better example than a heavy duty RDBMS task. Thus, I utilized a MySQL 5.1.50 server on RHEL 5.5 box (2.6.18-194.26.1.el5) to perform a heavy duty task on the 12 TB partition (booting from two sets of identical system disks, in order to switch between RHEL 5 and 6). The workload comes from the field of bioinformatics. The task had 3 steps:

Step 1: Unzipping the entire EMBL release 105 nucleotide database. The release consists of 105 .gz compressed files of total size approximating 138 GB. A small Perl script that forks 8 instances (1 for each available processing core) unzips each one of the 105 files in sequence, without saturating the SATA I/O controller. The end result is just under 1 TB of disk space with all the files uncompressed.
Step 2: A second Perl parsing script utilizes the DBI interface and the DBD::mysql driver to read all the entries to an InnoDB MySQL table. That's right, the whole lot of 194 million entries are going into a single InnoDB table.
Step 3: A text index is created on certain columns of the InnoDB MySQL table that concern the feature table of the sequence database. This is mainly a processor/memory intensive task.

The above tests were performed by using the deadline I/O scheduler:

echo "deadline" > /sys/block/[your_block-disk_identifier_here]/queue/scheduler)

This ensures an overall low latency for I/O tasks which can yield notable performance differences over the default cfq scheduler. The results of this benchmark are given below and they show substantial improvements in RHEL 6.

In total, RHEL 5 took 36.3 hours to complete the workload. In contrast, RHEL 6 reduced the time of completion to 29.1 hours, a staggering difference. The greatest difference can be attributed to the last step (MySQL creation stage). Step 2 is in the middle, mainly due to the difference in Perl versions and drivers (5.8.8 in RHEL 5 versus 5.10.1 in RHEL 6).

An additional area where I normally look into when I benchmark systems is that of network device and protocol performance. In my production environment, I have to use NFS heavily. This is a mixed blessing, as NFS was, is and will be a grey area, when it comes to tuning and performance but on the other hand, it is one cheap and easy way to distribute large data sets into a moderate amount of nodes. RHEL 6 introduces two major differences in this area:

Default use of NFS v4: Amongst RHEL 6 boxes, the default NFS protocol version is v4. RHEL 5 used v3 by default. I am not going to write on essay on NFS v4 features, I am sure most of you know your NFS or can look it up yourselves. The great difference here is that you now have solid IPv6 and encryption support for NFS v4, as well as the ability to combine it with FS-Cache. This latter feature can increase the performance in some data distributions scenarios.
Kernel networking enhancements: These include improved drivers for FCoE (Fiber Channel over Ethernet) and RDMA support over 10 G Ethernet (RoCE) and Infiniband for low latency networking.

In order to place some of these features to the test, I have setup a small computer cluster of three RHEL 6 clients (8 x core clients) to talk to a single RHEL 6 NFS server over NFS v4. The interconnect was a 10G link point-to-point connections, using Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connection adapters. An FS-Cache setup was employed to read-only access 551 files of 105 GB. The files were data sets for the NCBI Blast database utility, a popular tool for life scientists. Under RHEL 5, I used to NFS v3 mount them as read-only to the client nodes. Lots of sysadmins rsync those files to local client disks for better performance. This is also my normal practice , however, here I want to test the two different NFS version scenarios: NFS v3 without FS-Cache in RHEL 5 versus NFS v4 with FS-Cache in RHEL 6.

The results were a bit strange. At the beginning, when I actually had one or two clients accessing the NFS imported data in RHEL 6, things were substantially slower with FS-Cache than the RHEL 5 scenario. However, after rebooting to clear any previous cache effects, as soon as I started hitting the NFS imported data with all three clients and launched 5 NCBI Blast jobs on each of them (3x5=15 jobs) simultaneously, things started looking better on RHEL 6. This tells me exactly what the RHEL 6 manual states: FS-Cache and NFS v4 are not performance winners all the time over simple NFS v3, if the client-server traffic is low. However, when you start loading the interconnect, FS-Cache is a good compromise between client reads and network overhead (all these context switches caused by your card).

I am re-looking into this test case, as I want to make sure that I get things right before I post numbers and I will update the blog when I am certain.

That's it for this week. The next article will look into virtualization (KVM) performance, the other big area where RHEL 6 is supposed to give us great benefits.

I think the world of media has been taken hostage by an attention seeker...or maybe the media empires have decided to use an attention seeker to spice up their news. The second scenario seems more probable, even if it seems to be a cynical view.

I am NOT against Wikileaks. I find it useful as a third resource of evidence-based news records, which can help to cross-reference bits of information, when in doubt about various news bits coming from other sources. Despite the fact that sometimes I (and many others) question the source ethics and the authenticity of its published records. This seems to be a general problem with the Internet. It contains a lot of information, but not all of it is credible or useful. The recent diplomatic cable leaks may serve as evidence of various foreign policy misconducts, but if I raise the question "Does a reasonably educated and well informed person need Wikileaks to know the deal behind the US foreign policy, to infer the ties between China and North Korea?", the answer I will expect is a definitely negative. Thus, I question the noise made behind the Wikileaks "revelations".

In fact, I have financially supported Wikileaks (before their accounts where shut down). What I am really against is the behaviour of its founder, for various reasons.

Julian Assange has a background of ethical hacking. There is no universal definition of the term "ethical hacker". I tend to think of it as a reference to a person with advanced technology skills, that puts them into good use to reveal the truth or inform people about potentially harmful situations, seeking no financial gain or other rewards from any affected parties. In the information security world, we have the classical argument of software vulnerability disclosures. Some people argue that all vulnerabilities should be made public, whereas others disagree and are of the view that vulnerabilities should be disclosed only to a limited number of parties, on a strict need-to-know-to-fix basis. Personally, I support the second view, and I never disclosed software vulnerabilities in public.

If I draw a parallel line to the software vulnerability disclosure issue, it would run along the Wikileaks disclosure of vital US sites around the world. I strongly dislike this action, even if I am not a US citizen (or US Government employee). The reason is simple and it has nothing to do with the breaching of any National Security policy. After all, if someone is really determined to do something nasty, surely they will find the resources to do harm without the Wikileaks disclosures. However, any reasonable person understands that revealing strategic infrastructure locations (some of which are not only US based, but they serve collectively many nations) is an act that adds very little to the truth. It is simply a reckless action, bound to also draw the attention of less serious folk with malicious intent.

An equally noteworthy issue with the Wikileaks case is that of the Denial of Service (DoS) attacks they had on their domain. I am not sure whether the slowness experienced on their domain was due to persistent DoS attacks or simply by the strong demand (probably by both) in anticipation of the forthcoming document leaks, but this is a strong lesson in distributing important information in a scalable and secure way. This seems to be the job of Peer to Peer (P2P) protocols and not a number of static HTTP/FTP servers mirrored around the world, which is the usual approach. Torrents distributing the content were of course active from day 1.

My final comment concerns the good old face value of the information origin. I will use an example that comes from the Linux/Unix sysadmin world. Every security conscious sysadmin (and user) that uses third party binary package repositories makes sure to validate them via either a secure hash algorithm based key (MD5, SHA) or public repository key prior installing them into computer systems. These mechanisms make it more difficult for someone to maliciously alter the contents of the binary package and make you install something nasty in your computer. However, they are not a panacea. We have had cases where world famous open source packages have been compromised. Nevertheless, this is a rare event, and each time we download a Linux kernel, an Apache binary or our latest IDE from our Linux distribution, we trust that the keys have not been compromised. This trust is there because we know that our favourite Linux distribution has capable folk to look after security issues, so we do use the good old face value rule.

Wikileaks has appeared so far to be a human-centric entity around the face of Julian Assange. Hence, it would be fair to say that Julian represents to the world the face value of Wikileaks, even if there are probably dozens of people behind the scenes that work to make Wikileaks tick.

It is also understandable that people that reveal the truth are also the subject of massive attacks at every level. Mr Assange had been hiding for quite a long time. I find that a bit odd. If Bob Woodward and Carl Bernstein managed to stay alive by revealing one of the largest scandals of the US political history during the horrible Nixon era, I am sure Assange could find ways not to hide. In the same way, I am sure that the lack of transparency and instant communication during the Nixon times could make it easier for someone to attack journalists then. And they did, but somehow, the journalists and the papers stood up to the challenge. No hiding was necessary.

In the same way, if Assange is not willing to understand that he has to face the Swedish prosecutors and clear his name, he will never gain the face value he needs to be trusted. Sweden is not known to be a corrupted state, so if the "rape" and "sexual misconduct" allegations are constructed to halt him down and he cares about the truth, he should raise to the challenge and pass the public face of Wikileaks to someone else. Sooner or later, he will face the facts.

Thoughts of a technocrat as letter sequences

Search This Blog

Sunday, December 12, 2010

RHEL 6: Part III: First impressions from a sysadmin's point of view

Tuesday, December 7, 2010

The Julian Assange theater from an IT security perspective