I’ve just added a feature that makes it easier for those who want to receive files via Sprend. By adding a parameter to the link url the email address will be filled in already. Try out the following link:
I deployed build 323 this morning. There are only a couple of visible changes, and among those are two simple bug fixes in the email handling (#1, #2). But under the hood quite a few things have changed.
I’ve been refactoring like hell for some days now. The center of my attention has been the core of the system, the code for uploading and downloading. I feel I’ve secured a sound platform for adding some good stuff in the future.
The site was down for almost twenty hours yesterday. Let’s start with the pathetic excuses: Our ISP had a broken cable that took a long time to repair.
Obviously one needs a backup ISP for this kind of situation. I recently asked a question about ISP failover on serverfault.com. There doesn’t seem to exist a foolproof and cheap way to do it. I’ll have to re-read the answers and try to learn some more. But I guess any kind of ISP failover is better than none.
Hello. I’m the sysadmin at Sprend. In this post I’ll expand on Arnes previous post (read that one first), and dive into technical details. So consider yourself warned, this is gonna be fairly nerdy stuff (Actually the Wikipedia geek article has a better explanation on the subject of nerds, but when I was a teenager me and my friends here in Sweden always called ourselves nerds, so that’s the expression I’m sticking to)
Regarding the Java threads eating 99% CPU: This might in part have been caused by us running Linuxkernel 2.6.18 or Tomcat 5.5.20, but most likely the reason was Java 5.0.10 (it’s hard to know since we were too lazy to do any serious debugging or profiling). Also, the Java threads regularly allocated more memory than they were assigned, sometimes to the point of starving the machine of memory, at which point the kernel (or rather its dreaded OOM Killer) always made the unfortunate choice of killing MySQL instead of something less important, in order to free up memory. (I didn’t know about the oom_adj setting at the time. Not that it would have helped considering that Java and MySQL were the only things consuming any significant amount of memory on the server, and both of them had to stay alive)
Aside from CPU usage being reduced on the new server, memory is not “leaking” anymore and Java & MySQL are using fewer threads. That’s partly due to the faster CPU (dual core Athlon 64 5200+) but also because of more efficient software versions: kernel 2.6.3x, Java 6.x, Tomcat 6.x and MySQL 5.x When things were at their worst on the old server, Java grew towards 200 threads and 1 GiB of RAM (the server had 1 GiB of RAM, but no swap because that would have hurt our bad performance even more). MySQL 4.1.22 behaved more gracefully and stayed below 50 threads and 100 MiB of RAM. On the new server Java stays below 300 MiB of RAM and 120 threads. MySQL stays below 50 MiB of RAM and 25 threads. Java now seldom occupy more than 100% of one CPU core (often much less than that) and MySQL consumes virtually zero CPU (and that’s how it should be). We had some other minor problems with Java and MySQL as well that disappeared on the new server. As a consequence Java and MySQL is roughly an order of magnitude more stable now, which is quite nice for me since I don’t need to babysit them anymore.
Regarding moving the db from the USB flash drives to the hard drives: The reason was that the USB drives are slow when MySQL is doing something that causes heavy and sustained disk IO. Which is not a surprise considering that USB flash drives typically have IO throughput of merely 5-15 MiB/s.
Also, I separated the system disk (which holds the operating system) from the data disk (which holds the files being uploaded and downloaded to/from sprend.com). The reason to separate the system disk from the data disk is performance – concurrent reads and writes in particular. And why is that necessary? Well, our internet connection is a dedicated 100/100 Mbit/s full duplexethernet line (which is pretty damn good for a “free of charge web service” provided by a couple of unknown guys). This means what we can push a maximum of 25 MiB/s through the line. That’s nothing for our SATA-300 hard drives which I’ve measured to push approximately 100 MiB/s per drive at peak performance. But, and this is the crux, at peak hours (noon, afternoon and evenings) we typically have something like 30 to 40 simultaneous file transfers in progress. And while the aggregate bandwidth of those transfers seldom go beyond 15 MiB/s they do cause simultaneous reads and writes of 30 to 40 different files on the hard drive. This means that the magnetic head inside the hard drive is jumping around like crazy the whole time while accessing the different data blocks belonging to all those files (no matter what you do, the data blocks are gonna get spread out over the platter(s) inside the hard drive over time – especially with our high rate of file creations and deletions – and that’s why the magnetic head has to jump around so much). That in turn translates into increased seek times (and increased wear & tear) on the hard drive. On the old server we had combined system and data disk, a PATA/100disk controller and the XFS file system on the hard drives. That caused the old hard drives to become seriously overworked and slow at peak traffic hours. Now, there’s absolutely nothing wrong with XFS. I’ve done some serious performance comparisons of the Linux journalling file systemsext3, reiser3, JFS and XFS. All on the same Linux installation on non-enterprise hardware, and XFS was the clear winner. But the newer generation ext4 (with its extents, pre-allocation, delayed allocation and multiblock allocator) in conjunction with the faster SATA-300 disk subsystem and separated system & data disks proved to be highly effective. The load on the hard drives can’t even be noticed any more during peak traffic hours.
Of course, ZFS is still the ultimate pr0n when it comes to file systems. Unfortunately, the CDDL license of ZFS and the GPL license of the Linux kernel are incompatible, preventing ZFS from being incorporated into the Linux kernel. But the good news is that there is an all new and shiny Linux native file system in full development right now, which is basically an improved clone of ZFS. It’s called Btrfs (sponsored primarily by Oracle) and when it’s declared stable we’ll switch over to it and get amazing kickass features!
Oh, and the reason that we used USB flash drives is that they’re cheap, noiseless, cold, power efficient and small in physical size (the server has room for them, but not for 2 extra hard drives). All of this except being cheap is also true for SSD drives, which is why we went with USB flash drives instead. SSD drives have blazing performance, but they’re just too expensive at this point in time for this project. Also, SSD still share a serious technical problem with USB flash – after something like 50-100K of writes, individual memory cells will start to fail (even when utilizing wear levelling). But that, and write performance, won’t be a problem in the next generation of SSD drives.
Other points of interest regarding the new server:
A little over a month ago we retired our old server that’s been serving us since we started out with Sprend. I guess it’s obvious we needed an upgrade to increase stability and scalability. The 2.6 GHz P4 of the old Dell Inspiron had been running at 100% continuously for quite some time. The Java process stole 99% of the CPU which in turn made it difficult for MySQL to do its job.
The new machine is working well and the CPU usage is way down. The percentage of failed file transfers is also looking better. Before the switch, 25% of the uploads failed and 11% of the downloads. With the new server the numbers are 17% failed uploads and 9% failed downloads. Better but still not very good.
Both the system drive and the data drive is mirrored with Linux software RAID.
An interesting solution of the new machines is that the system disk is a USB flash drive (actually two drives because of mirroring). It took a while for Sysadm to convince me to try that. It seems to work but we moved the MySQL db from the USB drive to the hard drive for better performance.
The server also has got newer versions of Java (6), Tomcat (6.0) and MySQL (5.0). The next step, hardware wise, is to figure out the simplest way to plug in the second server as well. Adding another dimension always adds quite a bit of complexity.
I also fixed a really naughty SQL bug that make MySQL send 50 000 rows instead of just one (!). I apologize for breaking every ongoing upload and download which happens when a new version is deployed. This fix may actually stabilize our system a bit, which has been rather shaky for some time.