So this blog was quite delayed in the writing like months, but better late then never. I was tasked awhile back with creating a caching cluster serving cached API requests and during my research I selected Apache Mod_Cache and below were some things I learned tuning it along the way.
Server Specifications:
Dell 1950’s Dual Core Intel Xeon 2.0 GHz
8 Gigs of RAM
Running 64bit Centos 5
Server Application Type:
API Caching Server
Basic Installation:
Install Apache 2.2.8 gunzip -c httpd-2.2.8.tar.gz | tar -xvf - cd httpd-2.2.8 ./configure --disable-status --enable-status=shared --enable-rewrite --enable-so --enable-proxy --enable-cache --enable-disk-cache make make install |
Installed PHP
gunzip -c php-5.2.8.tar.gz | tar -xvf - cd php-5.2.8 ./configure --with-apxs2=/usr/local/apache2/bin/apxs --enable-module=so --blah --blah --blah make make install |
Below is the mod_cache portion of my apache vhost:
<IfModule mod_cache.c> <IfModule mod_disk_cache.c> CacheDefaultExpire 3600 CacheEnable disk / CacheRoot "/opt/apicache/" CacheDirLevels 2 CacheDirLength 1 CacheMaxFileSize 1000000 CacheMinFileSize 1 CacheIgnoreCacheControl On CacheIgnoreNoLastMod On CacheIgnoreQueryString Off CacheIgnoreHeaders None CacheLastModifiedFactor 0.1 CacheDefaultExpire 3600 CacheMaxExpire 86400 CacheStoreNoStore On CacheStorePrivate On </IfModule> </IfModule> |
The first performance decision I made was to use –enable-disk-cache after doing some research I found that contrary to what you would think, disk cache is faster then memory cache when it comes to Apache mod_cache and OS interaction. The reason why is when you use mod_mem_cache the process of reading a file into memory, basically copying its data into RAM and thus kernel buffer in order to deliver it is not optimal. When using mod_disk_cache Linux uses the sendfile API, which does not require the server to read the file before delivering it. The server identifies the file to deliver and the destination via the API, the OS then reads and delivers the file, so no read API or memory for the payload is required, and the OS can just use the file system cache. So the kernel acts as a buffer, increasing cache speed.
The second performance issues I saw was when I set my CacheDirLevels and CacheDirLength to high, load was skyrocketing. I found that CacheDirLevels 2 and CacheDirLength 1 was the optimal setting for about 50 Gigs of cache, lowering the amount of traversing needed on reads and writes.
The third performance issue I saw was when I met Brian Moon at the Velocity Conference in 2008 a very bright guy! I asked Brian how I could optimize the filesystem for Apache mod_disk_cache he instructed me to make some fstab changes only if I was using EXT3. Specifically to reflect the one entry below for my cache partition:
/dev/md1 /opt ext3 defaults,noatime,nodiratime,data=writeback 1 2 |
Setting the noatime effects removing a write for every read. Typically when a file is read the system updates the inode for the file with an access time so that the last access time is recorded, which basically entails a write to the file system. Unless you are running some sort of mirror you probably do not need the access time written.
Setting the nodiratime is the same as the noatime but for directories.
*Note 08/09/2009 it has been pointed out to me that noatime is a superset of nodiratime which is a subset. So if you use noatime you don’t need the entry for nodiratime
Setting data=writeback causes the non preserving of data ordering, the data to be written into the file system after its metadata has been committed to the journal which offers a higher throughput. Warning this setting could allow recently modified files to become corrupted in the event of an unexpected reboot or system crash.
If you look at the below graph you will see the sharp gain from doing the outlined tuning.
Happy tuning, fun stuff!