Path

7x / documentation / exponential / high performance / linux platform


Caution: This documentation is for Exponential, from version 3.x to 6.x.
For 5.x documentation covering Platform see Exponential Platform Documentation, for difference between legacy and Platform see 5.x Architecture overview.

Linux Platform

Network

  •  tcp_wmem

The configuration file is available in

/proc/sys/net/ipv4/tcp_wmem

The ideal value is

(maxcontentsize * maxclients) / pagesize.

This makes it possible to increase the TCP receive window size above its standard capacity. But if the server has low memory, do not change anything as you would get strange behaviour.

Reference: http://en.wikipedia.org/wiki/TCP_window_scale_option

FileSystem

  •  noatime

You can use the noatime option for the /kernel and /lib directories. By using this it should be possible to save disk I/O.

Apache

Installation

If you plan to compile your own Apache you might want to use the ICC compiler which gives an overall 10% performance boost without much effort only by using it.

More informations about ICC can be found here: http://software.intel.com/en-us/intel-compilers/

You should also think about using specific CFLAGS for your CPU. There is an interesting list of Intel CFLAGS available on the Gentoo Wiki: http://en.gentoo-wiki.com/wiki/Safe_Cflags/Intel

At the minimum you should consider using the following CFLAGS

export CFLAGS="-O3 -DREWRITELOG_DISABLED"

The -DREWRITELOG_DISABLED disables any custom log for mod_rewrite.

Please refer to mod_rewrite.c for more informations about this: http://svn.apache.org/repos/asf/httpd/httpd/trunk/modules/mappers/mod_rewrite.c

Compile modules as shared objects and compile only the one you need. If you are not sure of what you can do, you can use the following option for ./configure:

--enable-mods-shared=most

If you know exactly which modules you want, then you can use the following option:

--enable-mods-shared=module1 module2 moduleN

If possible, do not use mod_negotiation , it is not useful for Exponential: http://httpd.apache.org/docs/2.2/mod/mod_negotiation.html

General configuration

If you compiled more modules than necessary, and you can still not load them: The only thing you have to do, is to comment any unwanted "LoadModule xxxx yyyy" lines.
 You will need to restart Apache after that.

  •  KeepAlive

Reference: http://httpd.apache.org/docs/2.2/mod/core.html#keepalive

KeepAlive should be set to off as it is does not help a lot with dynamic contents.

Furthermore if there are many KeepAlives you will generate some overhead on MySQL because one connection to Apache generally means one connection to MySQL.

So the best advice that can be given here, is to disable it by default and enable it only if it is absolutely necessary. The only use case for Exponential is a static version of an Exponential instance, i.e a site made static by makestaticcache.php

You might also want to give LingerD a try http://www.iagora.com/about/software/lingerd/

But please note that LingerD is only available for Apache 1.3 and is not compatible with Apache 2.

  •  MaxClient

Reference: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxclients

This is the maximum number of simultaneous requests Apache should handle. Here is a method to calculate a good value:

MaxClients = (RAM - size_all_other_processes)/(size_apache_process)

You can find out which size an Apache process is, by running the following command:

ps -ylC apache2 --sort:rss

The value you get is in Kb so do not forget to multiply it to get the value in Mb.

Using "top" is fine as well, you can try "top -u UID", for example on Debian, "top -u 33" (or "top -U www-data").
 Use "free -m" to get the amount of free RAM available and take the "-/+ buffers/cache" line. You can also use "vmstat 2 5" to get more informations about running processes, page-ins and page-outs (swap/si, swap/so), and IO (blocks received/sent)

In case you get a result below 20 for "MaxClients", you have a very serious performance issue and you should consider fixing all bad templates first. Note that ignoring the problem and using Varnish to workaround it, is not the right way to fix the issue.

  •  MaxSpareServers

Reference: http://httpd.apache.org/docs/2.2/mod/prefork.html#maxspareservers

Keep the number high so Apache can handle unexpected traffic spikes or expected traffic spikes (for example between 8-9am, 12-1pm, 7-8pm). Start with using "MaxSpareServer=20".

  •  StartServer

Reference:  http://httpd.apache.org/docs/2.2/mod/mpm_common.html#startservers

Set it to the expected average number of requests. Start with using "StartServer=10".

  •  AllowOverride

Reference: http://httpd.apache.org/docs/2.2/mod/core.html#allowoverride

This directive should be set to "Off". This will save a lot of disk I/O and will give you a nice performance improvement. This basically means that any custom configuration for a specific site must be done in its VirtualHost and never in a .htaccess file.

  •  SendBufferSize

Reference: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#sendbuffersize

This is related to the network configuration described earlier in this document. You should consider tweaking this value if you get an important latency, around 100ms or more.

  •  Options

Reference: http://httpd.apache.org/docs/2.2/mod/core.html#options

Use "Options FollowSymLinks" and never "Options SymLinksIfOwnerMatch". This will save a lot of processing time for checking user/group etc.

  •  Timeout

Reference: http://httpd.apache.org/docs/2.2/mod/core.html#timeout

Set to "Timeout 10" and ditch long waiting connections.

  •  MaxRequestsPerChild

Reference: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#maxrequestsperchild

You can start with a value of 5000. If you notice issues, in particularly memory leaks, consider reducing this number.

  •  HostnameLookups

Reference: http://httpd.apache.org/docs/2.2/mod/core.html#hostnamelookups

Make sure it is always set to "Off", if set to "On" it will consume a lot of network traffic and you will get a very poor performances.

Disable any input/output filter if possible. The list of these filters is available here (take mod_*.c): http://svn.apache.org/repos/asf/httpd/httpd/trunk/modules/filters/

Module specific configuration

  •  mod_expires

You can use the following configuration:

<IfModule mod_expires.c>
ExpiresActive On
# Images and javascript files from the desgin are not meant to change often
<LocationMatch /design/[^/]+/(images|javascript|stylesheets)/>
ExpiresDefault "access plus 1 week"
</LocationMatch>
# Images and javascript files from the extensions are not meant to change often
<LocationMatch /extension/[^/]+/design/[^/]+/(yui-assets|images|javascript|stylesheets)/>
ExpiresDefault "access plus 1 week"
</LocationMatch>
# Uploaded images and image variations may be meant to change often
<LocationMatch /var/<siteaccess>/storage/(images|images-versioned)/>
ExpiresDefault "access plus 1 week"
</locationMatch>
</IfModule>

You can also use "ExpiresByType", but using this directive (even though it is easier to read) will force "mod_expire" to act as an output filter and will not be as efficient as the "ExpiresDefault" configuration directive.

  •  mod_rewrite

Make sure "RewriteLog" and "RewriteLogLevel" are disabled.

RewriteLog: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritelog

RewriteLogLevel: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteloglevel

If Apache has been compiled with the following CFLAG "-DREWRITELOG_DISABLED", you do not have to think about this.

  •  mod_status

Disable it if possible or at least use "ExtendedStatus Off".

  •  mod_dir

Using "DirectoryIndex index.php" is enough.

Extra stuff

Logging in a tmpfs

You can consider mounting /var/log/apache in a tmpfs:

mount --bind -ttmpfs /var/log/apache /var/log/apache

But please note than in some situations using TMPFS may have the opposite effect and make things slower than using a non TMPFS partition.You must benchmark both solution before making any decision.

MySQL

  •  Note

In order to get the best performances with MySQL, running it on a 64 bit OS is recommended.

Installation

If you consider compiling MySQL by yourself, then you might find the the Google's tcmalloc malloc library interesting. It gives good results on multi threaded applications.
 If you can not compile MySQL you can use GPerftool anyway by using the following line:

LD_PRELOAD="/usr/lib/libtcmalloc.so"

Everything is explained at this URL: http://goog-perftools.sourceforge.net/doc/tcmalloc.html

General configuration

  •  key_buffer

Decent start value: 500M
 To find a suitable value for the key buffer, investigate the status variables "key_read_requests" and "key_reads".
 The "key_read_requests" is the total number of key requests served from the cache.
 The "key_reads" shows the number of times MySQL had to access the filesystem to fetch the keys. The lower the number of "key_reads" the better.
 The more memory you allocate to the key buffer, the more requests will be served from the cache. There will always be some keys that need to be read from disk (for example when data changes), so the value will never be zero.
 By comparing the two values you see the hit ratio of your key buffer. The key_read_requests should be much larger than the key_reads. 99% cached requests is a good number to aim for in a read-intensive environment.

  •  table_cache

Decent start value: 4000
 The rule of thumb is that you should multiply the maximum number of connections (described below) by the maximum number of tables used in joins. For example, if the maximum number of connections is set to 400, the table cache should be at least 400 * 10.

  •  max_connections

The rule of thumb here is "max_connections = MaxClient" value in Apache.

  •  sort_buffer_size

The sort buffer is per connection, so you must multiply the size of the sort buffer by the maximum number of connections to predict the server memory requirements.
 So if you use a 3MB sort buffer with 400 max connections, you can use a total of 1.2GB of memory.

  •  query_cache_type=1

Enables the query cache

  •  query_cache_limit=1M

Do not cache result sets larger than 1Mb

  •  query_cache_size

Use SHOW STATUS LIKE "qcache%"; to see how your configuration changes affects MySQL.

InnoDB specific

  •  Note

There is an interesting article on InnoDB memory usage here: http://www.mysqlperformanceblog.com/2006/05/30/innodb-memory-usage/

  •  innodb_buffer_pool_size

If the server has 8Gb RAM, give 6Gb to MySQL pool size, if it has 10 Gb, gives 8Gb, etc, etc

  •  innodb_flush_method

Use "O_DIRECT" to avoid double buffering (which does not make much sense with MySQL) and thus reduce disk pressure. To disable swap you can also try the following, but you might get nasty side effects:

echo 0 > /proc/sys/vm/swappiness

 There is a nice article about this here: http://feedblog.org/2007/09/29/using-o_direct-on-linux-and-innodb-to-fix-swap-insanity/

  •  innodb_log_file_size

Using 256M is a good start and is generally the best configuration.

  •  innodb_log_buffer_size

Using 4M is enough for Exponential needs, even if you use the Exponential cluster which is supposed to store a lot of BLOBs. Since each file is split in 64Kb packets it does not change anything.

  •  innodb_flush_log_at_trx_commit

Use "2" so you avoid committing all the time. However if MySQL crashes, you might loose the transactions made in the last 2 seconds (But are you working on such a write intensive site?)

  •  innodb_thread_concurrency

Using "8" is a decent start.

  •  innodb_file_per_table

Set it to "On" at the very beginning of the server installation. If you want to do it once the site is live, you will have to export data, change the configuration value and re-import everything which will take a lot of time.

  •  transaction-isolation

Use "READ-COMMITTED" if you use Exponential cluster with the eZDB file handler. This might avoid a few problems and it can certainly not make things worse.

PHP

Installation

If you want to compile PHP you can use ICC for Apache. The advice about CFLAGS is the same, except that "-DREWRITELOG_DISABLED" will not be useful to you in this case. Feel free to compile PHP as a dso (i.e with APXS) since it seems that compiling it statically does not help that much.

  •  Debian

If you want to use a binary package on Debian, the dotdeb mirrors are recommended: http://www.dotdeb.org/mirrors/

  •  Red Hat Enterprise Linux

Do not use their binaries, they are too old. Compiling PHP by yourself is the only acceptable option on this platform.

Configuration

You can use the following php.ini values:

safe_mode=Off
register_globals=Off
magic_quotes_gpc=Off
magic_quotes_runtime=Off
allow_call_time_pass_reference=Off
expose_php=Off
register_argc_argv=Off
; For non CLI SAPI
always_populate_raw_post_data=Off
; If eZ Components are bundled with Exponential
include_path="."
date.timezone = "<Continent>/<Town>"

APC

A few useful links on APC information:
 http://php.net/apc

You can use the following configuration in php.ini:

; Check with apc.php how much it is full.
; If it's > 90% full, increase the size.
apc.shm_size = 512
apc.shm_segments = 1
apc.file_update_protection = 2
apc.max_file_size = 1M

Varnish

Installation

Varnish compiles fine, but you can still use the same CFLAGS (O3, march...) as described above. Using a binary package is also an option.

Configuration

Since configuring Varnish is not that easy. You will find all the needed information is the "Using Varnish with Exponential" paragraph.

Using varnish with Exponential

HTTP Caching tutorial for the impatient

  •  Cache-Control

"The Cache-Control general-header field is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain."[1]

This header will instruct the browser (as well as any other stakeholder in the request/response chain) to cache the result of the response for a given TTL. This means that after the first HTTP request for a specific resource, the browser will send aIf-Modified-Since header for validation with the ReverseProxy / WebServer. If the cache's ttl is still valid then the local copy is used, if it is not then the HTTP transaction continues and the local copy will be updated.

  •  Pragma

"The Pragma general-header field is used to include implementation specific directives that might apply to any recipient along the request/response chain." [2]

In practice, this field is useless since HTTP 1.1 is out, but it is a good practice to unset it by giving it no value.

  •  Expires

"The Expires entity-header field gives the date/time after which the response is consideredstale."[3]

The Expires header is quite useful but somewhat more "radical" then the Cache-Control. If the browser requests a resource and gets an Expires header returned, it will keep a local copy during TTL returned by Expires and will no longer ask for this resource during this period. For example, when you ask for an image foo.jpg, the WebServer returns the image plus Expires: one year in the future. Next time you will request this image, it will come from the local copy only. This makes it possible to save quite a lot of HTTP request.

  •  Etag

"The Etag response-header field provides the current value of the entity tag for the requested variant." [4]

The Etag header is something that is under-used in Exponential, as this header is sort of a unique identifier for each file returned by a HTTP request. It could for example be really helpful for content re-validation and to update the browser's cache.

Useful links

Caching in HTTP: http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13
Complete HTTP workflow by Alan Dean

Making Exponential HTTP friendly

  •  site.ini

The only configuration group we have to work with is HTTPHeaderSettings. Basically you have to take all the standard configuration in settings/site.ini and copy it in your siteaccess. Copying everything is required, because if you just override the current configuration values, you will always get the unwanted "no-cache" directive.

As a configuration example:

[HTTPHeaderSettings]
CustomHeader=enabled
OnlyForAnonymous=disabled
HeaderList[]
HeaderList[]=Cache-Control
HeaderList[]=Pragma
Cache-Control[]
Cache-Control[/]=public, must-revalidate, max-age=200
Cache-Control[/Varnish-tests/Article-which-should-be-cached]=public, must-revalidate, max-age=200
Pragma[]
Pragma[/]=
Pragma[/Varnish-tests/Article-which-should-be-cached]=

Cache-Control always overrides Expires, however experience has shown that Varnish does not really like the presence of both Cache-Control and Expires, so it is highly recommend no to use Expires here.

Making Apache HTTP more caching friendly with binary files

  •  mod_expires

mod_expires is quite powerful, so please read the following documentation which is quite short and clear: http://httpd.apache.org/docs/2.2/mod/mod_expires.html

The configuration for mod_expires is available in the mod_expires_configuration. This configuration could be somewhat reduced and re-factored, but it is more readable like this.

With this configuration browsers are forced to cache any binary file which comes from the Exponential' design directories:

  •  design/*
  •  extension/*/design/*

The last part forces browsers to cache content images for a week. If an image is updated, it is always done in a new version so the filename will change and there is no risk of getting an un-updated image here. Once you are OK with the configuration, do not forget to reload Apache's configuration with:

sudo /etc/ini.d/apache2 reload

A quick "before/after" to show that it is helpful. On the RBA corporate project we have the following numbers:

  •  Without mod_expires configured: 106 HTTP requests
  •  With mod_expires configured: 5 HTTP requests

To add/change the configuration:http://varnish.projects.linpro.no/wiki/FAQ#ShouldIusepipeorpassinmyVCLcodeWhatisthedifference

  •  Note

In order to get everything working Apache listens on port 8080 and Varnish on port 80.

  •  Note

If you want to know what the difference is between pipe; and pass; there is an interesting page which explains it: http://varnish.projects.linpro.no/wiki/FAQ#ShouldIusepipeorpassinmyVCLcodeWhatisthedifference

Tested with Varnish 2.0.2

backend default
{
.host = "127.0.0.1";
.port = "8080";
}
sub vcl_recv
{
set req.backend = default;
/***************************************************/
/* Never cache pages with POST parameters */
/* Typical use case : /user/login or /usr/register */
/* You do not want to get those pages cached */
/***************************************************/
if (req.request == "POST")
{
pass;
}
/******************************************************/
/* do not use Varnish when the user is authenticated */
/* This configuration requires the ezvlogin extension */
/* http://projects.ez.no/ezvlogin */
/* With the default configuration for LoginSettings */
/* in vlogin.ini.append.php */
/******************************************************/
if(req.http.Cookie ~ "^is_logged_in=true.*$")
{
pass;
}
/********************************************************/
/* Returns the item from the cache for any GET request */
/* All the configuration done in HTTPHeaderSettings in */
/* Exponential as well as Apache's mod_expires will come */
/* into play here */
/********************************************************/
if (req.request == "GET")
{
lookup;
}
/*********************************************************/
/* We want Varnish to return binary files from its cache */
/* all the configuration we used for mod_expires is used */
/* in this case */
/*********************************************************/
if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico)$")
{
lookup;
}
/************************************************************************************************************/
/* Nirvana /* Never ever remove this line /* cf: /* http://projects.ez.no/ezvlogin/forum/general/working_with_custom_headers/re_working_with_custom_headers5 /* for more informations /************************************************************************************************************/
remove req.http.cookie;
}
sub vcl_fetch
{
/**************************************************************/
/* I remove any cookies from the request as Varnish is */
/* really picky with cookies, and never cache the page */
/* when they are present which is the safest way to prevent */
/* issues but which does not help in our case */
/** **/
/** **/
/* Should Works with Exponential 3.*, 4.0.* with eZ Vlogin */
/* if (obj.http.Set-Cookie ~ "^eZSESSID.*=.*$") */
/** **/
/** **/
/* Works with Exponential 4.1 and + */
/* if (obj.http.Set-Cookie ~ "^is_logged_in=deleted.*$") */
/**************************************************************/
if (obj.http.Set-Cookie ~ "^is_logged_in=deleted.*$")
{
remove obj.http.Set-Cookie;
}
}

You might be interested in the following thread: http://projects.ez.no/ezvlogin/forum/general/working_with_custom_headers

Testing your work

Here comes the hard part, testing your site with Varnish is really difficult. It is recommend to use the following helpful tools.

Using Firefox

  •  LiveHTTPHeader

This can be downloaded at: https://addons.mozilla.org/en-US/firefox/addon/3829

It is possible to filter what you do not want to see reported in the headers, I recommend to use the following configuration:

.gif$|.jpg$|.ico$|.css$|.js$|.png$

Next you will have the possibility to exclude URLs with a regex in the "Configuration" tab. Use the one given above.

  •  Tips when using CTRL + R and Shift + CTRL + R

If you use Firefox for testing, you should know that refreshing a page sends different HTTPheaders in the request so you may have a different behaviour. Here is a summary of what is sent:

CTRL+R : <i>Cache-Control: max-age=0
</i>Shift+CTRL+R : <i>Pragma: no-cache </i>and <i>Cache-Control: no-cache</i>

This is really important as it instructs Varnish to send an appropriate request or not.

CURL

You can use curl from the command line if you want, something like this:

curl -I -X HEAD http://site.com

or

curl -I -X GET http://site.com

But you must know that, it will not accept any client side caching used as shown above. It is the equivalent to using Shift + CTRL + R in Firefox. If Varnish is not prepared to lookup files for GET request, you will not get the expected result. You can add the following line to fix this:

if (req.request == "HEAD")
{
lookup;
}

Apache Benchmark

Apache Benchmark will behave like multiple CURLs sending GET request. Which means it will not accept client cache and will not send If-Modified-Since header either. However this is ideal to brute force your server. Using "ab" is a good idea, because it simulates a ton of users on the site who do not have anything in their cache. Varnish take almost all the requests here.

If you like to get results with GNUPlot and ab, there is a interesting utility available here: http://sourceforge.net/projects/abgraph

Robots.txt

Configuration

Reference: http://en.wikipedia.org/wiki/Robots.txt

In order to avoid Google or other crawlers to hammer your site during traffic spikes you might want to tell them to crawl the site only during the "stressless" hours, for example every night. This can be done like this:

User-agent: *
Visit-time:0600-0845 # only visit between 06:00 and 08:45 UTC (GMT)

Surviving to site switches

It is quite common to work on a migration project. Let's say your customer used CMS XXX and decided to use Exponential, so you do all you have to do and code all you have to code. And then comes the crucial moment when you switch the old application to the new Exponential one. This generally means that old URLs will not be valid anymore in Exponential and it will generate a lot of kernel XX errors to inform visitors the page foo/bar is not available.

In an application switch context, you can not survive this if you rely only on Exponential. This amount of "old" traffic (i.e coming from old and obsolete URLs) will be so important that it will just kill your machines. In order to solve this issue Nicolas Pastorino and Jérôme Renard wrote the eZURLMapper extension: http://projects.ez.no/ezurlmapper

With a simple INI file you will be able to generate a set of Apache RewriteRules to use during your switch so any URL redirection is done by Apache. This will make Exponential beter and more efficient.

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
  [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.32
  [3] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.21
  [4] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19