spacer
 home  products examples manual  faq support forum  contact news  login store spacer

Last revised 30 Oct 2009. The benchmarks may be out of date, but the general advice is still good. I have revised the caching sections as 8 years of experience has shown it is the most critical part of high performance: I have added discussions of squid and memcache. If you want to see what has changed, search for this date in this article.

If you like this article, visit my blog, PHP Everywhere for related articles.

A HOWTO on Optimizing PHP

PHP is a very fast programming language, but there is more to optimizing PHP than just speed of code execution.

In this chapter, we explain why optimizing PHP involves many factors which are not code related, and why tuning PHP requires an understanding of how PHP performs in relation to all the other subsystems on your server, and then identifying bottlenecks caused by these subsystems and fixing them. We also cover how to tune and optimize your PHP scripts so they run even faster.

Achieving High Performance

When we talk about good performance, we are not talking about how fast your PHP scripts will run. Performance is a set of tradeoffs between speed versus accuracy versus scalability. An example of speed versus accuracy is your scripts might be tuned to run fast with caching, but the data will tend to grow stale and be less accurate. For an example of speed versus scalability you could write a script that runs fast by loading everything into memory, or write a more scalable one that only loads data in chunks so that it does not exhaust application memory (Updated 30 Oct 2009 from speed vs scalability to speed vs accuracy vs scalability).

In the example below, A.php is a sprinter that can run fast, and B.php is a marathon runner than can jog forever at the nearly the same speed. For light loads, A.php is substantially faster, but as the web traffic increases, the performance of B.php only drops a little bit while A.php just runs out of steam.

spacer

Let us take a more realistic example to clarify matters further. Suppose we need to write a PHP script that reads a 250K file and generates a HTML summary of the file. We write 2 scripts that do the same thing: hare.php that reads the whole file into memory at once and processes it in one pass, and tortoise.php that reads the file, one line at time, never keeping more than the longest line in memory. Tortoise.php will be slower as multiple reads are issued, requiring more system calls.

Hare.php requires 0.04 seconds of CPU and 10 Mb RAM and tortoise.php requires 0.06 seconds of CPU and 5 Mb RAM. The server has 100 Mb free actual RAM and its CPU is 99% idle. Assume no memory fragmentation occurs to simplify things.

At 10 concurrent scripts running, hare.php will run out of memory (10 x 10 = 100). At that point, tortoise.php will still have 50 Mb of free memory. The 11th concurrent script to run will bring hare.php to its knees as it starts using virtual memory, slowing it down to maybe half its original speed; each invocation of hare.php now takes 0.08 seconds of CPU time. Meanwhile, tortoise.php will be still be running at its normal 0.06 seconds CPU time.

In the table below, the faster php script for different loads is in bold:

Connections

CPU seconds required to satisfy 1 HTTP request

CPU seconds required to satisfy 10 HTTP requests

CPU seconds required to satisfy 11 HTTP requests

hare.php

0.04

0.40

0.88
(runs out of RAM)

tortoise.php

0.06

0.60

0.66

 

As the above example shows, obtaining good performance is not merely writing fast PHP scripts. High performance PHP requires a good understanding of the underlying hardware, the operating system and supporting software such as the web server and database.

Bottlenecks

The hare and tortoise example has shown us that bottlenecks cause slowdowns. With infinite RAM, hare.php will always be faster than tortoise.php. Unfortunately, the above model is a bit simplistic and there are many other bottlenecks to performance apart from RAM:

(a) Networking

Your network is probably the biggest bottleneck. Let us say you have a 10 Mbit link to the Internet, over which you can pump 1 megabyte of data per second. If each web page is 30k, a mere 33 web pages per second will saturate the line.

More subtle networking bottlenecks include frequent access to slow network services such as DNS, or allocating insufficient memory for networking software.

(b) CPU

If you monitor your CPU load, sending plain HTML pages over a network will not tax your CPU at all because as we mentioned earlier, the bottleneck will be the network. However for the complex dynamic web pages that PHP generates, your CPU speed will normally become the limiting factor. Having a server with multiple processors or having a server farm can alleviate this.

(c) Shared Memory

Shared memory is used for inter-process communication, and to store resources that are shared between multiple processes such as cached data and code. If insufficient shared memory is allocated any attempt to access resources that use shared memory such as database connections or executable code will perform poorly.

(d) File System

Accessing a hard disk can be 50 to 100 times slower than reading data from RAM. File caches using RAM can alleviate this. However low memory conditions will reduce the amount of memory available for the file-system cache, slowing things down. File systems can also become heavily fragmented, slowing down disk accesses. Heavy use of symbolic links on Unix systems can slow down disk accesses too.

Default Linux installs are also notorious for setting hard disk default settings which are tuned for compatibility and not for speed. Use the command hdparm to tune your Linux hard disk settings.

(e) Process Management

On some operating systems such as Windows creating new processes is a slow operation. This means CGI applications that fork a new process on every invocation will run substantially slower on these operating systems. Running PHP in multi-threaded mode should improve response times (note: older versions of PHP are not stable in multi-threaded mode).

Avoid overcrowding your web server with too many unneeded processes. For example, if your server is purely for web serving, avoid running (or even installing) X-Windows on the machine. On Windows, avoid running Microsoft Find Fast (part of Office) and 3-dimensional screen savers that result in 100% CPU utilization.

Some of the programs that you can consider removing include unused networking protocols, mail servers, antivirus scanners, hardware drivers for mice, infrared ports and the like. On Unix, I assume you are accessing your server using SSH. Then you can consider removing:

deamons such as telnetd, inetd, atd, ftpd, lpd, sambad
sendmail for incoming mail
portmap for NFS
xfs, fvwm, xinit, X

You can also disable at startup various programs by modifying the startup files which are usually stored in the /etc/init* or /etc/rc*/init* directory.

Also review your cron jobs to see if you can remove them or reschedule them for off-peak periods.

(f) Connecting to Other Servers

If your web server requires services running on other servers, it is possible that those servers become the bottleneck. The most common example of this is a slow database server that is servicing too many complicated SQL requests from multiple web servers.

When to Start Optimizing?

Some people say that it is better to defer tuning until after the coding is complete. This advice only makes sense if your programming team's coding is of a high quality to begin with, and you already have a good feel of the performance parameters of your application. Otherwise you are exposing yourselves to the risk of having to rewrite substantial portions of your code after testing.

My advice is that before you design a software application, you should do some basic benchmarks on the hardware and software to get a feel for the maximum performance you might be able to achieve. Then as you design and code the application, keep the desired performance parameters in mind, because at every step of the way there will be tradeoffs between performance, availability, security and flexibility.

Also choose good test data. If your database is expected to hold 100,000 records, avoid testing with only a 100 record database – you will regret it. This once happened to one of the programmers in my company; we did not detect the slow code until much later, causing a lot of wasted time as we had to rewrite a lot of code that worked but did not scale.

 

Tuning Your Web Server for PHP

We will cover how to get the best PHP performance for the two most common web servers in use today, Apache 1.3 and IIS. A lot of the advice here is relevant for serving HTML also.

The authors of PHP have stated that there is no performance nor scalability advantage in using Apache 2.0 over Apache 1.3 with PHP, especially in multi-threaded mode. When running Apache 2.0 in pre-forking mode, the following discussion is still relevant (21 Oct 2003).

(a) Apache 1.3/2.0

Apache is available on both Unix and Windows. It is the most popular web server in the world. Apache 1.3 uses a pre-forking model for web serving. When Apache starts up, it creates multiple child processes that handle HTTP requests. The initial parent process acts like a guardian angel, making sure that all the child processes are working properly and coordinating everything. As more HTTP requests come in, more child processes are spawned to process them. As the HTTP requests slow down, the parent will kill the idle child processes, freeing up resources for other processes. The beauty of this scheme is that it makes Apache extremely robust. Even if a child process crashes, the parent and the other child processes are insulated from the crashing child.

The pre-forking model is not as fast as some other possible designs, but to me that it is "much ado about nothing" on a server serving PHP scripts because other bottlenecks will kick in long before Apache performance issues become significant. The robustness and reliability of Apache is more important.

Apache 2.0 offers operation in multi-threaded mode. My benchmarks indicate there is little performance advantage in this mode. Also be warned that many PHP extensions are not compatible (e.g. GD and IMAP). Tested with Apache 2.0.47 (21 Oct 2003).

Apache is configured using the httpd.conf file. The following parameters are particularly important in configuring child processes (updated 30 Oct 2009- in Apache 2, these settings have been moved to conf/extra/httpd-mpm.conf. Make sure you also uncomment the include extra/httpd-mpm.conf in httpd.conf):

Directive

Default

Description

MaxClients

256

The maximum number of child processes to create. The default means that up to 256 HTTP requests can be handled concurrently. Any further connection requests are queued.

StartServers

5

The number of child processes to create on startup.

MinSpareServers

5

The number of idle child processes that should be created. If the number of idle child processes falls to less than this number, 1 child is created initially, then 2 after another second, then 4 after another second, and so forth till 32 children are created per second.

MaxSpareServers

10

If more than this number of child processes are alive, then these extra processes will be terminated.

MaxRequestsPerChild

0

Sets the number of HTTP requests a child can handle before terminating. Setting to 0 means never terminate. Set this to a value to between 100 to 10000 if you suspect memory leaks are occurring, or to free under-utilized resources.

For large sites, values close to the following might be better:

MinSpareServers 32

MaxSpareServers 64

Apache on Windows behaves differently. Instead of using child processes, Apache uses threads. The above parameters are not used. Instead we have one parameter: ThreadsPerChild which defaults to 50. This parameter sets the number of threads that can be spawned by Apache. As there is only one child process in the Windows version, the default setting of 50 means only 50 concurrent HTTP requests can be handled. For web servers experiencing higher traffic, increase this value to between 256 to 1024.

Other useful performance parameters you can change include:

Directive

Default

Description

SendBufferSize

Set to OS default

Determines the size of the output buffer (in bytes) used in TCP/IP connections. This is primarily useful for congested or slow networks when packets need to be buffered; you then set this parameter close to the size of the largest file normally downloaded. One TCP/IP buffer will be created per client connection.

KeepAlive [on|off]

On

In the original HTTP specification, every HTTP request had to establish a separate connection to the server. To reduce the overhead of frequent connects, the keep-alive header was developed. Keep-alives tells the server to reuse the same socket connection for multiple HTTP requests.

If a separate dedicated web server serves all images, you can disable this option. This technique can substantially improve resource utilization.

KeepAliveTimeout

15

The number of seconds to keep the socket connection alive. This time includes the generation of content by the server and acknowledgements by the client. If the client does not respond in time, it must make a new connection.

This value should be kept low as the socket will be idle for extended periods otherwise.

MaxKeepAliveRequests

100

Socket connections will be terminated when the number of requests set by MaxKeepAliveRequests is reached. Keep this to a high value below MaxClients or ThreadsPerChild.

TimeOut

300

Disconnect when idle time exceeds this value. You can set this value lower if your clients have low latencies.

LimitRequestBody

0

Maximum size of a PUT or POST. O means there is no limit.

If you do not require DNS lookups and you are not using the htaccess file to configure Apache settings for individual directories you can set:

# disable DNS lookups: PHP scripts only get the IP address

HostnameLookups off

# disable htaccess checks

<Directory />

AllowOverride none

</Directory>

If you are not worried about the directory security when accessing symbolic links, turn on FollowSymLinks and turn off SymLinksIfOwnerMatch to prevent additional lstat() system calls from being made:

Options FollowSymLinks

#Options SymLinksIfOwnerMatch

(b) IIS Tuning

IIS is a multi-threaded web server available on Windows NT and 2000. From the Internet Services Manager, it is possible to tune the following parameters:

Performance Tuning based on the number of hits per day.

Determines how much memory to preallocate for IIS. (Performance Tab).

Bandwidth throttling

Controls the bandwidth per second allocated per web site. (Performance Tab).

Process throttling

Controls the CPU% available per Web site. (Performance Tab).

Timeout

Default is 900 seconds. Set to a lower value on a Local Area Network. (Web Site Tab)

HTTP Compression

In IIS 5, you can compress dynamic pages, html and images. Can be configured to cache compressed static html and images. By default compression is off.

HTTP compression has to be enabled for the entire physical server. To turn it on open the IIS console, right-click on the server (not any of the subsites, but the server in the left-hand pane), and get Properties. Click on the Service tab, and select "Compress application files" to compress dynamic content, and "Compress static files" to compress static content.

You can also configure the default isolation level of your web site. In the Home Directory tab under Application Protection, you can define your level of isolation. A highly isolated web site will run slower because it is running as a separate process from IIS, while running web site in the IIS process is the fastest but will bring down the server if there are serious bugs in the web site code. Currently I recommend running PHP web sites using CGI, or using ISAPI with Application Protection set to high.

You can also use regedit.exe to modify following IIS 5 registry settings stored at the following location [Updated 30 Oct 2009: Tips for IIS6 and IIS7]:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Inetinfo\Parameters\

MemCacheSize

Sets the amount of memory that IIS will use for its file cache. By default IIS will use 50% of available memory. Increase if IIS is the only application on the server. Value is in megabytes.

MaxCachedFileSize

Determines the maximum size of a file cached in the file cache in bytes. Default is 262,144 (256K).

ObjectCacheTTL

Sets the length of time (in milliseconds) that objects in the cache are held in memory. Default is 30,000 milliseconds (30 seconds).

MaxPoolThreads

Sets the number of pool threads to create per processor. Determines how many CGI applications can run concurrently. Default is 4. Increase this value if you are using PHP in CGI mode.

ListenBackLog

Specifies the maximum number of active Keep Alive connections that IIS maintains in the connection queue. Default is 15, and should be increased to the number of concurrent connections you want to support. Maximum is 250.

If the settings are missing from this registry location, the defaults are being used.

High Performance on Windows: IIS and FastCGI

After much testing, I find that the best PHP performance on Windows is offered by using IIS with FastCGI. CGI is a protocol for calling external programs from a web server. It is not very fast because CGI programs are terminated after every page request. FastCGI modifies this protocol for high performance, by making the CGI program persist after a page request, and reusing the same CGI program when a new page request comes in.

As the installation of FastCGI with IIS is complicated, you should use Zend Core for Windows or php.iis.net. This will install PHP and FastCGI for the best performance possible. The Zend Core installer can also install Apache.

This section on FastCGI updated 30 Oct 2009.

PHP's Zend Engine

The Zend Engine is the internal compiler and runtime engine used by PHP. Developed by Zeev Suraski and Andi Gutmans, the Zend Engine is an abbreviation of their names. In the early days of PHP4, PHP worked in the following fashion:

spacer

The PHP script was loaded by the Zend Engine and compiled into Zend opcode. Opcodes, short for operation codes, are low level binary instructions. Then the opcode was executed and the HTML generated sent to the client. The opcode was flushed from memory after execution.

Today, there are a multitude of products and techniques to help you speed up this process. In the following diagram, we show the how modern PHP scripts work; all the shaded boxes are optional.

spacer

PHP Scripts are loaded into memory and compiled into Zend opcodes. These opcodes can now be optimized using an optional peephole optimizer called Zend Optimizer. Depending on the script, it can increase the speed of your PHP code by 0-50%.

Formerly after execution, the opcodes were discarded. Now the opcodes can be optionally cached in memory using several alternative open source products and the Zend Accelerator (formerly Zend Cache), which is a commercial closed source product. The only opcode cache that is compatible with the Zend Optimizer is the Zend Accelerator. An opcode cache speeds execution by removing the script loading and compilation steps. Execution times can improve between 10-200% using an opcode cache.

Where to find Opcode Caches (Modified 30 Oct 2009)

 

For an overview, see this Wikipedia article on PHP Accelerators.

Zend Platform: A commercial opcode cache developed by the Zend Engine team. Very reliable and robust. Visit zend.com for more information.

You will need to test the following open source opcode caches before using them on production servers as their performance and reliability very much depends on the PHP scripts you run.

The eAccelerator is quite popular, and I am using it (Added 28 Feb 2005).

Alternative PHP Cache: pecl.php.net/apc. I believe that PHP6 will come with APC built in.

 

Caching: the Ultimate Speed Booster

One of the secrets of high performance is not to write faster PHP code, but to avoid executing PHP code by caching generated HTML in a file or in shared memory. The PHP script is only run once and the HTML is captured, and future invocations of the script will load the cached HTML. If the data needs to be updated regularly, an expiry value is set for the cached HTML. HTML caching is not part of the PHP language nor Zend Engine, but implemented using PHP code. There are many class libraries that do this. One of them is the PEAR Cache, which we will cover in the next section. Another is the Smarty template library.

Finally, the HTML sent to a web client can be compressed. This is enabled by placing the following code at the beginning of your PHP script:

<?php

ob_start("ob_gzhandler");

:
:

?>

If your HTML is highly compressible, it is possible to reduce the size of your HTML file by 50-80%, reducing network bandwidth requirements and latencies. The downside is that you need to have some CPU power to spare for compression.

HTML Caching with PEAR Cache

The PEAR Cache is a set of caching classes that allows you to cache multiple types of data, including HTML and images.

The most common use of the PEAR Cache is to cache HTML text. To do this, we use the Output buffering class which caches all text printed or echoed between the start() and end() functions:

require_once("Cache/Output.php");

$cache = new Cache_Output("file", array("cache_dir" => "cache/") );

if ($contents = $cache->start(md5("this is a unique key!"))) {

#
# aha, cached data returned
#

  print $contents;
  print "<p>Cache Hit</p>";

} else {

#
# no cached data, or cache expired
#

  print "<p>Don't leave home without it…</p>"; # place in cache
  print "<p>Stand and deliver</p>"; # place in cache
  print $cache->end(10);


Since I wrote these lines, a superior PEAR cache system has been developed: Cache Lite.

The Cache constructor takes the storage driver to use as the first parameter. File, database and shared memory storage drivers are available; see the pear/Cache/Container directory. Benchmarks by Ulf Wendel suggest that the "file" storage driver offers the best performance. The second parameter is the storage driver options. The options are "cache_dir", the location of the caching directory, and "filename_prefix", which is the prefix to use for all cached files. Strangely enough, cache expiry times are not set in the options parameter.

To cache some data, you generate a unique id for the cached data using a key. In the above example, we used md5("this is a unique key!").

The start() function uses the key to find a cached copy of the contents. If the contents are not cached, an empty string is returned by start(), and all future echo() and print() statements will be buffered in the output cache, until end() is called.

The end() function returns the contents of the buffer, and ends output buffering. The end() function takes as its first parameter the expiry time of the cache. This parameter can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to default to 24 hours.

Another way to use the PEAR cache is to store variables or other data. To do so, you can use the base Cache class:

<?php

require_once("Cache.php");

$cache = new Cache("file", array("cache_dir" => "cache/") );
$id = $cache->generateID("this is a unique key");

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.