Last revised 30 Oct 2009. The benchmarks may be out of date, but the general advice is still good. I have revised the caching sections as 8 years of experience has shown it is the most critical part of high performance: I have added discussions of squid and memcache. If you want to see what has changed, search for this date in this article.
If you like this article, visit my blog, PHP Everywhere for related articles.
A HOWTO on Optimizing PHP
PHP is a very fast programming language, but there is more to optimizing PHP
than just speed of code execution.
In this chapter, we explain why optimizing PHP involves many factors which
are not code related, and why tuning PHP requires an understanding of how PHP
performs in relation to all the other subsystems on your server, and then identifying
bottlenecks caused by these subsystems and fixing them. We also cover how to
tune and optimize your PHP scripts so they run even faster.
Achieving High Performance
When we talk about good performance, we are not talking about how fast your
PHP scripts will run. Performance is a set of tradeoffs between speed versus accuracy versus scalability. An example of speed versus accuracy is your scripts might be tuned to run fast with caching, but the data will tend to grow stale and be less accurate. For an example of speed versus scalability you could write a script that runs fast by loading everything into memory, or write a more scalable one that only loads data in chunks so that it does not exhaust application memory
(Updated 30 Oct 2009 from speed vs scalability to speed vs accuracy vs scalability).
In the example below, A.php is a sprinter that can run fast, and B.php is a
marathon runner than can jog forever at the nearly the same speed. For light
loads, A.php is substantially faster, but as the web traffic increases, the
performance of B.php only drops a little bit while A.php just runs out of steam.
Let
us take a more realistic example to clarify matters further. Suppose we
need to write a PHP script that reads a 250K file and generates a HTML
summary of the file. We write 2 scripts that do the same thing: hare.php that reads the whole file into memory at once and processes it in one pass, and tortoise.php
that reads the file, one line at time, never keeping more than the
longest line in memory. Tortoise.php will be slower as multiple reads
are issued, requiring more system calls.
Hare.php requires 0.04
seconds of CPU and 10 Mb RAM and tortoise.php requires 0.06 seconds of
CPU and 5 Mb RAM. The server has 100 Mb free actual RAM and its CPU is
99% idle. Assume no memory fragmentation occurs to simplify things.
At 10 concurrent scripts running, hare.php will run out of memory (10 x 10
= 100). At that point, tortoise.php will still have 50 Mb of free memory. The
11th concurrent script to run will bring hare.php to its knees as it starts
using virtual memory, slowing it down to maybe half its original speed; each
invocation of hare.php now takes 0.08 seconds of CPU time. Meanwhile, tortoise.php
will be still be running at its normal 0.06 seconds CPU time.
In the table below, the faster php script for different loads is in bold:
Connections
|
CPU seconds required to satisfy 1 HTTP request
|
CPU seconds required to satisfy 10 HTTP requests
|
CPU seconds required to satisfy 11 HTTP requests
|
hare.php
|
0.04
|
0.40
|
0.88
(runs out of RAM)
|
tortoise.php
|
0.06
|
0.60
|
0.66
|
As the above example shows, obtaining good performance is not merely writing
fast PHP scripts. High performance PHP requires a good understanding of the
underlying hardware, the operating system and supporting software such as the
web server and database.
Bottlenecks
The hare and tortoise example has shown us that bottlenecks cause slowdowns.
With infinite RAM, hare.php will always be faster than tortoise.php. Unfortunately,
the above model is a bit simplistic and there are many other bottlenecks to
performance apart from RAM:
(a) Networking
Your network is probably the biggest bottleneck. Let us say you have a 10 Mbit
link to the Internet, over which you can pump 1 megabyte of data per second.
If each web page is 30k, a mere 33 web pages per second will saturate the line.
More subtle networking bottlenecks include frequent access to slow network
services such as DNS, or allocating insufficient memory for networking software.
(b) CPU
If you monitor your CPU load, sending plain HTML pages over a network will
not tax your CPU at all because as we mentioned earlier, the bottleneck will
be the network. However for the complex dynamic web pages that PHP generates,
your CPU speed will normally become the limiting factor. Having a server with
multiple processors or having a server farm can alleviate this.
(c) Shared Memory
Shared memory is used for inter-process communication, and to store resources
that are shared between multiple processes such as cached data and code. If
insufficient shared memory is allocated any attempt to access resources that
use shared memory such as database connections or executable code will perform
poorly.
(d) File System
Accessing a hard disk can be 50 to 100 times slower than reading data from
RAM. File caches using RAM can alleviate this. However low memory conditions
will reduce the amount of memory available for the file-system cache, slowing
things down. File systems can also become heavily fragmented, slowing down disk
accesses. Heavy use of symbolic links on Unix systems can slow down disk accesses
too.
Default Linux installs are also notorious for setting hard disk default settings
which are tuned for compatibility and not for speed. Use the command hdparm
to tune your Linux hard disk settings.
(e) Process Management
On some operating systems such as Windows creating new processes is a slow
operation. This means CGI applications that fork a new process on every invocation
will run substantially slower on these operating systems. Running PHP in multi-threaded
mode should improve response times (note: older versions of PHP are not stable
in multi-threaded mode).
Avoid overcrowding your web server with too many unneeded processes. For example,
if your server is purely for web serving, avoid running (or even installing)
X-Windows on the machine. On Windows, avoid running Microsoft Find Fast (part
of Office) and 3-dimensional screen savers that result in 100% CPU utilization.
Some of the programs that you can consider removing include unused networking
protocols, mail servers, antivirus scanners, hardware drivers for mice, infrared
ports and the like. On Unix, I assume you are accessing your server using SSH.
Then you can consider removing:
deamons such as telnetd, inetd, atd,
ftpd, lpd, sambad
sendmail for incoming mail
portmap for NFS
xfs, fvwm, xinit, X
You can also disable at startup various programs by modifying the startup files
which are usually stored in the /etc/init* or /etc/rc*/init* directory.
Also review your cron jobs to see if you can remove them or reschedule them
for off-peak periods.
(f) Connecting to Other Servers
If your web server requires services running on other servers, it is possible
that those servers become the bottleneck. The most common example of this is
a slow database server that is servicing too many complicated SQL requests from
multiple web servers.
When to Start Optimizing?
Some people say that it is better to defer tuning until after the coding
is complete. This advice only makes sense if your programming team's coding
is of a high quality to begin with, and you already have a good feel of
the performance parameters of your application. Otherwise you are exposing
yourselves to the risk of having to rewrite substantial portions of your
code after testing.
My advice is that before you design a software application, you should
do some basic benchmarks on the hardware and software to get a feel for
the maximum performance you might be able to achieve. Then as you design
and code the application, keep the desired performance parameters in mind,
because at every step of the way there will be tradeoffs between performance,
availability, security and flexibility.
Also choose good test data. If your database is expected to hold 100,000
records, avoid testing with only a 100 record database – you will regret
it. This once happened to one of the programmers in my company; we did
not detect the slow code until much later, causing a lot of wasted time
as we had to rewrite a lot of code that worked but did not scale.
|
Tuning Your Web Server for PHP
We will cover how to get the best PHP performance for the two most common web
servers in use today, Apache 1.3 and IIS. A lot of the advice here
is relevant for serving HTML also.
The authors of PHP have stated that there is no performance nor
scalability advantage in using Apache 2.0 over Apache 1.3 with PHP,
especially in multi-threaded mode. When running Apache 2.0 in pre-forking
mode, the following discussion is still relevant (21 Oct 2003).
(a) Apache 1.3/2.0
Apache is available on both Unix and Windows. It is the most popular
web server in the world. Apache 1.3 uses a pre-forking model
for web serving. When Apache starts up, it creates multiple child
processes that handle HTTP requests. The initial parent process
acts like a guardian angel, making sure that all the child processes
are working properly and coordinating everything. As more HTTP requests
come in, more child processes are spawned to process them. As the
HTTP requests slow down, the parent will kill the idle child processes,
freeing up resources for other processes. The beauty of this scheme
is that it makes Apache extremely robust. Even if a child process
crashes, the parent and the other child processes are insulated
from the crashing child.
The pre-forking model is not as fast as some other possible designs,
but to me that it is "much ado about nothing" on a server serving
PHP scripts because other bottlenecks will kick in long before Apache
performance issues become significant. The robustness and reliability
of Apache is more important.
Apache 2.0 offers operation in multi-threaded mode. My benchmarks
indicate there is little performance advantage in this mode. Also
be warned that many PHP extensions are not compatible (e.g. GD and
IMAP). Tested with Apache 2.0.47 (21 Oct 2003).
Apache is configured using the httpd.conf file. The following parameters are
particularly important in configuring child processes (updated 30 Oct 2009- in Apache 2, these settings have been moved to conf/extra/httpd-mpm.conf. Make sure you also uncomment the include extra/httpd-mpm.conf in httpd.conf):
Directive
|
Default
|
Description
|
MaxClients
|
256
|
The maximum number of child processes to create. The default means that
up to 256 HTTP requests can be handled concurrently. Any further connection
requests are queued.
|
StartServers
|
5
|
The number of child processes to create on startup.
|
MinSpareServers
|
5
|
The number of idle child processes that should be created. If the number
of idle child processes falls to less than this number, 1 child is created
initially, then 2 after another second, then 4 after another second, and
so forth till 32 children are created per second.
|
MaxSpareServers
|
10
|
If more than this number of child processes are alive, then these extra
processes will be terminated.
|
MaxRequestsPerChild
|
0
|
Sets the number of HTTP requests a child can handle before terminating.
Setting to 0 means never terminate. Set this to a value to between 100
to 10000 if you suspect memory leaks are occurring, or to free under-utilized
resources.
|
For large sites, values close to the following might be better:
MinSpareServers 32
MaxSpareServers 64
Apache on Windows behaves differently. Instead of using child
processes, Apache uses threads. The above parameters are not used.
Instead we have one parameter: ThreadsPerChild which defaults
to 50. This parameter sets the number of threads that can be spawned by
Apache. As there is only one child process in the Windows version, the
default setting of 50 means only 50 concurrent HTTP requests can be
handled. For web servers experiencing higher traffic, increase this
value to between 256 to 1024.
Other useful performance parameters you can change include:
Directive
|
Default
|
Description
|
SendBufferSize
|
Set to OS default
|
Determines the size of the output buffer (in bytes) used in TCP/IP connections.
This is primarily useful for congested or slow networks when packets need
to be buffered; you then set this parameter close to the size of the largest
file normally downloaded. One TCP/IP buffer will be created per client
connection.
|
KeepAlive [on|off]
|
On
|
In the original HTTP specification, every HTTP request had to establish
a separate connection to the server. To reduce the overhead of frequent
connects, the keep-alive header was developed. Keep-alives tells the server
to reuse the same socket connection for multiple HTTP requests.
If a separate dedicated web server serves all images, you can disable
this option. This technique can substantially improve resource utilization.
|
KeepAliveTimeout
|
15
|
The number of seconds to keep the socket connection alive. This time
includes the generation of content by the server and acknowledgements
by the client. If the client does not respond in time, it must make a
new connection.
This value should be kept low as the socket will be idle for extended
periods otherwise.
|
MaxKeepAliveRequests
|
100
|
Socket connections will be terminated when the number of requests set
by MaxKeepAliveRequests is reached. Keep this to a high value below MaxClients
or ThreadsPerChild.
|
TimeOut
|
300
|
Disconnect when idle time exceeds this value. You can set this value
lower if your clients have low latencies.
|
LimitRequestBody
|
0
|
Maximum size of a PUT or POST. O means there is no limit.
|
If you do not require
DNS lookups and you are not using the htaccess file to configure Apache
settings for individual directories you can set:
# disable DNS lookups: PHP scripts only get the IP address
HostnameLookups off
# disable htaccess checks
<Directory />
AllowOverride none
</Directory>
If you are not
worried about the directory security when accessing symbolic links,
turn on FollowSymLinks and turn off SymLinksIfOwnerMatch to prevent
additional lstat() system calls from being made:
Options FollowSymLinks
#Options SymLinksIfOwnerMatch
(b) IIS Tuning
IIS is a multi-threaded web server available on Windows NT and
2000. From the Internet Services Manager, it is possible to tune the
following parameters:
Performance Tuning based on the number of hits per day. |
Determines how much memory to preallocate for IIS. (Performance Tab). |
Bandwidth throttling |
Controls the bandwidth per second allocated per web site. (Performance Tab). |
Process throttling |
Controls the CPU% available per Web site. (Performance Tab). |
Timeout |
Default is 900 seconds. Set to a lower value on a Local Area Network. (Web Site Tab) |
HTTP Compression |
In
IIS 5, you can compress dynamic pages, html and images. Can be
configured to cache compressed static html and images. By default
compression is off.
HTTP compression has to be
enabled for the entire physical server. To turn it on open the IIS
console, right-click on the server (not any of the subsites, but the
server in the left-hand pane), and get Properties. Click on the Service
tab, and select "Compress application files" to compress dynamic
content, and "Compress static files" to compress static content. |
You can also configure the default isolation level of your web site.
In the Home Directory tab under Application Protection, you can define
your level of isolation. A highly isolated web site will run slower
because it is running as a separate process from IIS, while running web
site in the IIS process is the fastest but will bring down the server
if there are serious bugs in the web site code. Currently I recommend
running PHP web sites using CGI, or using ISAPI with Application
Protection set to high.
You can also use regedit.exe to modify following IIS 5 registry settings stored at the following location [Updated 30 Oct 2009: Tips for IIS6 and IIS7]:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Inetinfo\Parameters\
MemCacheSize
|
Sets the amount of memory that IIS will use for its file cache. By default
IIS will use 50% of available memory. Increase if IIS is the only application
on the server. Value is in megabytes.
|
MaxCachedFileSize
|
Determines the maximum size of a file cached in the file cache in bytes.
Default is 262,144 (256K).
|
ObjectCacheTTL
|
Sets the length of time (in milliseconds) that objects in the cache are
held in memory. Default is 30,000 milliseconds (30 seconds).
|
MaxPoolThreads
|
Sets the number of pool threads to create per processor. Determines
how many CGI applications can run concurrently. Default is 4. Increase
this value if you are using PHP in CGI mode.
|
ListenBackLog
|
Specifies the maximum number of active Keep Alive connections that IIS
maintains in the connection queue. Default is 15, and should be increased
to the number of concurrent connections you want to support. Maximum is
250.
|
If the settings are missing from this registry location, the defaults are being
used.
High Performance on Windows: IIS and FastCGI
After much testing, I find that the best PHP performance on Windows
is offered by using IIS with FastCGI. CGI is a protocol for calling
external programs from a web server. It is not very fast because
CGI programs are terminated after every page request. FastCGI modifies
this protocol for high performance, by making the CGI program persist
after a page request, and reusing the same CGI program when a new
page request comes in.
As the installation of FastCGI with IIS is complicated, you should
use Zend Core for Windows or php.iis.net. This will install PHP and FastCGI
for the best performance possible. The Zend Core installer can also install
Apache.
This section on FastCGI updated 30 Oct 2009.
PHP's Zend Engine
The Zend Engine
is the internal compiler and runtime engine used by PHP. Developed by
Zeev Suraski and Andi Gutmans, the Zend Engine is an abbreviation of
their names. In the early days of PHP4, PHP worked in the following
fashion:
The
PHP script was loaded by the Zend Engine and compiled into Zend opcode.
Opcodes, short for operation codes, are low level binary instructions.
Then the opcode was executed and the HTML generated sent to the client.
The opcode was flushed from memory after execution.
Today, there are a multitude
of products and techniques to help you speed up this process. In the
following diagram, we show the how modern PHP scripts work; all the
shaded boxes are optional.
PHP Scripts are loaded into memory and compiled into Zend opcodes. These opcodes
can now be optimized using an optional peephole optimizer called Zend Optimizer.
Depending on the script, it can increase the speed of your PHP code by 0-50%.
Formerly after execution, the opcodes were discarded. Now the opcodes can be
optionally cached in memory using several alternative open source products and
the Zend Accelerator (formerly Zend Cache), which is a commercial closed source
product. The only opcode cache that is compatible with the Zend Optimizer is
the Zend Accelerator. An opcode cache speeds execution by removing the script
loading and compilation steps. Execution times can improve between 10-200% using
an opcode cache.
Where to find Opcode Caches (Modified 30 Oct 2009)
For an overview, see this Wikipedia article on PHP Accelerators.
Zend Platform: A commercial opcode
cache developed by the Zend Engine team. Very reliable and
robust. Visit zend.com
for more information.
You will need to test the following
open source opcode caches before using them on production
servers as their performance and reliability very much depends
on the PHP scripts you run.
The eAccelerator is quite popular, and I am using it (Added 28 Feb 2005).
Alternative PHP Cache: pecl.php.net/apc. I believe that PHP6 will come with APC built in.
|
Caching: the Ultimate Speed Booster
One of the secrets of high performance is not to write faster PHP code, but
to avoid executing PHP code by caching generated HTML in a file or in shared
memory. The PHP script is only run once and the HTML is captured, and future
invocations of the script will load the cached HTML. If the data needs to be
updated regularly, an expiry value is set for the cached HTML. HTML caching
is not part of the PHP language nor Zend Engine, but implemented using PHP code.
There are many class libraries that do this. One of them is the PEAR Cache,
which we will cover in the next section. Another is the Smarty
template library.
Finally, the HTML sent to a web client can be compressed. This is enabled by
placing the following code at the beginning of your PHP script:
<?php
ob_start("ob_gzhandler");
:
:
?>
If your HTML is
highly compressible, it is possible to reduce the size of your HTML
file by 50-80%, reducing network bandwidth requirements and latencies.
The downside is that you need to have some CPU power to spare for
compression.
HTML Caching with PEAR Cache
The PEAR Cache is a set of caching classes that allows you to cache multiple types of data, including HTML and images.
The most common use of the PEAR Cache is to cache HTML text. To do
this, we use the Output buffering class which caches all text printed
or echoed between the start() and end() functions:
require_once("Cache/Output.php");
$cache = new Cache_Output("file", array("cache_dir" => "cache/")
);
if ($contents = $cache->start(md5("this is a unique key!")))
{
#
# aha, cached data returned
#
print $contents;
print "<p>Cache Hit</p>";
} else {
#
# no cached data, or cache expired
#
print "<p>Don't leave home without it…</p>";
# place in cache
print "<p>Stand and deliver</p>"; # place
in cache
print $cache->end(10);
Since I wrote these lines, a superior PEAR cache system has been developed: Cache Lite.
The Cache
constructor takes the storage driver to use as the first parameter.
File, database and shared memory storage drivers are available; see the
pear/Cache/Container directory. Benchmarks by Ulf Wendel suggest that
the "file" storage driver offers the best performance. The second
parameter is the storage driver options. The options are "cache_dir",
the location of the caching directory, and "filename_prefix", which is
the prefix to use for all cached files. Strangely enough, cache expiry
times are not set in the options parameter.
To cache some data, you
generate a unique id for the cached data using a key. In the above
example, we used md5("this is a unique key!").
The start() function uses
the key to find a cached copy of the contents. If the contents are not
cached, an empty string is returned by start(), and all future echo()
and print() statements will be buffered in the output cache, until
end() is called.
The end() function returns
the contents of the buffer, and ends output buffering. The end()
function takes as its first parameter the expiry time of the cache.
This parameter can be the seconds to cache the data, or a Unix integer
timestamp giving the date and time to expire the data, or zero to
default to 24 hours.
Another way to use the PEAR cache is to store variables or other data. To do so, you can use the base Cache class:
<?php
require_once("Cache.php");
$cache = new Cache("file", array("cache_dir" => "cache/")
);
$id = $cache->generateID("this is a unique key");
|