Archive for June, 2012

Thus far being able to handle ~ 70 connections a second isn’t bad but can we do better?

Yes.

Let’s continue where we left off.

  1. From your WordPress dashboard, select plug-ins. This time we’re going to install the WordPress Nginx proxy cache integrator. As before search, and install.
  2. Now edit /etc/nginx and add
            proxy_cache_path /dev/shm/nginx levels=1:2 keys_zone=czone:16m max_size=32m inactive=10m;
            proxy_temp_path /dev/shm/nginx/tmp;
            proxy_cache_key "$scheme$host$request_uri";
            proxy_cache_valid 200 302 10m;

    directly after the line that has server_names_hash_bucket_size 64;

  3. edit /etc/hosts modifying the entry for localhost
    127.0.0.1 localhost backend frontend
  4. Now edit /etc/nginx/conf.d/default.conf, and change everything before
    # BEGIN W3TC Page Cache cache

    to

    server {
        server_name frontend;
    location / {
    proxy_pass http://backend:8080;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_cache czone;
    }
    }
    
    server {
        server_name backend;
        root /var/www/;
        listen 8080;
        index index.html index.htm index.php;
    
        include conf.d/drop;
    
            location / {
                    # This is cool because no php is touched for static content
                            try_files $uri $uri/ /index.php?q=$uri&$args;
            }
    
    location ~ \.php$ {
    fastcgi_buffers 8 256k;
    fastcgi_buffer_size 128k;
    fastcgi_intercept_errors on;
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_pass unix:/dev/shm/php-fpm-www.sock;
    }
  5. Restart services with
    service php5-fpm restart
    service nginx restart

Now we’re ready for more speed tests with ApacheBench.

That’s quite a speed up. This configuration is able to handle about 200 concurrent dispatches a second before performance starts to drop off. Even at 300 connections per second, the system is still able to handle requests faster than they are coming in however the latency for each request is starting to build. At 400 the system is able to process just as many requests as are coming in.

Here’s a dstat of a 400 concurrent connections run. There’s an interesting behavior that occasionally shows up here.

root@linaro-server:/etc/nginx/conf.d# dstat
You did not select any stats, using -cdngy by default.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  1   0  97   1   0   0|3069B   21k|   0     0 |   0     0 | 144    66 
  0   1  99   0   0   0|   0     0 | 150B 1082B|   0     0 | 207    44 
  0   0 100   0   0   0|   0     0 |  52B  366B|   0     0 | 106    56 
  0   0 100   0   0   0|   0     0 | 104B  492B|   0     0 |  96    50 
  0   0 100   0   0   0|   0     0 | 104B  732B|   0     0 | 104    54 
  3   0  88   1   0   8|   0    40k| 101k   43k|   0     0 | 639    76 
 48  33   0   0   0  19|   0     0 | 196k  375k|   0     0 |2578   705 
 47  33   0   0   0  21|   0     0 | 306k  539k|   0     0 |3391   593 
 45  33   0   0   0  22|   0     0 | 404k 4112k|   0     0 |4545   696 
 63  16   0   0   0  22|   0     0 | 458k 7742k|   0     0 |6007  1656 
 96   4   0   0   0   0|   0     0 |2600B   66k|   0     0 | 525   760 
 94   6   0   0   0   0|   0    48k|3894B   98k|   0     0 | 605   886 
 94   6   0   0   0   0|   0    24k|3940B   98k|   0     0 | 591   809 
 93   7   0   0   0   0|   0    16k|3675B   87k|   0     0 | 564   833 
 97   3   0   0   0   0|   0     0 |3062B   76k|   0     0 | 565   829 
 96   3   0   0   0   0|   0     0 |3432B   87k|   0     0 | 547   828 
 97   3   0   0   0   0|   0    24k|3796B   98k|   0     0 | 561   843 
 96   4   0   0   0   0|   0     0 |3016B   76k|   0     0 | 543   855 
 95   5   0   0   0   0|   0     0 |4078B   98k|   0     0 | 574   809 
 97   3   0   0   0   1|   0     0 |3227B   76k|   0     0 | 563   802 
 96   4   0   0   0   0|   0     0 |3848B   98k|   0     0 | 573   839 
 97   3   0   0   0   0|   0    24k|2600B   66k|   0     0 | 554   750 
 94   5   0   0   0   1|   0     0 |3848B   98k|   0     0 | 581   881 
 97   3   0   0   0   0|   0     0 |3016B   76k|   0     0 | 567   791 
 95   5   0   0   0   0|   0     0 |3432B   87k|   0     0 | 564   840 
 97   3   0   0   0   0|   0     0 |3940B   98k|   0     0 | 567   782 
 38   2  59   0   0   1|   0    48k|  18k   99k|   0     0 | 559   216 
  0   1  99   0   0   0|   0     0 |  52B  366B|   0     0 | 153    62 
  0   1  99   0   0   0|   0     0 | 104B  492B|   0     0 | 112    48 
  0   1  99   0   0   0|   0     0 |  52B  366B|   0     0 | 138    61 
  0   1  99   0   0   0|   0     0 |  52B  366B|   0     0 | 101    46 
  0   0 100   0   0   0|   0     0 |  52B  366B|   0     0 | 110    47 
  0   0 100   0   0   0|   0     0 |  52B  366B|   0     0 | 110    55

These last runs at 400 concurrent connections seems to stick at the end, waiting for some last requests. We can see this in the ab report.

Server Software:        nginx/1.2.1
Server Hostname:        linaro-server.testdomain.com
Server Port:            80

Document Path:          /
Document Length:        172 bytes

Concurrency Level:      400
Time taken for tests:   20.397 seconds
Complete requests:      3000
Failed requests:        1175
   (Connect: 0, Receive: 0, Length: 1175, Exceptions: 0)
Write errors:           0
Non-2xx responses:      1825
Total transferred:      12837398 bytes
HTML transferred:       12277473 bytes
Requests per second:    147.08 [#/sec] (mean)
Time per request:       2719.560 [ms] (mean)
Time per request:       6.799 [ms] (mean, across all concurrent requests)
Transfer rate:          614.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   31 105.1      1    1076
Processing:     1  842 2772.7     82   20243
Waiting:        1  842 2772.6     81   20242
Total:          2  873 2771.4    146   20264

Percentage of the requests served within a certain time (ms)
  50%    146
  66%    194
  75%    330
  80%    347
  90%   2003
  95%   2206
  98%  13677
  99%  17262
 100%  20264 (longest request)

It doesn’t always stick, but I sense there’s a software issue here.

Regardless. For a very cheap ARM based server, this seems good. It would be worthwhile to compare to an intel setup.

Does this apply to real world needs? Well, I tweeted to @gruber asking what were his daringfireball stats the other day and he was kind enough to respond back with a set of 24 hour stats from his site which is one of the more popular blogs out there. I don’t know that he uses wordpress but regardless. He responded with this:

Presuming an evenly distributed number of requests over an hour allowing us to divide by 3600, during the busiest hour 15815, yields ~4.4 requests a second. Seems like this setup would be able to do that.

Update #1: Doing a little digging around last night, I discovered  in the 400 concurrent request range the kernel is throwing page allocation failures yet it’s not really out of memory yet. I’ve posted a question to the linaro-dev list about this to which Andy Green mentioned it might be related to a problem observed on the Raspberry Pi which seems to involve the ethernet driver. Stay tuned.

Advertisements

Graph : Plain vs W3 Total Cache WordPress perf on ARM

This graph shows successive numbers of concurrent connections with the number of concurrent connections per second maintained. ab, apache benchmark is used to drive traffic over a wired gig ethernet network. Plain wordpress on ARM is able to handle 8.06 connections / per second with specifying dispatching of 10 connections per second. Given the rate you can see the server is already falling behind and adding more traffic will hasten the point failure in the form of dropped connections. Turning on the W3 Total Cache, we’re able to service 70 connections per second. Once the number of concurrent connections per second goes above 70, the server starts to fall behind and the time to service requests goes up. Within the test’s time period ramping up to 130 connects per second still works as long the wait time does not get so long that it results in a dropped connection. Above 130 the wait time becomes so long connections start to drop.

Updated : links to the new home for the linaro based server image and nginx armhf debs.

In science being able to reproduce results outside of the lab is essential. I thought I would try and reproduce the performance results of this blog post about a high performance word press server using an ARM device. I’ve made updates based on Linaro images, and have prebuilt armhf debs for nginx besides a few setup things.

In my case I’m using a Panda ES which is of course a dual core cortex A-9 omap4 4460 with 1 Gig of RAM.

Let’s get started.

  1. Download the lnmp-server image from here.
  2. Boot the image
  3. apt-get update
  4. apt-get install mysql-server (but sure to set the server password and remember it!)
  5. Download nginx-common_1.2.1-0ubuntu0ppa1~precise_all.deb and nginx-full_1.2.1-0ubuntu0ppa1~precise_armhf.deb from here.
  6. dpkg -i nginx-common_1.2.1-0ubuntu0ppa1~precise_all.deb nginx-full_1.2.1-0ubuntu0ppa1~precise_armhf.deb
  7. apt-get install -f  (this will pull in various deps that nginx needs)
  8. mysql -u root -p
  9. Enter CREATE DATABASE wordpress;
    GRANT ALL PRIVILEGES ON wordpress.* TO “wp_user”@”localhost” IDENTIFIED BY “ENTER_A_PASSWORD”;
    FLUSH PRIVILEGES;
    EXIT
  10. apt-get install php5-fpm php-pear php5-common php5-mysql php-apc
  11. edit /etc/php5/fpm/php.ini
  12. add to the bottom
    [apc]
    apc.write_lock = 1
    apc.slam_defense = 0
  13. edit /etc/php5/fpm/pool.d/www.conf
  14. replace
    listen = 127.0.0.1:9000

    with

    listen = /dev/shm/php-fpm-www.sock
  15. and then add
    listen.owner = www-data
    listen.group = www-data
    listen.mode = 0660
  16. edit /etc/nginx/nginx.conf
  17. In the http section add
    port_in_redirect off;
  18. find
    # server_names_hash_bucket_size 64;

    change to

    server_names_hash_bucket_size 64;
  19. edit /etc/nginx/conf.d/drop and place the contents of this link into the file
  20. edit /etc/nginx/conf.d/default.conf and place the contents of this link into the file
  21. Within the same file, change all instances of domainname.com to your appropriate domain.
  22. mkdir -p /var/www/
    chmod 775 /var/www
  23. service nginx start
  24. service php5-fpm restart
  25. cd /tmp
    wget http://wordpress.org/latest.tar.gz
    tar xfz latest.tar.gz
    mv wordpress/* /var/www/
  26. cp /var/www/wp-config-sample.php /var/www/wp-config.php
  27. edit /var/www/wp-config.php
  28. Visit this link and replace the fields in file with the values produced by the web page
  29. In the same file replace the following
    define('DB_NAME', 'wordpress');
    
    define('DB_USER', 'wp_user');
    
    define('DB_PASSWORD', 'whatever you entered for a password');
  30. Visit http://www.yourdomainname.com/wp-admin/install.php
  31. Fill in the fields appropriately
  32. Afterwords, log in
  33. Go to settings -> permalinks, select custom structure and enter
    /%post_id%/%postname%

    Then hit “Save Changes”

Now we are to a point where you can run your system by creating a first post and doing some testing with ab. I did so and at this point found the numbers weren’t that great.

Time to start to enable caches. Goto the admin page, select plugins, and then install new plugin. Search for the “W3 Total Cache” plugin and install it. After this is complete click on active plugin.

Now select the performance menu on the left on side. For all the entries, if you have an option to choose “PHP APC” do so. You’ll also need to specifically enable:

Database Cache
Object Cache

Save all settions, and the select deploy.

Again at this point you can run ab and collect performance data. I can see from my data that things are much improved and replicating nicely. Data and pretty graphs tomorrow. But I’m far from being done yet. Stay tuned!

ngnix & Calxeda

Posted: June 22, 2012 in images, linaro, open_source, server

In progress I have an update to the Linaro based server image I’ve created. It fixes a couple of notable bugs.

  1. linaro-media-create would fail due to an already installed kernel.
  2. openssh-server removed for now – the package while previously installed wouldn’t have the keys generated, so unless you knew to manually gen your keys slogin and friends would fail in unhelpful ways.

Besides the update to this lamp image, I’ve another image I’ve create which replaces apache with ngnix. Never heard of ngnix ?  Read more about it here.

Also of note Calxeda has posted ApacheBench numbers using their new chips. That can be found here.

The new lamp server image is located here.  It is as yet untested.

The ngnix based image isn’t complete.

Today I went to look to find the Linaro server image on snapshots.linaro.org  that I had put together last year  and well .. umm .. err … yeah not there. Now don’t get upset. Maintaining lots of different reference images takes time, effort and resources. It’s cool.

Rolling up my sleeves, I took the time to create a version of the past server image using armhf with precise. I’ve tested booted it on my panda boards. It works. Tomorrow I’m intending to run ApacheBench against it.

The live-build config is located at : lp:~tom-gall/linaro/live-helper.config.precise.server

I’ve cross built this on my intel box with the a45 version linaro’s live-build. You can too.

If you’ve never cross built an image before you can find instructions here.

Or if I have a little time tomorrow I’ll post the image somewhere so you don’t have to rebuild it.

Updated 6/19 : Server image can be downloaded from here.

ARM Server Performance

Posted: June 15, 2012 in linaro, server

One thing I’ve been giving some thought to lately is just how well can ARM hardware stand up when being used as a server? Take current Cortex A-9 hardware and do some comparisons…  well I’m glad to say others are thinking about it too. Here’s a couple of links that I think are worth your time if you have an interest in this.

I think it would be very interesting to apply some time and effort on server app performance on ARM Linux like what the Linaro Android team has done and see just how far we might be able to push the ARM performance envelope. Fun stuff.

Thoughts on WWDC 2012

Posted: June 8, 2012 in Uncategorized

Next week is Apple’s World Wide Developers Convention. (WWDC) Everyone else is posting their predictions, so I will too.

  1. Apple replaces google maps with something else. No matter what it is and how it works, let’s just hope it works better than Ping!
  2. New intel Macs? Yeah. HiDPI? O I sure hope so. Apple likes to command a premium and taking a step forward in display technology is certainly going to create some serious want in the hearts and minds of those with the cash to spend it. USB3 I would bet will be there.
  3. Siri 3rd party access. This might be a little early yet. Siri isn’t even out of beta yet. This is what I think we’ll see, Siri 1.0 and expanded to more hardware, probably just iPad 3. 3rd party access? I kinda doubt it.
  4. New iPhone hardware? Count me as a doubter. Even tho the 4G LTE rumors about moscone getting wired for it seem pretty real, I just don’t think we’ll see an iPhone revision announced until a little later towards the fall which would set the stage for an active fall / Christmas shopping season.
  5. More enterprise support in iOS. I’m sure Apple will continue to want to make inroads with business. I think we’ll see additions to iOS to support what enterprises want for security in particular. This to me means there will be some way added to divide between the owners stuff and the companies apps / data placed onto the iOS device.
  6. Multi-user, sorta. I think we’re going to see something introduced that allows your iPad to be used by multiple members of the family and is able to keep preferences, log in ids, app access, differentiated by user.
  7. Apps on Apple TV but not just any apps. It’s going to happen but these will be apps that specifically deliver content. With 100+ billion in the bank, I wonder if apple won’t throttle back on the percentage cut it takes on business through the iTunes store just for this angle of business for a new Apple TV. Will Apple be content with 2nd run content or will it further sweeten the pot to try and attract new original content. That’ll be interesting. I don’t think there will be DVR functionality
  8. OS X.8
  9. I think the way apps communicate back and forth in iOS is about to change. I think we’re going to see something like NeXSTEP services of yore implemented allowing apps to work together. If it happens, this will be big. Just as one can string unix commands with stdin/stdout and the magic of pipe so too this allows one app to look for other apps on the iOS device to perform specific functions.
  10. I wonder if there won’t be a push to showcase business solutions. Microsoft Office for iOS on stage? I wouldn’t be surprised. Further I wouldn’t be surprised if there are other enterprise solutions that might join them there just to reinforce the point that iOS devices belong not just in the hands of consumers.

This month is going to be fun. After all, Google IO is just a few weeks away.