Quantcast
submit to reddit       
       

Adding more nodes to the cluster

My Raspberry Pi server cluster has been running for three months, and is now serving 45,000 page views per month. The amount of traffic reaching my site is increasing all the time, and occasionally there are large spikes in traffic from social networking sites.

As the load on the server increases, it's important to make sure it has enough capcity, so I decided to increase it's computing power by adding four more Raspberry Pi server nodes to the cluster.

I built two new racks, each holding four Raspberry Pi servers. I daisy-chained two ethernet switches together, with four Pi servers connected to each switch.

I cloned an SD card from one of the Pi nodes and set up four almost identical SD cards. The only difference between them is the IP address in /etc/network/interfaces. I did a backup of my site from the dashboard of my CMS, and used a script to synchonize the content all worker nodes.

The next step was to modify the load balancer settings to start using the new nodes. On the load balancer, /etc/apache2/sites-available/default needed to be updated to include the new nodes in the cluster declaration:

<Proxy balancer://rpicluster> BalancerMember http://192.168.1.2:80 BalancerMember http://192.168.1.3:80 BalancerMember http://192.168.1.4:80 BalancerMember http://192.168.1.5:80 BalancerMember http://192.168.1.6:80 BalancerMember http://192.168.1.7:80 BalancerMember http://192.168.1.8:80 BalancerMember http://192.168.1.9:80 AllowOverride None Order allow,deny allow from all ProxySet lbmethod=byrequests </Proxy>

When I finished making changes, I ran this command to load them into Apache:

$ sudo /etc/init.d/apache2 reload

This tells Apache to reload its configuration files without restarting. I went to the balancer manager interface at 192.168.0.3/balancer-manager to make sure the new nodes had been added:

Testing

I tested the new eight Pi cluster using the same tests as when I was only using four Pi servers. First, I used seige to generate 200 concurrent requests over a minute:

$ siege -d1 -c200 -t1m http://192.168.0.3/specs.html Lifting the server siege... done. Transactions: 23492 hits Availability: 100.00 % Elapsed time: 59.81 secs Data transferred: 48.93 MB Response time: 0.01 secs Transaction rate: 392.78 trans/sec Throughput: 0.82 MB/sec Concurrency: 3.81 Successful transactions: 23492 Failed transactions: 0 Longest transaction: 0.63 Shortest transaction: 0.00

The result is very similar to the result for the same test on the cluster with just four nodes (see 'Testing the whole cluster' Improving cluster performance by tuning Apache).

Next, I ran siege with 800 concurrent requests:

Lifting the server siege... done. Transactions: 76510 hits Availability: 100.00 % Elapsed time: 59.76 secs Data transferred: 159.39 MB Response time: 0.12 secs Transaction rate: 1280.29 trans/sec Throughput: 2.67 MB/sec Concurrency: 148.45 Successful transactions: 76510 Failed transactions: 0 Longest transaction: 13.04 Shortest transaction: 0.00

The longest transaction time has increased, but the through-put and the number of transactions per second have also increased.

Performance with a low number of concurrent request hasn't really changed, but performance has improved for an increased number of concurrent requests. This is to be expected as adding more nodes doesn't make a cluster faster, it's meant to increase the cluster's capacity.

I was surprised that the time for the longest transaction increased. Most requests completed in 20ms or less, and unfortunately siege doesn't record the average response time.

Tests with Apache Bench, ab, show that the longest transaction time was 5.111 seconds, but the mean transaction time was 0.214 seconds, so an increase in the longest transaction time doesn't mean that overall performance is worse, but it is cause for concern. I logged into the load balancer using ssh and re-ran the tests on my cluster. I ran the uptime command followed by free on the load balancer:

$ uptime 13:30:04 up 1 day, 5:20, 1 user, load average: 9.27, 3.52, 1.33 $ free total used free shared buffers cached Mem: 498268 487188 11080 0 37328 299752 -/+ buffers/cache: 150108 348160 Swap: 513020 20 513000

The load average figures for the load balancer are much higher than for the servers. The load balancer only has 11MB of RAM left and has started to use swap. When web servers start to use swap space, they slow down dramatically, so I need to look into the performance and memory usage of my load balancer. Using siege to test with 800 concurrent users is testing for the worst case. At the moment my site isn't getting that much traffic, so the performance issues with the load balancer aren't an immediate problem, but it's something I need to look at.

I still don't know how much traffic this system can actually handle because serving real traffic is not the same as testing with siege. I do know that my server can handle at least 45,000 hits a month, and probably a lot more now that I have added more nodes.


Comments

Comments

comments powered by Disqus



Follow me


This site is powered by Pyplate, a lightweight Python CMS for the Raspberry Pi.