Skip to main content

The "Grand National" Effect - dealing with traffic spikes

The Grand National is one of the largest events in the UK horse racing calendar, and that means hundreds of million of pounds at stake for the online bookmakers and gambling sites.

It also means a major headache for the web operations teams as they deal with the inevitable traffic spikes hammering their sites, at a time when availability and performance direct equate to money won or lost.

Site Confidence monitored the home pages of ten of the top ranking online gambling sites on the Grand National Saturday, and some fascinating patterns emerged in the lead up to the race (16:15 BST).

Firstly, lets look at the Contenders…

image 

You can immediately see a range of availability and performance, with some of the traditional “bricks and mortar” bookmakers considerably slower than the pureplay dotcom’s like “DotCom #1” and Bet365.

Option 1 – Do nothing… and #fail

image

The first option is to “do nothing and hope for the best” approach, as seen in the graph above for William Hill, a traditional “High Street” betting shop.

No changes to the page size (light grey) and highly erratic page download speed giving a poor user experience to their customers… and presumably significant revenue leakage.

Option 2 – Peak Capacity

image

Bet365 appear to have weathered the storm without a blip – rock solid response time across the critical period (albeit with a “light” home page anyway).

“DotCom #1” also managed to keep performance fast (average around 2.5s with some ~5 sec spikes) whilst still serving up a “full sized page”.

Presumably both of these “pure play” dotcom sites have a capacity planning model that has sized their infrastructure and application performance at a sufficiently high level to be able to handle the “peaks”, albeit potentially at the price of carrying excess capacity during “average” periods.

image

Option 3 - Lighten up to speed up

image

A number of sites took a middle line – reduce the page weight and serve a “stripped down” version of the site that could be served within a lower overall capacity. “High Street #3” (above) stripped their site from ~300Kb down to a super lean 70Kb page.

Blue Square took the same approach, cutting from ~500Kb to ~300Kb… but still suffering some performance hiccups.

image

Why good deployment processes count…

One other thing to note… whilst Option 3 is a good balance between “losing money when your site fails” and “losing money by carrying capacity you don’t need it does highlight the need for good deployment processes… because otherwise you can end up with an inconsistent site (page size) across your web farm.

In the two graphs below you see a pattern of differing page sizes (presumably as the monitoring agents hit different servers in the server farm)… normally a sure sign of a site that has not been deployed consistently across the web farm!

Interesting the first (“High Street #1”) suffered the problem when they rolled out the “light” version, and the second (“High Street #2”) when they tried to roll back to the “normal” version.

Good reasons to review both the deployment plan AND the rollback plan…

image

image

Comments

Popular posts from this blog

So what else does Operations do? Well, there is a whole organisation run by the UK govermnent to help answer that question! ITIL , or the IT Infrastructure Library, is a library of best practice information that basically tells you everything you need to do to run an IT department. Similarly developers have development methodologies such as RAD, JAD, Agile/XP, and Project Managers have PM methodologies such as Prince 2, PMBok etc to cover off their areas in more specific detail. ITIL breaks it down into 7 key areas: Service Support - deals with the actual provision of IT services such as the service (help) desk, incident management, problem management, release management etc Service Delivery - deals with ensuring that you can continue to DELIVER the service support functions with things like contigency planning, capacity management, service levels etc The Business Perspective - helps to ensure that the IT function is aligned with the organisation's business strategy and that how to

Top 13 Website Crashes of 2010?

I was doing a bit of research for an article and I started compiling a list of high-profile website crashes in 2010. Pingdom have published a list here - http://www.readwriteweb.com/archives/major_internet_incidents_and_outages_of_2010.php as have Alertsite here - http://www.huffingtonpost.com/2010/12/29/the-biggest-web-outages-o_n_801943.html But I decided to compile my own list from a more UK-centric perspective and came up with my “baker’s dozen” below. # Site Date News Link 1 National Rail Jan-10 http://www.theregister.co.uk/2010/01/05/rail_chaos/ 2 Outnet Apr-10 http://www.guardian.co.uk/lifeandstyle/blog/2010/apr/16/outnet-sale-website-crash 3 Apple (iPhone 4 Launch) Jun-10 http://www.dailymail.co.uk/sciencetech/article-1286756/Apple-iPhone-4-pre-order-Website-crashes-new-iPhone-goes-sale.html 4 ITV.com (World C

Real-time Performance Analytics with Pion and WebTuna

One of my goals is to create an easy to implement real-time web performance analytics solution that doesn’t rely on fragile, inaccurate javascript tags and I have been playing around with an idea on the weekend. I used the performance measurement and analytics stream generation capabilities of Atomic Lab’s Pion to inspect the HTTP traffic directly off the network and measure the page load performance. I then used some simple Python scripting within Pion to generate a beacon to www.webtuna.com , a UK-based performance analytics provider. I then fired up webpagetest.org and generated some traffic from different nodes around the world and you can see the results graphically in the screen shot below. The end result is a proof of concept that works brilliantly to tell you who is on your website, where they come from, what pages they have visited… and how fast the page appeared to load from the end-user’s perspective. Keep in mind these are page load times, not server response