Monday, January 14, 2008

JANET, UKERNA & BT strike again

Almost exactly a year ago, we were attempting to do some demonstrations at BETT when, somewhere on the network, a card in a BT exchange became "unseated" - i.e. not sitting in its socket properly and connections across the BucksGfL WAN went down. As UKERNA / JANET were at the BETT show we were able to pester them about what was going on. As is the way with these things, the issue was passed to BT. What happened then was (to my mind) a Keystone Cops-esque tale of ineptitude as it took about three days to sort the problem out - and bear in mind that (as I understand it and I'm happy to be corrected) this was a card - basically a big version of a network card in a server or PC - which hadn't been pushed in properly. At one stage, a BT engineer was (so I'm told) sent to the wrong exchange to fix the problem. On finding that all was well at said exchange s/he went off home (understandably) and we were told that the problem had been fixed. Of course, anyone looking at a network connectivity map or trying to access a server or the internet could tell that it wasn't, but it got frustrating going back to the UKERNA / JANET stand and insisting that, no, it really really wasn't fixed.
Well, this morning (actually, as of yesterday), a similar problem has arisen - almost exactly a year on. Our UKERNA feed went down on Sunday - so it's Monday morning and (I'm assuming) there's no internet access beyond the WAN, or access to BucksGfL sites from outside of the WAN. Here's a simple test for you to see if UKERNA / JANET / BT have got their act together - try visiting this web site and, if the link doesn't work, they haven't fixed it yet...
The thing which spins my head around though is how this is almost exactly 12 months on from a very similar issue.
This is very pertinent actually in the light of last week's comments by Fronter about Moodle not being reliable enough for a 24/7 operation. Our Moodle servers are sat there, in County Hall, humming away (and they should be accessible from inside the WAN - i.e. schools should still be able to see them) and - let's be honest, any server - Moodle, Fronter, Microsoft, whoever is as vulnerable to this sort of issue as any other - the idea that spending a fortune on a VLE implementation will insulate you from this sort of issue is a fallacy.
11:30am update - a conversation with someone on our internal BucksCC service desk reveals that the entire UKERNA service for half of the south-east is down. Speaking to someone on the school's network reveals that the BucksGfL (and hence the Moodles) are accessible to schools as all are inside the BucksGfL WAN and hence insulated from this. Here's a quote from a JANET update:

Latest update from JANET
Loss of connectivity to several circuits in London (TT:20080113-1)
Update - 14/01/2008 09:40
BT have reported to us that is an SDH fault, the fault has been passed to their SDH team for further investigation. Further updates will follow when available.

There then follows a long list of services / LAs / colleges / Universities categorised in green or red according to whether they're working or not. Watch this space, or go outside in the sunshine if you have it...

3.30pm update - still nothing, though the lack of contact I've had from schools (other than those who aren't on the BucksGfL) suggests that they can still access their Moodles.

9.15pm update - il marche! A quick attempt to browse to the Winslow Moodle on my phone reveals that everything's peachy now.

Final outcome: the outage lasted from lunchtime on Sunday until between 7 and 8pm on Monday. The cause of the problem was a failed ATM card in a BT circuit somewhere in London. Schools couldn't access the internet but could still work within the network - i.e. could access their Moodles and send and receive internal emails.

1 comment:

  1. "the lack of contact I've had from schools ... suggests that they can still access their Moodles."

    As you pointed out there's no reason access across WAN connections should be an issue.

    But even if it were, I'm guessing most users within the schools don't bother contacting anyone about anything for the duration of such an outage, as they assume "everything's down" ;)

    I guess it's the poor level of service that schools without a business ADSL line (ya know, one with actual penalty clauses in the contracts, SLAs, and one that big business actually depends on therefore has 99.9% reliability) have to suffer. Which is why schools should go solo if they possibly can... right? Beh, if it's good enough for an investment bank and within my price range then I don't see why not.