Getting a Kindle to connect to "eduroam"
I'll start this with the following warning;
This is nasty. It might destroy your Kindle permanantly (although I think it's unlikely). It involves fiddling around with some fairly low level stuff. Don't do it. If you don't understand some of the terms I have used in this post, then definitely don't do this.
That said;
You need to install the jailbreak and usbnetwork hacks for the Kindle. They are available from http://www.mobileread.com/forums/showthread.php?t=88004
Then you need to go somewhere where there is no "eduroam" available and create your OWN wireless network called eduroam (see, I told you it's nasty) using a WPA-PSK key. My Android phone came in pretty handy for this as I was able to create a temporary Wifi hotspot using the tether functionality.
Associate your Kindle to the temporary hotspot. Then turn off the hotspot. You will be very unpopular if you run a hotspot called eduroam and my colleagues in the networking team find out
Follow the instructions at http://frakira.fi.muni.cz/~antos/2010/12/15/kindle-and-eduroam/. You can get the CA certificate from http://www.terena.org/activities/tcs/repository/AddTrust_External_CA_Root.pem. Use the following for your wpa_config.sh script;
Again, you need to customise this script. If you don't understand what to change, you shouldn't be doing this.
#!/bin/sh sleep 6 id="`wpa_cli list_networks | grep eduroam | cut -f1 | sed -n '1p'`" exec="`wpa_cli << EOF set_network $id ssid \"eduroam\" set_network $id scan_ssid 1 set_network $id key_mgmt WPA-EAP set_network $id pairwise TKIP set_network $id group TKIP set_network $id eap PEAP set_network $id identity \"username@sheffield.ac.uk\" set_network $id password \"PASSWORD\" set_network $id phase1 \"peaplabel=0\" set_network $id phase2 \"auth=MSCHAPV2\" set_network $id ca_cert \"where-ever-you-put-the-CA-certificate\" enable_network $id quit EOF `" echo $exec #!/usr/bin/perl
Restart your Kindle and hope for the best.
I went on holiday
I went on holiday on my motorbike to Valor in Spain and then back up via Valencia, through Toulouse and Cernay in France, then through Luxembourg to Zeebrugge in Belgium.
Here's some video of me riding over El Puerto de la Ragua in the Sierra Nevada.
In total I did around 2800 miles in 11 days.
I love it when a plan (almost) comes together
Last weekend a number of colleagues and I did some fairly major upheaval of the uSpace setup, most of which should be fairly unnoticeable to our customers with the exception that it will now hopefully be a bit more reliable! There were two main four main components to this work:
- Upgrade the uSpace software from 4.0.14 to 4.5.5. This brings us up to date with the manufacturers current release, which gives us a number of benefits, including the ability to properly cluster the front end web servers (the servers that people actually connect to when you click on uSpace in MUSE or go to http://uspace.shef.ac.uk/).
- Enable clustering across a pair of web servers, rather than the single web server we had before. This means if we need to update or restart the webservers, we should be able to minimise the unavailability of the uSpace service.
- Move the uSpace software from physical to virtual servers running on VMware. This makes it easy for us to scale and manage the service.
- Move the uSpace binary content (eg, Word Documents) out of the database and into a filesystem. Again, this is a technical measure which makes it easier to manage the service.
The upgrade process had been thoroughly tested before it was started, but like the best laid schemes of mice and men, it went a bit wrong whilst we upgrading the live system. This required a slight alteration to the upgrade process which hadn't been planned. It all worked out OK in the end.
There were a couple of User Interface bugs which slipped through QA unnoticed and these were corrected yesterday afternoon and yesterday evening. There was also a problem with some uploaded documents not appearing correclty in uSpace. Unfortunately, these documents will have to be uploaded again, but I believe the numbers are very small.
How to make me look like this…
When something is broken and you need help here are some key things that I need to know:
- What were you doing and when were you doing it? Ideally, a step-by-step list in chronological order of what actions you performed. Include things like commands you typed, URLs used, buttons clicked and widgets twizzled. I almost definitely need any usernames you used, but I will almost never need you password, so please don't send it!
- What were you expecting to happen?
- What actually happened? Make sure you include any error messages. Screenshots are OK if it's a graphical application, but if the error message is just in plain text, just send the text.
- When did it last work correctly? This morning? Last week? Never?
If I get all that information up front then I'll look as happy as this.
Andrew Beresford’s Blog 2010-12-14 16:50:40
When something is broken and you need help here are some key things that I need to know:
- What were you doing and when were you doing it? Ideally, a step-by-step list in chronological order of what actions you performed. Include things like commands you typed, URLs used, buttons clicked and widgets twizzled. I almost definitely need any usernames you used, but I will almost never need you password, so please don't send it!
- What were you expecting to happen?
- What actually happened? Make sure you include any error messages. Screenshots are OK if it's a graphical application, but if the error message is just in plain text, just send the text.
- When did it last work correctly? This morning? Last week? Never?
If I get all that information up front then I'll look as happy as this.
I’ve been away
Now that I've finished doing stuff towards Oxjam Huddersfield, I've had a bit more spare time on my hands so I took a long weekend to go and visit some friends in France and Germany. Two of my friends live in Freiburg on the edge of the Black Forest so it was a great opportunity to take some pictures.
We went for a walk around a research arboretum and I managed to get a couple of good shots. At lower altitudes the leaves were amazing colours but if you went higher up they had already dropped from most of the trees.
We went to Burg Hohenzollern, the ancestral castle of the Hohenzollern family who became German Emperors. My german is non-existant, so my friend, who works as a translator and intepreter was translating for me. The tour guide kept having to ask people to keep quite though, so my friend was giving me the concise version as we moved between rooms in the castle. I don't think the tour guide was particularly annoyed at my friend translating for me me, more by the people who seemed to just be having a casual chat whilst he was trying to give his tour. I can certainly understand that. If it was my job, I wouldn't want to ed up with a sore throat from having to talk over people.
What happened to our e-mail servers, or… when RAID arrays go bad
NB: I started writing this blog post back in September, a few days after the initial problems with the Staff and PGR e-mail service, so if it seems like I'm talking about "last week" when actually I'm referring to September, that's why!
NB: This blog posts deals with the technical work we did to get the e-mail service working again. It doesn't discuss the decision making process that we went through. A number of people were involved in planning the work.
I'm sure everyone knows we've had some problems with e-mail over the past few days. I thought I'd write a technical post of what happened and what we did to fix it.
The initial problem started on one of mail servers called "gazelle" at 21:02 on the 30th August. The first incident is recorded in the mail logs as:
2010-08-30 21:02:05 1OqAXR-0001j0-05 == <USER>@gazelle.shef.ac.uk R=cyrus_user T=cyrus_deliver defer (-44): LMTP error after RCPT TO:<USER@gazelle.shef.ac.uk>: 451 4.3.0 System I/O error
This error basically indicates a problem with the machine's ability to write the incoming e-mail out to it's hard disks. The recipient's name has been changed to protect the innocent, although it happened to be one of our helpdesk staff!
The server also keeps more "low-level" logs which details the specific problem:
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff000000000007d4d6366716900 (ssd8):
Aug 30 21:02:00 gazelle Error for Command: read(10) Error Level: Retryable
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.notice] Requested Block: 483039280 Error Block: 483039280
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.notice] Vendor: SUN Serial Number: 63667169-00
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.notice] Sense Key: Unit Attention
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff000000000007d4d6366716900 (ssd8):
Aug 30 21:02:00 gazelle Error for Command: read(10) Error Level: Fatal
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.notice] Requested Block: 483039280 Error Block: 483039280
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.notice] Vendor: SUN Serial Number: 63667169-00
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.notice] Sense Key: Hardware Error
Aug 30 21:02:00 gazelle scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional sense info), ASCQ: 0x0, FRU: 0x0
The first error is indicated as "Retryable", which means what the server can try to talk to the storage again. The second is indicated as "Fatal", which means that the storage is now unavailable.
So, what happened?
gazelle has a RAID 5 disk array attached to it (if you're not sure what RAID is, the wikipedia RAID article has a good description). This RAID array is the main storage for all staff e-mail and it is split into 4 "Logical Drives" each with 4 "Volumes" on them, making 16 in total.
RAID 5 is designed to allow you to put together a group of hard disks, and if any one of them fails, you can keep going with degraded performance. However, if you lose two or more drives, all the data across those drives is useless. The technical details of this are explained in the wikipedia article about Parity.
In our disk array, each of our Logical Drives is constructed of 5 hard disks. At some point prior to 9:00pm on Monday night, one of the disks in the array failed. When this happens, the array picks a spare disk and reconstructs the data off the failed disk using the parity algorithm. It started doing this automatically. Unfortunately, whilst rebuilding the spare disk, the array marked another disk in the array as failed. This had the effect of taking that entire Logical Drive offline and marking it as dead.
This single failed Logical Drive meant that the 4 volumes on that Logical Drive were now unavailable. That is the equivalent of 25% of the staff e-mail. Normally we would notice such a failure immediatly through our automated monitoring, but because 75% of the e-mail was still working, the monitoring systems didn't notice.
When I arrived in the office, I already knew there must be a problem. My mobile had rung about 10 times on my way in to work and when I got in there were a number of the top brass in our office talking with my colleague.
The problem with the RAID array made itself apparent and it said that two disks had failed. However, after a colleague and I pulled some disks out, restarted the controller and reinserted some disks, it claimed the volume was back in a "CRITICAL" (but repairable) state. At this point we attempted to rebuild the critical volume onto a spare disk. This process takes quite a while, up to 10 hours on this array, so we let it go and decided to keep the mail server unavailable whilst it was repairing. We could have re-enabled access but given the fragile state the array was in we left the service unavailable. With hindsight, this turned out to be a prudent move as at about 3pm, the array failed in a similar way to the way it had previously.
Up until this point, we were confident that we would be able to recover all the data from the failed array, but now the array had failed again, we decided to start the tape recovery. Tape backups are a very useful thing to have in these circumstances, but they suffer from a couple of flaws:
1. They are slow. Tape restore takes a very long time.
2. Restoring from tape means that we were guaranteed to lose data. Initially this sounds like a strange thing to say, but it is because there is a gap between where the last backup was taken and when the array failed. In that time, many people will have moved e-mail into folders, deleted mail, received new mail. Restoring from a previous tape backup means we lose all of that data.
After a bit of scratching of heads, I thought of another way to start recovering data from the failed array. We started doing that by about 5pm on the Tuesday evening and left it going overnight. That process finished at about 4am the next morning and I had set my alarm clock so I could catch it and start a second rebuild of the array. I arrived in the office at about 7 and the process was still running, but as previously, it failed around 11am. By this time we had managed to restore a quarter of the failed volumes user accounts back onto some other storage, so we took the chance to bring the e-mail service back up for users where we had accounts. We also kicked off the restore of the remaining affected accounts from tape.
As a last attempt to recover e-mail from the array, I removed some disks and slotted them back in and restarted the disk controllers. This brought the array back to the critical state. In this state it is possible to access the data, but it is not a good way to run the e-mail service. We knew that there would be some e-mails missing in the copies that we had recovered from tape, but if we could recover the e-mails from the failed storage array then we could compare the two and redeliver the missing emails to those accounts.
The email server manager wrote a tool to work out which e-mails were missing whilst I worked on getting the data off the fragile array. In these circumstances, it is usually a good idea to use the simplest tool possible, so I chose one of the oldest and most reliable tools available - tar. With the array still in the critical state, we used tar to copy the files off the and onto our NetApp filers. Over the weekend, our e-mail administrator wrote a tool to redeliver mails that were missing from the gap between the tape backup and when the array failed and then the following week, we ran it to redeliver the mails.
In the end, I don't think we lost any e-mail, although some were missing from inboxes for a few days whilst we recovered them. It's probably worth mentioning that we are currently going through a process to migrate all our e-mail to Google, so we shouldn't have a problem like this again. We'll have different ones instead!
Proof of Concept – accessing filestore from an iPad and other devices
We had an Apple iPad in the office the other day. I'm not great fan of Apple's mobile devices myself, but the hardware itself is quite elegant, even if I can't work out why I would want one. Anyway, we had a conversation about to how we could access some of the University's resources. One of the things we'd like to be able to do is access our University filestore (home and shared folders) from these types of devices, regardless of whether you are on or off-campus.
Someone recommended the "GoodReader" client for accessing files on the iPad so we downloaded a copy and tried it out. GoodReader supports a few different ways of accessing servers, including WebDAV. WebDAV is an extension to the well known HTTP protocol which allows access to upload and download files. Unfortunately our Novell filestores don't support WebDAV in a way which is supported by most devices (they use the "microsoft" way of doing things which no one else supports), so we can't talk directly to them. However, I discovered a piece of software called Davenport which allows us to translate between SMB (aka CIFS), which the filestores do support, and the WebDAV protocol.
Davenport took a little work to configure. There was a problem with the software that I managed to fix, relating to the way it handles file uploads. It requires the client to send a Content-Length header and not all do. Once that had been sorted, I managed to get it to work.
Enough blab... let me try it!
So, how to get it to work. Right now there are a number of caveats;
- You can only access BAMFORD (I believe only CICS users are on BAMFORD, but I might be wrong!)
- You can only access it on the University network or via VPN (this may be changed, it would be great if we can provide off-campus access to filestore for mobile devices).
I would also like to make it clear that this is a proof of concept I knocked together in very short time. It may delete all your files, crash your device, kill your dog or all of the above. It is not an official CICS service and it could disappear indefinitely at any time. Having said that, I'm happy to receive any feedback, so feel free to comment on this post.
The URL for the WebDAV service is https://webdave.shef.ac.uk/bamford/ (don't try other names... they won't work!). It uses your normal university username and password for authentication.
Also, if any Android users want to try it; there area a couple of Android WebDAV clients out there. I tried PadersyncDAV and it worked OK (although it's not a particularly pleasant piece of software).
Installing the Perl DBD::Oracle on SPARC/Solaris 10
Installing the Perl DBD::Oracle driver on Solaris is traditionally a complete ballache. I hate most things about Oracle software packaging and installation instructions, so I thought I’d get my own instructions going.
So here goes;
Compilers
Start by ensuring you have Solaris 10, patched up to date. First thing we need is a compiler, so take yourself off to http://developers.sun.com/sunstudio/ and get them installed. Once you have done that you should be able to run “versions” and get something like this;
[root@cisapplive /]# version
Machine hardware: sun4u
OS version: 5.10
Processor type: sparc
Hardware: SUNW,SPARC-Enterprise
The following components are installed on your system:
Sun Studio 12 update 1
Sun Studio 12 update 1 C Compiler
Sun Studio 12 update 1 C++ Compiler
Sun Studio 12 update 1 Tools.h++ 7.1
Sun Studio 12 update 1 C++ Standard 64-bit Class Library
Sun Studio 12 update 1 Garbage Collector
Sun Studio 12 update 1 Fortran 95
Sun Studio 12 update 1 Debugging Tools (including dbx)
Sun Studio 12 update 1 IDE
Sun Studio 12 update 1 Performance Analyzer (including collect, ...)
Sun Studio 12 update 1 Performance Library
Sun Studio 12 update 1 Scalapack
Sun Studio 12 update 1 LockLint
Sun Studio 12 update 1 Building Software (including dmake)
Sun Studio 12 update 1 Documentation Set
Sun Studio 12 update 1 /usr symbolic links and GNOME menu item
version of "/opt/sunstudio12.1/bin/../prod/bin/../../bin/cc": Sun C 5.10 SunOS_sparc 2009/06/03
version of "/opt/sunstudio12.1/bin/../prod/bin/../../bin/CC": Sun C++ 5.10 SunOS_sparc 2009/06/03
version of "/opt/sunstudio12.1/bin/../prod/bin/../../bin/f90": Sun Fortran 95 8.4 SunOS_sparc 2009/06/03
version of "/opt/sunstudio12.1/bin/../prod/bin/../../bin/dbx": Sun DBX Debugger 7.7 SunOS_sparc 2009/06/03
version of "/opt/sunstudio12.1/bin/../prod/bin/../../bin/analyzer": Sun Analyzer 7.7 SunOS_sparc 2009/06/03
version of "/opt/sunstudio12.1/bin/../prod/bin/../../bin/dmake": Sun Distributed Make 7.9 SunOS_sparc 2009/06/03
Oracle Instant Client
Get the 32-bit SPARC Instant Client from http://www.oracle.com/technology/software/tech/oci/instantclient/index.html. Get the 32bit version even if you Solaris install is 64-bit (on SPARC, this doesn’t apply on x86-64). Your perl install is 32-bit (don’t believe me? Use file /usr/bin/perl to show you) and you need the 32 bit drivers.
You need the sqlplus, basic and sdk files from the Oracle site.
Extract the zip files into a directory. I put mine in /opt/instantclient. Then ensure your link loader knows where to find the libraries. You can use LD_LIBRARY_PATH although I generally don’t like that method and prefer to use crle instead.
crle -l /lib:/usr/lib:/opt/instantclient
Updating CPAN
The default install of CPAN on Solaris 10 has loads of out-dated modules. Fortunately, it is very easy to update them to current versions. You can start the CPAN shell by running (as root) perl -MCPAN -e shell.
The very first time you use CPAN it will ask you a lot of config questions. One thing I would strongly suggest doing is using gtar instead of the default Solaris tar. The default tar appears to have been written about the same time as Noah built the Ark and as such doesn’t support quite a few of the options modern tar files use (include the CPAN tar file which gets downloaded). Once you have finished answering the questions, you should be presented with a CPAN shell, where you can run;
install Bundle::CPAN
This will update CPAN and all the related modules. Answer “yes” to any questions it gives you and once it has completely finished use reload cpan to get it to refresh itself.
Installing DBD::Oracle
This is the last step. At the CPAN shell, use get DBD::Oracle to download the Oracle DBD interface and then make DBD::Oracle to build it. I do this as separate tests because it is very likely that the make test phase of the install will fail. Assuming that the compile is successful use force install DBD::Oracle to install the Oracle driver. The test-suite is very thorough and goes as far as checking for connectivity to a test database and the ability to perform operations on a test database (hence why I said it is likely to fail). Once the install phase is finished, you are done.










