Posted: Thu 27 Oct 2016, 12:49pm.
We're planning to carry out urgent maintenance work on lustre, the scratch file management system used on the University's High Performance Computing cluster. Work will start at 10.00am on Friday 28 October and we hope to restore normal operation by 3.00pm.
We're sorry for the short notice about this work but we need to take action quickly in order to replace one of the disk arrays which has developed a fault. All jobs on the cluster that use lustre will need to be stopped while the maintenance work is carried out and queues will be paused to prevent new jobs from starting.
If you use the High Performance Computing and anticipate this will may problems for running jobs, please contact us as soon as possible.
UPDATE: 28 Oct 2016 - 3:20pm
This afternoon's maintenance work on lustre, the scratch file management system used on the University's High Performance Computing cluster, is progressing but is over-running slightly.
We should be finished by 5.00pm and are sorry for the extra delay.
UPDATE: 28 Oct 2016 - 5.15pm
This afternoon's maintenance work on lustre, the scratch file management system used on the University's High Performance Computing cluster, is taking longer than expected and we haven't been able to get it back up and running in time for the weekend.
The file system is updating itself after the migration of data from the failed array, but it will take longer than planned for this process to complete, partly due to a bug in the version of the file system software we are using.
We're sorry for the inconvenience this will cause over the weekend and we'll make lustre available as soon as we can next week.
All data on the failed array was successfully migrated to other disks but maintenance work took longer than planned due to unexpected problems restoring the affected server after the faulty parts were replaced.