Hardware Recovery Update


The website has been restarted and we are working on rebuilding the science database so BOINC can restart soon.



Brief history

On March 1st, we suffered a disk failure that prevented communication between our science and BOINC filesystems, and brought down the website and forum as well. Initially, it looked like a RAID controller failure. What should have been a routine fix turned into a lengthier endeavor when we realized the issue was much more severe. It turns out the PCI bus failed, meaning we needed to move all of our disks to an alternate storage system, and rebuild the RAID configuration. Fortunately, Sharcnet was able to locate an identical, older storage system that we could use during the recovery.

The data center was able to put all of our disks into a spare system and the rebuilding process began. While the data integrity was confirmed, we could not boot the system; we needed to fix the system disks to work in the new server.

Website restart

On March 13, finally, we managed to restart the website/forum databases. Initial performance and overall availability/functionality remains limited due to continued storage recovery efforts and backup. While the stats cannot be updated until we fully restart BOINC and download already processed WUs - no work will be lost, and all credit will be given, as we will extend the time for returning results.

We are immensely grateful for the positivity that we received during the process.

We have ARP, SCC, MCM updates in the pipeline - just waiting for the full recovery from our storage failure.

We will be posting updates on the situation using this thread (last update: March 27), where you can also share any questions you may have about the hardware recovery process. Thank you for your support, patience and understanding.

WCG team