UC leads cloud computing use for big data analytics
UC leads cloud computing use for big data analytics
The University of Canterbury is the first university in Australasia to develop cutting edge training through unique access to cloud infrastructure to solve big-data analysis problems and give staff and students free access to this cloud infrastructure.
UC Senior Lecturer from the School of Mathematics and Statistics Dr Raazesh Sainudiin secured grants from Databricks Academic Partners Program and Amazon Web Services Educate which enable free and ongoing access for all UC faculty, staff and students to use their enormous cloud-computing infrastructure for academic teaching and research.
This provides UC with huge potential to emerge as a leader in big data analytics in this region of the globe, says Dr Sainudiin, who is part of UC’s Big Data Working Group. He is giving a presentation about UC’s capabilities in industrial research and big data analytics to members of the local tech industry on 3 May.
“In today's digital world, data about every conceivable aspect of life is being collected and amassed at an unprecedented scale. To give you some idea of how much data we are talking about, IBM estimated that a whopping 2.5 exabytes (2,500,000,000,000,000,000 bytes) of data was generated every single day, and that was back in 2012. This massive data could potentially hold answers for many critical questions and problems facing our world today. But to be able to get at these important answers, the first step is to be able to explore and analyse this gargantuan volume of data in a meaningful way,” he says.
“Cloud computing allows you to instantly scale up access to over 10,000 off-site computers, as required by the scale of the real-world big data problem at hand, and complete the data analyses in the least amount of time needed - usually a matter of hours.
“What if all past and present recorded and real-time data of earthquakes on the planet could be analysed simultaneously? Or consider the live analysis of every tweet on Earth. There are on average 60 tweets per second. The scale of such volumes of data is such that they can't be stored, let alone analysed, by one computer or even a 100 computers in any sort of reasonable timeframe.”
UC has already established a research cluster (at www.math.canterbury.ac.nz/databricks-projects) with thousands of computer nodes running Apache Spark, a lightning-fast cluster computing engine for large-scale data processing. This locally set-up resource taps into the infrastructure provided by these grants and is being used by UC students in a new course STAT478: Special Topics in Scalable Data Science, including several students who are full-time employees in the local tech industry.
Students are trained to run their own big-data projects as part of their course requirements. This cutting-edge training using cloud infrastructure to solve big-data problems will generate globally competitive graduates for the data industry, with key skills in top paying technologies listed in the 2016 Developer Survey, Dr Sainudiin says.
With a curriculum created in consultation with the tech industry, the innovative course has been praised by Wynyard Group’s Chief Technical Officer Roger Jarquin.
“We hope that such industry-academia collaborations will continue to be a dynamic training ground for future employees in our growing data industry,” says Jarquin, also an Adjunct Fellow of UC's School of Mathematics and Statistics.
Professor James Smithies, Director of King's Digital Lab, Department of Digital Humanities, King's College London, and former Senior Lecturer in History at UC, says the course in Scalable Data Science is an excellent resource for the digital humanities, and sits very nicely beside activities occurring at King’s Digital Lab (KDL).
“The combination of AWS and Databricks is broadly in line with what we think digital humanities students and researchers will need, and benefits from excellent levels of usability and scalability. This kind of approach is of crucial importance to the future of digital humanities, as researchers move into big data analysis and we seek to provide our students with the tools and experiences they need to succeed in their careers both inside and outside university,” Prof Smithies says.
UC Associate Professor Rick Beatson, recipient of the 2015 UC Innovation Medal, and Dr Sainudiin are making a technical presentation about UC’s capabilities in industrial research and big data analytics to Canterbury Tech (formerly Canterbury Software Cluster), a non-profit organisation of local tech insiders and entrepreneurs this month (http://canterburytech.nz/events/May-2016-event/). This technical presentation, held at UC's Centre for Entrepreneurship, is an industrial outreach activity by UC's Big Data Working Group.
Dr Sainudiin is keen to introduce this freely available on-demand scalable computing resource to interested postgraduate students, faculty and staff across the University.
“While this Canterbury Tech outreach event is targeted at local industry, it is important to bring more awareness of the grant's infrastructure, potential benefits and utility to the UC community.”
Dr Sainudiin completed his PhD at Cornell University and a postdoctoral research fellowship at Oxford before joining UC in 2007.