You can find an overview of the SIG here.
We kicked off with a presentation from Duncan, to set the scene for the session - Every journey starts with an idea / Seven ideas for starting your Big Data journey
Duncan chose to give his presentation via Prezi. You will find a copy of his presentation here. This was an interesting departure from the traditional M$ Powerpoint, and intrigued me enough to plan on using it for a future presentation myself. Certainly the transition between bullet points is more dramatic which some people like and others find nauseating in equal measure :-)
So the 7 ideas that Duncan presented were:
- Data exhaust
- Crowd Sourcing
- Self knowledge
- Quantified Self
- Consumer Data Locker
- Data Markets
- Open Data
I guess you could consider that 8 ideas Duncan! This presentation suitably warmed up the attendees for presentations from Tom Fastner on Do More with your Data: Deep Analytics. It is fantastic to learn about the scale at which eBay operates. The have an EDW on Teradata at 8+ PB, a Teradata system they call Singularity, for semi structured data at 42+ PB and unstructured data in Hadoop at 50+ PB. It was also interesting to see that the concurrent user population ranged from 500+ with the EDW, to 150+ with the singularity system to 5-10 on Hadoop. He also talked about their behaviour data flow and the value of compression to them.
This was followed by an equally interesting presentation by Professor Mark Whitehorn from the University of Dundee who was ably assisted by Chris Hillman a former student, who recently joined Teradata EMEA as a Principal Data Scientist. I found their presentation fascinating on their research work at Dundee on Proteomics: Science, Data Science and Analytics. Chris turned out to be a true geek, admitting to having built his own Hadoop cluster at home. Rest assured I have since converted him to the even more powerful and productive environment of Teradata Aster. I wasn't sure about his Hadoop cluster at home, but he sent me proof...
Their presentation outlined some of the possibilities of using Big Data analytics techniques on Proteomics, that could lead to dramatic improvements in drug discovery and shorten the drug development lifecycle. While this is a highly complicated area, it really outlines an innovative and possible very critical use case for big data analytics. I learnt that Mass Spectrometry generates 7GB of raw data per 4 hours and in excess of 15 TB per year that needs to be analysed and that's from only one machine! Teradata is working with university to bring this research forward. Stay tuned for more updates in this area in the future.
We then moved onto a panel session and our speakers where joined by Navdeep Alam from Mzinga, a Teradata Aster customer, who presented earlier in the week and our very own Dr. Judy Bayer from Teradata. We were hoping for a provocative panel session, so we set the title as: Big Data and Analytics for Greater Competitive advantage
Duncan and I brainstormed some questions in advance to kick off the panel session. The questions we posed were:
Q. What is the one word that sums up Big data for you?
Q. What makes a good Data Scientist?
- open mind,
- good communicator,
- passion for finding the stories in the data
Q. What is the most important Big Data Analytical Technology and why?
- Fault Tolerance
- Path Analysis
Q. If Big Data Fails in 2012 what will be its cause?
- Data Silos
- Stupidity :-)
- Lack of skilled people
- Unreasonable expectations?
Q. If you were starting a Big Data project tomorrow (and could choose to do anything)
what would you do?
- Study the universe
- Natural Language Processing
- Projects to benefit society
Did you attend this SIG? If so, what were your impressions?