Donal Daly on Big Data Analytics: May 2012

On Tuesday May 29th, Teradata Aster will be hosting a web cast to discuss the Bridging the Gap--SQL and MapReduce for Big Analytics. Expected duration is 60 minutes and will start at 15:00 CET (Paris,Frankfurt) 14:00 UTC (London). You can register for free here.

We had run this seminar earlier in May but at a time which was more convenient for a US audience. The seminar was well attended and we received good feedback from attendees that encouraged us to rerun it again with some minor changes and at a time more convenient for people in Europe.

If you are considering a big data strategy, confused by all the hype that is out there, believe that Map Reduce = Hadoop? or Hive = SQL?, Then this is an ideal event for a business user to get a summary of the key challenges, the sort of solutions that are out there and the novel and innovative approach that Teradata Aster has taken to maximise time to value for companies considering their first Big Data initiatives.

I will be the moderator for the event, and will introduce Rick F. van der Lans, independent analyst and Managing Director of R20/Consultancy, based in the Netherlands. Rick is an independent analyst, consultant, author and lecturer specializing in Data Warehousing, Business Intelligence, Service Oriented Architectures, and Database Technology. He will be followed by Christopher Hillman from Teradata. Chris, is based in the United Kingdom and recently joined us a Principal Data Scientist. We will have time at the end to address questions from attendees.

During the session we will discuss the following topics:

Understanding MapReduce vs SQL, UDF's, and other analytic techniques

How SQL developers and business analysts can become "data scientists"

Fitting MapReduce into your BI/DW technology stack

Making the power of MapReduce available to the larger business community

So come join us on May 29th. It will be an hour of your time well invested. Register for free here.

At the recent Teradata Universe Conference held in Dublin, Duncan Ross, Simona Firmo and I organised a Special Interest Group (SIG) devoted to Big Data Analytics. We were lucky in the high quality speakers and panelists we had as well as the large attendance of delegates on the last day of the conference. I thought I would try to summarise my reflections from the session.

You can find an overview of the SIG here.

We kicked off with a presentation from Duncan, to set the scene for the session - Every journey starts with an idea / Seven ideas for starting your Big Data journey

Duncan chose to give his presentation via Prezi. You will find a copy of his presentation here. This was an interesting departure from the traditional M$ Powerpoint, and intrigued me enough to plan on using it for a future presentation myself. Certainly the transition between bullet points is more dramatic which some people like and others find nauseating in equal measure :-)

So the 7 ideas that Duncan presented were:

Data exhaust
Crowd Sourcing
Location
Gamification
Self knowledge

Quantified Self
Consumer Data Locker

Data Markets
Open Data

I guess you could consider that 8 ideas Duncan! This presentation suitably warmed up the attendees for presentations from Tom Fastner on Do More with your Data: Deep Analytics. It is fantastic to learn about the scale at which eBay operates. The have an EDW on Teradata at 8+ PB, a Teradata system they call Singularity, for semi structured data at 42+ PB and unstructured data in Hadoop at 50+ PB. It was also interesting to see that the concurrent user population ranged from 500+ with the EDW, to 150+ with the singularity system to 5-10 on Hadoop. He also talked about their behaviour data flow and the value of compression to them.

This was followed by an equally interesting presentation by Professor Mark Whitehorn from the University of Dundee who was ably assisted by Chris Hillman a former student, who recently joined Teradata EMEA as a Principal Data Scientist. I found their presentation fascinating on their research work at Dundee on Proteomics: Science, Data Science and Analytics. Chris turned out to be a true geek, admitting to having built his own Hadoop cluster at home. Rest assured I have since converted him to the even more powerful and productive environment of Teradata Aster. I wasn't sure about his Hadoop cluster at home, but he sent me proof...

Their presentation outlined some of the possibilities of using Big Data analytics techniques on Proteomics, that could lead to dramatic improvements in drug discovery and shorten the drug development lifecycle. While this is a highly complicated area, it really outlines an innovative and possible very critical use case for big data analytics. I learnt that Mass Spectrometry generates 7GB of raw data per 4 hours and in excess of 15 TB per year that needs to be analysed and that's from only one machine! Teradata is working with university to bring this research forward. Stay tuned for more updates in this area in the future.

We then moved onto a panel session and our speakers where joined by Navdeep Alam from Mzinga, a Teradata Aster customer, who presented earlier in the week and our very own Dr. Judy Bayer from Teradata. We were hoping for a provocative panel session, so we set the title as: Big Data and Analytics for Greater Competitive advantage

Duncan and I brainstormed some questions in advance to kick off the panel session. The questions we posed were:

Q. What is the one word that sums up Big data for you?

Q. What makes a good Data Scientist?

Curious,
open mind,
good communicator,
creativity,
passion for finding the stories in the data

Q. What is the most important Big Data Analytical Technology and why?

MPP
Fault Tolerance
R
Visualisation
Path Analysis
Ecosystem

Q. If Big Data Fails in 2012 what will be its cause?

Data Silos
Stupidity :-)
Lack of skilled people
Unreasonable expectations?

Q. If you were starting a Big Data project tomorrow (and could choose to do anything)

what would you do?

Study the universe
Proteomics
Natural Language Processing
Projects to benefit society

Did you attend this SIG? If so, what were your impressions?

Donal Daly on Big Data Analytics

Wednesday, May 23, 2012

Upcoming WebCast: Bridging the Gap--SQL and MapReduce for Big Analytics

Friday, May 11, 2012

Big Data Analytics SIG - Teradata Universe Dublin April 2012