Wednesday, October 24, 2012

My perspective on the Teradata Aster Big Analytics Appliance

Aster Big Analytics Appliance 3H

By now, no doubt you have heard the announcement of our new Teradata Aster Big Analytics Appliance. In the interests of full disclosure, I work for Teradata within the Aster CoE in Europe. Prior to joining Teradata,  I was responsible for a large complex Aster environment which was built on commodity servers in excess of 30 TB of usable data with a 24 x 7 style operational environment. So my perspective in this post is from that standpoint and also recalling the time when we went under a hardware refresh and software upgrade.

OK, First of all you procure 30 servers and at least two network switches (for redundancy). When you receive them, it up to your data centre team to rack them and cable them up. Next,  check the firmware on each system is the same, surprise, surprise they aren't, so a round of upgrades later, then you configure the raid controllers. In this case we went for Raid 0 which maximises space, more on that choice later...

Then it is over to the network team, to configure the switches and the VLAN we are going to use. We then put on a basic Linux image on the servers so we can carry out some burn in tests, to make sure all the servers have a similar performance profile. Useful tests, as we found two servers whose raid controllers were not configured correctly. It was the result of human error, I guess manually doing 30 servers can get boring. This burns through a week, before we can install Aster, configure the cluster and bring all the nodes online to start the data migration process. Agreed, this is a one off cost, but in this environment, we are responsible for all hardware issues, network issues, Linux issues, with the Vendor just supporting Aster. Many customers never count that cost or possible outages that might be avoided because of these one off configurations.

We had some basic system management as these are commodity servers but nothing as sophisticated as Teradata Server Management and the Teradata Vital Infrastructure. I like that with the Server management software it allows me to manage 3 clusters within the Rack logically (e.g. Test, Production, Backup). I also like the proactive monitoring, as it is likely they will identify issues prior to them becoming an outage for us, or an issue found with one customer can be checked against all customers. If you build and manage your own environment, you don't get that benefit.

Your next consideration should be when looking at an appliance, is it leading edge and how much thought has gone into the configuration? The Appliance 3H is a major step forward from Appliance 2. From a CPU perspective, it has the very latest processors from Intel, dual 8 core Sandy Bridge @ 2.6GHz. Memory has increased to 256GB. Connectivity between nodes is now provided by Infiniband at 40Gb/s. Disk drives are the newer 2.5 size, enabling more capacity per node. Worker nodes using 900GB, while backup and the Hadoop nodes leveraging larger 3TB drives. RAID 5 for Aster and RAID 6 for the backup and Hadoop nodes. Also with the larger cabinet size, enables better data centre utilisation with the higher density that is possible.

I also like the idea of providing integrated Backup nodes as well, previously Aster just had the parallel backup software only, you had to procure your own hardware and manage it. We also know that all of these components have been tested together, so I am benefiting from their extensive testing, rather than building and testing reference configurations myself.

What this tells me, is that Teradata, can brings advances in hardware quickly to the marketplace. Infiniband will make make an important difference. For example, for very large dimension tables, that I have decided against replicating, joins will run much faster. Also I expect positive impact on Backups. In my previous environment, it took us about 8 hours for a full backup of 30 TB or so. Certainly the parallel nature of their backup software could soak-up all the bandwidth on a 10GB connection, so we had to throttle it back.  On the RAID choices, I absolutely concur with the RAID 5 choice. If I was building my own 30 node cluster again I wouldn't have it any other way. While the replication capabilities in Aster protects me against at least any single node failure, a disk failure, will bring that node out of the cluster, until the disk is replaced and the node is rebuilt and brought back online. When you have 30+ servers each with 8 drives (240+ disk drives) the most common failure will be the disk drive. With RAID 5, you can replace the drive, without any impact on the cluster at all, and you still have the replication capabilities to protect yourself from multiple failures.

I also like the option of being able to have an Hadoop cluster tightly integrated as part of my configuration. For example if I have to store a lot of 'grey data' e.g. log/audit files etc for compliance reasons,  I can leverage a lower cost of storage and still do batch transformations and analysis as required. Bring a working set of data (last year for example) for deeper analytics. With the transparent capabilities of SQL-H, I can extend those analytics into my Hadoop environment as required.

Of course purchasing an appliance, is a more expensive route than procuring it, building and configuring it all yourself. However, most enterprise are not hobbyists, and building this sort of infrastructure, is not their core competence nor is is bringing value to their business. They should be focused on the Time to Value and with the Teradata Aster Big Analytics appliance the time to value will be quick, as everything, is prebuilt, configured, tested and ready to go, to accept data and start performing analytics on it.  As I talk to customers across Europe this message is being well received when you talk through the details.

I'll leave you with this thought, one aspect of big data that I don't hear enough of is Value. To me, the dimension of Volume, Variety, Velocity and Complexity are not very interesting if you are not providing value by means of actionable insights. I believe every enterprise customer needs a discovery platform capable of executing the analytics that can provide them an important competitive edge over their competition. This platform should have the capability to handle structured as well as multi-structured data. It should provide a choice of analytics, whether they be SQL based, MapReduce or statistical functions. It should provide a host of prebuilt functions to enable rapid progress. It should be a platform that can appeal to power users in the business, by having a SQL interface and that will work with their existing visualisation tools to our most sophisticated data scientists, by providing them a rich environment to develop their own custom functions as necessary, while enabling them to benefit both from the power of SQL and Map Reduce to build out these new capabilities. In summary that is why I am so excited to be talking with customers and prospects about Teradata Aster Big Analytics Appliance.

Thursday, October 11, 2012

Aster integration into the Teradata Analytical ecosystem continues at pace…

Not long after Teradata acquired Aster in April last year we outlined a roadmap as to how Aster would integrate into the Teradata Analytical ecosystem.

Aster-Teradata Adapter

Clearly, the first priority was to delivera high speed interconnect between Aster And Teradata. The Aster-Teradata adapter is based on the Teradata Parallel Transporter API and provides ahigh-speed link to transfer data between the two platforms. It allows parallel data transfers between Aster and Teradata, with each Aster Worker connecting toa Teradata AMP. This connector is part of the Aster SQL-MR library, with all data transfers initiated through AsterSQL.
The Aster-Teradata adapter offers fast and efficient data access. Users can build views in the Aster Database on tables stored in Teradata. Aster Database users can access and perform joins on Teradata-stored data as if it were stored in the Aster Database. Data scientists can run Aster’s native analytic modules,such as nPath pattern matching, to explore data in the Teradata Integrated DataWarehouse. Users now have the capability to Investigate & Discover in Teradata Aster, then Integrate & Operationalize in the Data Warehouse.

This example below shows the load_from_teradata connector being called from within an Aster SQL query:

SELECT userid, age, sessionid,pageid

FROM nPath(

ON (

select * fromclicks,


on mr_driver tdpid(‘EDW')

credentials ('tduser’)

query(‘SELECT userid, age, gender, income FROM td_user_tbl;')

) T

where clicks.userid =T.userid )

PARTITION BY userid, sessionid

RESULT ( FIRST(age of A) as age, … )

This example shows the load_to_teradata component, with analytic processing on Aster withresults being sent to a Teradata target table:


FROM load_to_teradata(

ON ( aster_target_customers )

tdpid (‘dbc’)





Viewpoint Integration

We have just announced Aster integration with Teradata Viewpoint, with release of 14.01 of Viewpoint and Aster 5.0. Viewpoint's single operational view (SOV) monitoring has been extended to include support for Teradata Aster. Teradata wanted to leverage as many of the existing portlets as possible for easier navigation, familiarity, and support for theViewpoint SOV strategy. So the following existing portlets were extended to include support for Aster:

System Health

Query Monitor

Capacity Heatmap

Metrics Graph

Metrics Analysis

Space Usage

Admin - Teradata Systems

However not all the needs of Teradata Aster's differing architecture made sense to put into an existing portlet. Therefore there are two new Aster specific portlets in this release.

Aster Completed Processes

Aster Node Monitor

Some screenshots:

Check out the complete article on Viewpoint 14.01 release at the Teradata Developer Exchange here.
Stay tuned for some very exciting news coming next week...

Friday, October 05, 2012

Don't be left in the dark... Exciting Teradata Aster Announcement coming soon

We’re gearing up for a big Teradata Aster announcement. Mark your calendars to get the scoop on  Oct 17th. We will be hosting a live Webinar to announce it. Register here for it

Believe me, it is going to be amazing...

Wednesday, October 03, 2012

The Big Data Analytics Landscape

As a technologist and evangelist working in the big data marketplace it is certainly exciting. I am excited by the new products we are bringing to market and how this new functionality really helps to bridge the gap for Enterprises adoption. It is also surreal, in terms of the number of blog posts, tweets on Big Data and there seems to be a new big data conference cropping up on a weekly basis across Europe :-)

It is interesting to monitor other vendors in the marketplace and how they position their offerings. There is certainly a lot of clever marketing going on (that I believe in time will show a lack of substance) and some innovation too, . You know who you are.... But jump aboard the bandwagon. Just because you might have Hadoop and your database + Analytics within the same rack that doesn't mean they are integrated.

It is also interesting the fervor that people bring when discussing open source products. Those people who know me, know that I am a long time UNIX guru over 20 years from the early BSD distributions to working with UNIX on mainframes, to even getting minix to work on PC's before Linux came along. Fun times, but I was young free and single and enjoyed the technical challenge. I was also working in a research department in a university. However the argument that Hadoop is free and easy to implement and will one day replace data warehousing, doesn't ring true for me. Certainly it is true is has a place, and does provide value, but it doesn't come at no cost. Certainly Hortonworks and Cloudera provide distributions that are reducing the installation/configuration and management effort, but you have multiple distributions, starting to go in different directions? MapR for example?

How many enterprises really want to get that involved in running and maintaining this infrastructure. Surely they should be focused on identify new insights that provides business benefits or gives greater competitive advantage. IT has an important role to play, but it will be the business users ultimately that need to leverage the platform to gain these insights.

It is no use getting insights, if you don't take action on them either.

Insight gained from big data analytics should be fed into existing EDW (if they exist) so they can enhance what you already have and the EDW provides you with a better means  of operationalizing the results.

I say to those people who think Hive is a replacement for SQL, not yet it ain't, it doesn't provide the completeness or performance that a pure SQL engine can provide. You don't replace 30+ years of R&D that quickly...

To the NoSQL folks, this debate is taking on religious fervour at times, It has a role, but I don't see it replacing the relational database overnight either.

In a previous role I managed a complex DB Environment that included a Big Data platform for a company that operated in the online gaming marketplace in a very much 24 X 7 environment, with limited downtime. It was the bleeding edge at times, growing very fast.  If we had Teradata Aster 5.0 then, my life would have been so much easier. Se had an earlier release but we learned a lot. We proved the value of SQL combined with the Map Reduce programming paradigm. We saw the ease of scaling and reliability, We delivered important insights into various types of fraud, and took action on them, which yielded positive kudos for the company and increased player trust, which is very important in an online marketplace. We also were able to leverage the platform for an novel ODS requirement and had both executing simultaneously along with various ad-hoc queries. I was also lucky then and since to meet real visionaries, like Mayank and Tasso which gives you confidence in the approach and the future direction

When you think of big data analytics, it just not just about multi structure data or new data sources. Using SQL/MR for example may be the most performant way to yield new insights from existing relational data. Also consider what 'grey data' already exists within your organisations, it maybe easier to tap into that first, before sourcing new data feeds. The potential business value should drive that decision though.

Do not under estimate the important of having a discovery platform as you tackle these new Big Data Challenges. Yes, you will probably need new people or even better, train existing analysts to take on these new skills and grow your own data scientists. The ease of this approach, will be in how feature rich your discovery platform is, How many built in and useful analytical functions are provided to get you started, before you may have to develop specific ones of your own.

I suppose,  some would say I am rambling with these comments and not expressing them very elegantly, but help is at hand :-). We recently put together a short webinar, I think it is about 20 minutes duration. 

The Big Data Analytics Landscape: Trends, Innovations and New Business Value, featuring Gartner Research Vice President Merv Adrian and Teradata Aster Co-President Tasso Argyros.  In the video, Merv and Tasso, answer these questions and more, including how organizations can find the right solution - to make smarter decisions, take calculated risks, and gain deeper insights than their industry peers.

  • How do you cost-effectively harness and analyze new big data sources?
  • How does the role of a data scientist differ from other analytic professionals?  
  • What skills does the data scientist need?
  • What are the differences between Hadoop, MapReduce, and a Data Discovery Platform?
  • How are these new sources of big data and analytic techniques and technology helping organizations find new truths and business opportunities?
I suggest if you have the time to spare... watch the video

What do you think?

For me it is all about the analytics and the new insights that can be gained and acted upon