Category Archives: Flume

Just finished reading: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data

I have just finished reading this book, I was excited about the IBM offering and the concepts around big data at IDUG, but after reading the book I want to find a project I can try this out on. The book can be downloaded from here: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data.

The book is in two parts, Part 1: Big Data from the business prospective and Part 2: Big Data from the technology prospective. The first part of the book as it suggests does not touch on the technical aspects of big data only the benefits to businesses and how we all are already part of the Big Data world. The second part of the book explains at a high level all the different parts of the Hadoop cluster and how you get data in and out and process data in there. The second part also explains the IBM offering into this marketplace in the form of IBM InfoSphere BigInsights and Streams.

The as a high level description first part introduces the concept of the three V’s of big data, Volume, Velocity and Variety, the uses of these V’s in a number of different scenarios all of which are very interesting and I can easily see how it would bring you competitive advantage (probably the point of the case studies). The second part is for the techies explaining what Hadoop is and all of the different parts that make it up with MapReduce, common components and the file system. Also explaining all the other technologies surrounding Big Data such as Hive, Flume and Jaql.

So this is just a very light overview of the book, and well worth a read. I did it on my kindle, sometimes the text varies from page to page as it gets resized but overall it was fine.