Saturday 8 February 2014

Book Review: Pentaho Data Integration Cookbook Second Edition

PACKT Publishing
Writers: Alex Meadows, Adrián Sergio Pulvirenti, María Carina Roldán
Paperback: 462
Link to the book page:
http://www.packtpub.com/pentaho-data-integration-cookbook-second-edition/book
My rating: 5/5
Pentaho Data Integration or also called Kettle is one of the best open source tool for tasks as extraction, transformation and loading data between different systems. It is integrated within the Pentaho BI suite and covers all necessary to develop and maintain a data warehouse / data mart functionality. Beyond the scope of BI, allows us to deal with and transform data in multiple ways.

This book explains simply and with numerous examples how to get the most out of this tool Pentaho. It is mainly aimed at both developers who have basic knowledge of Kettle, and advanced users who want to know the new possibilities it brings a new version of the tool.

Some years ago, about 2011, one first edition was published with a lot of useful recipes:
 http://www.packtpub.com/pentaho-data-integration-4-cookbook/book
As a general recommendation and for all chapters, I would add a new tip called such "advance trick" where reference is made to some more advanced features. It will always be useful for the user to keep in mind if you handle this case in the future.

* Chapter 1: Working with Databases
This chapter mentions the simplest thing we can do with Kettle: read data from a database. We have 15 recipes, from how to establish a simple connection to database with JDBC or JNDI to more advance recipes like how to build dynamic SQL query options.

* Chapter 2: Reading and Writing Files
When we work using PDI, the data source can be very different. In this chapter we have 14 recipes. We will see how to collect and write data in different files .Very interesting recipe that tells how to read data from an instance of Amazon Web Services S3.
Knowledge of regular expressions is essential to get the most out of this chapter.

* Chapter 3: Working with Big Data and Cloud Sources
Currently, when we are implementing a data warehouse, we have new possibilities. Depending on the problem to be modeled, we can make use of NOSQL or cloud services like SalesForce.
In this chapter we have 8 recipes on how to get / load data using such technologies.

For those wanting to delve deeper into the world of Big Data and Pentaho, the following book is recommended
http://www.packtpub.com/pentaho-for-big-data-analytics/book

 * Chapter 4: Manipulating XML Structures
It is very common to find XML files. In this chapter, we have 10 recipes, from reading a single file to validate the contents against a DTD or an XSD schema definition. Generating XML files and reading from an RSS feed generation is also discussed.

* Chapter 5: File Management
This may be the most interesting chapter to IT staff profiles as systems management. We found 9 recipes that tell how to upload / download files, compare the contents of these, create ZIP files or encrypted files.

Also, as some previous chapter, knowledge of regular expressions will be critical to get the most out of this chapter.

* Chapter 6: Looking for Data
In this chapter we find in 8 recipes and based on various criteria how we can get data from a database, files or web services using PDI.

* Chapter 7: Understanding and Optimizing Data Flows
When we are dealing with data streams, it is common that we find the problem to synchronize that. In this chapter we have 12 recipes to synchronize or redirect our data flows.

* Chapter 8: Executing and Re-using Jobs and Transformations
On PDI tool there are 2 ways of structuring our action sequence: jobs and transformations. In this chapter we have 9 recipes to deal with transformations considering parameters and for certain cases.

* Chapter 9: Integrating Kettle and the Pentaho Suite
In my opinion, I find this chapter as the most interesting. It allows you to extract the full potential of Pentaho integrating PDI with the different elements of the suite.

The chapter consists of 6 recipes among which are how to create a report using Pentaho PDI, or how to populate a dashboard created with CDE and using PDI.

In addition to knowledge of PDI, to follow this chapter successfully is required a minimum knowledge of the Pentaho suite, the Design Studio tool and CTools of WebDetails also are recommended

* Chapter 10: Getting the Most Out of Kettle
This chapter contains a variety of recipes that do not fit in any of the other chapters. Specifically, there are 9 recipes, from sending mails with attachments files, processing JSON and one interested recipe for tunning about transformations and jobs.

* Chapter 11: Utilizing Visualization Tools in Kettle
In this chapter there are 4 recipes. One is about adding more functionality to PDI, adding plugins from the MarketPlace . As responsible for maintaining the plugin to extract / load data into CiviCRM using PDI, I can not let pass the opportunity to mention that:
Also, in this chapter there are other interested recipes as data profiling using PDI and DataCleaner, or display data from our business using AgileBI on a quick way.

* Chapter 12: Data Analytics
The final chapter of the book consists of three interesting recipes on how to obtain information from our data. We will see how to read data from the analytical suite SAS, obtain statistics using PDI steps, and creating a set of random data to the WEKA data mining tool.

Thursday 6 February 2014

Book Review: Pentaho 5.0 Reporting By Example Beginner's Guide


PACKT Publishing
Writers: Mariano García Mattío, Dario R. Bernabeu
Paperback: 342 pages
Link to the book page: 
http://www.packtpub.com/pentaho-5-0-reporting-by-example-beginners-guide/book
My Rating: 4/5
Pentaho Report Designer is the best open source tool for creating reports. It is integrated within the Pentaho BI suite and covers all the necessary functions in a reporting tool.
This book explains simply and with several examples how you may create a report following a series of steps, no previous knowledge is required.
There are other more advanced books as
http://www.packtpub.com/pentaho-reporting-3-5-for-java-developers/book
As a general recommendation and for all chapters, I would add a new tip called such "advance trick" where reference is made to some more advanced features. It will always be useful for the user to keep in mind if you handle this case in the future.

* Chapter 1: What is Pentaho Report Designer
This chapter navigates through the history of PRD and the different types of reports that we usually find in a company. The Pentaho BI suite provides some examples of such reports and therefore build it is a simple task.

* Chapter 2: Installation and Configuration
In this chapter would be mentions obtaining web PRD and the steps needed to make it work (java drivers to connect to the database). It is also recommended to allocate more memory to the Java JVM for better performance. The DB to be used for the examples are also indicated.
This chapter should clearly indicate the folders to put the drivers to our DB connection that we will use in PRD. It would be important to note where they go these libraries also located in the Pentaho BI Server.

* Chapter 3: Start PRD and the User Interface (UI) Layout
In this chapter all the elements to create PRD provides reports are briefly presented.
We find in this chapter how is the perfect start, but noted that it would be good what can we do if problems starting the tool. A link to the Pentaho wiki with common problems (problems for assigning permissions in Unix, Java misconfiguration that prevents booting ...) will be good.
* Chapter 4: Instant Gratification- Creating your first report with PRD
This chapter will introduce the reader to creating your first report with PRD, a "Hello World".
The easiest way to do this is by providing the tool wizard. In a few simple steps and zero technical knowledge, a user could build a report. This chapter is not even mentioned that there is this wizard and part of an empty report to start learning. Maybe the best way would be starting the wizard, create a simple report with this and then propose the example introduced in this chapter.

* Chapter 5: Adding a Relational Data Source
This chapter is part of the example created in the previous chapter and some modifications are made. Although in the examples it works with DB MySQL, would be nice to have mentioned other connection parameters for other DBMS like PostgreSQL, for example.
There are many advantages using JNDI connections instead JDBC connection, so would be nice to talk about in this chapter.
This book is mainly focus to beginners so it would be interesting to note that querys can be constructed automatically by selecting the pencil icon. This is very useful for users without SQL knowledge and want to create a report.

* Chapter 6: Adding Groups
This chapter mentions clearly how to group our data for viewing in our report. This is a good way to present our data in different pages and sections of our set report.

* Chapter 7: Adding Parameters
This chapter shows how to give more freedom to the user to create reports using parameters. The possibility to build cascading parameters so that the selection of one affects the other is also explained.
Maybe and based in our experience, it would be interesting to talk about different masks applied to data types. Many problems with parameters in PRD are solved using masks as the data type.

* Chapter 8: Using Formulas in our reports
Chapter interesting and well explained. Interesting the mention made to OpenFormula, for readers who want to delve into creating formulas in reports.

* Chapter 9: Adding charts
Chapter fairly complete and interesting uses the different types of graphics that we can build with PRD.
There is a useful Packt book http://www.packtpub.com/data-visualization-a-successful-design-process/book as an introduction to the different visual options in reports and where applicable each type.
 http://www.packtpub.com/data-visualization-a-successful-design-process/book
* Chapter 10: Adding subreports
The subreports are an essential component in PRD for giving us the opportunity to datasources when building.
As a tip, it would be important to note that you must be careful with the layout of the subreports within our main report. Many times, you have to play with the page size defined to avoid overlap of items.

* Chapter 11: Publishing and running reports in Pentaho BA Server
In this chapter we will learn to take advantage of our reports created with PRD. We’ll find how to publish these reports in Pentaho Server as BA and even run its automatic execution plan.
Also we’ll learn what these other modules that make Pentaho and of course to easily install the BA Pentaho Server.

* Chapter 12: Making a difference- Reports with hyperlinks and sparklines
This chapter is mentioned as giving even more functionality to our reports. Quite simply we can build reports and add hyperlinks between three different types of charts sparkline.
Also we learn how to filter by a certain value depending on where the user clicks. Easy and poweful!

* Chapter 13: Environment variables, stylesheets and crosstabs
Following the line of the previous chapter, to give more functionality to our reports, we have other display elements such as crosstabs.
Also, we can apply styles to our reports using css. For more advanced users, you can use environment variables to folow a certain way according to the value they take in BA Pentaho Server.

* Chapter 14: PRD Reports Embedded in web applications
This chapter is aimed primarily at users with technical knowledge in J2EE.We may work with Pentaho code provide us and customize this to suit our needs.
We’ll learn how to deploy a web server like tomcat, configure and use Eclipse and make a standalone application.

Rating