[box type=”note” align=”” class=”” width=””]This is a book excerpt from Learning Pentaho Data Integration 8 CE – Third Edition written by María Carina Roldán. From this book, you will learn to explore, transform, and integrate your data across multiple sources.[/box]
Today, we will learn to configure and use Job executor along with capturing the result filenames.
The Job Executor is a PDI step that allows you to execute a Job several times simulating a loop. The executor receives a dataset, and then executes the Job once for each row or a set of rows of the incoming dataset. To understand how this works, we will build a very simple example. The Job that we will execute will have two parameters: a folder and a file. It will create the folder, and then it will create an empty file inside the new folder. Both the name of the folder and the name of the file will be taken from the parameters. The main transformation will execute the Job iteratively for a list of folder and file names.
Let’s start by creating the Job:
5. Double-click the Create file entry. As File name, type ${FOLDER_NAME}/${FILE_NAME}.
6. Save the Job and test it, providing values for the folder and filename. The Job should create a folder with an empty file inside, both with the names that you provide as parameters.
Now create the main Transformation:
6. At the end of the stream, add a Job Executor step. You will find it under the Flow category of steps.
7. Double-click on the Job Executor step.
8. As Job, select the path to the Job created before, for example, ${Internal.Entry.Current.Directory}/create_folder_and_file.kjb
9. Configure the Parameters grid as follows:
11. Run the transformation. The Step Metrics in the Execution Results window reflects what happens:
13. Browse your filesystem. You will find all the folders and files just created.
As you see, PDI executes the Job as many times as the number of rows that arrives to the Job Executor step, once for every row. Each time the Job executes, it receives values for the named parameters, and creates the folder and file using these values.
Just as it happens with the Transformation Executors that you already know, the Job Executors can also be configured with similar settings. This allows you to customize the behavior and the output of the Job to be executed. Let’s summarize the options.
The Job Executor doesn’t cause the Transformation to abort if the Job that it runs has errors. To verify this, run the sample transformation again. As the folders already exist, you expect that each individual execution fails. However, the Job Executor ends without error. In order to capture the errors in the execution of the Job, you have to get the execution results. This is how you do it:
5. With the destination step selected, run a preview. You will see the results that you just defined, as shown in the next example:
for the execution, as shown in the following example:
2017/10/26 23:45:53 – create_folder_and_file – Starting entry
[Create a folder]
2017/10/26 23:45:53 – create_folder_and_file – Starting entry
[Create file]
2017/10/26 23:45:53 – Create file – File
[c:/pentaho/files/folder1/sample_50n9q8oqsg6ib.tmp] created!
2017/10/26 23:45:53 – create_folder_and_file – Finished job entry
[Create file] (result=[true])
2017/10/26 23:45:53 – create_folder_and_file – Finished job entry
[Create a folder] (result=[true])
As you know, jobs don’t work with datasets. Transformations do. However, you can still use the Job Executor to send the rows to the Job. Then, any transformation executed by your Job can get the rows using a Get rows from result step.
By default, the Job Executor executes once for every row in your dataset, but there are several possibilities where you can configure in the Row Grouping tab of the configuration window:
If the Job has named parameters—as in the example that we built—you provide values for them in the Parameters tab of the Job Executor step. For each named parameter, you can assign the value of a field or a fixed-static-value. In case you execute the Job for a group of rows instead of a single one, the parameters will take the values from the first row of data sent to the Job.
At the output of the Job Executor, there is also the possibility to get the result filenames. Let’s modify the Transformation that we created to show an example of this kind of output:
5. Configure it as shown:
…
… – Write to log.0 –
… – Write to log.0 – ————> Linenr 1———————-
——–
… – Write to log.0 – filename =
file:///c:/pentaho/files/folder1/sample_5agh7lj6ncqh7.tmp
… – Write to log.0 –
… – Write to log.0 – ====================
… – Write to log.0 –
… – Write to log.0 – ————> Linenr 2———————-
——–
… – Write to log.0 – filename =
file:///c:/pentaho/files/folder2/sample_6n0rhmrpvj21n.tmp
… – Write to log.0 –
… – Write to log.0 – ====================
… – Write to log.0 –
… – Write to log.0 – ————> Linenr 3———————-
——–
… – Write to log.0 – filename =
file:///c:/pentaho/files/folder3/sample_7ulkja68vf1td.tmp
… – Write to log.0 –
… – Write to log.0 – ====================
…
The example that you just created showed the option with a Job Executor.
We learned how to nest jobs and iterate the execution of jobs. You can know more about executing transformations in an iterative way and launching transformations and jobs from the Command Line from this book Learning Pentaho Data Integration 8 CE – Third Edition.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…