Categories: TutorialsData

Gathering all rejects prior to killing a job

3 min read

(For more resources related to this topic, see here.)

Getting ready

Open the job jo_cook_ch03_0010_validationSubjob. As you can see, the reject flow has been attached and the output is being sent to a temporary store (tHashMap).

How to do it…

  1. Add the tJava, tDie, tHashInput, and tFileOutputDelimited components.
  2. Add onSubjobOk to tJava from the tFileInputDelimited component.
  3. Add a flow from the tHashInput component to the tFileOutputDelimited component.
  4. Right-click the tJava component, select Trigger and then Runif. Link the trigger to the tDie component. Click the if link, and add the following code

    ((Integer)globalMap.get("tFileOutputDelimited_1_NB_LINE")) > 0

  5. Right-click the tJava component, select Trigger, and then Runif. Link this trigger to the tHashInput component.

    ((Integer)globalMap.get("tFileOutputDelimited_1_NB_LINE")) == 0

    The job should now look like the following:

  6. Drag the generic schema sc_cook_ch3_0010_genericCustomer to both the tHashInput and tFileOutputDelimited.
  7. Run the job. You should see that the tDie component is activated, because the file contained two errors.

How it works…

What we have done in this exercise is created a validation stage prior to processing the data.

Valid rows are held in temporary storage (tHashOutput) and invalid rows are written to a reject file until all input rows are processed.

The job then checks to see how many records are rejected (using the RunIf link). In this instance, there are invalid rows, so the RunIf link is triggered, and the job is killed using tDie.

By ensuring that the data is correct before we start to process it into a target, we know that the data will be fit for writing to the target, and thus avoiding the need for rollback procedures.

The records captured can then be sent to the support team, who will then have a record of all incorrect rows. These rows can be fixed in situ within the source file and the job simply re-run from the beginning.

There’s more…

This article is particularly important when rollback/correction of a job may be particularly complex, or where there may be a higher than expected number of errors in an input.

An example would be when there are multiple executions of a job that appends to a target file. If the job fails midway through, then rolling back involves identifying which records were appended to the file by the job before failure, removing them from the file, fixing the offending record, and then re-running. This runs the risk of a second error causing the same thing to happen again.

On the other hand, if the job does not die, but a subsection of the data is rejected, then the rejects must be manipulated into the target file via a second manual execution of the job.

So, this method enables us to be certain that our records will not fail to write due to incorrect data, and therefore saves our target from becoming corrupted.

Summary

This article has shown how the rejects are collected before killing a job. This article also shows how incorrect rejects be manipulated into the target file.

Resources for Article:


Further resources on this subject:


Packt

Share
Published by
Packt

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago