(For more resources related to this topic, see here.)

Three ages to workload automation

Enterprise-wide workload automation does not happen overnight. Converting nonstandardized batch processing into a centralized workload automation platform can be time consuming and risky. We need to gain full understanding of the existing running batch jobs before moving them into the new platform, that is, how the jobs are currently scheduled? What are the relationships between these jobs? Which are the higher priority ones? Then based on the amount of jobs and the complexity, we can decide the method of the migration process, which either can be performed automatically or manually. Once jobs are migrated, a series of test needs to be performed before moving into production. Each production cutover is preferably to be transparent to the business' normal operation as much as possible or to be performed within the agreed outage window.

Apart from the technical challenges, "people issues" can be the next road block. First of all, users need to be educated about the tool. It can take a lot of time for users to accept and get used to the new way of operation. The bigger challenge is to let the application developers take in and apply the "centralized workload automation" concept. So during each IT project's development phase, they can utilize built-in features provided by the workload automation platform as much as possible rather than reinventing the wheel.

Forcing users and developers to fully take in the centralized workload automation concept and change their ways of working with batch processing straightaway could lead the project to an ultimate failure. Instead, different areas of approach and actions should be taken in stages according to the actual IT environment condition. We can group these different approaches and actions into three "ages", that is, the stone age, iron age, and golden age.

Stone age

Unless we are building an IT infrastructure from scratch, for any reasonable size organization, they should have a noticeable amount of batch jobs running already to serve different business needs. Such processes are either running by OS/ application's inbuilt batch features or scheduled from homegrown scheduling tools (sometimes they can be manual tasks too). For example, these tasks can be:

End of day (EOD) reporting jobs
ERP application's overnight batch
Housekeeping jobs, for example, database backup and log recycling jobs

Depending on the organization's business requirements, the number of batch jobs required to achieve the outcome can start from a few hundred and go up to tens of thousands across a large number of different job execution hosts.In a heterogeneous environment it is extremely challenging to run cross-platform batch processing by using different tools, especially when the number of tasks is large and the batch window is small. Therefore, these batch processings are the most essential and critical ones to be consolidated into a centralized scheduling platform. On the other hand, these types of processing tasks are the "low hanging fruits" - relatively easy to identify and migrate, simply because they have already been clearly defined and scheduled by existing batch scheduling mechanisms, which means it is more likely that the job scheduling information can be extracted from these sources.

At the end of the day, it all comes down to the question of how to migrate the jobs into a centralized scheduling platform and how are they going to be triggered in the new environment. "How to migrate", as in, how the jobs should be extracted from the existing batch scheduling mechanism and how they should be imported into the new environment. It can be done by using a job migration program, if it is available, or else someone has to manually redefine the jobs from scratch. "How jobs should be triggered", as in, should the job directly trigger the script/command or use scheduling tool's extended features (that is, Control-M Control Modules) for batch processing within a particular application?

The bottom line is – this stage is all about standardizing the way the existing batch jobs are executed and managed by consolidating them into a centralized tool. The migration process should be relatively straightforward and should not require major modification to application codes as well as each application's architecture. However, this will change the way users manage and monitor batch jobs forever. It is the initial step for standardizing batch management and batch optimization, therefore we call it the "stone age".

The successful implementation of "stone age" will benefit the organization without a doubt. After a while, users will realize how easy it is to manage cross-platform batch flows from a centralized GUI. They no longer need to look at different screens to trigger jobs or monitor a job's execution. Instead, batch jobs are triggered automatically and are to be managed by exceptions.

Iron age

A lot of organizations stop improving and stop extending their centralized batch environment once they have completed the stone age. As the business rules are becoming more and more complex, it is common to see silos of batch processing existing in different applications that are related but not linked together, that is, they do not know about other processing taking place and how they relate. Plus on top of that, we have business process steps that are being "patched up" by mechanisms outside the centralized scheduling tool. As a result, batch flows within the centralized scheduling tool are commonly unable to present an end-to-end business process.

One possibility is that these organizations believe that they have already got everything that the centralized scheduling tool is capable of – triggering executables at a fixed time on a predefined day. Rather than someone taking the lead to discover and try other features within the batch scheduling tool, people in different parts of the organization always try to develop their own ways to handle more advanced requirements such as event triggering processing or inter-application file transfers.

In late 2010, I was involved in an EAI development project. During my meeting with some JAVA developers, I noticed they still think batch processing (in Control-M) is all about triggering some program to run at a fixed time and nothing more. Unless they change their views on batch processing and understand what "workload automation" is about, they won't be able to fully utilize the features provided by a workload automation tool for the applications they develop. As a result, after the application goes live, there will be a large amount of processing to be done by inbuilt or self-coded scheduling mechanisms while the other half is running in Control-M.

Iron age is about changing how batch processing is initially designed, that is, improving it by fully utilizing the capabilities of the batch scheduling tool. This requires ongoing education and letting application designers and developers accept and use features that are already available in a centralized scheduling tool rather than reinventing the wheel. This is a lot more challenging than simply extracting and importing batch-processing data from one tool to another during the stone age. Also, the benefits we get from the iron age are not as easy to measure as what we can directly see during the stone age. In the stone age, the users instantly get the benefits of managing batch from a centralized scheduling tool.

In reality, application development teams may rather write their own code to meet the processing requirements so that they can have total control of the application they have developed. Application developers may think "Why should we learn a new tool when we can simply write a few lines of code to achieve event-driven triggering?" or "In the future, if we want to change the way my application works, we might have to log a change request for the scheduling team to modify the batch job in Control-M, whereas having everything done in the code, we will have full control, therefore saving us a lot of hassle."

A certain degree of politics can also be involved. For example, the management of the application development team may think "If half of our work is done by the scheduling tool, where is our value?" or "We want to be more important to the organization's IT in front of the IT directors!" Another scenario with organizations is that they outsource their application development. Instead of building a new system from scratch for each project, the outsourcing companies try to modify what they have already implemented in other organizations for a similar project. In such cases, the outsourcing companies, most of the time, will refuse to do any major modifications to the application just to fit into the centralized scheduling tool. They might believe that by doing so, they can ensure that the project gets delivered with minimal time, cost, and risk. But in reality, the result always turns out the opposite.

In order to avoid falling into one of the categories mentioned above, the person who is going to promote "iron age" within an organization should work on people rather than expecting everything to turn out fine by only focusing on the technology. At the same time, higher-level management in the organization should provide a level of assistance by enforcing an organization-wide scheduling standard so the organization's IT can get the most out of the centralized batch scheduling platform and therefore maximize the business' ROI on it.

The definition of a successfully-implemented iron age is that the organization should see that batch flows are becoming more meaningful at the business service level (that is, presents the complete business process) and is optimized to process within a shorter batch window by using features provided with the batch scheduling tool (for example, percentage of processing is moved into the event triggered batch, which can happen outside the batch window). From an ongoing maintenance point of view, there are less homegrown tools to manage. The total time and effort required for application development may also reduce if the developers are familiar with the batch scheduling tool and know how to leverage the available features properly.

Golden age

Golden age refers to the full implementation of workload automation. It is not as easy to achieve as it sounds, because there are a number of prerequisites that need to be met before the organization even considers it.

First of all, the centralized scheduling platform needs to be upgraded to a more up-todate version that provides the workload automation ability, such as Control-M version 7. Secondly, in order to get the true value from workload automation, the organization needs to have both the stone age and the iron age successfully implemented, that is, jobs in the centralized scheduling tool need to be well defined and presenting the actual business processes. Furthermore, it depends on how far the organization wants to go down this road in order to reach the pinnacle. The IT environment may look at providing the foundation to allow the batch environment to become more dynamic by using resource virtualization and cloud computing technologies.

Once all prerequisites are met, implementing the golden age requires the batch environment designer to work closely with a system architect and application developers. They need to transform the existing bath to become more flexible (moving away from batch jobs' static nature), so the workload automation tool can schedule them according to business policies and route the workload according to runtime load to the best available virtual resource for execution. The batch job should also be designed by following the SOA design principles for reusability and should be loosely coupled.

In the golden age, batch workloads are managed according to the business policies and service agreement. The limited machine resource bottleneck of batch processing is not much of a concern because resources can be acquired whenever needed. In this case, the system can handle a sudden spark of processing requests, while still ensuring the process to complete within its agreed batch window or SLA.