Home Data Looking at the different types of Lookup cache

Looking at the different types of Lookup cache

November 20, 2017 - 12:00 am

8323

6 min read

[box type=”note” align=”” class=”” width=””]The following is an excerpt from a book by Rahul Malewar titled Learning Informatica PowerCenter 10.x. We walk through the various types of lookup cache based on how a cache is defined in this article.[/box]

Cache is the temporary memory that is created when you execute a process. It is created automatically when a process starts and is deleted automatically once the process is complete. The amount of cache memory is decided based on the property you define at the transformation level or session level. You usually set the property as default, so as required, it can increase the size of the cache. If the size required for caching the data is more than the cache size defined, the process fails with the overflow error. There are different types of caches available.

Building the Lookup Cache – Sequential or Concurrent

You can define the session property to create the cache either sequentially or concurrently.

Sequential cache

When you select to create the cache sequentially, Integration Service caches the data in a row-wise manner as the records enter the lookup transformation. When the first record enters the lookup transformation, lookup cache gets created and stores the matching record from the lookup table or file in the cache. This way, the cache stores only the matching data. It helps in saving the cache space by not storing unnecessary data.

Concurrent cache

When you select to create cache concurrently, Integration service does not wait for the data to flow from the source; it first caches complete data. Once the caching is complete, it allows the data to flow from the source. When you select a concurrent cache, the performance enhances as compared to sequential cache since the scanning happens internally using the data stored in the cache.

Persistent cache – the permanent one

You can configure the cache to permanently save the data. By default, the cache is created as non-persistent, that is, the cache will be deleted once the session run is complete. If the lookup table or file does not change across the session runs, you can use the existing persistent cache.

Suppose you have a process that is scheduled to run every day and you are using lookup transformation to lookup on the reference table that which is not supposed to change for six months. When you use non-persistent cache every day, the same data will be stored in the cache; this will waste time and space every day. If you select to create a persistent cache, the integration service makes the cache permanent in the form of a file in the $PMCacheDir location. So, you save the time every day, creating and deleting the cache memory.

When the data in the lookup table changes, you need to rebuild the cache. You can define the condition in the session task to rebuild the cache by overwriting the existing cache. To rebuild the cache, you need to check the rebuild option on the session property.

Sharing the cache – named or unnamed

You can enhance the performance and save the cache memory by sharing the cache if there are multiple lookup transformations used in a mapping. If you have the same structure for both the lookup transformations, sharing the cache will help in enhancing the performance by creating the cache only once. This way, we avoid creating the cache multiple times, which in turn, enhances the performance. You can share the cache–either named or unnamed

Sharing unnamed cache

If you have multiple lookup transformations used in a single mapping, you can share the unnamed cache. Since the lookup transformations are present in the same mapping, naming the cache is not mandatory. Integration service creates the cache while processing the first record in first lookup transformation and shares the cache with other lookups in the mapping.

Sharing named cache

You can share the named cache with multiple lookup transformations in the same mapping or in another mapping. Since the cache is named, you can assign the same cache using the name in the other mapping.

When you process the first mapping with lookup transformation, it saves the cache in the defined cache directory and with a defined cache file name. When you process the second mapping, it searches for the same location and cache file and uses the data. If the Integration service does not find the mentioned cache file, it creates the new cache.

If you run multiple sessions simultaneously that use the same cache file, Integration service processes both the sessions successfully only if the lookup transformation is configured for read-only from the cache. If there is a scenario when both lookup transformations are trying to update the cache file or a scenario where one lookup is trying to read the cache file and other is trying to update the cache, the session will fail as there is conflict in the processing.

Sharing the cache helps in enhancing the performance by utilizing the cache created. This way we save the processing time and repository space by not storing the same data multiple times for lookup transformations.

Modifying cache – static or dynamic

When you create a cache, you can configure them to be static or dynamic.

Static cache

A cache is said to be static if it does not change with the changes happening in the lookup table. The static cache is not synchronized with the lookup table.

By default, Integration service creates a static cache. The Lookup cache is created as soon as the first record enters the lookup transformation. Integration service does not update the cache while it is processing the data.

Dynamic cache

A cache is said to be dynamic if it changes with the changes happening in the lookup table. The static cache is synchronized with the lookup table.

You can choose from the lookup transformation properties to make the cache dynamic. Lookup cache is created as soon as the first record enters the lookup transformation. Integration service keeps on updating the cache while it is processing the data. The Integration service marks the record as an insert for the new row inserted in the dynamic cache. For the record that is updated, it marks the record as an update in the cache. For every record that doesn’t change, the Integration service marks it as unchanged.

You use the dynamic cache while you process the slowly changing dimension tables. For every record inserted in the target, the record will be inserted in the cache. For every record updated in the target, the record will be updated in the cache. A similar process happens for the deleted and rejected records.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Looking at the different types of Lookup cache

Building the Lookup Cache – Sequential or Concurrent

Sequential cache

Concurrent cache

Persistent cache – the permanent one

Sharing the cache – named or unnamed

Sharing unnamed cache

Sharing named cache

Modifying cache – static or dynamic

Static cache

Dynamic cache

LEAVE A REPLY Cancel reply

Must Read in Cloud & Networking

Top life hacks for prepping for your IT certification exam

Learning Essential Linux Commands for Navigating the Shell Effectively

ServiceNow Partners with IBM on AIOps from DevOps.com

Must Read in Data

Learn Transformers for Natural Language Processing with Denis Rothman

Scientific Analysis of Donald Trump’s Tweets on COVID-19 with Transformers

Distributed training in TensorFlow 2.x

Interviews

Learn Transformers for Natural Language Processing with Denis Rothman

Clean Coding in Python with Mariano Anaya

Bringing AI to the B2B world: Catching up with Sidetrade CTO Mark Sheldon [Interview]

On Adobe InDesign 2020, graphic designing industry direction and more: Iman Ahmed, an Adobe Certified Partner and Instructor [Interview]

Is DevOps experiencing an identity crisis? [Interview]

MobilePro

datapro

Programming

Subscribe to our newsletter