10 min read

Content persistence

We have seen so far how metadata is persisted but it is not obvious how content is persisted and associated with its metadata. All sysobjects (objects of type dm_sysobject and its subtypes) other than folders (objects of type dm_folder and its subtypes) can have associated content. We saw that a document can have content in the form of renditions as well as in primary format. How are these content files associated with a sysobject? In other words, how does Content Server know what metadata is associated with a content fi le? How does it know that one content fi le is a rendition of another one? Content Server manages content files using content objects, which (indirectly) point to the physical locations of content files and associate them with sysobjects.

Locating content files

Recall that Documentum repositories can store content in various types of storage systems including a file system, a Relational Database Management System (RDBMS), a content-addressed storage (CAS), or external storage devices. Content Server decides to store each file in a location based on the configuration and the presence of products like Content Storage Services. In general, users are not concerned about where the file is stored since Content Server is able to retrieve the file from the location where it was stored. We will discuss the physical location of a content file without worrying about why Content Server chose to use that location.

Content object

Every content file in the repository has an associated content object, which stores information about the location of the fi le and identifi es the sysobjects associated with it. These sysobjects are referred to as the parent objects of the content object.

A content object is an object of type dmr_content, whose key attributes are listed as follows:

Attribute

Description

parent_count

Number of parent objects

parent_id

List of object IDs of the parent objects

storage_id

Object ID of the store object representing the storage area holding the content.

data_ticket

A value used internally to retrieve the content. The value and its usage depend upon the type of storage used.

i_contents

When the content is stored in turbo storage, this property contains the actual content. If the content is larger than the size of this property (2000 characters for databases other than Sybase, 255 for Sybase), the content is stored in a dmi_subcontent object and this property is unused.

If the content is stored in content addressed storage, it contains the content address.

If the content is stored in external storage, it contains the token used to retrieve the content.

rendition

Identifies if it’s a rendition and its related behavior

0 means original content

1 means rendition generated by server

2 means rendition generated by client

3 means rendition not to be removed when its primary content

is updated or removed

format

Object ID of the format object representing the format of the content

full_content_size

Content file size in bytes, except when the content is stored in external storage

Object-content relationship

Content Server manages content objects while performing content-related operations. Content associated with a sysobject is categorized as primary content or a rendition. A rendition is a content fi le associated with a sysobject that is not its primary content.

Content in the first content file added to a sysobject is called its primary content and its format is referred to as the primary format for the parent object. Any other content added to the parent object in the same format is also called primary content, though it is rarely done by users manually. This ability to add multiple primary content files is typically utilized programmatically by applications for their internal use.

While a sysobject can have multiple primary content files it is also possible for one content object to have multiple parent objects. This just means that a content file can be shared by multiple objects.

Putting it together

The details about content persistence can become confusing due to the number of objects involved and the relationships among various attributes. It becomes even more complicated when the full Content Server capabilities (such as multiple content files for one sysobject) are manifested. We will look at a simple scenario to visually grasp how content persistence works in common situations.

Documentum provides multiple options for locating the content file. DFC provides the getPath() method and DQL provides get_file_url administration method for this purpose. This section has been included to satisfy the reader’s curiosity about content persistence and works through the information manually. This discussion can be treated as supplementary to technical fundamentals..

The sysobject is named paystub.jpg. The primary content file is in jpg format and the rendition is in pdf format, as shown in the following figure:

The following figure shows the objects involved in the content persistence for this document. The central object is of type dm_document. The figure also includes two content objects and one format object. Let’s try to understand the relationships by asking specific questions.

How many content files, primary or renditions, are there for the document paystub.jpg? This question can be answered by looking for the corresponding content objects. We look for dmr_content objects that have the document’s object ID in one of their parent_id values. This figure shows that there are two such content objects.

Which of these content objects represents the primary content and which one is a rendition? This can be determined by looking at the rendition attribute. The content object on the left shows rendition=0, which indicates primary content. The content object on the right shows rendition=2, which indicates rendition generated by client (recall that we manually imported this rendition).

What is the primary format for this document? This is easy to answer by looking at the a_content_type attribute on the document itself. If we need to know the format for a content object we can look for the dm_format object which has the same object ID as the value present in the format property of the content object. In the fi gure above, the format object for the primary content object is shown which represents a JPEG image. Thus, the format determined for the primary content of the object is expected to match the value of a_content_type property of the object. The format object for the rendition is not shown but it would be PDF.

What is the exact physical location of the primary content file? As mentioned in the beginning of this section, there are DFC and DQL methods which can provide this information. For understanding content persistence, we will deduce this manually for a file store, which represents storage on a file system. For other types of storage, an exact location might not be evident since we need to rely on the storage interface to access the content file. Deducing the exact file path requires the ability to convert a decimal number to a hexadecimal (hex) number; this can be done with pen and paper or using one of the free tools available on the Web. Also remember that negative numbers are represented with what is known as a 2’s-complement notation and many of these tools either don’t handle 2’s complement or don’t support enough digits for our purposes.

There are two parts of the file path—the root path for the file store and the path of the file relative to this root path. In order to fi gure out the root path, we identify the fi le store first. Find the dm_filestore object whose object ID is the same as the value in storage_id property of the content object. Then find the dm_location object whose object name is the same as the root property on the file store object. The file_ system_path property on this location object has the root path for the fi le store, which is C:Documentumdatalocaldevcontent_storage_01 in the figure above.

In order to find the relative path of the content fi le, we look at data_ticket (data type integer) on the content object. Find the 8-digit hex representation for this number. Treat the hex number as a string and split the string with path separators (slashes, / or depending on the operating system) after every two characters. Suffi x the right-most two characters with the file extension (.jpg), which can be inferred from the format associated with the content object. Prefix the path with an 8-digit hex representation of the repository ID. This gives us the relative path of the content file, which is 000000108009be.jpg in the figure above. Prefix this path with the file store root path identified earlier to get the full path of the content file.

Content persistence in Documentum appears to be complicated at first sight. There are a number of separate objects involved here and that is somewhat similar to having several tables in a relational database when we normalize the schema. At a high level, this complexity in the content persistence model serves to provide scalability, flexibility by supporting multiple kinds of content stores, and ease of managing changes in such an environment.

Lightweight and shareable object types

So far we have primarily dealt with standard types. Lightweight and shareable object types work together to provide performance improvements, which are significant when a large number of lightweight objects share information. The key performance benefits are in terms of savings in storage and in the time it takes to import a large number of documents that share metadata. These types are suitable for use in transactional and archival applications but are not recommended for traditional content management.

The term transactional content (as in business transactions) was coined by Forrester Research to describe content typically originating from external parties, such as customers and partners, and driving transactional back-office business processes. Transactional Content Management (TCM) unifi es process, content, and compliance to support solutions involving transactional content. Our example scenario of mortgage loan approval process management is a perfect example of TCM. It involves numerous types of documents, several external parties, and sub-processes implementing parts of the overall process. Lightweight and shareable types play a central role in the High Volume Server, which enhances the performance of Content Server for TCM.

A lightweight object type (also known as LwSO for Lightweight SysObject ) is a subtype of a shareable type. When a lightweight object is created, it references an object of its shareable supertype called the parent object of the lightweight object. Conversely, the lightweight object is called the child object of the shareable object. Additional lightweight objects of the same type can share the same parent object. These lightweight objects share the information present in the common parent object rather than each carrying a copy of that information.

In order to make the best use of lightweight objects we need to address a couple of questions. When should we use lightweight objects? Lightweight objects are useful when there are a large number of attribute values that are identical for a group of objects. This redundant information can be pushed into one parent object and shared by the lightweight objects.

What kind of information is suitable for sharing in the parent object? System-managed metadata, such as policies for security, retention, storage, and so on, are usually applied to a group of objects based on certain criteria. For example, all the documents in one loan application packet could use a single ACL and retention information, which could be placed into the shareable parent object. The specific information about each document would reside in a separate lightweight object.

Lightweight object persistence

Persistence for lightweight objects works much the same way it works for objects of standard types, with one exception. A lightweight object is a subtype of a shareable type and these types have their separate tables as usual. For a standard type, each object has separate records in all of these tables, with each record identified by the object ID of the object. However, when multiple lightweight objects share one parent object there is only one object ID (of the parent object) in the tables of the shareable type. The lightweight objects need to refer to the object ID of the parent object, which is different from the object ID of any of the lightweight objects, in order to access the shared properties. This reference is made via an attribute named i_sharing_parent, as shown in the last figure.

LEAVE A REPLY

Please enter your comment!
Please enter your name here