Alfresco 3 Business Solutions: Document Migration Strategies

12 min read

Alfresco 3 Business Solutions

Alfresco 3 Business Solutions

Practical implementation techniques and guidance for delivering business solutions with Alfresco

  • Deep practical insights into the vast possibilities that exist with the Alfresco platform for designing business solutions.
  • Each and every type of business solution is implemented through the eyes of a fictitious financial organization – giving you the right amount of practical exposure you need.
  • Packed with numerous case studies which will enable you to learn in various real-world scenarios.
  • Learn to use Alfresco’s rich API arsenal with ease.
  • Extend Alfresco’s functionality and integrate it with external systems.

The Best Money CMS project is now in full swing and we have the folder structure with business rules designed and implemented and the domain content model created. It is now time to start importing any existing documents into the Alfresco repository. Most companies that implement an ECM system, and Best Money is no exception, will have a substantial amount of files that they want to import, classify, and make searchable in the new CMS system.

The planning and preparation for the document migration actually has to start a lot earlier, as there are a lot of things that need to be prepared:

  • Who is going to manage sorting out files that should be migrated?
  • What is the strategy and process for the migration?
  • What sort of classification should be done during the import?
  • What filesystem metadata needs to be preserved during the import?
  • Do we need to write any temporary scripts or rules just for the import?

Document migration strategies

The first thing we need to do is to figure out how the document migration is actually going to be done. There are several ways of making this happen. We will discuss a couple of different ways, such as via the CIFS interface and via tools. There are also some general strategies that apply to any migration method.

General migration strategies

There are some common things that need to be done no matter which import method is used, such as setting up a document migration staging area.

Document staging area

The end users need to be able to copy or move documents—that they want to migrate—to a kind of staging area that mirrors the new folder structure that we have set up in Alfresco. The best way to set up the staging area is to copy it from Alfresco via CIFS. When this is done the end users can start copying files to the staging area. However, it is a good idea to train the users in the new folder structure before they start copying documents to it. We should talk to them about folder structure changes, what rules and naming conventions have been set up, the idea behind it, and why it should be followed.

If we do not train the end users in the new folder structure, they will not honor it and the old structure will get mixed up with the new structure via document migration, and this is not something that we want. We did plan and implement the new structure for today’s requirements and future requirements and we do not want it broken before we even start using the system.

The end users will typically work with the staging area over some time. It is good if they get a couple of weeks for this. It will take them time to think about what documents they want to migrate and if any re-organization is needed. Some documents might also need to be renamed.

Preserving Modified Date on imported documents

We know that Best Money wants all their modified dates on the files to be preserved during an import, as they have a review process that is dependent on it. This means that we have to use an import method that can preserve the Modified Date on the network drive files when they are merged into the Alfresco repository. The CIFS interface cannot be used for this as it sets Modified Date to Current Date.

There are a couple of methods that can be used to import content into the repository and preserve the Modified Date:

  • Create an ACP file via an external tool and then import it
  • Custom code the import with the Foundation API and turn off the Audit Aspect before the import
  • Use an import tool that also has the possibility to turn off the Audit Aspect

At the time of writing (when I am using Alfresco 3.3.3 Enterprise and Alfresco Community 3.4a) there is no easy way to import files and preserve the Modified Date. When a file is added via Alfresco Explorer, Alfresco Share, FTP, CIFS, Foundation API, REST API, and so on, the Created Date and Modified Date is set to “now”, so we lose all the Modified Date data that was set on the files on the network drive.

The Created Date, Creator, Modified Date, Modifier, and Access Date are all so called Audit properties that are automatically managed by Alfresco if a node has the cm:auditable aspect applied. If we try and set these properties during an import via one of the APIs, it will not succeed.

Most people want to import files via CIFS or via an external import tool. Alfresco is working towards supporting preserving dates when using both these methods for import. Currently, there is a solution to add files via the Foundation API and preserve the dates, which can be used by custom tools. The Alfresco product itself also needs this functionality in, for example, the Transfer Service Receiver, so the dates can be preserved when it receives files.

The new solution that enables the use of the Foundation API to set Auditable properties manually has been implemented in version 3.3.2 Enterprise and 3.4a Community. To be able to set audit properties do the following:

  1. Inject the policy behavior filter in the class that should do the property update:

    <property name=”behaviourFilter” ref=”policyBehaviourFilter”/>

  2. Then in the class, turn off the audit aspect before the update, it has to be inside a new transaction, as in the following example:

    RetryingTransactionCallback<Object> txnWork = new
    RetryingTransactionCallback<Object>() {
    public Object execute() throws Exception {

  3. Then in the same transaction update the Created or Modified Date:

    ContentModel.PROP_MODIFIED, someDate);
    . . .


With JDK 6, the Modified Date is the only file data that we can access, so no other file metadata is available via the CIFS interface. If we use JDK 7, there is a new NIO 2 interface that gives access to more metadata. So, if we are implementing an import tool that creates an ACP file, we could use JDK 7 and preserve Created Date, Modified Date, and potentially other metadata as well.

Post migration processing scripts

When the document migration has been completed, we might want to do further processing of the documents such as setting extra metadata. This is specifically needed when documents are imported into Alfresco via the CIFS interface, which does not allow any custom metadata to be set during the import. There might also be situations, such as in the case of Best Money, where a lot of the imported documents have older filenames (that is, following an older naming convention) with important metadata that should be extracted and applied to the new document nodes.

For post migration processing, JavaScript is a convenient tool to use. We can easily define Lucene queries for the nodes we want to process, as the rules have applied domain document types such as Meeting to the imported documents, and we can use regular expressions to match and extract the metadata we want to apply to the nodes.

Search restrictions when running post-migration scripts

What we have to think about though when running these post-migration scripts, is that the repository now contains a lot of content, so each query we run might very well return much more than 1,000 rows. And 1,000 rows is the default max limit that a search will return.

To change this to allow for 5,000 rows to be returned, we have to make some changes to the permission check configuration (Alfresco checks the permissions for each node that is being accessed, so the user running the query is not getting back content that he or she should not have access to). Open the file located in the alfresco/tomcat/shared/classes directory and add the following properties:

# The maximum time spent pruning results (was 10000)
# The maximum number of results to perform permission checks
against (was 1000)

Unwanted Modified Date updates when running scripts

So we have turned off the audit feature during document migration or made some custom code changes to Alfresco, to get the document’s Modified Date to be preserved during import. Then we have turned on auditing again so the system behaves in the way the users expect.

The last thing we want now is for all those preserved modified dates to be set to the current date when we update metadata. And this is what will happen if we are not running the post-migration scripts with the audit feature turned off. So this is important to think about unless you want to start all over again with the document migration.

Versioning problems when running post-migration scripts

Another thing that can cause problems is when we have versioning turned on for documents that we are updating the post-migration scripts. If we see the following error:

org.alfresco.service.cmr.version.VersionServiceException: 07120018
The current implementation of the version service does not support
the creation of branches.

By default, new versions will be created even when we just update properties/metadata. This can cause errors such as the preceding error and we might not even be able to check-in and check-out the document. To prevent this error from popping up, and turn off versioning during property updates once and for all, we can set the following property at the same time as we set the other domain metadata in the scripts:[“cm:autoVersionOnUpdateProps”] = false;

Setting this property to false effectively turns off versioning during any property/metadata update for the document.

Another thing that can be a problem is if folders have been set up as versionable by mistake. The most likely reason for this is that we probably forgot to set up the Versioning Rule to only apply to cm:content (and not to “All Items”). Folders in the workspace://SpacesStore store do not support versioning

The WCM system comes with an AVM store that supports advanced folder versioning and change sets. Note that the WCM system can also store its data in the Workspace store.

So we need to update the versioning rule to apply to the content and remove the versionable aspect from all folders, which have it applied, before we can update any content in these folders. Here is a script that removes the cm:versionable aspect from any folder having it applied:

var store = “workspace://SpacesStore”;
var query = “PATH:”/app:company_home//*” AND TYPE:”cm:folder”
AND ASPECT:”cm:versionable””;
var versionableFolders = search.luceneSearch(store, query);

for each (versionableFolder in versionableFolders) {
logger.log(“Removed versionable aspect from folder: ” +;
logger.log(“Removed versionable aspect from ” +
versionableFolders.length + ” folders”);

Post-migration script to extract legacy meeting metadata

Best Money has a lot of documents that they are migrating to the Alfresco repository. Many of the documents have filenames following a certain naming convention. This is the case for the meeting documents that are imported. The naming convention for the old imported documents are not exactly the same as the new meeting naming convention, so we have to write the regular expression a little bit differently.

An example of a filename with the new naming convention looks like this:
10En-FM.02_3_annex1.doc and the same filename with the old naming convention looks like this: 10Eng-FM.02_3_annex1.doc. The difference is that the old naming convention does not specify a two-character code for language but instead a list that looks like this: Arabic,Chinese,Eng|eng,F|Fr,G|Ger,Indonesian,Jpn,Port,Rus|Russian,Sp,Sw,Tagalog,Turkish. What we are interested in extracting is the language and the department code and the following script will do that with a regular expression:

// Regulars Expression Definition
var re = new RegExp(“^d{2}(Arabic|Chinese|Eng|eng|F|Fr|G|Ger|

var store = “workspace://SpacesStore”;
var query = “+PATH:”/app:company_home/cm:Meetings//*” +
var legacyContentFiles = search.luceneSearch(store, query);

for each (legacyContentFile in legacyContentFiles) {
if (re.test( == true) {
var language = getLanguageCode(RegExp.$1);
var department = RegExp.$2;
logger.log(“Extracted and updated metadata (language=” + language
+ “)(department=” + department + “) for file: ” +;
if (legacyContentFile.hasAspect(“bmc:document_data”)) {
// Set some metadata extracted from file name[“bmc:language”] = language;[“bmc:department”] = department;

// Make sure versioning is not enabled for property updates[“cm:autoVersionOnUpdateProps”] =
} else {
logger.log(“Aspect bmc:document_data is not set for
document” +;
} else {
logger.log(“Did NOT extract metadata from file: ” +;

* Convert from legacy language code to new 2 char language code

* @param parsedLanguage legacy language code
function getLanguageCode(parsedLanguage) {
if (parsedLanguage == “Arabic”) {
return “Ar”;
} else if (parsedLanguage == “Chinese”) {
return “Ch”;
} else if (parsedLanguage == “Eng” || parsedLanguage == “eng”) {
return “En”;
} else if (parsedLanguage == “F” || parsedLanguage == “Fr”) {
return “Fr”;
} else if (parsedLanguage == “G” || parsedLanguage == “Ger”) {
return “Ge”;
} else if (parsedLanguage == “Indonesian”) {
return “In”;
} else if (parsedLanguage == “Ital”) {
return “”;
} else if (parsedLanguage == “Jpn”) {
return “Jp”;
} else if (parsedLanguage == “Port”) {
return “Po”;
} else if (parsedLanguage == “Rus” || parsedLanguage == “Russian”) {
return “Ru”;
} else if (parsedLanguage == “Sp”) {
return “Sp”;
} else if (parsedLanguage == “Sw”) {
return “Sw”;
} else if (parsedLanguage == “Tagalog”) {
return “Ta”;
} else if (parsedLanguage == “Turkish”) {
return “Tu”;
} else {
logger.log(“Invalid parsed language code: ” + parsedLanguage);
return “”;

This script can be run from any folder and it will search for all documents under the /Company Home/Meetings folder or any of its subfolders. All the documents that are returned by the search are looped through and matched with the regular expression. The regular expression defines two groups: one for the language code and one for the department. So after a document has been matched with the regular expression it is possible to back-reference the values that were matched in the groups by using RegExp.$1 and RegExp.$2.

When the language code and the department code properties are set, we also set the cm:autoVersionOnUpdateProps property, so we do not get any problem with versioning during the update.


Please enter your comment!
Please enter your name here