The primary benefit of using surrogate characteristics is the ability to process larger volumes of data using fewer resources. This is achieved with a compromise; an understanding that the processing will not want to know the original value. As soon as the original value is required, an overhead is added to the processing through the need to go and lookup the relationship to obtain the original value and then process the original value.
This highlights that the implementation of surrogate characteristics should not be entered into quickly. While they might seem like the ultimate solution for performance optimisation, their use should be thoroughly investigated and tested. The best way to understand them is to remember that they have a life cycle from beginning to middle to end. This involves:
- Beginning: Overhead to become a surrogate characteristic;
- Middle: Benefit of using less resources for processing;
- Less database table space for storage;
- Less memory (RAM) used;
- Less network traffic between the application and database services;
- Faster database processing for sort and group-by commands;
- All DataMart extractors from a DataStore change log table;
- All DataStore activations. New table into the active table;
- Queries on DataStore via a MultiProvider;
- Normalise text fields in posted transaction data;
- Middle: Overhead to lookup related original value for processing;
- Middle: Benefit to logical operators for processing;
- Middle: Benefit of using less resources in more processes;
- Down-stream in additional DataStores and Cubes;
- Cross-stream in Transformation lookups;
- End: Overhead to return back to the original value.
Keep in mind that the use of a surrogate characteristic is usually localised to a single system. The unique id is generated within the context of the system where it was created. The unique id cannot be guaranteed to survive as unique (100% integral) when transferred to another system, beyond the control of the mechanism that generated the unique id. For Example: Extracting data from a regional BW system to a consolidated enterprise reporting BW system.
When data flows between systems, beyond the control of the unique id generator, the data should be transferred as the original value to enable a smooth reconciliation between the two systems. The introduction of the Real Time Data Acquisition (RDA) feature in BW v7 has made this process even easier. It has enabled the ability to extract related attributes directly from a transaction data DataStore with no custom ABAP code required in the extractor.
“surrogate characteristics are not useful
at every point in a data model”
Alternatively; if you can guarantee the target BW system will be a ‘master data’ slave to the source system and the target surrogate characteristic is compounded to a source system identifier then you can safely re-use the source system surrogate value and avoid the overhead of converting the transaction data back into the original values to transfer between systems. This will require separate master data extractors for each surrogate characteristic to provide the original values for reporting in the target system.
Please consider this as a very real option, especially when you have to transfer a lot of transaction data across a Wide Area Network (WAN) to the enterprise head office in another country. The financial and time-to-send savings on data transfer can easily justify the project time required to implement this data model.
What has been my experience with using surrogate characteristics?