Defining a Surrogate Characteristic

By definition a surrogate is a replacement; a substitute.

A concatenated characteristic could be considered a surrogate as technically a new InfoObject was created to be used as a replacement of several other InfoObjects in the DataStore key field list. However, this is such a limited example of substitution that the clear separate definitions of ‘Concatenated versus Surrogate’ characteristic will still be used.

The primary distinction between these two definitions is “Human Readable”. A concatenated characteristic still contains the raw, original value. A SAP BW surrogate characteristic will use a completely different value to represent the original; one that can be read by a human but will not represent any significant semantic relationship to the original value. For Example: Use of integer numbers as nothing more than a unique identifier.

“Surrogate Characteristics have a benefit at every step
of the journey through the data model”

The most efficient implementation of a surrogate characteristic is to use a native data type that is an unsigned integer at the block width of the native operating system (32/64); although going for an unsigned 32-bit integer is a good compromise as a lot of written code will handle the bulk processing of 32 bit fields very efficiently.

The simplest implementation of a surrogate characteristic will contain only one attribute. This will create a normalised list of unique master data values that can be used in a record as a replacement. For Example: The Purchasing Line Item extractor (2LIS_02_ITM) delivers a text field (EKPO-TXZ01) that is posted transaction data. Create a surrogate characteristic with 0PROD_DESCR as the attribute.

Within the BW ETL above the staging layer, replace the 0PROD_DESCR characteristic in the transaction data with the new surrogate characteristic and use the attribute relationship when you want the real product description value (TXZ01). The transformation layer in your data model will now use only 4 bytes to store the text field on each transaction data record, instead of the original 40 characters (80 bytes, unicode), multiplied by the millions of records you might have in the DataSet and the database space saved starts to become significant.

A more complex surrogate characteristic will have many attributes with the surrogate master data list becoming a normalised list of unique master data combinations of the attribute values. While this can increase the space saved in each transaction data record, it has a counter productive overhead as soon as you want to use one of the attributes in a Transformation for implementing business rules.

A good solid analysis of the business rules will highlight which characteristics in the record are “only along for the ride to the reporting layer” and will not be used to govern business logic impact to the record. For Example: The Purchasing Line Item extractor (2LIS_02_ITM) and the base/most detailed characteristics of a dimension; vendor, supplier, receiver, shipper, material, requisitioner, agreement, etc. While meta-data and grouping characteristics will probably be involved in business logic; document type, document category, purchasing group, purchasing organisation, transfer process, status flags, etc.

The time spent analysing the business logic and implementing complex surrogate characteristics is well worth the effort because it creates a “horizontal compression of a record” where many wide fields are replaced by a single unique identifier.

It is a best practice not to use the SAP BW Surrogate Characteristic technique on any InfoObjects that have been enabled for use in authorisation. The impact to security and analysis authorisations is just never worth the time and complexity that is added by having to move a characteristic based authorisation over to be an attribute driven authorisation.

It is worth noting that the use of surrogate characteristics in Cubes for reporting is counter productive as you are effectively forcing the overhead of transitive attributes into the Cube, Aggregates, MultiProvider, and Queries with no significant advantage gained. The fundamental implementation of a normal Cube is already utilising surrogate characteristics for the dimensions (Through SIDs, DIMs and the Fact Table).