Implied, Truncated and Surrogate Characteristics

Chances are that you will never experience the problem of filling up the 16 key fields with Concatenated Characteristics and still have more fields from the records full unique key definition to be added. Reaching this true technical limitation is possible but not probable. The BW DataStore Limit is 16 key fields; each one is a Characteristic that can hold a maximum of 60 characters.

The total usable characters is 960 (16×60). If this technical limitation is reached then you can no longer utilise the Concatenated Characteristic technique and will have to use a technique that implements shorter values.

What other options are available?

Most of us know the ‘Value Type for Reporting (0VTYPE)’ characteristic. It has approximately half a dozen official values. ‘010’ for Actuals, ‘020’ for Plan, ‘040’ for Commitment, etc.

An Implied Characteristic is one that exists as a definition only. It is never declared, instantiated, created in memory or saved in any storage format. It exists only because the DataSet has been ‘Implied’ to be it, usually by way of a text description somewhere. For Example: The Plan Cube must obviously hold ‘Plan’ data. Implied Characteristics can save a lot of storage space.

Despite the obvious storage benefit, the use of Implied Characteristics is not recommended. If anything, it is a best practice to go to the extra effort to create and add any implied semantic meaning into the record, redundantly on all records in the DataSet. You have probably seen the use of 0VTYPE = ‘010’ in an Actuals Cube and 0VTYPE = ‘020’ in a Plan Cube. Take that same idea and run with it for all other implied aspects of all DataSets in your entire Data Warehouse. For Example: Add a characteristic that identifies the Application Component (‘CC’ for Cost Centre, ‘PY’ for Payroll, ‘PO’ for Purchase Order, etc).

Once the DataSets start to fill out with the meta-data that was implied but is now real in the record; you will find that the Transformations and Queries are a lot easier to create, debug and enhance because there are no assumptions (implied knowledge). The selection filters in the code and restricted key figures are very explicit. You might be tempted to use Implied Characteristics (by leaving the data out of the record) to enable the full unique key to fit within the DataStore 16 key field limit but the benefit gained in the ETL and query definition far out weights the effort required to implement an alternative solution like Surrogate Characteristics.

“a Surrogate Characteristic leverages the same aspects
of a Truncated Characteristic but without the risk”

A Truncated Characteristic is when the master data value is stripped of excess characters that provide no real benefit and a shorter value is used instead. For Example: We could assess the complete master data list of values for 0VTYPE and determine that we will replace it with just the second character. Now only 1 character is required to represent the same semantic value as the original 3 character value. ‘010’ becomes ‘1’, ‘020’ becomes ‘2’, ‘040’ becomes ‘4’, etc. This provides the benefit of still representing the same semantic meaning but doing it using less space.

The Master Data Modeling for a Truncated Characteristic can be simplified by creating a dedicated attribute that will contain the truncated value. This attribute is then used in the transaction data record instead of the original characteristic. This enables Transformation lookup to be easily implemented and also allows for the “Real to Truncated” value relationship to be seen in reconciliation queries. For Example: The 0MATERIAL characteristic is usually defined as a CHAR 18 with a CHAR 2 compound to Source System. Upon reviewing the master data list you notice that the leading 10 characters are always a zero (0). A Truncated Characteristic would be created as a CHAR 8 with a CHAR 2 compound to Source System. The benefit is 10 less characters on every record, allowing more fields to be placed into a DataStore key field list.

As a pre-caution, add a safety check into the Transformation to guarantee the truncated characters are insignificant. For Example: Ensure the 10 leading characters removed from 0MATERIAL are always zeros (0). Any failing records should be marked as an error and not allowed to load through further. This safety check will ensure data integrity remains intact. At this point the BW Administrator is now probably cursing the developer who implemented this Data Model as the reporting users want their data but the failing record will not be available and the key figure totals will be wrong. This leaves the BW Administrator with a nightmare to fix as none of the available solutions are quick or easy. No matter how much the business analyst guaranteed this situation would never happen, as a DataModeler, it is always better to implement a solution that will always load.

Even with the benefits to storage space saved, less data transferred from the database to application server, less memory consumed and faster block processing of internal tables; the use of Truncated Characteristics is not recommend.

A Surrogate Characteristic does not introduce any data integrity problems, the need for a safety check or suffer from missing reporting data. The Characteristic still gets a new attribute, which is used as a replacement in the transaction data but the value of this new master data list is a unique number. It is an Integer (INT4) and is the exact same technique as the SIDs (Master Data in a Cube Dimension) and DIMs (Cube Dimension in a Fact Table).

The use of Surrogate Characteristics is a DataModeling best practice. There is a lot more information required to cleanly implement a Surrogate Characteristic. At least you now understand the limitations of Implied Characteristics and Truncated Characteristics; enough to know that the implementation of a Surrogate Characteristic is worth the effort to ensure data integrity and not leave a data integrity surprise for the BW Administrator.