One of the data modelling constraints of a DataStore is the limitation of ‘How many characteristics you can put in the key field list?’. It is a design strategy that has been enforced from the database layer up to the application layer.
The BW application is intentionally keeping the direct correlation between the object model of a DataStore and the actual database table used to store the data. This does make sense when you remember that a DataStore exists for fast storage and processing of data.
“a DataStore is optimised to balance
between read and write demands”
This is in contrast to a cube that is implementing a database layer model with relational tables. This involves the use of SIDs, DIMs and up to three different types of fact-table. This representation is segmented and no-where near as flat as the application layers perspective of a data record. This is one of the fundamental differences between a DataStore and a cube.
“a cube is optimised for reading”
A DataStore inherits some of its technical limitations from the database layer. This is fine because you have chosen to use the right tool for the right job; a DataStore for Extraction, Transformation and Loading (ETL).
What do you do when the full unique key of a record exceeds the limitation of where it is going to be stored?
This is a scenario that is less than ideal and is quite frustrating to have data model limitations enforced due to technical limitations.
The first and obvious solution is to use a cube instead of a DataStore. This will meet the full unique key requirements of the record but at what price?
A cube is not the right tool for the job when your goal is to extract, transform and load data fast as possible.
There are a number of reasons why using a cube in the ETL is quick to build but costly in the long term:
- A cube will always auto-generate SIDs in the characteristics used;
- Processing overhead to dissect a flat record structure into a SID, DIM and Fact table relationship;
- Compressing the cube is not an option because the request dimension is needed for downstream delta loading;
- Cube indexing overhead due to Oracles’ bitmap index problem for writing/locking in parallel;
- Millions of records in a non-indexed cube are noticeably slower;
- Deleting and creating the cube indexes gets slower with more data;
- Loading further downstream is a 1 to 1 relationship;
- Unable to leverage a DataStores true delta record generation;
- Reconciliation of raw data in a cube is not possible due to the use of DIMs and SIDs;
- A cube forces reconciliation queries to be created and used;
- A DataStore allows the BW Administrator to review the raw data in the table.
Resist the implementation of a cube in the staging layer and the transformation layer of the data model.
The only time you will not have a choice is when the extractor is defined in the source system as being ‘Cube Only’ or ‘Not DataStore Compatible’.
There are four data modelling techniques available that enable a DataStore to be used with its 16 key field limit and still hold unique records that have more than 16 fields in their full unique key definition:
- An implied characteristic (Not recommended);
- A truncated characteristic (Not recommended);
- Use a concatenated characteristic (Recommended);
- Use a surrogate characteristic (Best practice).
A data modelling solution can use any combination of those techniques.
- Are there any staging Cubes in your Data Model?
- [Yes] Can you identify the records full unique key from a business point of view?