A Datastore versus a Staging Cube

One of the data modeling constraints of a DataStore is the limitation of ‘How many characteristics you can put in the key field list?’. It is a design strategy that has been enforced from the database layer up to the application layer.

Cube

The BW application is intentionally keeping the direct correlation between the object model of a DataStore and the actual database table used to store the data. This does make sense when you remember that a DataStore exists for fast storage and processing of data.

“a DataStore is optimised to balance
between read and write demands”

This is in contrast to a cube that is implementing a database layer model with relational tables. This involves the use of SIDs, DIMs and up to three different types of fact table. This representation is segmented and no-where near as flat as the application layers perspective of a data record. This is one of the fundamental differences between a DataStore and a cube.

“a cube is optimised for reading”

A DataStore inherits some of its technical limitations from the database layer. This is fine because you have chosen to use the right tool for the right job; a DataStore for Extraction, Transformation and Loading (ETL).

What do you do when the full unique key of a record exceeds the limitation of where it is going to be stored?

This is a scenario that is less than ideal and is quite frustrating to have data model limitations enforced due to technical limitations.

The first and obvious solution is to use a cube instead of a DataStore. This will meet the full unique key requirements of the record but at what price?

A cube is not the right tool for the job when your goal is to extract, transform and load data fast as possible.

There are a number of reasons why using a cube in the ETL is quick to build but costly in the long term:

  • A cube will always auto-generate SIDs in the characteristics used;
  • Processing overhead to dissect a flat record structure into a SID, DIM and Fact table relationship;
  • Compressing the cube is not an option because the request dimension is needed for downstream delta loading;
  • Cube indexing overhead given Oracles’ bitmap index problem for writing/locking in parallel;
  • Millions of records in a non-indexed cube are noticeably slower;
  • Deleting and creating the cube indexes gets slower with more data;
  • Loading further downstream is a 1 to 1 relationship;
  • Unable to leverage a DataStores true delta record generation;
  • Reconciliation of raw data in a cube is not possible due to use of DIMs and SIDs;
  • A cube forces reconciliation queries to be created and used;
  • A DataStore allows the BW Administrator to review the raw data in the table.

Resist the implementation of a cube in the staging layer and the transformation layer of the data model.

The only time you will not have a choice is when the extractor is defined in the source system as being ‘Cube Only’ or ‘Not DataStore Compatible’.

There are four data modeling techniques available that enable a DataStore to be used with its 16 key field limit and still hold unique records that have more than 16 fields in their full unique key definition:

  • An implied characteristic (Not recommended);
  • A truncated characteristic (Not recommended);
  • Use a concatenated characteristic (Recommended);
  • Use a surrogate characteristic (Best practice).

A data modeling solution can use any combination of those techniques.

  • Are there any staging Cubes in your Data Model?
  • [Yes] Can you identify the records full unique key from a business point of view?