As you grow and develop more complex extraction, transformation and loading (ETL) scenarios you will naturally implement more business rules using ABAP in the transformation routines. This introduces a whole new level of understanding required as this greater flexibility and control also comes with the ability to make a real mess of the data.
Do you load data out of a DataStore in delta mode? Yes, keep reading. This does not apply to data that is always guaranteed to be done using a full load. This discussion is focusing on a transformation built between two DataStores.
“the RECORD field identifies the sequence
of processing for the records in the DataPacket”
When coding in a transformation start routine or end routine it is important to pay attention to the “RECORD” field. It identifies the sequential order of the records delivered to the routine by the DataProvider. You must make sure that your code honours that sequence when the start routine or end routine have finished implementing the business rules. Take a look for any code you’ve introduced:
- Does it have a delete statement for the source_package internal table?
- Does it sort the source_package internal table?
- Does it make a copy of the source_package internal table, process it and then put it back?
If you answered yes to any of the above then you are at risk of introducing a data integrity problem.
delete source_package where calday is initial. sort source_package by record.
field-symbols: <wa_sp> like line of SOURCE_PACKAGE. sort source_package by employee dateto descending. loop at source_package assigning <wa_sp>. ... endloop. sort source_package by record.
The RECORD field is important when the ETL is utilising DataStores. A DataStore DataProvider will supply records from its change log table in the correct sequence according to the activation sequence done into that DataProvider DataStore. Did you notice the common code in the above examples?
sort source_package by record.
Keep in mind that if your code does not re-order the sequence of the records in the source_package then you do not have to force the sort by record field.
When there have been multiple activations in the DataProvider DataStore, the records delivered downstream are usually combined into a single request and sent to the DataTarget. This single request may have multiple occurrences of the same record, one for each change done in the individual activations in the DataProvider. The sequence of delivering these records to the DataTarget must be honoured by your start routine and end routine. If you get them out of order then you’re actually going to write the wrong record to the DataTarget because the last activated record for a single full unique key will always be the record shown in the DataTarget DataStore active table.
The scenario of multiple activation requests being merged into a single request (upon extraction from the change log table) normally only occurs when the process chains are designed to load multiple DataProviders into the same DataTarget before subsequently loading all that data further downstream. Stated another way; even though you will only get one request loading into the downstream DataTarget, the records that are in that request is actually made up of multiple activation requests within the DataProvider DataTarget.
Given that this is a common data modeling scenario it becomes clear that you need to add one more check to your code review list before releasing the transport/solution from the development system.
What other ABAP scenarios do I have in my code review list?