SCD2 (date range) – marking the dates
To implement SCD2 by maintaining flags, perform the following steps:
- In the designer, navigate to Tools | Mapping Designer | Mapping | Wizard | Slowly Changing Dimensions, as shown in the following screenshot:
A new window will pop up, asking you the name (
m_SCD2_DATE_RANGE
) of the new SCD mapping. Select Type 2 Dimension - keep a full history of the changes in the target, as we are implementing SCD2, and click on Next. - The next screen will ask you to select the source. Select a source from the drop-down list. We are using
EMP_FILE.txt
as the source file for our reference. Also, specify the name of the target you wish to create. We will name the targetEMPLOYEE_SCD2_DATE_RANGE
in this book for our reference. Then, click on Next. - In the next window, select
EMPLOYEE_ID
as Logical Key Field. Also, addLOCATION
under Fields to compare the changes, and click on Next. - The next screen asks you to choose the option to maintain the history in the target. Select Mark the dimension records with their effective date range and click on Finish.
- The wizard generates a complete mapping in your Mapping Designer Workspace. Make the necessary changes to the mapping if required.
- Change the target data type from the flat file to the
Oracle
table, as shown in the following figure:
When we create a mapping using this option, the wizard create three additional columns in the target table:
PM_PRIMARY_KEY
: The wizard generates the primary key for each row to be inserted into the target. Note thatEMPLOYEE_ID
will not be the primary key in the table.PM_BEGIN_DATE
: The wizard loadsSYSTEMDATE
for eachNewFlag
andChangeFlag
record inserted into the table.PM_END_DATE
: The wizard loadsSYSTEMDATE
for each updated record inserted into the table, indicating the end date of the record.
The Informatica PowerCenter SCD2 mapping uses the LKP_GetData
lookup transformation to look up the data in the target table and uses the EXP_DetectChanges
expression transformation to compare the data with the source data. Based on the comparison, the expression transformation marks a record as NewFlag
or ChangedFlag
. The mapping is divided into three flows:
- The
FIL_InsertNewRecord
filter transformation allows only theNewFlag
record to pass further and filterChangedFlag
from the first flow. It passes theNewFlag
records toUPD_ForceInserts
, which inserts these records into theEMPLOYEE_SCD2_DATE_RANGE
target instance. TheSEQ_GenerateKeys
sequence generator generates the primary key for eachNewFlag
record. TheEXP_KeyProcessing_InsertNew
expression transformation loadsSYSTEMDATE
into thePM_BEGIN_DATE
column and leavesPM_END_DATE
as null. This indicates that the new record has been added from the date loaded inPM_BEGIN_DATE
. - The
FIL_InsertChangedRecord
filter transformation allows only theChangedFlag
record to get passed toUPD_ChangedInserts
, which inserts changed records in theEMPLOYEE_SCD2_DATE_RANGE1
target instance. For everyChangedFlag
record, theEXP_KeyProcessing_InsertChanged
expression transformation loadsSYSTEMDATE
into thePM_BEGIN_DATE
column and leavesPM_END_DATE
as null. This indicates that the changed record has been added, and the changed row now contains the current data. - The
FIL_UpdateChangedRecord
filter transformation passes the primary key of the previous value for everyChangedFlag
record toUPD_ChangedUpdate
, which inserts changed records into the target, which isEMPLOYEE_SCD2_DATE_RANGE2
. TheEXP_CalcToDate
expression transformation loadsSYSTEMDATE
intoPM_END_DATE
to indicate that the row now contains the historical data.
Let's understand each transformation that is used in the SCD2 mapping:
- Source qualifier (
SQ_EMP_FILE
): This extracts the data from the file or table that you used as the source in the mapping. It passes data to the downstream transformations, that is, lookup, expression, and filter transformation. - Lookup (
LKP_GetData
): This is used to look up the target table. It caches the existing data fromEMPLOYEE_SCD2_DATE_RANGE
. TheEMPLOYEE_ID=IN_EMPLOYEE_ID
condition will compare the data with the source table and target table. It passes the data based on the comparison with the expression transformation. - Expression (
EXP_DetectChanges
): This receives the data from the upstream transformation and based on that, it creates two flags, which areNewFlag
andChangedFlag
:- Condition for NewFlag:
IIF(ISNULL(PM_PRIMARYKEY), TRUE, FALSE)
- Condition for ChangedFlag:
IIF(NOT ISNULL(PM_PRIMARYKEY) AND (DECODE(LOCATION,PM_PREV_LOCATION,1,0)=0), TRUE, FALSE)
Based on the condition, it passes the data to downstream filter transformations.
- Condition for NewFlag:
- Filter (
FIL_InsertNewRecord
): This filters the records that come from the upstream expression transformation and are marked asChangedFlag
; it allows records withNewFlag
to get passed to theUPD_ForceInserts
update strategy. - Filter (
FIL_InsertChangedRecord
): This filters the records that come from the upstream expression transformation and are marked asNewFlag
; it allows records withChangedFlag
to get passed to theUPD_ForceInserts
update strategy. - Filter (
FIL_UpdateChangedRecord
): This filters the records that come from the upstream expression transformation and are marked asNewFlag
; it allows records withChangedFlag
to get passed to theUPD_ChangedUpdate
update strategy. For each row marked asChangedFlag
, it passes the primary key of the previous version toUPD_ChangedUpdate
. - Update strategy (
UPD_ForceInserts
): This uses theDD_INSERT
condition to insert data into theEMPLOYEE_SCD2_DATE_RANGE
target instance. - Update strategy (
UPD_ChangedInserts
): This uses theDD_INSERT
condition to insert data into theEMPLOYEE_SCD2_DATE_RANGE1
target instance. - Update strategy (
UPD_ChangedUpdate
): This uses theDD_UPDATE
condition to overwrite the existingLOCATION
value into theEMPLOYEE_SCD2_DATE_RANGE2
target instance. - Sequence generator (
SEQ_GenerateKeys
): This generates a sequence of values for each new row marked asNewFlag
that comes into target, getting incremented by1
. It passes the value toEXP_KeyProcessing_InsertNew
. - Expression (
EXP_KeyProcessing_InsertNew
): This loads the generated value in thePM_PRIMARYKEY
column in theEMPLOYEE_SCD2_DATE_RANGE
target instance. It loadsSYSTEMDATE
into thePM_BEGIN_DATE
column in the target, marking the start of the record. - Expression (
EXP_KeyProcessing_InsertChanged
): This loads the generated value in thePM_PRIMARYKEY
column in theEMPLOYEE_SCD2_DATE_RANGE1
target instance. It loadsSYSTEMDATE
into thePM_BEGIN_DATE
column in the target, marking the start of the record in theEMPLOYEE_SCD2_DATE_RANGE1
target instance. - Expression (
EXP_CalcToDate
): This usesSYSTEMDATE
to update thePM_END_DATE
column in theEMPLOYEE_SCD2_DATE_RANGE2
target instance in an existing record, indicating that the record is not current anymore. - Target (
EMPLOYEE_SCD2_DATE_RANGE
): This is the target table instance that accepts new records in the target table. - Target (
EMPLOYEE_SCD2_DATE_RANGE1
): This is the target table instance that accepts changed records in the target table. - Target (
EMPLOYEE_SCD2_DATE_RANGE2
): This is the target table instance that allows updates to existing records in the target table.