Learning Informatica PowerCenter 9.x
上QQ阅读APP看书,第一时间看更新

SCD2 (date range) – marking the dates

To implement SCD2 by maintaining flags, perform the following steps:

  1. In the designer, navigate to Tools | Mapping Designer | Mapping | Wizard | Slowly Changing Dimensions, as shown in the following screenshot:
    SCD2 (date range) – marking the dates

    A new window will pop up, asking you the name (m_SCD2_DATE_RANGE) of the new SCD mapping. Select Type 2 Dimension - keep a full history of the changes in the target, as we are implementing SCD2, and click on Next.

    SCD2 (date range) – marking the dates
  2. The next screen will ask you to select the source. Select a source from the drop-down list. We are using EMP_FILE.txt as the source file for our reference. Also, specify the name of the target you wish to create. We will name the target EMPLOYEE_SCD2_DATE_RANGE in this book for our reference. Then, click on Next.
    SCD2 (date range) – marking the dates
  3. In the next window, select EMPLOYEE_ID as Logical Key Field. Also, add LOCATION under Fields to compare the changes, and click on Next.
    SCD2 (date range) – marking the dates
  4. The next screen asks you to choose the option to maintain the history in the target. Select Mark the dimension records with their effective date range and click on Finish.
    SCD2 (date range) – marking the dates
  5. The wizard generates a complete mapping in your Mapping Designer Workspace. Make the necessary changes to the mapping if required.
    SCD2 (date range) – marking the dates
  6. Change the target data type from the flat file to the Oracle table, as shown in the following figure:
    SCD2 (date range) – marking the dates

When we create a mapping using this option, the wizard create three additional columns in the target table:

  • PM_PRIMARY_KEY: The wizard generates the primary key for each row to be inserted into the target. Note that EMPLOYEE_ID will not be the primary key in the table.
  • PM_BEGIN_DATE: The wizard loads SYSTEMDATE for each NewFlag and ChangeFlag record inserted into the table.
  • PM_END_DATE: The wizard loads SYSTEMDATE for each updated record inserted into the table, indicating the end date of the record.

The Informatica PowerCenter SCD2 mapping uses the LKP_GetData lookup transformation to look up the data in the target table and uses the EXP_DetectChanges expression transformation to compare the data with the source data. Based on the comparison, the expression transformation marks a record as NewFlag or ChangedFlag. The mapping is divided into three flows:

  • The FIL_InsertNewRecord filter transformation allows only the NewFlag record to pass further and filter ChangedFlag from the first flow. It passes the NewFlag records to UPD_ForceInserts, which inserts these records into the EMPLOYEE_SCD2_DATE_RANGE target instance. The SEQ_GenerateKeys sequence generator generates the primary key for each NewFlag record. The EXP_KeyProcessing_InsertNew expression transformation loads SYSTEMDATE into the PM_BEGIN_DATE column and leaves PM_END_DATE as null. This indicates that the new record has been added from the date loaded in PM_BEGIN_DATE.
  • The FIL_InsertChangedRecord filter transformation allows only the ChangedFlag record to get passed to UPD_ChangedInserts, which inserts changed records in the EMPLOYEE_SCD2_DATE_RANGE1 target instance. For every ChangedFlag record, the EXP_KeyProcessing_InsertChanged expression transformation loads SYSTEMDATE into the PM_BEGIN_DATE column and leaves PM_END_DATE as null. This indicates that the changed record has been added, and the changed row now contains the current data.
  • The FIL_UpdateChangedRecord filter transformation passes the primary key of the previous value for every ChangedFlag record to UPD_ChangedUpdate, which inserts changed records into the target, which is EMPLOYEE_SCD2_DATE_RANGE2. The EXP_CalcToDate expression transformation loads SYSTEMDATE into PM_END_DATE to indicate that the row now contains the historical data.

Let's understand each transformation that is used in the SCD2 mapping:

  • Source qualifier (SQ_EMP_FILE): This extracts the data from the file or table that you used as the source in the mapping. It passes data to the downstream transformations, that is, lookup, expression, and filter transformation.
  • Lookup (LKP_GetData): This is used to look up the target table. It caches the existing data from EMPLOYEE_SCD2_DATE_RANGE. The EMPLOYEE_ID=IN_EMPLOYEE_ID condition will compare the data with the source table and target table. It passes the data based on the comparison with the expression transformation.
  • Expression (EXP_DetectChanges): This receives the data from the upstream transformation and based on that, it creates two flags, which are NewFlag and ChangedFlag:
    • Condition for NewFlag: IIF(ISNULL(PM_PRIMARYKEY), TRUE, FALSE)
    • Condition for ChangedFlag: IIF(NOT ISNULL(PM_PRIMARYKEY) AND (DECODE(LOCATION,PM_PREV_LOCATION,1,0)=0), TRUE, FALSE)

    Based on the condition, it passes the data to downstream filter transformations.

  • Filter (FIL_InsertNewRecord): This filters the records that come from the upstream expression transformation and are marked as ChangedFlag; it allows records with NewFlag to get passed to the UPD_ForceInserts update strategy.
  • Filter (FIL_InsertChangedRecord): This filters the records that come from the upstream expression transformation and are marked as NewFlag; it allows records with ChangedFlag to get passed to the UPD_ForceInserts update strategy.
  • Filter (FIL_UpdateChangedRecord): This filters the records that come from the upstream expression transformation and are marked as NewFlag; it allows records with ChangedFlag to get passed to the UPD_ChangedUpdate update strategy. For each row marked as ChangedFlag, it passes the primary key of the previous version to UPD_ChangedUpdate.
  • Update strategy (UPD_ForceInserts): This uses the DD_INSERT condition to insert data into the EMPLOYEE_SCD2_DATE_RANGE target instance.
  • Update strategy (UPD_ChangedInserts): This uses the DD_INSERT condition to insert data into the EMPLOYEE_SCD2_DATE_RANGE1 target instance.
  • Update strategy (UPD_ChangedUpdate): This uses the DD_UPDATE condition to overwrite the existing LOCATION value into the EMPLOYEE_SCD2_DATE_RANGE2 target instance.
  • Sequence generator (SEQ_GenerateKeys): This generates a sequence of values for each new row marked as NewFlag that comes into target, getting incremented by 1. It passes the value to EXP_KeyProcessing_InsertNew.
  • Expression (EXP_KeyProcessing_InsertNew): This loads the generated value in the PM_PRIMARYKEY column in the EMPLOYEE_SCD2_DATE_RANGE target instance. It loads SYSTEMDATE into the PM_BEGIN_DATE column in the target, marking the start of the record.
  • Expression (EXP_KeyProcessing_InsertChanged): This loads the generated value in the PM_PRIMARYKEY column in the EMPLOYEE_SCD2_DATE_RANGE1 target instance. It loads SYSTEMDATE into the PM_BEGIN_DATE column in the target, marking the start of the record in the EMPLOYEE_SCD2_DATE_RANGE1 target instance.
  • Expression (EXP_CalcToDate): This uses SYSTEMDATE to update the PM_END_DATE column in the EMPLOYEE_SCD2_DATE_RANGE2 target instance in an existing record, indicating that the record is not current anymore.
  • Target (EMPLOYEE_SCD2_DATE_RANGE): This is the target table instance that accepts new records in the target table.
  • Target (EMPLOYEE_SCD2_DATE_RANGE1): This is the target table instance that accepts changed records in the target table.
  • Target (EMPLOYEE_SCD2_DATE_RANGE2): This is the target table instance that allows updates to existing records in the target table.