Wednesday, February 16, 2011

Handling duplicate rows with Flat file as source






The below two methods can be used to get rid of duplicate rows in case you are using flat file as source.
1)      Using Aggregator Transformation
2)      Using Rank Transformation
We will take the below file as source for this example.


Using Aggregator Transformation

Design a mapping as shown below:


 Sorter Transformation

Sorter is used to sort the rows according to roll.


Aggregator Transformation

The Aggregator will group the rows together based on roll so that there will be only one row per roll sent to the target. The Aggregator sends the last row in each group to the target.



Output

You will get only one row per roll in the target file as shown below.


Using Rank Transformation

Design a mapping as shown below:


Rank Transformation

Set the properties for Rank Transformation as shown below. This groups the rows by Roll, and ranks rows according to Name. The Top option returns the values with the greatest value. Setting Number of Ranks to 1 returns the row with the highest value.


Output

You will get only one row per roll in the target as shown below.
 

No comments:

Post a Comment