Friday, June 24, 2011

Unconnected Lookup Transformation -- Returning Multiple Values

> We can achieve Multiple values from Unconnected Lookup-Tr. We just need to override the default lookup sql query, there in one of the return port we can concatnate the values using ||.

> The concatnated multiple values will be returned from lookup to calling expression. In the expression we can extract / separate the concatnated string value (multiple values concatnated in lookup) and can pass to the target…

SELECT DEPT.DNAME||' '||DEPT.LOC as OUT_MUL_Values, DEPT.DNAME as DNAME, DEPT.LOC as LOC, DEPT.DEPTNO as DEPTNO FROM DEPT

> Before overrideing we should create one return port in the lookup-tr (here OUT_MUL_Values )

> Devide that OUT_MUL_Values in an expression

O_DNAME =     REG_EXTRACT(OUT_MUL_Values,'(\w+)\s+(\w+)',1)
O_LOC  =           REG_EXTRACT(OUT_MUL_Values,'(\w+)\s+(\w+)',2)

> Connect to target...

So here we are getting Multiple values using Unconnected Tr.


Thursday, June 23, 2011

2nd Workflow needs to run after 1st workflow finishes

Solution - 1

> Create a worklet_1 for running all of the session that you are running in wkf1 and create another worklet worklet_2 to run all of the session in wkf2.

> Now create a new Workflow and connect
start - > worklet_1 -> worklet_2.

> Edit the connection link between worklet_1 -> worklet_2 and set the condition as
worklet_1.Status=Succeeded.

---------------------------------------------------------------------------------------------

Solution - 2

> In this case you have to create a flat file at the end of wkf1 run.. by using
touchxyz.txt ......

> and in wkf2 use event wait task and use file watch event on that flatfile.

> In this solution if you run the wkf2 it will wait for the flat file to appear.. if not it
won't run



-- Ram

Friday, June 17, 2011

Having some cols in Aggregator and not given 'Sorted Input' as checked and we done GroupBy . What will happen? Whether it sorts the data first and do group by.. or ..directly do groupby.

      If your incoming data is in sorted manner and if we forgot to enable the option in Aggr-tr, then Integ serv still treates the rows as unsorted, and try to scan from the first Record onwards. It will sorts the data in the Group By port and next does Grouping..

     For example we have NAMES column which contains alphabet names, and we checked on group by on NAMES, then it will sorts the data first and does group by with corresponding columns.

SID      SNAME
1        Ram
1      Pothina
2      Anil
2      Kumar
  3      Doppal

In this above example .. if we do Group by on SNAME…It will sorts the names and then does Group By..

2              Anil
3              Doppal
2              kumar
1              Pothina
1              Ram



-- Ram

Tuesday, June 14, 2011

To load only requird Rows........

If you want to load only records 3rd, 5th, 10th..

SELECT * FROM (SELECT ROWNUM AS RNO,EMP.* FROM EMP)
WHERE RNO IN (3,5,10)
 







-- Ram

Monday, June 13, 2011

Sumup of Current Record and Previous Record

i have a source table which has only one column of some quantity.
and i want the sum of the last two rows  in the target table.

problem something like this...

Source  Target
Field       Field
1              1
2              3
3              5
4              7
5              9
6              11

Solutions





-- Ram


REGEX_MATCH() -- Example..

Source is a Flat File.  The Name Columns Contain Non-Alphabetic Character at any Position Like
  (0-9) (!@........so on) . We need to separate the data.

Names
Anudeep
Rajes4h
Pravee@n
Ra5mesh
Kuldeep
$reddy
Maya_

Target should be loaded as..

Target1                 Target2    
Anudeep                   Rajes4h
Kuldeep                     Pravee@n
                                    Ra5mesh
                                     $reddy  

Solution
Use the Informatica data comparision function REGEX_MATCH( ) to achieve this..
> Use Filter Condition as  REG_MATCH(Names,'^[a-zA-Z]+$')


-- Ram


Sunday, June 12, 2011

Half Records to one Target Half to another -- If Source is a Flat File ..

If your Source is FLAT FILE..


Half_Half_Loading_Flatfile.jpg
Design the two pipelines as shown above..
First Create a Mapping Variable $$Rec_Count


Second Pipeline

> Idea is to pick up Count of Rows using Aggrigator transformation using COUNT() function..... CNT = COUNT(*)
> Using Exp-Tr that value will be assigned to Mapping variable using SETVARIABL($$Rec_Count, CNT)
> Filter Tr is used with conditon set as FALSE.. becoz no records need to be loaded into target.
Here our idea is to find out RowCount and assigingin it to Mapping variable only..


First Pipeline


> In the Exp-Tr create a new port Rec_count (variable port) and increment it with 1 -- Rec_count + 1
This is becz inorder to generate seqnum with out using Seq-Gen-Tr ...
> Create o/p port  O_Rec_count = Rec_count
> Pass all ports to Router .. Create a group  and conditon as

Rec_Count <= ($$Rec_Count / 2)

Here $$Rec_Count is mapping variable.....
> Connect respective groups to targets.. mean default group to second target..

NOTE :  In this example Second pipeline should execute first so set ..
Target Load Order  : Second Pipeline first  -- First Pipeline next

Incremental Reading - Only one row at one session run..

How to load first record of first time run, Second record on 2nd time run... 3rd record on 3rd time run, to target table?

I understood your requirement as
When you load the session for the first time it has to load only 1st record. On the second run of the session it has to load 2nd record only, and so on... rit? If this is your requirement.. you can achieve this using Mapping variable concept.... Assuming  EMP schema as my source..

> Desing mapping as below..
incremental_load.jpg
> Crete and connect nexval port to Exp-Tr and Set Seq_Gen properties as: Check on Cycle, (Be clear with properties)
> Create Mapping Variable $$Var.
> In the Exp tr Create variable port and expression as   --  ' var_assign = $$Var + 1
> In the Filter Transformation write condition as   Condition :   seq_no = $$Var
> Connect to Target..
  


   For the first time run $$Var will hold intial value as 1 and first record will be loaded, and $$Var will be incremented by 1 in the Exp transformation . On the second run $$Var value will become 2 and only 2nd record will be loaded into the target.. and so on..


-- Ram

Half records to one target another Half to other..

--  In order to achieve this.. we need to design two pipelines..


-- In the first pipeline override the SQL Query as
  SELECT EMP.EMPNO, EMP.ENAME, EMP.JOB, EMP.MGR, EMP.HIREDATE, EMP.SAL, EMP.COMM, EMP.DEPTNO
FROM
 EMP WHERE ROWNUM < = (SELECT COUNT(*) /2 FROM EMP )


-- In the Second pipeline override the SQL Query as
  
SELECT EMP.EMPNO, EMP.ENAME, EMP.JOB, EMP.MGR, EMP.HIREDATE, EMP.SAL, EMP.COMM, EMP.DEPTNO
FROM
 EMP
MINUS
SELECT EMP.EMPNO, EMP.ENAME, EMP.JOB, EMP.MGR, EMP.HIREDATE, EMP.SAL, EMP.COMM, EMP.DEPTNO
FROM
 EMP WHERE ROWNUM <=(SELECT COUNT(*)/2 FROM EMP)


 -- Ram






I want to skip first 5 rows to load into target ?



> If your Source is a Flatfile and if you need to skip first 5 rows ...simply while Importing flatfile definition,
   set the properties as..Start import rows at option=6 .. generally 1 row goes as field names..so
   Start Import Rows at : 6

>  If  your source is a RDBMS, Override default SQL query as..

  SELECT * FROM EMP
  MINUS
  SELECT * FROM EMP WHERE rownum <=5

  so that records from 6 to n will be extracted by SQ..


> Other way..
   get the seq numbers using Seq-Gen-Tr.
   and in the Filter trasformation put condition as : seq_num > 5 .
   so only records from 5 - n records will be loaded into the target.

 

Thursday, June 9, 2011

While performing incremental reading using mapping variable using setvariable how can i increment date value ,can  u  explain in detail,which function i have to use ?

> Generally Incremental reading will be done using Dates comparisions.
src_date_field > $$Mapping_var_date  ... etc.. to load only recent records.. etc..


> Our sample senario is ... Wee need to extract the records .. ' whose update/insert date is to be 3 days before from Todays date '..
so i have set the mapping variable as...


       SETVARIABLE($$Date_To_Extract, ADD_TO_DATE(SYSDATE,'DD',-3))


- here $$Date_To_Extract is Mapping Variable...


-- Ram

Wednesday, June 8, 2011

Even Records - Odd Records Loading..

How to load odd values in one target  and even values another, if my source is rdbms and for flat files?


If your Source is RDBMS

Design two piplines with different targets as you wish.. order of ports should be same..

In SQ transformation Override the query as...
In first pipeline SQ
-- SELECT * FROM EMP WHERE (ROWID,0) IN (SELECT ROWID,MOD(ROWNUM,2) FROM EMP)
In second pipeline SQ
-- SELECT * FROM EMP WHERE (ROWID,1) IN (SELECT ROWID,MOD(ROWNUM,2) FROM EMP)

If your Source is a Flat file..

Generate Seq Nos using Seq-Tr by deriving new port in Exp-Tr.. 
Create Router group..create one group.
> Group1   --   mod(seq_no,2) != 0  -- connect to Even records Table
> Default Group  -- connect to Odd Records Table...


- Ram

Tuesday, June 7, 2011

Dynamically Changing Flatfile Name

How to change Target file name with session run time


If your Infa is 8.x later ver.. then you can go ahead as Swetha suggested..
> Add 'FileName' port to your target definition as a new port.
> and set  the Date Format setting for that Port in Properties..
> You can Derive one o/p port in Exp-tr with correct datatype and precision
> O_FileName :  'Tgt_File_'||TO_CHAR(SESSSTARTTIME,'DD/MM/YYYY HH24:MI')||'.csv'
Here Instead of using  'Sysdate' better to use SESSSTARTTIME .

Another way is....

> In the Workflow create a Workflow variable $$Se_End_Time
> In the Data Flow.. create Session task and Assignment Task .. and connect them
> In the assignment task / Expressions /
$$Se_End_Time   =  TO_CHAR($Session_Name.EndTime,'DD-MM-YYYY')

> Create a Parameter file
[Folder_Name.WF:wkf_Name.ST:session_Name]
$FileName=Tgt_File_
$DatePart=$$Se_End_Time


In the above file ..
> $$Se_End_Time is the Workflow variable....

In the Session Properties...
> Output FileName :  $FileName$DatePart .csv
> and set the parameter file path ...


-- Ram

Friday, May 27, 2011

Working With Decision Task


How can I execute part of a daily workflow only on Sundays using a decision task?

> Put a decision task just before the main session which needs to be run only on Sunday.
>
In the decision task, give the condition as to_char(sysdate,'DAY')='SUNDAY'.
>
In the Link task to the downstream session, give the
$Decision_Task.Condition = TRUE and $Decision_Task.Status = SUCCEEDED.

Wednesday, March 16, 2011

Java Transformation Example

Converting One Row into Multiple Rows...Using Java Transformation..
Cosider my Source is a Flatfile and the data is like
ORDER_NO,ORDER_NAME,CHAIN
101,SOAP,4
102,CREAM,3
103,SHAMPOO,2
104,OIL,1

Now i want my target as..
 Target
ORDER_NO,ORDER_NAME,CHAIN
101 SOAP 4
101 SOAP 4
101 SOAP 4
101 SOAP 4
102 CREAM 3
102 CREAM 3
103 SHAMPOO 2
103 SHAMPOO 2
104 OIL 1



I am going to achieve this using Java-Transformation
> Create Java Tranformation and create InputGrooup and OutputGroup with ports as shown in diagram.
> Connect the ports from SQ to Exp-Tr(not needed) and then to InputGroup of Java-Tr.

> Open Java-Tr properties and chose Java Code tab.. and write the code as below..


> Compile the Java code for any errors..once code gets Successfully compiled..create session and workflow..and run the session..observ the output ..


-- Ram


Monday, February 21, 2011

 

-- By RK

Database Indexing for the Lookup SQL - A myth

I have seen people suggesting an index to improve the performance of any SQL. This suggestion is incorrect - many times. Specially when talking about indexing the condition port columns of Lookup SQL, it is far more "incorrect".

Before explaining why it is incorrect, I would try to detail the functionality of Lookup. To explain the stuff with an example, we take the usual HR schema EMP table. I have EMPNO, ENAME, SALARY as columns in EMP table.

Let us say, there is a lookup in ETL mapping that checks for a particular EMPNO and returns ENAME and SALARY from the Lookup. Now, the output ports for the Lookup are "ENAME" and "SALARY". The condition port is "EMPNO". Imagine that you are facing performance problems with this Lookup and one of the suggestion was to index the condition port.

As suggested (incorrectly) you create an index on EMPNO column in the underlying database table. Practically, the SQL the lookup executes is going to be this:

SELECT EMPNO, ENAME, SALARY
FROM EMP
ORDER BY EMPNO, ENAME, SALARY;

The data resulted from this query is stored in the Lookup cache and then, each record from the source is looked up against this cache. So, the checking against the condition port column is done in the Informatica Lookup cache and "not in the database". So any index created in the database has no effect for this.

You may be wondering if we can replicate the same indexing here in Lookup Cache. You don't have to worry about it. PowerCenter create "index" cache and "data" cache for the Lookup. In this case, condition port data - "EMPNO" is indexed and hashed in "index" cache and the rest along with EMPNO is found in "data" cache.

I hope now you understand why indexing condition port columns doesn't increase performance.

Having said that, I want to take you to a different kind of lookup, where you would've disabled the caching. In this kind of Lookup, there is no cache. Everytime a row is sent into lookup, the SQL is executed against database. In this scenario, the database index "may" work. But, if the performance of the lookup is a problem, then "cache-less" lookup itself may be a problem.

I would go for cache-less lookup if my source data records is less than the number of records in my lookup table. In this case ONLY, indexing the condition ports will work. Everywhere else, it is just a mere chanse of luck, that makes the database pick up index.

I'm sure you will have many questions regarding Lookups after reading this blog article. I'm ready to answer, fire away.
-- By RK...


Designing the Mapping/Workflow Efficiently
(Performence Wise....)
  1. I would always suggest you to think twice before using an Update Strategy, though it adds me certain level of flexibility in the mapping. If you have a straight-through mapping which takes data from source and directly inserts all the records into the target, you wouldn’t need an update strategy.
  2. Use a pre-SQL delete statement if you wish to delete specific rows from target before loading into the target. Use truncate option in Session properties, if you wish to clean the table before loading. I would avoid a separate pipe-line in the mapping that roans before the load with update-strategy transformation.
  3. You have 3 sources and 3 targets with one-on-one mapping. If the load is independent according to business requirement, I would create 3 different mappings and 3 different session instances and they all run in parallel in my workflow after my “Start” task. I’ve observed that the workflow runtime comes down between 30-60% of serial processing.
  4. PowerCenter is built to work of high volumes of data. So let the server be completely busy. Induce parallelism as far as possible into the mapping/workflow.
  5. Needless to say, if any transformation waits for complete source data to be arrived in before it can proceed further in the mapping, use a sorter transformation to speed up the process. For eg: an Aggregator transformation, a Joiner transformation etc will perform well if sorted data is given to them and that a sorter transformation at the start will make the efficient.
  6. Push as much filtering as possible into the SQL override of the source-qualifier/lookup to make the database handle most of the filtering as databases are experts in doing that stuff than using a filter transformation.
  7. As I always say, the above task is similar to manual push-down optimization.
Try to push as much processing as possible into the database source or target based on the necessity. Few instances where we can do so.
Using analytics
Instead of using aggregator transformation or a combination of sorter/expression/filter for occassions when they are demanded, we can make use of the anlytic functions of the database. For eg: While reading your source data from the table, you may need to identify the last record from the source. For this requirement, the first option that comes into mind is something like this:
-One source pipeline picking up the data records and the second pipeline counting the number of source records and eventually you equate that number to the current record number. When they are equal, it implies that the current record is the last record from the source.
-People resort to a count(*) and GROUP BY from SQL or an aggregator/sorter/expression/filter combination for the same. Instead of this, the analytic functions provide you greater flexibility. The same task can be acheived with the below SQL.

SYNTAX
SELECT col1,col2,col2,LEAD(col1) OVER (ORDER BY input_sort_criteria) next_record
FROM table_name
Ex:
SELECT EMPNO,ENAME,DEPTNO, LEAD(ENAME) OVER (ORDER BY EMPNO) NEXT_RECORD FROM EMP
For the last record with the given "input_sort_criteria", the value will be NULL.

Avoid Stored Procedures
I have seen many mappings with unnecessary usage of Stored Procedures. Stored Procedures "are" performance killers. If run for every record in the source, stored procedures consume much of the workflow run-time. A database connection has to be established each time for a record and then the execution takes place in the database. As you can see, this beahviour leads to worst performing interfaces.
A general guideline is that when you can do it in PowerCenter using transformations, do not use store procedures.
Conclusion:
        The two sections in this article seem to be conflicting with each other. In the first section, urged you to push as much processing as possible to the database and in the second section,  suggesting you to avoid database.
         When you connect to database using a Lookup or Source qualifier or pre/post-SQL procedures, you are really connecting to the database only ONCE for all the source records. When you connect to the database using a stored procedure transformation that is called for each record, then you are connecting to the database for as many times as the number of records in the source.
Ex: When you have a vehicle (a loading/unloading truck) at your disposal, would you take 100 bricks from one place to the other place at a time or each brick at a time in the vehicle?