Category Archives: Informatica

Partition, Index in SQL


Partition:

The most widely used partition for improving performance is Range by partition. The data in tables are physically partitioned based on the parameters we choose. It can be daily, weekly, monthly or quarterly based on the volume of data we are processing. Say quarters, the data will be partitioned into four quarters.

CREATE TABLE t1(c1 INT) IN tbsp1, tbsp2, tbsp3

PARTITION BY RANGE (c1)

(STARTING (1) ENDING (99) EVERY (33))

Index:

It is primarily used for better query performance for searching. We can create index for a field or a set of fields. It’s like a primary key index. But it may cause performance degradation. Either we can create clustered index or else unclustered index. Creating clustered index degrades performance as it sorts the data in key order and every time tables will be sorted resulting in longer query execution times.

Best practice:

Create partition and index on same set of columns like below,

……partition by range (Region) create index <idx_name>(Region)

Pushing data to Hyperion Planning Applications from Informatica 9.1


With EOL notice from Informatica for its 8.x versions, to get continuous support from Informatica it’s mandatory to migrate to 9.x versions.

Hyperion Planning application

On analyzing our old environment in 8.1, we found that 9.1 doesn’t support Hyperion Planning Application connection but we need to push data to planning application. We need to replace the existing planning adapter targets in the mapping with a delimited flat file. This flat file will act as source for the data push to Hyperion Planning.

We can push data to planning application by two methods,

  1. Java code which makes use of Hyperion planning library files.
  2. Through ODI (Oracle Data Integrator)

Method: 1 (Using Java Code)

It involves two steps,

  1. Java code

We first need to develop a java file which makes use of HSP.jar library file of Hyperion planning application. The program flow includes locking a dimension before doing a data push, checking for UDA and finally it saves the dimension member using saveMember() function.

  1. Batch file

This batch file should call the above java file and the parameters are planning application name, server, RMI port no, user name, password and delimiter.

In post success command of a session, we will call this batch file along with the parameters. The problem was that it took more time to complete and it was also noted that throughput was very less while saving the records in planning application using saveMember().

Method: 2 (Using ODI)

The reason why we went on to implement ODI is that we don’t need to get new licenses if we are using Hyperion suite of applications.

Steps in ODI Designer:

  1. Install and configure ODI, in the same server where Informatica is installed.
  2. Reverse engineer the Hyperion planning knowledge module.
  3. Create planning models (targets) and file models.
  4. Create interfaces for all the UDA’s, attributes and dimensions and specify log file names.
  5. Create packages for each dimension say Customer, the flow should have first its attribute interfaces, UDA’s if applicable and then customer interface.
  6. Create scenario’s (Scenarios will be called from batch file) and publish the package.

Steps in Topology Manager:

We need to create physical schemas and context for the different planning environments. Generally we would be having two planning applications viz budgeting and forecasting for planning DEV, UACC and PROD.

Scenario’s can be started from command prompt and the syntax is,

startscen <<scenario name>> <<version no>> <<context name>> <<trace level>>

A batch file needs to written to call the scenarios from command task

set EXECPATH=D:\Oracle\ODI\oracledi\bin

cd /d “%EXECPATH%”

call startscen.bat %SCENARIONAME% %VERSION% %CONTEXT% %LOGLEVEL% %TRACELEVEL%

In Informatica workflow designer, worklet variable to specify the context name at run time needs to be created and we will be passing the values in parameter file. Call the batch file created specifically for this purpose in post session command task.

Conclusion

On comparing these both methods, through ODI we found that time taken for data load was consistent and was near to the time taken by Hyperion planning adapter in Informatica 8.1 than through JAVA code.

Connecting Informatica and Social Media (Facebook)


Informatica Power Exchange is a data access tool to connect and access enterprise wide data irrespective of the type of databases like SQL Server, Oracle, Netezza, Sybase, Salesforce, Teradata, Green Plum, all kinds of Mainframe data, SAP data, etc,. Power exchange is similarly used to access Hadoop data.

The most interesting thing is using Informatica to access social media data on Facebook, LinkedIn and Twitter. Nowadays this happens to be latest real time data. By using Informatica we can collect data from any of the above social media and provide that to sales and marketing teams to leverage the more from the social media to meet their sales and revenue targets.

 Consider the scenario, if a FMCG or mobile company launches a new product, immediately the social media is flooded with the comments and opinions of the people. It will decide whether the product is success or flop in the market. Based on the outcome the product can be tweaked and released as some version 1.1. The management can plan and budget their revenue forecast and profits. The power of Power Exchange can be used along with Power Center to reap maximum of benefits.

Implementing Informatica Partitions


Implementing Informatica Partitions.

Connected Lookup and un-connnected lookups in Informatica


I was caught off-guard suddenly, when I was asked about the scenarios where connected and unconnected lookup is used in Informatica. I have given some of the scenarios and how to use the connected, un-connected lookups.

Generally a lookup is used to fetch a data from other relational table/flat file that is not available in the real source. For example consider the case that we have DeptId column in the EMP table which will be used as source but we need DeptName which may be present in DEPT table.  In such cases we can use the lookup transformation to the DEPT table and the condition for retrieval will be based the key column which is Dept.

Connected lookup is the same as mentioned above, the problem here is for each and every row in the source, it will lookup the value from the huge cache bringing down the performance if the volume of data to be processed is large.

Un-connected lookup is used where we need to look-up only for 1000 rows out of millions of rows (i.e) we are aware that we need to populate DeptName and there will be only 1000 rows meeting the functional condition. There may be also other case such as where we need to lookup only if the Salary is above 10000 and not lookup for other Salary values.

 We can use un-connected lookup by calling the lookup from an expression transformation with a parameter which will be to the input port in the lookup transformation. Always un-connected lookup will return a single value which may resemble a function.

Informatica-Variable Ports


Variable ports can be used in places where you need to retrieve previous rows data. But the order in which we create variable ports is very important. Generally the order is

1.Input Ports

2.Variable Ports

3.Output Ports.

But the order can change  according to the output we need.

Here comes a small example about the usage of the variable port.

We need Students in Dept  to be generated in Informatica and we can use variable port to generate this numbers

The below screenshot shows the mapping

The properties of the Expression transformation are showed in the screenshot below.

Cache memories in Informatica


All the people know that there are three different types of cache memories viz..

1.Static Cache,

2.Dynamic Cache,

3.Persistent Cache.

Let me explain how exactly these caches work and when they should be used.

Static Cache

This has session lifetime and once the session is complete,the cache is deleted. It brings entire records from the database into the cache. Its default cache for lookup tables. Informatica takes time to create cache and affects performance. It should be used only when the tables used in the mapping aren’t used anywhere in the workflows.

Dynamic Cache

It should be used when they databases are large. The principle behind it is,it will bring records one by one into the cache from the database. If the record is present already(Identified using key columns),then the particular record is not brought into the cache. It will help when the database has lot of redundant records.

Persistent Cache

The lifetime of this kind of cache is entire workflow. When a particular table is being used in many sessions across the workflow, the table can be made as persistent cache in the lookup table properties. It will be improving the performance considerably.

Performance Tuning in Informatica


We use Informatica for data integration and the choice is primarily due to its efficiency. But sometimes due to some wrong decisions we aggravate the situation through wrong decisions.

Below are some of the things which I came across my project so that Efficiency can be improved, leading to good efficient mapping.

1. Remove unused ports in the source qualifer.

2. If you use aggregator transformation, enable sorted input option.

Two ways are there,

2.1. Keep a sorter transformation before aggregator, sort the key columns alone.

2.2. You can avoid sorter by using order by clause in the SQL override option of source qualifier.

3. Enable persistent cache of the lookup tables that are to be used in the lookup transformation, if the same tables are          being used in the upcoming mappings.

4. Use Parameter files for relational connections in the source and targets by specifying parameter variables like                    $Source and $Target. Specify the path of the files in UNIX server in the workflow properties. It eliminates the need          setting connections manually and it becomes tedious when you move the mappings and sessions across the various        environments like QA,PROD and DEV.

5. Restore the tracing levels to Normal in the tranformations, so that no data is collected at repository after testing.

6. Enable version controlling so that we control the changes that are being done. In case of Unix servers create unix            groups so that only users of that particular group have access.

Recovery Strategies in Informatica


While executing a workflow we may be happy if runs successfully without any of its sessions or any tasks failing. There are cases where the sessions get fail due to fatal error which happens due to problem in creating logs by Integration service.
The entire effort goes waste if the target tables are used in more than one workflow.The worst case may be arise tables needed to truncated again since there may be weekly,biweekly or monthly run which mostly occurs based on how the reports are used by senior management.
To avoid this painful thing from happening Informatica has provided lot of options that can be used according to different scenarios.

Workflow Level:
In workflow properties set Enable HA recovery and don’t forget to set the number of times the session should be restarted before a workflow gets failed.
The above mentioned option will help in case of session failure due to connection interruption or fatal errors.

Session Level:
Select any of the three options such as
1.Restart task
2.Fail task and continue workflow.

In case of the dependencies on targets set options like faiure parent if child fails.