In the previous postings in my Oracle BI Apps series, I’ve looked at the BI Apps architecture, installation and configuration, and performing the initial data load. As one of the key benefits of the BI Apps is its extensibility, one of the first things I’d want to know if I was a customer is exactly how you go about extending it. Oracle have a published methodology for adding new data into the packaged data warehouse both in terms of new rows, new columns and even new data sources, but as this is all based around the Informatica ETL tool, that Siebel originally did an OEM deal with and Oracle now support at least until they can get ODI up to speed, the first thing you’re really going to need to know is how to work with Informatica. The good news is that if you’re familiar with Oracle Warehouse Builder the two tools are actually fairly similar, at least in terms of the developer GUI, and so if you’re comfortable with one, as I am, you’ll quickly pick up the other.
So what I thought I’d do then, before diving into the methodology used for extending the data warehouse, is run through an end-to-end example of building some Informatica Mappings, creating a workflow and using it to load some data into a set of Oracle tables. The data and scenario I’m going to use is actually from our OWB class that we teach, where at one point we load data from a set of normalized tables and file into a set of staging tables – this will allow me to compare building the scenario in OWB and then in Informatica.
Informatica PowerCenter comes in both server and client versions, with all of the design work being carried out in the client. To start off the project then, I connect to the PowerCenter Repository Manager and create a new folder in my repository, which performs a similar function to a project in OWB.
You can see in the repository all of the folders containing mappings to take data out of the various Oracle BI Apps source systems – Oracle E-Business Suite, Peoplesoft, SAP, Siebel – plus any custom ones you add. It’s worth noting at this point that Informatica PowerCenter, as provided by the BI Apps, is licensed only for loading the BI Apps data warehouse, you can’t use it for anything else, so what I’m doing here is only just for learning, you can’t adopt PowerCenter as your general ETL tool unless you go out an license it for general use.
Once you create your folder, you then switch over within the same application to the “Source Analyzer”. This is a feature of the Repository Manager, like the Oracle Module wizards in OWB, that connects to a set of relational tables and imports the metadata into the repository. This seems to work off of ODBC rather than native Oracle connections, however when mappings run later on they can be configured to use OCI in the same way that OBIEE lets you. Suffice to say that connecting to non-Oracle sources is very easy, much easier than OWB or even ODI where you have to locate, and then fiddle around with specific JDBC drivers before you can work with these sources in a project.
Once you’ve brought your sources in it’s a similar process to bring in your target tables, again this is done via ODBC with the ability to switch to OCI during the mapping phase.
Flat files are also pretty straightforward to bring in, with an Excel-style import wizard that performs much the same function as the OWB one.
Of course it’s at this stage that you start to notice a few differences between Informatica and OWB. With OWB, the wizard that steps you through file importation is building an SQL*Loader control file, whereas of course the Informatica equivalent is setting up some metadata that drives an internal Informatica file reading process, there’s no SQL*Loader being used. Also, in OWB you often take these file definitions and turn them into external tables whereas there’s no real concept of this in Informatica. The mapping process is even more different, but I’ll get on to this in a moment.
Once you’ve imported in your sources and targets, you can review the list of tables in the Repository Manager.
Now that we’ve got some data in place, it’s time to do the first mapping. In this one, I’m taking data from three product source tables, joining it and loading it into a staging table. In OWB, the mapping would look like this:
The equivalent in Informatica, using the Informatica Designer application, would look like this:
A few points to note. Firstly, it’s actually very similar to OWB; each operator has input and output mappings, you drag and drop sources and targets onto the palette and then connect them using lines and transformation operators. On first glance, OWB has a richer set of transformations out of the box (there’s no equivalent of name and address matching, match-merge and so on) whilst Informatica has better out of the box support for alternative data sources and targets such as XML, CDC, HTTP and so on. Data sources in Informatica mappings are accessed through a “Source Qualifier” object that lets you filter the data before it goes into the mapping (removing the need for filter operators just after objects), and interestingly joins only seem to support two inputs whereas of course in OWB you can add several.
When I started thinking about joins it got me thinking about how, under the covers, Informatica is doing the data integration and loading. With OWB, the database does all the integration using SQL selects, inserts, joins, table functions, merge commands and so on, whereas in Informatica you’ve got a separate hub engine that does the work. Now in some ways that seems to offer advantages – you can just drag and drop any data sources and targets together and the Informatica Integration Service just gets on with working out the optimal, row-by-row way to get data from A to B. With OWB, you’ve got to be careful about the SQL that you’re getting OWB to generate which offers advantages and disadvantages – in the hands of an expert you can built some pretty fast, efficient integration, but it’s fairly easy to get OWB to generate some pretty hairy SQL if you’ve got multiple extract and load stages in a single mapping. I suspect with Informatica the environment is more controlled, more predictable but looses some of the capability to produce super-fast ETL routines that you could potentially get with OWB. I’ll have to see how things transpire once we start using the tool on more projects.
So once you’ve pulled all your mappings together it’s time to create a workflow, which in OWB terms is a process flow. Creating a workflow is pretty similar to performing the same task in OWB, you start off by adding tasks to the workflow process, connect them to the relevant data sources (this is where you can switch to an OCI connection) and string them all together.
Then it’s just a case of pressing the “Go” button and checking that the workflow has executed correctly. In OWB you’d use the Control Center Manager to do this, in Informatica you’d use the Workflow Monitor, in my case most of the processes have run but one of them, that uses a flat file, has failed.
Taking a look at the error log, I can see that any flat files obviously need to be in the /infa_shared/src_files directory.
After copying the file to the correct location the process then executed correctly.
So, that’s a simple set of mappings taken place. Of course there’s a lot more to Informatica than just this, but I think now we’re ready to use the tool in earnest to start customizing the BI Apps data warehouse. More on that in a couple of days.
SL
While I found OWB easier to pick up as a tool, Informatica has a lot of functinality which makes it more flexible. Yes, they are both similar in many ways too which is why I found it easy to switch between them.