Oracle Analytics: January 2009

As you would expect, the debate still goes on as to what is happening on the OBIEE ETL front.

As all you ex-Siebel people will know Informatica is the system that every uses for their databases. It was licensed by Siebel to run their CRM ETL Packaged Applications. Then when Oracle bought out Siebel they stated that Informatica was still the supported platform, even though they had their own tool, OWB. Now, OWB appears to be a great tool, but Oracle did not want to re-write all the Application (Mappings, Workflows, DAC, Documentation) and force their customers onto new software.

However, they then bought an ETL product company and launched another ETL (or ELT) tool called ODI - Oracle Data Integrator. The immediate speculation was that this would replace the Informatica tool in the Applications development. At the conferences I went to last year Oracle were still saying that Informatica was the main tool, but hinted at the the possible use of ODI in the future. In this document from 2007 they only mentioned OWB in the BI suite! and This Oracle Fusion roadmap does not state which tool is king.

Now there is a rumour that Oracle are beginning to use ODI for some applications. Mark Silialks (Oracle employee) pointed out in a debate on LinkedIn that the Oracle stated direction for Financial Analytics is the use of ODI - I haven't verfied this yet, but I think Greg Hutchinson summed it up nicely when he said that "Using ODI for Financial Analytics busts one of the key design advantages....conforming dimensions across all 42 OBI EE modules. So imagine a company that buys an OBI EE SLA now. Does this mean they must support both ODI for one part and Informatica for the rest? What about upgrades? or buying additional modules? "

It is clear from customers that they like Informatica because it is skillset that is already known in the business and can be available from the market (contrators, consultancies, employees).
They have also invested a great deal of time customising - thats what I do! - Adding missing functionality, adapting for changes made in the OLTP, adding other external data.
They do not want to have to repeat all of that work, learn a new skillset and be held to ransom on rates because of a skills shortage.

I'll keep looking out for more signs from Oracle but for now stick to the Informatica stuff.

A Quick recap:

From Part 1 we saw that you can organise logical tables in a normalised fashion, and in a simple model the SQL will work without errors. The use of a normalised model in the Data warehouse is not something that you would normally see being advocated by the Warehouse experts, and yet we face pressure from clients to reduce the development time and complexity of the warehouse, which can lead to so called sub-optimal performance.
From Part 2 we saw that organising the logical model into a proper star led to the most efficient SQL being generated. This SQL will enable STAR Transformation to work in an Oracle DB. We also see that for this to work you need all the relevant foreign keys on the fact.

In this section we are going to explore the Hybrid design. This design is common to most implementations that I have seen.
The logical model is designed as a star. However, the physical model is a Snowflake.

The LOGICAL MODEL.

I have only shown one dimension here for simplicity, but there can be several dimensions attached to the fact.

Here I have shown three logical tables to demonstrate the different methods for the Logical Table Sources (LTS)
If you look at the Logical table sources in W_PERSON_D, you will see that I have three set up, with the LTS name being the same as the underlying Physical table:

In this example each one contains just one Physical Table. The first one contains just the HYBRID_PERSON_D:

And the column mapping is...

If we look at the column mapping for PERSON_DX

We can see that only those columns that exist in the physical table are matched to the Logical Columns. This will force the engine to use this LTS if any of these fields are used. Also note that I have mapped the ROW_WID field to the DX field that contains the Person Row WID.

Now let’s look at a report being run using this Logical Dimension.

No errors and the SQL does the magic of combining the separate Logical Sources

select T5260."FULL_NAME" as c1,
T5303."ACCNT_WID" as c2,
T5303."BRICK_WID" as c3,
sum(T5274."BLAH_VALUE") as c4
from
"W_XREF_PERSON_M" T5303,
"W_PERSON_D" T5260,
"W_PERSON_F" T5274
where ( T5260."ROW_WID" = T5274."CONTACT_WID"
and T5260."ROW_WID" = T5303."CONTACT_WID" )
group by T5260."FULL_NAME", T5303."ACCNT_WID", T5303."BRICK_WID"

And if I just want to query against the Name:

Then the SQL shows that only the relevant tables are queried:

select T5260.`FST_NAME` as c1,

sum(T5274.`BLAH_VALUE`) as c2

from `W_PERSON_D` T5260, `W_PERSON_F` T5274

where ( T5260.`ROW_WID` = T5274.`CONTACT_WID` )

group by T5260.`FST_NAME`

(unfortunately I am using Access for my sample database and Access does NOT support comments in SQL, otherwise you would see which LTS has been used)

Now let’s look at Person V2

Here we can see that I have added just one LTS

But that LTS has three physical sources,

And if we run a request:
(the same one as above)

Then this time the SQL is as follows:

select T5260.`FST_NAME` as c1, sum(T5274.`BLAH_VALUE`) as c2

from `W_PERSON_D` T5260, `W_XREF_PERSON_M` T5303, `W_PERSON_F` T5274

where ( T5260.`ROW_WID` = T5274.`CONTACT_WID`

and T5260.`ROW_WID` = T5303.`CONTACT_WID` )

group by T5260.`FST_NAME`

As you can see W_PERSON_D and W_XREF_PERSON_M are included in the SQL but not W_PERSON_DX? Why? W_XREF_PERSON_M is not even used in the request columns and it does not help the SQL to answer the request!
We know that PERSON_DX is joined to PERSON_D using a physical foreign key, whereas PERSON_M is using a logical key. So let experiment with the keys to see what happens.
I changed the join between PERSON_D and PERSON_DX to a physical join, not a physical foreign key, and re-ran the report, which resulted in the following SQL.

select T5260.`FST_NAME` as c1, sum(T5274.`BLAH_VALUE`) as c2

from `W_PERSON_D` T5260, `W_PERSON_DX` T5267, `W_XREF_PERSON_M` T5303, `W_PERSON_F` T5274

where ( T5260.`ROW_WID` = T5267.`ROW_WID` and T5260.`ROW_WID` = T5274.`CONTACT_WID` and T5260.`ROW_WID` = T5303.`CONTACT_WID` )

Group by T5260.`FST_NAME`

This time the PERSON_DX tables was added into the SQL even though it is not used in the request.

Now let’s have a look at a third way. This combines the two methods above of creating the LTS’s.
Here I have created the dimension with 3 LTS

Each one contains the W_PERSON_D table. The first is on it’s own, the second contains both Person D and DX, the third Person D and XREF M table. The choice for the developer with this set-up is what fields to map where. Take ‘First Name’, do you map this to just the first LTS or all three?
If I map it to all three then run the following report

I get the following SQL

select T5260.`FST_NAME` as c1, T5267.`ATTRIB_11` as c2, sum(T5274.`BLAH_VALUE`) as c3

from `W_PERSON_D` T5260, `W_PERSON_DX` T5267, `W_PERSON_F` T5274

where ( T5260.`ROW_WID` = T5267.`ROW_WID` and T5260.`ROW_WID` = T5274.`CONTACT_WID` )

group by T5260.`FST_NAME`, T5267.`ATTRIB_11`

order by 1, 2

One piece of SQL gets all the answers
Now if I map ‘First name’ only to the HYBRID_PERSON_D LTS and re-run the report, I get

select T5260.`FST_NAME` as c1, T5267.`ATTRIB_11` as c2, sum(T5274.`BLAH_VALUE`) as c3

from `W_PERSON_D` T5260, `W_PERSON_DX` T5267, `W_PERSON_F` T5274

where ( T5260.`ROW_WID` = T5267.`ROW_WID` and T5260.`ROW_WID` = T5274.`CONTACT_WID` )

group by T5260.`FST_NAME`, T5267.`ATTRIB_11`

order by 1, 2

ITS THE SAME! The Analytics engine just puts together the SQL it needs, irrespective of your chosen method of LTS.

Summary

So what is going on? In the example model we have seen that
it does not matter what logical or physical model is used, OBIEE still works.
It does not matter how you set-up your logical table sources, OBIEE will put them together in a small query to get the results.
If you use a physical join, then OBIEE will always use the joined to table in the query, even if it’s not necessary.
If you use a Physical Foreign Key, and have the joined table in an LTS, it will only use the joined table when it is needed in the results.

So does design Matter?
Given the evidence so far, it would appear that you can do without a specific design. A complete beginner can do what they like and OBIEE will deal with it. So beginners can get away with building however they choose. If your dataset is small then performance will not be an issue, and if speed of response is bad then designers can always blame the network/db/server/web! When a project starts out, or is just a prototype then you do not need to consider the future addition of facts and dimensions so who cares what the technical stuff looks like. If you want to impress the client by saying you can install in 3 weeks then you had better forget design cause this takes longer, and once you have wowed the client you will get further work and no-one else will see your build (I know who you are!).

If a client has gone for the cheapest option, ie offshore, then they do not care about the quality of the work done, so build it however you like – they will never know.
But, I care what work I produce, and I normally work on very large databases which will not forgive poorly written SQL. If you have an Oracle environment then you want to maximise the use of Star transformations. For any large system you will want to take advantage if summaries. In my simple model I have used tables which join using the Person unique code. I have also not used any aggregations. There are no Hierarchies. There are no Level based measures. There are no visibility filters in place.
All of these features are what make OBIEE a useful tool for the analysis of large amounts of data, by end users with no need to fully understand SQL. Without these features, Drilling through data levels to find the specific issues would take too long. Summary reports would be too slow and users would revert back to Access, Excel and SAS. In fact, if your users still insist on using Excel or Access for a large amount of their analysis then your installation of OBIEE DESIGN is failing (You cannot totally remove Excel, in fact you should be encouraging some integration with OBIEE).

In the real world the joins in a dimension can be numerous, not use the Dimension keys and sometimes involve complex joins. The joining of tables in a one-to-many relationship can result in multiple results for a single dimension e.g. Person, which when used with a fact causes duplication of facts. If you have multiple Logical Table Sources then you have to be sure they will work together. If the user were to choose a field from each source do they get an error? If so why?

How do you incorporate Dimension attributes, for example a Contact could be a Doctor who works at a particular Hospital, but later changes to another Hospital. Do you store the doctors hospital in the facts where contact exists (true Star Schema style) or in the Contact dimension, or linking from Contact to Account (Snowflake) or in an SCD? If you use a pure normalised model then this is not a problem, or is it? How do you reflect history? Do all the patients the doctor has treated over time get added up in his new hospital?

So my point is, yes you can do quick and dirty, and you can use normalised schemas or snowflakes or Pure Stars in OBIEE. But please do so knowing the consequences, and have good reasons for doing so. Don’t blindly follow a star schema for small implementations, which need minimal amounts of ETL. Do use as pure a star as possible for very large implementations, or where multiple facts share common dimensions.

Oracle Analytics

Wednesday, January 21, 2009

Another New Blogger

Monday, January 19, 2009

Localisation of OBIEE

Thursday, January 08, 2009

How is your Croatian?

Informatica vs ODI

Friday, January 02, 2009

OBIEE Snowflakes and Stars - Part 3

The Cowes

Labels

Blogroll

My Blog List

Analytics Links

Sailing Links

Other Links

Contributors

What is your current Rate for OBIEE Consultancy (Euro per day)

Followers

Blog Archive