Wednesday, December 10, 2008

Stars and Snowflakes

Part 2

Continuing on the discussion about OBIEE (Oracle BI EE) structure of the Logical Layer - please read Part 1 before this chapter.

To set the context, The discussion is around your choices in table design in the warehouse and I am only referring to the reporting in OBIEE on the datawarehouse, and your design considerations for Logical Tables, logical Table Sources and Physical Joins in the Physical Layer. Another point to note is that I am not too concerned about the theoretical difference between Snowflakes and Normalised, in the example shown in Part 1 you could call that a Snowflake or a Normalised design. Some have described Snowflakes as semi-normalised stars which is a fair decription, but the important point for the discussion is that there is at least one join between dimensions, which instantly makes it a non-star design, and is on the road to normalisation.

In part three I will look at a mixed model - Logical stars with normalised physical tables.

I do believe the actual dataWarehouse design and ETL routines should follow what you decide in your logical design considerations. Having said that though, I don't think that the whole reporting system can be developed in a linear path, you often have to loop back at each stage when more clarity is found.

And finally, Please join in the discussion by posting your comments.

Now, on with the blog...

From Part 1 some of you may be thinking that I'm advocating that Stars are not always required in the warehouse. Well, I tried to put together a fair case for the use of normalised design as best as I could. Now, lets have a look at the case for a pure star.

Continuing with the Contact example used in part 1, we can redraw the model in a logical and physical pure star.


















Running the same report.















And obviously it works, but look at the SQL:


SELECT

T3489."ATTRIB_11" as c1,

sum(T3517."BLAH_VALUE") as c2

from "W_PERSON_DX" T3489,

"W_PERSON_F" T3517

where ( T3489."ROW_WID" = T3517."CONTACT_WID" )

group by T3489."ATTRIB_11"order by 1



So this time the join is direct in the physical layer, and therefore the server does not add in another table to get the answer.
Is this reduction in joins THE main reason why you design a pure star? Or are there other reasons why star = good?


So on the face of it there is no problem with either method. The star method was very efficient in the query, minimising the number of joins. But at what cost to the ETL process? Here, in this example, none because it joined to an existing key, however if you were talking about say, Account Id, then the mapping would have to look it up and add it to the target table. Of course the Star Transformation in the db will struggle with the normalised schema (or will it – answers on a postcard).

But. I cheated. Twice. Firstly, this is a very simple model, with little data in, so design is not vital.
Secondly, what if I want to do this …..


Part 2A

Let's build a hybrid. Surely we can use a star and introduce some level of normalisation and all will be well? Consider the following design.





The above 'star' has all the physical dimension tables in the single dimension logical table.

We have a logical star. With Physical table sources arranged in a physical star, except we have a few ‘spokes’ around some of the physical dimension tables (this is essentially what mosst people would call a Snowflake!). A common use here would be a ‘List of Values’.

Does it work? and is it 'better'?

There are two considerations

  1. Errors is Analytics generating SQL
  2. Duplication of data.

I'll explore these in part 3.

2 comments:

stijngabriels said...

Hi Adrian,
today I had to think about this article, when reading this topic on the OBIEE forum:
Simple Join - BI Answers Report

Reading some of the questions on the forum I also get the impression that more and more people are reporting against an existing transactional schema instead of an ETL-driven datawarehouse with proper star schemas.
The problems they have are caused by data which is not "perfect" for reporting and they are not allowed to change the data. Now they expect OBIEE to do the trick.. In the end I think this will result in bad design (like using outer joins where you don't really need to).
Regards,
Stijn Gabriels

Palani said...

Yeah very true that many customers would be happy to report with OLTP rather than using costly Warehouse.

But the other Problems i do see apart from Outer Join are

1) Flattening Option
Say you have another Child entity to Person that makes "many side" in the Child end.
E.g Person Phone which stores Fax number, Pager, handset, work and home number based on some phone type column
Now with Star approach you would have flattened.
Other wise it would have a bad UI experience when you pull Person Phone number and Measure say sales amount, you would get the measure repeated for each phone number.
to worsen that think of many child entities like Address, email etc to person


2) SCD Type 2 equivalent?

In OLTP we have effective dated entities (most probably modeled as Dimensions) or versioned entities where the natural key is not unique by itself but with help of effective dates or version numbers.
You can ETL the correct surrogate key for each fact record and populate it.
Adding to that how about event fact which is not point in time but for Period
e.g I have person event fact which has person id and Person event and Person event start date and Person event end date?
How could we handle it without that ETL option in Snow flaking/ Normalizing?


3) Normalized Transaction Entitles

We use to have rich Facts in Star, in terms of dimensions
But in OLTP you have got this Fact data not coming directly from one table.
Though can be bridged with help of SQL or Materialized View, but then think about the complexity
+ We can’t truly exploit the aggregate awareness feature of OBIEE until its persisted.

I am not fan of ETL to raise these issues, I am also facing similar heat to mimic what ETLed star could do directly on OLTP :(

On the Positive Side We have Real Time Data and few reports and Dashboard might work, but customer should be refrain from creating Ad-Hoc reports, as there can be many cases where it might not be functionally correct also

The Cowes

The Cowes
Cowes Racing