Thursday, February 25, 2010

DW Concepts

Data Warehouse Concepts:

Slice :
Lets say there we want to look at year long sales for 3 particular stores(a,b,c). This concept is called slicing. The slicing of the members allows us to focus only on these three members across all other dimensions.

Dice:
The dicing concept means that you put multiple members from a dimension on an axis and then put multiple members from a different dimension on another axis. This allows you to view the interrelationship of members from different dimensions.

Drill-down and drill-up , Roll-down and Roll-up:
Drilling in multidimensional terminology means going from one hierarchy level to another. In other words, drill-down can be defined as the capability to browse through information, following a hierarchical structure.

Drill-across :
Drill-across is a method where you drill from one dimension to another. e.g. earlier we were looking at sales by region of products, now we drill across and now looking at sales by store in a region of products.

Grain:
The grain conveys the level of detail associated with the fact tablemeasurements. Identifying the grain also means deciding on the level of detailyou want to be made available in the dimensional model. The more detailthere is, the lower the level of granularity. The less detail there is, the higherthe level of granularity.

Each fact and dimension table is said to have its own grain or granularity. In other words, each table (either fact or dimension) will have some level of detail associated with it. The grain of the dimensional model is the finest level of detail implied by the joining of the fact and dimension tables.


It is possible to have multiple grains in one fact table. This can be accommodated by adding a column called the granularity flag e.g. you may have data at daily, weekly, monthly level.


Inversion Key Entry : Non Unique Index


Surrogate Keys:
Surrogate keys are keys that are maintained within the data warehouse instead of the natural keys taken from source data systems.
  • Data tables in various OLTP source systems may use different keys for the same entity. It may also be possible that a single key is being used by different instances of the same entity. This means that different customers might be represented using the same key across different OLTP systems.
  • Surrogate keys provide the means to maintain data warehouse information when dimensions change.
  • Natural OLTP system keys may change or be reused in the source data systems. This situation is less likely than others, but some systems have reuse keys belonging to obsolete data or for data that has been purged.
  • One simple way improve performance
  • Changes or realignment of the employee identification number should be carried in a separate column in the table, so information about the employee can be reviewed or summarized, regardless of the number of times the employee record appears in the dimension table.


Degenerate dimensions: Degenerate dimensions are dimensions without any attributes. They are not typical dimensions, but often simply a transaction number that is placed inside the fact table.

Conformed dimensions :

Slowly changing dimensions:

Garbage dimensions:
A garbage dimension is a dimension that consists of low-cardinality columns such as codes, indicators, and status flags. The garbage dimension is also referred to as a junk dimension.

Role-playing dimension:

A single dimension which is expressed differently in a fact table using views is
called a role-playing dimension.

Multi-valued dimensions:

Late arriving Dimensions:

No comments: