ORACLE DUAL TABLE;
SQL> SELECT LEVEL FROM DUAL CONNECT BY LEVEL < 4
SQL> SELECT COUNT(*) FROM DUAL;
COUNT(*)
----------
1
SQL> DELETE FROM DUAL;
DELETE FROM DUAL
*
ERROR at line 1:
ORA-01031: insufficient privileges
SQL> INSERT INTO DUAL VALUES ('X');
INSERT INTO DUAL VALUES ('X')
*
ERROR at line 1:ORA-01031: insufficient privileges
CONNECT SYS AS SYSDBA
SQL> INSERT INTO DUAL VALUES ('X');
SQL> COMMIT;
SQL> SELECT COUNT(*) FROM DUAL;
2
SQL> SELECT * FROM DUAL;
D
-
X
SQL> SELECT USER FROM DUAL;
USER
------------------------------
SYS
DUAL is owned by SYS. SYS owns the data dictionary, therefore DUAL is part of the data dictionary. Never update the dd table directly.
Tuesday, April 27, 2010
Thursday, February 25, 2010
DW Concepts
Data Warehouse Concepts:
Slice :
Lets say there we want to look at year long sales for 3 particular stores(a,b,c). This concept is called slicing. The slicing of the members allows us to focus only on these three members across all other dimensions.
Dice:
The dicing concept means that you put multiple members from a dimension on an axis and then put multiple members from a different dimension on another axis. This allows you to view the interrelationship of members from different dimensions.
Drill-down and drill-up , Roll-down and Roll-up:
Drilling in multidimensional terminology means going from one hierarchy level to another. In other words, drill-down can be defined as the capability to browse through information, following a hierarchical structure.
Drill-across :
Drill-across is a method where you drill from one dimension to another. e.g. earlier we were looking at sales by region of products, now we drill across and now looking at sales by store in a region of products.
Grain:
The grain conveys the level of detail associated with the fact tablemeasurements. Identifying the grain also means deciding on the level of detailyou want to be made available in the dimensional model. The more detailthere is, the lower the level of granularity. The less detail there is, the higherthe level of granularity.
Each fact and dimension table is said to have its own grain or granularity. In other words, each table (either fact or dimension) will have some level of detail associated with it. The grain of the dimensional model is the finest level of detail implied by the joining of the fact and dimension tables.
It is possible to have multiple grains in one fact table. This can be accommodated by adding a column called the granularity flag e.g. you may have data at daily, weekly, monthly level.
Inversion Key Entry : Non Unique Index
Surrogate Keys:
Surrogate keys are keys that are maintained within the data warehouse instead of the natural keys taken from source data systems.
Slice :
Lets say there we want to look at year long sales for 3 particular stores(a,b,c). This concept is called slicing. The slicing of the members allows us to focus only on these three members across all other dimensions.
Dice:
The dicing concept means that you put multiple members from a dimension on an axis and then put multiple members from a different dimension on another axis. This allows you to view the interrelationship of members from different dimensions.
Drill-down and drill-up , Roll-down and Roll-up:
Drilling in multidimensional terminology means going from one hierarchy level to another. In other words, drill-down can be defined as the capability to browse through information, following a hierarchical structure.
Drill-across :
Drill-across is a method where you drill from one dimension to another. e.g. earlier we were looking at sales by region of products, now we drill across and now looking at sales by store in a region of products.
Grain:
The grain conveys the level of detail associated with the fact tablemeasurements. Identifying the grain also means deciding on the level of detailyou want to be made available in the dimensional model. The more detailthere is, the lower the level of granularity. The less detail there is, the higherthe level of granularity.
Each fact and dimension table is said to have its own grain or granularity. In other words, each table (either fact or dimension) will have some level of detail associated with it. The grain of the dimensional model is the finest level of detail implied by the joining of the fact and dimension tables.
It is possible to have multiple grains in one fact table. This can be accommodated by adding a column called the granularity flag e.g. you may have data at daily, weekly, monthly level.
Inversion Key Entry : Non Unique Index
Surrogate Keys:
Surrogate keys are keys that are maintained within the data warehouse instead of the natural keys taken from source data systems.
- Data tables in various OLTP source systems may use different keys for the same entity. It may also be possible that a single key is being used by different instances of the same entity. This means that different customers might be represented using the same key across different OLTP systems.
- Surrogate keys provide the means to maintain data warehouse information when dimensions change.
- Natural OLTP system keys may change or be reused in the source data systems. This situation is less likely than others, but some systems have reuse keys belonging to obsolete data or for data that has been purged.
- One simple way improve performance
- Changes or realignment of the employee identification number should be carried in a separate column in the table, so information about the employee can be reviewed or summarized, regardless of the number of times the employee record appears in the dimension table.
Degenerate dimensions: Degenerate dimensions are dimensions without any attributes. They are not typical dimensions, but often simply a transaction number that is placed inside the fact table.
Conformed dimensions :
Slowly changing dimensions:
Garbage dimensions:
A garbage dimension is a dimension that consists of low-cardinality columns such as codes, indicators, and status flags. The garbage dimension is also referred to as a junk dimension.
Role-playing dimension:
A single dimension which is expressed differently in a fact table using views is
called a role-playing dimension.
Multi-valued dimensions:
Late arriving Dimensions:
Wednesday, February 10, 2010
Interview Questions : Informatica
This discussion is around the Informatica design related interview questions and queries.
~~ Replace a Dynamic lookup cache in a map with a static with optimal performance. We have a dimension table which is in SCD2 with the following structure (id, attr, datetimestamp,status) where :
id : is the key being used from the source itself. Assuming for simplicity of the scenario else use a synthetic key.
attr: change in attribute will result in a new record to be inserted in dim table with status='C' and rest records for this id needs to be marked as 'H'.
Consider the source table with structure (id, attr, datetimestamp). We need to pupulate the dimension from the source.
Design should be such that: For multiple records for the same source key, we should perform updates/inserts only once i.e. if there are multiple records for single id in source, to process first record we first update earlier record as 'H' and then insert new as 'C' and then do the same for other recs with similar id. Rather than following this we issue a single update statement to update existing record as 'H', insert last updated record (with max date) for the id as 'C' and rest all we insert as 'H'.
~~ Maintain SCD2 using 'C' and 'H' flags along with timestamp in the most optimised manner using Informatica as ETL tool.
~~ Transpose rows to cols and cols to rows without using Normalizer.
Assume source in the format (id, subject, marks) and we need to populate a target with structure (id, marks_s1, marks_s2, marks_s3, ...50).
Other way round, we have the source as (id,marks_s1,marks_s2,marks_s3...50) and we need the data in target as (id, subject, marks)
~~ Salary: We have the data as follows (id, name, dept, sal). We need to populate the following target (id,name,dept,sal,d_sal) where id is the id of the person, a unique identifier; name : name of the person ; dept is dept in which person works, sal is the sal he is drawing and d_sal is the difference between the salary of the person and the average salary of the dept in which the person is working.
There are numerous employees working in various depts and at any given time an employee is working only for a single dept.
~~ A big fat table which gets loaded with a flat file provided by business. File got loaded into the table but The file provided by business had 10% wrong records. Now, business provided a new file with ALL the records along with the corrected ones.
Which is the best approach to upload the corrected records?
~~ Many files to be uploaded: Assuming 1000 files to be uploaded all with different structures and data. Design logic such that we can load the data from all these files using a SINGLE mapping with multiple sessions.
~~ Sanity Check: There is a flat file which needs to be checked for sanity. Check can be related to data type of a column, format of data or business checks. We need to perform all checks. Now there can be different scenarios:
We may like to upload a file in case there are no failiures, or upload the file in any case or upload in case not more than 10% records have been rejected.
Similarly we like to capture the errors in the file in different formats in error table such as capture each error as different records with a unique record id, or capture errors for a single line in the file as a single record, or capture all the errors for all the lines of the file as single line in error table with the records
~~ Rejects and Reloads: Describe the rejects and error handling strategy.
~~ Duplicates: There are duplicates in the source file. Either we need to completely skip the duplicates while processing and commit only distinct rows, or we may need to put all the first instance of records in one table and route all the other into another or route all the instances of the records having count more than 1 into one table and all the non duplicate records into another table.
~~ Replicate: There is a table in source. We need to capture all the changes in this and replicate the same in target without using Informatica. What could be the options.
~~ Incremental aggregation: Implement incremental aggregation in the mapping without using session property.
~~ Lookup vs SQL Override : When will you prefer a lookup and when a SQL override. Can there be scenarios where a lookup cant be replaced with a SQL override or vice versa from the functionality perspective.
~~ Connected vs Unconnected Lookup: Describe various factors for considering an unconnected lookup vs a connected in the mapping logic.
~~ Replace a Dynamic lookup cache in a map with a static with optimal performance. We have a dimension table which is in SCD2 with the following structure (id, attr, datetimestamp,status) where :
id : is the key being used from the source itself. Assuming for simplicity of the scenario else use a synthetic key.
attr: change in attribute will result in a new record to be inserted in dim table with status='C' and rest records for this id needs to be marked as 'H'.
Consider the source table with structure (id, attr, datetimestamp). We need to pupulate the dimension from the source.
Design should be such that: For multiple records for the same source key, we should perform updates/inserts only once i.e. if there are multiple records for single id in source, to process first record we first update earlier record as 'H' and then insert new as 'C' and then do the same for other recs with similar id. Rather than following this we issue a single update statement to update existing record as 'H', insert last updated record (with max date) for the id as 'C' and rest all we insert as 'H'.
~~ Maintain SCD2 using 'C' and 'H' flags along with timestamp in the most optimised manner using Informatica as ETL tool.
~~ Transpose rows to cols and cols to rows without using Normalizer.
Assume source in the format (id, subject, marks) and we need to populate a target with structure (id, marks_s1, marks_s2, marks_s3, ...50).
Other way round, we have the source as (id,marks_s1,marks_s2,marks_s3...50) and we need the data in target as (id, subject, marks)
~~ Salary: We have the data as follows (id, name, dept, sal). We need to populate the following target (id,name,dept,sal,d_sal) where id is the id of the person, a unique identifier; name : name of the person ; dept is dept in which person works, sal is the sal he is drawing and d_sal is the difference between the salary of the person and the average salary of the dept in which the person is working.
There are numerous employees working in various depts and at any given time an employee is working only for a single dept.
~~ A big fat table which gets loaded with a flat file provided by business. File got loaded into the table but The file provided by business had 10% wrong records. Now, business provided a new file with ALL the records along with the corrected ones.
Which is the best approach to upload the corrected records?
~~ Many files to be uploaded: Assuming 1000 files to be uploaded all with different structures and data. Design logic such that we can load the data from all these files using a SINGLE mapping with multiple sessions.
~~ Sanity Check: There is a flat file which needs to be checked for sanity. Check can be related to data type of a column, format of data or business checks. We need to perform all checks. Now there can be different scenarios:
We may like to upload a file in case there are no failiures, or upload the file in any case or upload in case not more than 10% records have been rejected.
Similarly we like to capture the errors in the file in different formats in error table such as capture each error as different records with a unique record id, or capture errors for a single line in the file as a single record, or capture all the errors for all the lines of the file as single line in error table with the records
~~ Rejects and Reloads: Describe the rejects and error handling strategy.
~~ Duplicates: There are duplicates in the source file. Either we need to completely skip the duplicates while processing and commit only distinct rows, or we may need to put all the first instance of records in one table and route all the other into another or route all the instances of the records having count more than 1 into one table and all the non duplicate records into another table.
~~ Replicate: There is a table in source. We need to capture all the changes in this and replicate the same in target without using Informatica. What could be the options.
~~ Incremental aggregation: Implement incremental aggregation in the mapping without using session property.
~~ Lookup vs SQL Override : When will you prefer a lookup and when a SQL override. Can there be scenarios where a lookup cant be replaced with a SQL override or vice versa from the functionality perspective.
~~ Connected vs Unconnected Lookup: Describe various factors for considering an unconnected lookup vs a connected in the mapping logic.
Subscribe to:
Posts (Atom)