This discussion is around the Informatica design related interview questions and queries.
~~ Replace a Dynamic lookup cache in a map with a static with optimal performance. We have a dimension table which is in SCD2 with the following structure (id, attr, datetimestamp,status) where :
id : is the key being used from the source itself. Assuming for simplicity of the scenario else use a synthetic key.
attr: change in attribute will result in a new record to be inserted in dim table with status='C' and rest records for this id needs to be marked as 'H'.
Consider the source table with structure (id, attr, datetimestamp). We need to pupulate the dimension from the source.
Design should be such that: For multiple records for the same source key, we should perform updates/inserts only once i.e. if there are multiple records for single id in source, to process first record we first update earlier record as 'H' and then insert new as 'C' and then do the same for other recs with similar id. Rather than following this we issue a single update statement to update existing record as 'H', insert last updated record (with max date) for the id as 'C' and rest all we insert as 'H'.
~~ Maintain SCD2 using 'C' and 'H' flags along with timestamp in the most optimised manner using Informatica as ETL tool.
~~ Transpose rows to cols and cols to rows without using Normalizer.
Assume source in the format (id, subject, marks) and we need to populate a target with structure (id, marks_s1, marks_s2, marks_s3, ...50).
Other way round, we have the source as (id,marks_s1,marks_s2,marks_s3...50) and we need the data in target as (id, subject, marks)
~~ Salary: We have the data as follows (id, name, dept, sal). We need to populate the following target (id,name,dept,sal,d_sal) where id is the id of the person, a unique identifier; name : name of the person ; dept is dept in which person works, sal is the sal he is drawing and d_sal is the difference between the salary of the person and the average salary of the dept in which the person is working.
There are numerous employees working in various depts and at any given time an employee is working only for a single dept.
~~ A big fat table which gets loaded with a flat file provided by business. File got loaded into the table but The file provided by business had 10% wrong records. Now, business provided a new file with ALL the records along with the corrected ones.
Which is the best approach to upload the corrected records?
~~ Many files to be uploaded: Assuming 1000 files to be uploaded all with different structures and data. Design logic such that we can load the data from all these files using a SINGLE mapping with multiple sessions.
~~ Sanity Check: There is a flat file which needs to be checked for sanity. Check can be related to data type of a column, format of data or business checks. We need to perform all checks. Now there can be different scenarios:
We may like to upload a file in case there are no failiures, or upload the file in any case or upload in case not more than 10% records have been rejected.
Similarly we like to capture the errors in the file in different formats in error table such as capture each error as different records with a unique record id, or capture errors for a single line in the file as a single record, or capture all the errors for all the lines of the file as single line in error table with the records
~~ Rejects and Reloads: Describe the rejects and error handling strategy.
~~ Duplicates: There are duplicates in the source file. Either we need to completely skip the duplicates while processing and commit only distinct rows, or we may need to put all the first instance of records in one table and route all the other into another or route all the instances of the records having count more than 1 into one table and all the non duplicate records into another table.
~~ Replicate: There is a table in source. We need to capture all the changes in this and replicate the same in target without using Informatica. What could be the options.
~~ Incremental aggregation: Implement incremental aggregation in the mapping without using session property.
~~ Lookup vs SQL Override : When will you prefer a lookup and when a SQL override. Can there be scenarios where a lookup cant be replaced with a SQL override or vice versa from the functionality perspective.
~~ Connected vs Unconnected Lookup: Describe various factors for considering an unconnected lookup vs a connected in the mapping logic.
No comments:
Post a Comment