i have a scenario where I have to read 2 files that update the same table (a temp staging table)...this comes from the source system's limitation on the amount of columns that it can export. What we have done as a workaround is we split the data into 2 files where the 2nd file would contain the first file's primary key so we can know on which record to do an update...
Here is my problem...
The table that needs to be updated contains 9 columns. File one contains 5 of them and file2 contains 4 of them.
File 1 inserts 100 rows and leaves the other 4 columns as nulls and ready for file 2 to do an update into them.
File 2 inserts 10 rows but fails on 90 rows due to incorrect data.
Thus only 10 rows are successfully updated and ready to be processed but 90 are incorrect. I want to still do processing on the existing 10 but cant affort to try and do processing on the broken ones...
The easy solution would be to remove the incorrect rows from the temp table when ever an error occurs on the 2nd file's package by running a sql query on the table using the primary keys that exist in both files but when the error occurs on the Flat File source, I can't get the primary key.
What would be the best suggestion? Should i rather fail the whole package if 1 row bombs out? I cant put any logic in the following package that does the master file update/insert from the temp table because of the nature of the date. I
Regards
Mike
You have more than one way to accomplish that, I think.
You could add an extra column to the staging table that will act as a flag to indicate whether a row was properly updated by the 2 file or not; then further steps should filter the rows based on the value of that column.
Or...
Why you don't create 2 staging tables; one for each file. Then you can use SQL statements to join them, perform some data quality checks and decide which rows are going to be processed and which ones would be rejected.
|||You could also use a Merge Join transform with an inner join prior to loading the table - you would take the good record pipelines from the two flat file sources into the Merge Join and set up an inner join on the key from each file. The pipeline output from the Merge Join would then have only the ten records that had a key match and all 9 columns, which you can then load to the SQL table desitination.|||Thanks, I used the flag method and its doing fine.
Much Appreciated Rafael
Mike
No comments:
Post a Comment