ETL Assistant – Getting Error Row Description and Column Information Dynamically

This article is going to attempt to provide one solution to the question of row error management in SSIS.  It’s one option, specially constructed for dynamic column mapping scenarios, but could probably be exploited for static situations as well.

TLDR:

Management of Bad Rows in SSIS

For ETL, SSIS does a fine job at letting you manage the basics of copying one column of data in some source table to another column of data in destination table.  Assuming all goes well, you wind up extracting/transforming/loading that data.

If things don’t go well, however…

Exception handling is a central part of any development task and usually winds up representing a significant chunk of your time and code. You wind up covering any number of “what ifs” like:

  • What if I failed to connect to a system?
  • What if I expected data and didn’t get any?
  • What if my expected data type overflowed?
  • What if something totally unanticipated happened?

ETL Assistant Error Logger - Basic SSIS DFT Without Error HandlingIf you’ve used SSIS for ETL you’re accustomed to the idea of data flow paths inside of a transformation.  You connect a source component to a destination component via either a green line (“good output”) or a red line (“bad / error output”).  This is great stuff.  Say you query some rows from a source database table and want to send the rows to a destination database table – you simply wire up the green line from the source to the destination and map the columns.  Done.  Walk away.

But what about the implied red line for bad rows?  What if you actually have an issue with the transformation?  Two immediate reasons come to mind:

  • The data was truncated in some way (cast my Oracle number(20,0) to a SQL int)
  • Some other unanticipated error occurred (for the sake of explanation, let’s say a primary key violation on insert)

ETL Assistant - SSIS DFT Error Row Redirection

Usually what you’d do with a static transformation is simply use row redirection to handle the exception.  A common solution is to log your error information to a shared error log table for later review.  By attaching the appropriate error output to your destination you “channel” the row information to that destination so you have a hope of figuring out what happened and what you can do about it.

SSIS usually works really well for these situations, with the exception of two nagging challenges you’ll see come up a lot in discussion forums:

  • “My row failed – how do I get the error description?”
  • “My row failed – how do I tell which row failed?”

Error description is fairly straight forward and I’m not going to get into it too much – there’s a great step-by-step example at (http://consultingblogs.emc.com/jamiethomson/archive/2005/08/08/1969.aspx) which is very instructive.

Error row identifier, though, is a bit more complex because of the way SSIS works.

Error Columns and Lineage IDs

I’m going to preface this next section by noting that I don’t have a super clear picture on the internals of how SSIS column flow works, but I get a sense of it.  Please feel free to comment / email me and I’ll update anything that needs correcting.

 

Let’s say you have a row with an integer column “employee_id” which is the primary key on a table.  What you see is a single presentation of that column “employee_id” – it’s labeled that way throughout your data transformation flow, so to you it’s “the same” throughout the flow.  What SSIS sees internally, however, is something completely different.  If you dig a bit you’ll find you have a unique representation of this column at each point throughout the flow of your SSIS package.  That single “column” (“employee_id”) has to be treated uniquely at each input, output, and error output for each step.  Beyond needing to understand how to treat flow direction (ex: input column vs output column), the column itself may change data types, names, or even value as it flows through your package.  SSIS needs to keep track of that “column” at each point throughout the flow and treat it as though it’s unique.  So how does it do that?  LineageID.

There’s a great article on SQL Server Central (http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/65730/ ) that touches on some of this.  The article describes lineageid as

It’s an integer and it’s unique throughout the data flow. When buffers are reused, the Lineage ID doesn’t change – it’s the same column at the input and output. When buffers are copied, a new column is created – which gets a new (unique) Lineage ID.

That means that as the column “employee_id” flows through the DFT, it gets a unique Lineage ID - for each input and output copy of itself.  And, typically, you have…

  • An input column
  • An output column for “good” data
  • An output column for errors

Taking the “employee_id” example from the “OLE DB Source” step in our DFT we’d have:

  • Input (ID = 33)
  • Source Error Output (Lineage ID 35)
  • Good Output (Lineage ID = 34)

ETL Assistant - SSIS DFT Lineage ID Flow

Great!  No problem.  As long as we know the LineageIDs related to our steps we can back track to determine the mapping to “column name” and voila – we know which row failed.  We can simply look up the column by LineageID using “FindColumnByLineageID” in a script task (http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.wrapper.idtsbuffermanager100.findcolumnbylineageid.aspx).  Magic.

Not so fast.  One small, but critical catch.  Metadata about a task step is only available within the scope of that task step. Meaning – once we get past “OLE DB Source” I can see “Lineage ID,” but I can’t easily track back to determine the mapping of Lineage ID to column name.  So – if you want to write out error row information (specifically “column name”) in a second DFT (to your error log, for example) there’s no way to look up that name – because the metadata about LineageID is no longer in scope – it’s only available to the prior step.  Incredibly frustrating.

Getting Error Column Information With Static DFTs

For static packages this can be addressed a few ways. The general strategy is to map the Lineage IDs / IDs to column information at design time and then use that information to look up the information you need.

Couple of quick links you may find handy.

Again – for static packages, these can mostly if not completely solve the issue and leave you in a far better position to determine which rows failed.  I’m not going to go into these since you can read up online.

So What About a Dynamic DFT?

Note that the links I provided above address design time gathering / mapping of column information.   What do you do about a runtime situation?  We started digging into the CozyRoc dynamic DFT about a year ago.  Basic dynamic mappings worked great.  You can easily remap columns at runtime and, assuming all goes well, you’re done.  But if things don’t go well – what then?

We need to catch and log those bad rows.  But – we can’t map columns / Lineage ID information at design time because that negates the entire point of using a dynamic DFT – you won’t know any of the required information. It’s just not there.  Now that issue with the resolution of metadata from prior steps comes into play.  We can’t generate column information at design time and we can’t inspect metadata from ancestor steps within a DFT.  They’re out of scope.

I’ll admit that when I first looked at this I was stumped.  And incredibly frustrated.  There was this great opportunity to really let SSIS rock using CozyRoc’s dynamic DFT, but the inability to handle bad rows in a data warehousing solution is a showstopper (keep in mind the issue here is an SSIS design constraint, not a CozyRoc fault).  Following the examples for handling static mappings online (thank you very much, above-linked article authors), we had the notion that we should be able to pull some of the DFT information out at runtime and approach the problem somewhat similarly.

  • Upon startup, obtain a list of all columns, their IDs, and their Lineage IDs
  • Store that list in a collection
  • Using the IDs / Lineage IDs from the errors to look up the corresponding record in our collection
  • Profit

I rang up CozyRoc and discussed the situation with their engineers.  They immediately understood my intentions and mailed me back a quick sample of some code that exploited a fantastic capability of their dynamic DFT – the ability to add script to the DFT itself. (Thanks, CozyRoc!)  Not code via a script task within the DFT, but on the DFT directly.

CozyRoc DFT+ (http://www.cozyroc.com/ssis/data-flow-task) notes that you can apply script on the DFT by accessing…

  • Advanced tab – specifies advanced task options.
  • Script page – specifies data flow task script, which is used for Setup tab customization.

Aha.  And the magic snippet they supplied me…

public void OnColumnAdded(IDTSComponentMetaData100 component, bool isInput, string colName)
//do stuff

Great!  They provided event hooks for the dynamic column mapping!  So now I can detect when a column is added to the DFT flow, add it to my reference collection of column information, and then access that collection within the DFT to derive column information critical to error logging.

This will let me take “Lineage ID” 12345 at any point throughout the flow and figure out that it was column “employee_name_concat” or whatever and log that.  We’re in business.

Something to note here.  Handling row truncation behavior is trickier when you’re doing this dynamically.  You can now longer manually address the need to “redirect on truncation” on a column by column basis, so you just extend the magic DFT+ column binding event to do it for you.

 

if (!isInput)
{
      IDTSOutputColumn100 column = component.OutputCollection[0].OutputColumnCollection[colName];
      column.TruncationRowDisposition = DTSRowDisposition.RD_RedirectRow;
      column.ErrorRowDisposition = DTSRowDisposition.RD_RedirectRow;
}

Done. Setting row disposition behavior accomplished.

From there we wrote up the nastier parts of the whole exercise – the entire collection lookup mechanism to derive column information.  We did that as a script task within the body of the DFT.

ETL Assistant - SSIS DFT Error Script Task

The script task pulls rows out of the buffer and evaluates row position to resolve the Lineage ID / ID and determine…

  • Source column name (EX: “first_name”)
  • Source primary key name (EX: “employee_id”)
  • Source primary key value (EX: “12345″)
  • Error description (using ComponentMetaData.GetErrorDescription)
  • Error data (so we can quickly eyeball the offending column)

You’ll note I said “primary key name” – we felt it was “good enough” for the moment to avoid dealing with compound keys.  That’s definitely a shortcoming, but for the time being we felt that was acceptable since it matched our existing static ETL error handling process.  It’s definitely something that needs to be addressed, though.  We also cheat by explicitly passing in the primary key as an element of the process (we derive it at an earlier step) – again, in consulting speak, an “opportunity for improvement.”

Putting it All Together

Now that we’ve touched on the ideas, let’s see it work.  Rather than walk you through the entire step-by-step process of building a package I’m going to suggest you Dynamic DFT Error Handler Sample Project.  I’ll quickly touch on the major points on how the sample works.

The download includes some SQL scripts to set up…

  • [etl_proxy_tableetl_proxy_table_stgetl_proxy_table_src] We use some fake placeholder “proxy” tables  so you can set up data bindings in the DFT+.  CozyRoc also suggests you use THUNK_COLUMNs to do this, but I’ve found using these placeholder tables to be very helpful.  The reason we use these is that the magic OnColumnAdded method only fires when a column is actually added to the DFT. If you statically map any of the columns the entire error handling approach will fail because we won’t have those “static” columns added to our column collection.  Huge thank-you to CozyRoc for clueing me in on that.
  • [etl_errors] our error logging table. YMMV, but remember if you change this you also need to adjust the scripts in the DFT.
  • [demo_source_table, demo_dest_table] our source and destination tables.  We’re big Simpsons fans over here, so I’ve provided appropriate sample data.

The overall package has a few steps:

  • ["Set Table Information"] - A cheater Script Task to mimic pulling table configuration information.  In a production scenario you’d likely want to provide configuration elements from either a config file or, better yet, a configuration table.
  • ["SQL Get First table_keycol name"] - An Execute SQL task which we’ll use to pull out primary key information from our destination table.  This just uses INFORMATION_SCHEMA to look up your target table and pull back the first column for the primary key.  If you use unique constraints or something else, just tweak the SQL or overwrite the destination variable.
  • ["Truncate Destination Table"] - A second Execute SQL task to truncate our destination table (for a full load)
  • ["Data Flow Task Plus"] - A CozyRoc DFT+ task for our dynamic loading.  The brains of the operation.

ETL Assistant - Dynamic Error Handling - Overall Package Step Flow

We also have variables.  In our production deployment we have lots and lots of variables.

The major points here are:

  • table_colmap is a System.Object that is our collection of column names, IDs, and Lineage IDs for all columns in our DFT.  I scoped this to our DFT+ task because it’s specific to that task, but you could get away with scoping it to the package.
  • Everything else.  We’re more or less mimicking the variables we used in previous articles.

ETL Assistant - Dynamic Error - Variables

Let’s move on to the DFT.  Open up the DFT+.  You’re going to see two main paths:

  • We had an issue obtaining the source data.  (right side) Yes.  This does happen.  Case in point – you have a date of “-4444 AD” in Oracle.  The OLEDB driver we use for Oracle really doesn’t like that.  Or even a 44 digit numeric.
  • We had an issue writing to the destination table. (left side)
In both paths we simply channel the error rows to our error handler script task to process the buffer and do its magic.  I cheat by seeding the flow with additional error columns we overwrite within the task.  Mainly because I’m too lazy to magically add columns to the buffer myself from within the script task.

ETL Assistant - Dynamic Error Flow DFT+

Let’s give it a whirl and see what happens.

I’ve intentionally created opportunities for problems.

Column Source Destination
column_name DATA_TYPE MAX_LEN DATA_TYPE MAX_LEN
employee_id int NULL int NULL
employee_guid uniqueidentifier NULL uniqueidentifier NULL
email_addr varchar 20 varchar 15
first_nm varchar 20 varchar 10
last_nm varchar 20 varchar 10
awesomeness bigint NULL int NULL
create_dts datetime NULL datetime NULL
modified_dts datetime NULL datetime NULL

The destination columns will have conversion issues with

  • email_addr length
  • first_nm length
  • last_nm length
  • awesomeness (rating) size
employee_id email_addr first_nm last_nm awesomeness
1 jjones@test.org Jimbo Jones 25
2 captain@test.org Horatio McCallister 100000
3 homer@test.org Homer Simpson 25000
4 marge@test.org Marjorie Simpson 250000000000
5 cruiser@test.org Waylon Smithers 100
6 bart@test.org Bartholomew Simpson 25
7 lisasimpson@test.org Lisa Simpson 25

If we run the package and review our error log we’ll see failures related to the highlighted columns.  (Note that I’ve removed some elements of the exception log here solely for formatting)

 

error_id record_id record_id_dsc column_nm error_id error_dsc error_data
8 2 employee_id email_addr -1071607689 The data value cannot be converted for reasons other than sign mismatch or data overflow. captain@test.org
9 4 employee_id awesomeness -1071607686 Conversion failed because the data value overflowed the type used by the provider. 250000000000
10 5 employee_id email_addr -1071607689 The data value cannot be converted for reasons other than sign mismatch or data overflow. cruiser@test.org
11 6 employee_id first_nm -1071607689 The data value cannot be converted for reasons other than sign mismatch or data overflow. Bartholomew
12 7 employee_id email_addr -1071607689 The data value cannot be converted for reasons other than sign mismatch or data overflow. lisasimpson@test.org

“You’re failing, Seymour! What is it about you and failure?”

There you go – row exceptions being logged for various issues with data from the dynamic DFT.

How Denali Should Fix This

We’re eagerly anticipating Denali for several reasons, but one fantastic piece of news is that SSIS in Denali should let us bypass most if not all of the issues with LinageID.  As Jorg Klein notes in one of his blog posts (http://sqlblog.com/blogs/jorg_klein/archive/2011/07/22/ssis-denali-ctp3-what-s-new.aspx):

SSIS always mapped columns from source to transformations or destinations with the help of lineage ids. Every column had a unique metadata ID that was known by all components in the data flow. If something changed in the source this would break the lineage ids and raised error messages like: The external metadata column collection is out of synchronization with the data source columns.
To fix this error you would re-map all broken lineage ids with the “Restore Invalid Column References Editor”.
In Denali lineage-ids are no longer used. Mappings are done on column names, which is great because you can now use auto map on column names and even copy/paste pieces of another data flow and connect them by mapping the corresponding column names.

Fan.  Tastic.  Couldn’t come soon enough.  Granted, you’ll have to upgrade to Denali to make use of this, but there are so many other compelling reasons to migrate (http://www.brentozar.com/sql/sql-server-denali-2011-2012/) that this is just icing on the cake.

 

Appendix – Code

This code is provided in the download, but for quick access / reference I’m also including it here.

DFT+ Column Collection Script

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
 
/*
 
 Add references to ...
     CozyRoc.SSISPlus.2008
     Microsoft.SqlServer.DTSPipelineWrap
     Microsoft.SQLServer.DTSRuntimeWrap 
 
 */
 
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using CozyRoc.SqlServer.SSIS.Attributes;
using CozyRoc.SqlServer.SSIS;
 
using System.Collections;
using System.Collections.Generic;
 
namespace ST_44af5cee356540e294c47d0aa17d41ed.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    [DataFlowColumnAdded("OnColumnAdded")]//CozyRoc annotation
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
 
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
 
        public void OnColumnAdded(IDTSComponentMetaData100 component, bool isInput, string colName)
        {
 
            try
            {
                if (!isInput)
                {
                    IDTSOutputColumn100 column = component.OutputCollection[0].OutputColumnCollection[colName];
                    column.TruncationRowDisposition = DTSRowDisposition.RD_RedirectRow;
                    column.ErrorRowDisposition = DTSRowDisposition.RD_RedirectRow;
                }
                if (isInput)
                {
 
                    IDTSInputColumn100 column = component.InputCollection[0].InputColumnCollection[colName];
                    Dictionary colmap = new Dictionary();
                    Variables variables = null;
 
                    try
                    {
                        Dts.VariableDispenser.LockOneForWrite("User::table_colmap", ref variables);
                        if (variables["User::table_colmap"].Value.GetType() == colmap.GetType())
                        {
                            colmap = (Dictionary)variables["User::table_colmap"].Value;
                        }
                        else
                        {
                        }
                        colmap.Add(column.ID, column.Name);
                        variables["User::table_colmap"].Value = colmap;//put the column collection back into the variable
                    }
                    catch (Exception exi)
                    {
                    }
                    finally
                    {
                        variables.Unlock();
                    }
                }
            }
            catch
            {
            }
 
        }
 
        public void Main()
        {
 
            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

Error Row Handler Script

/* Microsoft SQL Server Integration Services Script Component
*  This is CozyRoc Script Component Plus Extended Script
*  Write scripts using Microsoft Visual C# 2008.
*  ScriptMain is the entry point class of the script.*/
 
using System;
using System.Text;
 
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
 
using System.Collections;
using System.Collections.Generic; //for our dictionaries / lists
 
/*
 * HOW IT WORKS:
 * ======================================================================
 * Thanks to CozyRoc's great sample code (thanks, CozyRoc! :), we're able to rip through the
 * set of columns and find the error info critical for our logging / fixing.  We get the basic
 * column info on PreExecute() and store the column names, column lineage IDs, and column relative
 * position ("index") in two separate dictionaries for later.  We use LineageID as the key for those
 * and the later on during Input_ProcessInputRow to look up those names and IDs so we can pull back
 * data from the buffer and then also UPDATE the buffer to overwrite our custom error info columns
 *
 * Dictionary 1: Set of column names ("colnames") key: LineageID, value: column.Name
 * Dictionary 2: Set of column relative positions ("colids") key: LineageID, value colIndex
 *
 * PreExecute - set up objects for later.  IDs for columns, dictionaries, variables, etc.
 * Input_ProcessInputRow - the "real work" of adjusting / setting the values in the columns
 *
 * SETUP - READ THIS OR IT WON'T WORK
 * ======================================================================
 * REQUIRED INPUT COLUMNS
 * -------
 * We anticipate the following input columns being present (sent to the script task as inputs)
 *
 * Standard "Error Output" columns from tasks
 * ------
 * ErrorColumn      MSFT - The Lineage ID for the error column
 * ErrorCode        MSFT - The SSIS error code
 *
 * Additional error columns specific to our purposes.  You can reuse these or update the column names
 * ------
 * error_id         CUSTOM - Same as the SSIS error code, but we need them for our table
 * column_nm        CUSTOM - The name of the column where the error occurred
 * record_id_dsc    CUSTOM - the column name for the "primary key" column (EX: employee_id)
 * record_id        CUSTOM - the value/ID for the "primary key" column so you can look up the row later
 *                              EX:"12345" in column "employee_id"
 *
 * error_id         CUSTOM - the SSIS error (same as ErrorCode, but for my purposes we left it here)
 * error_dsc        CUSTOM - the human-readable description of the SSIS error EX: "The data was truncated."
 *
 * REQUIRED VARIABLES
 * -------
 * NOTE: You MUST set these up as a read-only variables within your script task.
 *
 * Package Variable: @colmap (dictionary) - the collection of column names and IDs for our dynamic columns
 *                                        - this is set in the outer DFT+ OnColumnAdded()
 *                                        - we use this to pull out the full list of columns since we can't get ahold
 *                                        - of the prior step's column IDs/LineageIDs when we're in this script task
 *
 * Package Variable: @table_keycol (string) - the name of the column that represents your primary key
 *                                              EX: "employee_id"
 *
 * This is a cheap hack, but for my situation I'm OK with that.  We don't necessarily know what a "key"
 * column is at this point - primary key, I mean here.  So to get around that we set that value in a variable
 * within the overall package.  We then use that variable to say "oh, that's the key column" later and retrieve
 * the column name and the column value so we can write out our primary key reference info.  You'll see the
 * obvious limitation - we don't support compound primary keys.  But neither does my logging table, so...
 * 
 
*/
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
 
    private int[] m_idx;
 
    private string key_col_name;        //"name" of primary key column.  EX: "employee_id".
    private int key_col_id;             //Relative column index / position of our "primary key" column
    //single primary key column.  Does not handle compound primary keys.  Retrieve this from a package variable since we want to handle this
    //dynamically and can't automatically determine it from within the package at runtime
 
    private Dictionary colnames;           //collection to store our colnames for later use within row processing section
    private Dictionary colpositions;
    private Dictionary colidsbyposition;
    private Dictionary colids;           //collection to store our column ids for later use within row processing section
    private Dictionary colmap;
 
    //Internal column tracking numbers.
    //You could probably avoid using these as separate variables, but...
    // 1. I'm not that clever
    // 2. I really, really wanted to explicitly watch them as they moved around
    private int i_error_code_id;
    private int i_error_column_id;
    private int i_error_id;
    private int i_column_nm;
    private int i_record_id;
    private int i_record_id_dsc;
    private int i_error_dsc;
    private int i_error_data;
 
    StringBuilder _sbColIDs = new StringBuilder();
    StringBuilder _sbErrorCols = new StringBuilder();
 
    private bool isSourceErrorOutput = false;// = true;
    private string _OLEDBSourceType = "";
 
    public override void PreExecute()
    {
        base.PreExecute();
 
        colnames = new Dictionary();
        colpositions = new Dictionary();
        colids = new Dictionary();
        colidsbyposition = new Dictionary();
 
        colmap = new Dictionary();
 
        try
        {
            if (Variables.tablecolmap.GetType() == colmap.GetType())
            {
                colmap = (Dictionary)Variables.tablecolmap;
            }
        }
        catch (Exception exi)
        {
        }
 
        IDTSInput100 input = base.ComponentMetaData.InputCollection[0];
        IDTSVirtualInput100 virtInput = input.GetVirtualInput();
        int colsCount = virtInput.VirtualInputColumnCollection.Count;
        m_idx = new int[colsCount];
        for (int colIndex = 0; colIndex < colsCount; colIndex++)         {             IDTSVirtualInputColumn100 column = virtInput.VirtualInputColumnCollection[colIndex];             //================================================================             //pull out the error codes and column IDs             if (string.Compare(column.Name, "ErrorColumn", true) == 0)             {                 i_error_column_id = colIndex;             }             if (string.Compare(column.Name, "ErrorCode", true) == 0)             {                 i_error_code_id = colIndex;             }             if (string.Compare(column.Name, "error_id", true) == 0)             {                 i_error_id = colIndex;             }             if (string.Compare(column.Name, "column_nm", true) == 0)             {                 i_column_nm = colIndex;             }             if (string.Compare(column.Name, "record_id", true) == 0)             {                 i_record_id = colIndex;             }             if (string.Compare(column.Name, "record_id_dsc", true) == 0)             {                 i_record_id_dsc = colIndex;             }             if (string.Compare(column.Name, "error_dsc", true) == 0)             {                 i_error_dsc = colIndex;             }             if (string.Compare(column.Name, "error_data", true) == 0)             {                 i_error_data = colIndex;             }             //add our column names to our list for later use             colnames.Add(column.LineageID, column.Name); //column.LineageID used to look up index of error column name in row             colids.Add(column.LineageID, colIndex); //column.LineageID used to look up index of error column index position in row             colidsbyposition.Add(colIndex, column.LineageID);             try             {                 colpositions.Add(column.Name, colIndex);             }             catch { }             try             {                 //is this column the "key" column we're using to identify the key values for the row? EX: primary key                 //NOTE: we're only doing this for a single member if a compound primary key                 if (string.Compare(column.Name, Variables.tablekeycol, true) == 0)//true = ignore case during comparison                 {                     key_col_id = colIndex;                     key_col_name = column.Name;                 }             }             catch { }             //================================================================             m_idx[colIndex] = base.HostComponent.BufferManager.FindColumnByLineageID(                 input.Buffer,                 column.LineageID);         }     }     public override void PostExecute()     {         base.PostExecute();     }     public override void Input_ProcessInputRow(InputBuffer Row)     {         int colsCount = m_idx.Length;         int cColLineageKey;         if (colsCount > 0)
        {
            try
            {
                //stuff the errocode into the error_id column
                Row.Buffer[m_idx[i_error_id]] = Row.Buffer[m_idx[i_error_code_id]];
            }
            catch (Exception ex)
            {
            }
 
            try
            {
                //get the value for the "primary key" column
                Row.Buffer[m_idx[i_record_id]] = Row.Buffer[m_idx[key_col_id]];
            }
            catch (Exception ex)
            {
            }
 
            try
            {
                //get the value for the "primary key" column
                Row.Buffer[m_idx[i_record_id_dsc]] = key_col_name;
            }
            catch (Exception ex)
            {
            }
 
            try
            {
                //get the error description
                Row.Buffer[m_idx[i_error_dsc]] = (ComponentMetaData.GetErrorDescription(int.Parse(Row.Buffer[m_idx[i_error_code_id]].ToString())));
            }
            catch (Exception ex)
            {
            }
 
            try
            {
                //get the name and value of the column that failed.
                if (i_error_column_id != null && i_error_column_id > 0 && i_error_column_id  0)
                        {
                            if (colmap.TryGetValue(cColLineageKey, out columnName))
                            {
                                //use the lineage_id to pull the column name
                                //columnName should be set
                                if (cColLineageKey != null && cColLineageKey > 0 && columnName != null && columnName.Length > 0)
                                {
                                    if (colpositions.TryGetValue(columnName, out currentposition))
                                    {
                                        //use the lineage_id to pull the column name
                                        //current position should be set
                                    }
                                }
                                if (cColLineageKey != null && cColLineageKey > 0 && currentposition >= 0)//&& currentposition != null)
                                {
                                    if (colidsbyposition.TryGetValue(currentposition, out cColLineageKey))
                                    {
                                        //use the lineage_id to pull the column name
                                        //current position should be set
                                    }
                                }
                            }
                            else
                            {
                                cColLineageKey = cColLineageKey + 1;
                            }
                        }
                        else
                        {
                            //probably a "source error output"
                            cColLineageKey = cColLineageKey + 1;
                            //MAJOR MAJOR MAJOR HACK
                            //apparently, we do NOT persist the ORIGINAL LINEAGEID from source to output, so we need to... adjust... the number.
                            // this is EXCEPTIONALLY RISKY, but since MS "adjusts" the output rows for errors to have be "different" from the "it works!" destination
                            // we don't have much of a choice.  In reviewing them #'s it appears they consistently increment for errors, so we need to increment the
                            // index here to find the right value.  Horrible stuff.  Likely to break.  Enjoy.
                        }
                    }
 
                    //Retrieve from the column names dictionary and place column name in error info
                    if (i_column_nm != null && i_column_nm > 0 && i_column_nm  0)
                        {
                            if (colnames.TryGetValue(cColLineageKey, out value))
                            {
                                //use the lineage_id to pull the column name
                                Row.Buffer[m_idx[i_column_nm]] = value;
                            }
                        }
                    }
                    //get the missing column value for the key found at the identified "error column"
                    //had issues where the column blew up because of data type conversion issues, so try/catch is here to help handle this
                    try
                    {
                        if (i_error_data != null && i_error_data > 0 && i_error_data  0)
                            {
                                if (colids.TryGetValue(cColLineageKey, out colvalue))
                                {
                                    //use the lineage_id to pull the column name
                                    //NOTE: "bad" data MAY be totally thrown out here, which is why we're using the try/catch
                                    //if the custom CozyRoc row processor dies due to formatting errors then this will throw an exception
                                    //we're just going to ignore that and roll on by
                                    //probably worth revisiting at a later date to see if we can get at the bad data anyway
                                    Row.Buffer[m_idx[i_error_data]] = Row.Buffer[m_idx[colvalue]].ToString();
 
                                }
                            }
                        }
                    }
                    catch (Exception vEx)
                    {
                    }
 
                }
                else
                {
                }
            }
            catch (Exception ex)
            {
            }
        }
 
    }
 
}

No related posts.

7 thoughts on “ETL Assistant – Getting Error Row Description and Column Information Dynamically

  1. Pingback: Introducing ETL Assistant – A Dynamic, Templated Approach to SSIS | Informatics @ Northwestern Weblog

  2. Dear Whitley,

    Excellent article, Really this is very helpfull to me. But one thing is Attached Zip file having one dtsx package , while i was trying open that package it was not opened .
    Can you please reload the package or modify it.

    Regards,
    Ravikumar

  3. Thanks for this terrific example. Seems to be one of the few coherent walk-thrus of using cozy roc components for dynamic mapping and error handling.
    just had a question on this – when I try to run the sample after setting up the database as outlined, get the error Error: Column “THUNK_COLUMN” cannot be found at the datasource…any clues? data source is set up as specified.

  4. Unfortunately, attached package does not work. The error I receive is “Column “THUNK_COLUMN” cannot be found at the datasource”. But It works in case when the branch of error processing after OLE DB Destination is removed. I suppose, this is problem of Cozy Roc Dynamic Data Flow component.

  5. Hi,

    I have downloaded the package at http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/dynamic_dft_error_handler_example.zip

    I have tried adding it to projects both in 2005 and 2008R2, But it fails with this error

    TITLE: Microsoft Visual Studio
    ——————————

    The package failed to load due to error 0xC0010014 “One or more error occurred. There should be more specific errors preceding this one that explains the details of the errors. This message is used as a return value from functions that encounter errors.”. This occurs when CPackage::LoadFromXML fails.

    ——————————
    ADDITIONAL INFORMATION:

    The package failed to load due to error 0xC0010014 “One or more error occurred. There should be more specific errors preceding this one that explains the details of the errors. This message is used as a return value from functions that encounter errors.”. This occurs when CPackage::LoadFromXML fails.

    ——————————
    BUTTONS:

    OK
    ——————————

  6. Good day! I could have sworn I’ve been to this site before but after browsing through a few of the articles I realized it’s new to me.
    Anyways, I’m certainly delighted I came across
    it and I’ll be book-marking it and checking back regularly!

  7. Thank you Eric for this great discussion.
    But the sample code didn’t work for Me I used BIDS 2008 on SQL server 2008 R2. I got the package and the SQL script. I was able to run the SqL script but unable to open the package.
    Thanks
    this is the Error msg I received when I tried to add the package to my project ( FYI I didn’t get the soln file) Any help would be appreciated
    TITLE: Microsoft Visual Studio
    ——————————

    The package failed to load due to error 0xC0010014 “One or more error occurred. There should be more specific errors preceding this one that explains the details of the errors. This message is used as a return value from functions that encounter errors.”. This occurs when CPackage::LoadFromXML fails.

    ——————————
    ADDITIONAL INFORMATION:

    The package failed to load due to error 0xC0010014 “One or more error occurred. There should be more specific errors preceding this one that explains the details of the errors. This message is used as a return value from functions that encounter errors.”. This occurs when CPackage::LoadFromXML fails.

    ——————————
    BUTTONS:

    OK
    ——————————

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">