<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Informatics @ Northwestern Weblog</title>
	<atom:link href="http://informatics.northwestern.edu/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://informatics.northwestern.edu/blog</link>
	<description>NUBIC is a team dedicated to creating web applications and software tools expressly for clinical and translational research at Northwestern</description>
	<lastBuildDate>Tue, 01 May 2012 13:23:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Using the MERGE Statement in SSIS Via Stored Procedure</title>
		<link>http://informatics.northwestern.edu/blog/uncategorized/2012/05/using-the-merge-statement-in-ssis-via-stored-procedure/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=using-the-merge-statement-in-ssis-via-stored-procedure</link>
		<comments>http://informatics.northwestern.edu/blog/uncategorized/2012/05/using-the-merge-statement-in-ssis-via-stored-procedure/#comments</comments>
		<pubDate>Tue, 01 May 2012 12:54:12 +0000</pubDate>
		<dc:creator>Nick Smith</dc:creator>
				<category><![CDATA[EDW]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=503</guid>
		<description><![CDATA[(this is a modified version of the article I wrote for SQLServer.com published on 1/23/012, &#8216;Using the MERGE Statement in SSIS Via A Stored Procedure&#8216;) Background In our Enterprise Data Warehouse (EDW) within the Northwestern University Biomedical Informatics Center (NUBIC), &#8230; <a href="http://informatics.northwestern.edu/blog/uncategorized/2012/05/using-the-merge-statement-in-ssis-via-stored-procedure/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p style="text-align: left" align="center"><em>(this is a modified version of the article I wrote for SQLServer.com published on 1/23/012, &#8216;<a href="http://www.sqlservercentral.com/articles/EDW/77100/" target="_blank">Using the MERGE Statement in SSIS Via A Stored Procedure</a>&#8216;)</em></p>
<p style="text-align: left" align="center"><strong>Background</strong></p>
<p>In our Enterprise Data Warehouse (EDW) within the Northwestern University Biomedical Informatics Center (NUBIC), exact copies of many source systems exist. From that source system data we create several data marts to organize clinical data in meaningful ways to assist with reporting. Our current process to populate such data marts is to do the following.</p>
<ol>
<li>Stage the data in a staging table</li>
<li>Delete rows from the destination/production table that exist in the staging table</li>
<li>Insert the remaining data from the staging table into the production table</li>
</ol>
<p>Many times in our SSIS ETL packages we will overlap slightly with the last load date/time to make sure all new data is collected. This practice causes numerous delete transactions on the production table on a daily basis before the new rows are inserted from staging. For the most part, these deletes are unnecessary since many of the deleted rows are re-inserted with the same data (because of the deliberate overlap mentioned earlier). In the long-term these unnecessary deletes can cause issues with fragmentation.</p>
<h2><strong>Problem</strong></h2>
<p>To avoid these excessive deletes we investigated the option of a MERGE statement in place of the 3-step process listed previously.  This functionality initially looked very promising since it would allow us to stage whatever data we needed and then that data could simply be merged into the production data.  The MERGE would update rows that already existed in production and would insert the rows that did not exist.  This would eliminate the unnecessary deletes that occur in our current process.</p>
<p>Our evaluation process uncovered some issues with the MERGE statement and SSIS.</p>
<ol>
<li><strong>There is currently no simple MERGE task available in SSIS 2008.</strong> The easiest way to do this is to create an SSIS Execute SQL task and manually type out the MERGE statement.</li>
<li><strong>SSIS Execute SQL Task MERGE statement maintenance.</strong> If we were to use the Execute SQL Task to execute a MERGE statement, the statements would be rather large. Many of our data mart tables are very wide and are prone to having new columns added to track new clinical information. Manually entering the MERGE statement in an Execute SQL Task creates maintenance issues if a data mart were ever to be modified. If changes to the data mart occur, the MERGE statement would need to be manually updated as well to reflect such changes.</li>
<li><strong>SSIS Execute SQL Task character limit.</strong> Manual entry of the merge statement into the query editor window of the SSIS Execute SQL task is subject to a 32,767 character limit. Again, many of our data mart tables are very wide and contain several columns where this could potentially be an issue.</li>
<li><strong>SSIS Execute SQL Task manual entry errors.</strong> Even if the 32,767 character limit is not met, the MERGE statement itself can still be very long. To type all of the statement manually creates several opportunities for typographical errors, including omission of specific columns.</li>
</ol>
<p>If we were going to use the MERGE statement via SSIS a solution needed to be created that would be easy to use but also would resolve the issues we uncovered in our investigation.</p>
<h2>Proposed Solution</h2>
<p>Our solution was to create a stored procedure that could dynamically generate the MERGE statement outside of the SSIS Execute SQL Task and execute it.  This article will walk through step-by-step how we built our stored procedure.  An example MERGE statement is provided below.  As we review each step we will see how the stored procedure would generate this example MERGE statement.</p>
<p>To begin, let’s first review the MERGE statement and its construction (Here is a link to a very useful SQLServerCentral technical article which provides a thorough overview of the <a href="http://www.sqlservercentral.com/articles/SQL+Server+2008/64365/"><span style="text-decoration: underline">MERGE statement</span></a>). A MERGE is comprised of the following parts.</p>
<ul>
<li><strong>A source table  </strong>(this would be our staging table)</li>
<li><strong>A target table</strong> (this would be our production table)</li>
<li><strong>A predicate</strong> (how to join the source and target tables)</li>
<li><strong>An Update</strong> command to be executed when the source and target rows match on the predicate</li>
<li><strong>An Insert</strong> command to be executed when the source row does not match a target row</li>
<li><strong>A Delete </strong>command to be executed when the target rows do not exist in the source</li>
</ul>
<p>For our purposes there was no need for the <strong>delete</strong> command because we <em>want</em> to keep the rows on the target table that do not exist in the source table. Here is the example MERGE statement that we will attempt to generate and execute via this stored procedure.  Within this example we have a staging/source table called <strong>edw.adventure_hospital_dm.visits</strong> and a production/target table called <strong>edw.staging.stg_visits</strong>.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">MERGE</span> <span style="color: #993333; font-weight: bold;">INTO</span>
<span style="color: #808080; font-style: italic;">--1)  Target table</span>
edw<span style="color: #66cc66;">.</span>adventure_hospital_dm<span style="color: #66cc66;">.</span>visits tgt
<span style="color: #993333; font-weight: bold;">USING</span>
<span style="color: #808080; font-style: italic;">--2)  Source table</span>
edw<span style="color: #66cc66;">.</span>staging<span style="color: #66cc66;">.</span>stg_visits src
<span style="color: #993333; font-weight: bold;">ON</span>
<span style="color: #808080; font-style: italic;">--3)  Predicate</span>
src<span style="color: #66cc66;">.</span>financial_nbr <span style="color: #66cc66;">=</span> tgt<span style="color: #66cc66;">.</span>financial_nbr
<span style="color: #808080; font-style: italic;">--4)  Update into Source table when matched</span>
<span style="color: #993333; font-weight: bold;">WHEN</span> matched <span style="color: #993333; font-weight: bold;">THEN</span>
<span style="color: #993333; font-weight: bold;">UPDATE</span>
<span style="color: #993333; font-weight: bold;">SET</span> tgt<span style="color: #66cc66;">.</span>name                 <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>name<span style="color: #66cc66;">,</span>
    tgt<span style="color: #66cc66;">.</span>financial_nbr        <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>fininancial_nbr<span style="color: #66cc66;">,</span>
    tgt<span style="color: #66cc66;">.</span>medical_number       <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>medical_number<span style="color: #66cc66;">,</span>
    tgt<span style="color: #66cc66;">.</span>registration_date    <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>registration_date<span style="color: #66cc66;">,</span>
    tgt<span style="color: #66cc66;">.</span>meta_orignl_load_dts <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>meta_orignl_load_dts<span style="color: #66cc66;">,</span>
    tgt<span style="color: #66cc66;">.</span>meta_update_dts      <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>meta_update_dts
<span style="color: #808080; font-style: italic;">--5)  Insert into Source table when not matched</span>
<span style="color: #993333; font-weight: bold;">WHEN</span> <span style="color: #993333; font-weight: bold;">NOT</span> matched <span style="color: #993333; font-weight: bold;">THEN</span>
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #66cc66;">&#40;</span>
    name<span style="color: #66cc66;">,</span>
    financial_nbr<span style="color: #66cc66;">,</span>
    medical_number<span style="color: #66cc66;">,</span>
    registration_date<span style="color: #66cc66;">,</span>
    meta_orignl_load_dts<span style="color: #66cc66;">,</span>
    meta_update_dts
    <span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">VALUES</span> <span style="color: #66cc66;">&#40;</span>
    src<span style="color: #66cc66;">.</span>name<span style="color: #66cc66;">,</span>
    src<span style="color: #66cc66;">.</span>fininancial_nbr<span style="color: #66cc66;">,</span>
    src<span style="color: #66cc66;">.</span>medical_number<span style="color: #66cc66;">,</span>
    src<span style="color: #66cc66;">.</span>registration_date<span style="color: #66cc66;">,</span>
    src<span style="color: #66cc66;">.</span>meta_orignl_load_dts<span style="color: #66cc66;">,</span>
    src<span style="color: #66cc66;">.</span>meta_update_dts<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>Before the stored procedure was built we determined that a useful MERGE stored procedure would need to satisfy the following requirements.</p>
<ol>
<li>Accept parameters to enter the Source database/schema/table and the Target database/schema/table.</li>
<li>Automatically determine the predicate between the Source and Target tables</li>
<li>Automatically determine whether a Source column data type matches a Target column data type and store the matching columns in a temp table</li>
<li>Is smart enough to NOT update primary key column(s) on the Target table</li>
<li>Generate a dynamic SQL MERGE statement based on the matched columns stored in the temp table</li>
<li>Execute the dynamic SQL MERGE statement</li>
</ol>
<p><strong>Step 1: Accept parameters to enter the Source database, schema, and table along with the Target database, schema, and table.</strong></p>
<p>Our goal for this solution was for it to be easy to use.  We wanted to be able to execute a stored procedure and simply tell it what two tables to merge.  To do this we chose to create a stored procedure with 8 parameters.  The first 3 parameters (database, schema, and table name) are used to pass in the Source table.  The second 3 parameters (database, schema, and table name) are used to pass in the Target table.  Even though the stored procedure will automatically generate the matching predicate (which is discussed in Step #2), we created the 7th parameter as an option to manually pass in a comma-separated list of predicate items to match on if the tables being merged either do not have a primary key or the user would like to match on something other than the primary key.</p>
<p>Here is the section of code that was used to create the stored procedure along with its arguments and necessary variables.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">PROCEDURE</span> <span style="color: #66cc66;">&#91;</span>adventure_hospital<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">.</span><span style="color: #66cc66;">&#91;</span>generate_merge<span style="color: #66cc66;">&#93;</span>
                @SrcDB          SYSNAME<span style="color: #66cc66;">,</span>         <span style="color: #808080; font-style: italic;">--Name of the Source database</span>
                @SrcSchema      SYSNAME<span style="color: #66cc66;">,</span>         <span style="color: #808080; font-style: italic;">--Name of the Source schema</span>
                @SrcTable       SYSNAME<span style="color: #66cc66;">,</span>         <span style="color: #808080; font-style: italic;">--Name of the Source table</span>
                @TgtDB          SYSNAME<span style="color: #66cc66;">,</span>         <span style="color: #808080; font-style: italic;">--Name of the Target database</span>
                @TgtSchema      SYSNAME<span style="color: #66cc66;">,</span>         <span style="color: #808080; font-style: italic;">--Name of the Target schema</span>
                @TgtTable       SYSNAME<span style="color: #66cc66;">,</span>         <span style="color: #808080; font-style: italic;">--Name of the Target table</span>
                @predicate      SYSNAME  <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span> <span style="color: #808080; font-style: italic;">--(optional)Override to automatic predicate generation.  A comma-separated list of predicate match items</span>
                @debug          <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">NULL</span>  <span style="color: #808080; font-style: italic;">--(optional)Pass in 1 to kick out just the MERGE statement text without executing it</span>
<span style="color: #993333; font-weight: bold;">AS</span>
<span style="color: #993333; font-weight: bold;">BEGIN</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @merge_sql      NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--overall dynamic sql statement for the merge</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @columns_sql    NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--the dynamic sql to generate the list of columns used in the update, insert, and insert-values portion of the merge dynamic sql</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @pred_sql       NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;	<span style="color: #808080; font-style: italic;">--the dynamic sql to generate the predicate/matching-statement of the merge dynamic sql (populates @pred)</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @pk_sql         NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--the dynamic sql to populate the @pk table variable that holds the primary keys of the target table</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @updt           NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--contains the comma-seperated columns used in the UPDATE portion of the merge dynamic sql (populated by @columns_sql)</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @<span style="color: #993333; font-weight: bold;">INSERT</span>         NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--contains the comma-seperated columns used in the INSERT portion of the merge dynamic sql (populated by @insert_sql)</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @vals           NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--contains the comma-seperated columns used in the VALUES portion of the merge dynamic sql (populated by @vals_sql)</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @pred           NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--contains the predicate/matching-statement of the merge dynamic sql (populated by @pred_sql)</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @pred_param     NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">=</span> @predicate;  <span style="color: #808080; font-style: italic;">--populated by @predicate.  used in the dynamic generation of the predicate statment of the merge</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @pred_item      NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--used as a placeholder of each individual item contained within the explicitley passed in predicate</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @done_ind       <span style="color: #993333; font-weight: bold;">SMALLINT</span> <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span>;   <span style="color: #808080; font-style: italic;">--used in the dynamic generation of the predicate statment of the merge</span>
	<span style="color: #993333; font-weight: bold;">DECLARE</span> @dsql_param     NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">500</span><span style="color: #66cc66;">&#41;</span>;  <span style="color: #808080; font-style: italic;">--contains the necessary parameters for the dynamic sql execution</span></pre></div></div>

<p><strong>Step 2: Automatically determine the predicate used between the Source and Target tables</strong><br />
Before the predicate is automatically generated there must be a check to see whether or not a predicate was passed in as a parameter. If the predicate parameter is populated then the comma-separated list that was passed in is broken down into its individual items and the predicate matching statement is generated and assigned to @pred.</p>
<p>If the predicate was not passed in as a parameter a dynamic SQL (@pred_sql) is generated to query the INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE table using the Target database, schema, and table parameters. The INFORMATION_SCHEMA table is automatically available in SQL Server 2000 and above. Since the output of this the dynamic SQL (@pred_sql) needs to be collected and assigned to a variable (@pred) a parameter (@dsql_param) needs to be passed in so that the output can be returned and assigned appropriately. Once the dynamic SQL (@pred_sql) and the dynamic SQL parameter (@dsql_param) have been created, they can be executed. The output can then be assigned to @pred. We now have our predicate statement.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">/****************************************************************************************
* This generates the matching statement (aka Predicate) statement of the Merge.        *
* If a predicate is explicitly passed in, use that to generate the matching statement. *
* Else execute the @pred_sql statement to decide what to match on and generate the     *
* matching statement automatically.                                                    *
****************************************************************************************/</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">IF</span> @pred_param <span style="color: #993333; font-weight: bold;">IS</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>
  <span style="color: #808080; font-style: italic;">-- If a comma-separated list of predicate match items were passed in via @predicate</span>
  <span style="color: #993333; font-weight: bold;">BEGIN</span>
  <span style="color: #808080; font-style: italic;">-- These next two SET statements do basic clean-up on the comma-separated list of predicate items (@pred_param)</span>
  <span style="color: #808080; font-style: italic;">-- if the user passed in a predicate that begins with a comma, strip it out</span>
  <span style="color: #993333; font-weight: bold;">SET</span> @pred_param <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">CASE</span> <span style="color: #993333; font-weight: bold;">WHEN</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>ltrim<span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">','</span> <span style="color: #993333; font-weight: bold;">THEN</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">,</span><span style="color: #66cc66;">&#40;</span>charindex<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">','</span><span style="color: #66cc66;">,</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">+</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>LEN<span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ELSE</span> @pred_param <span style="color: #993333; font-weight: bold;">END</span>
  <span style="color: #808080; font-style: italic;">--if the user passed in a predicate that ends with a comma, strip it out</span>
  <span style="color: #993333; font-weight: bold;">SET</span> @pred_param <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">CASE</span> <span style="color: #993333; font-weight: bold;">WHEN</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>rtrim<span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>LEN<span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">','</span> <span style="color: #993333; font-weight: bold;">THEN</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span>LEN<span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">-</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ELSE</span> @pred_param <span style="color: #993333; font-weight: bold;">END</span>
  <span style="color: #808080; font-style: italic;">-- End clean-up of(@pred_param) *</span>
  <span style="color: #808080; font-style: italic;">-- loop through the comma-seperated predicate that was passed in via the paramater and construct the predicate statement</span>
  WHILE <span style="color: #66cc66;">&#40;</span>@done_ind <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>
    <span style="color: #993333; font-weight: bold;">BEGIN</span>
    <span style="color: #993333; font-weight: bold;">SET</span> @pred_item <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">CASE</span> <span style="color: #993333; font-weight: bold;">WHEN</span> charindex<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">','</span><span style="color: #66cc66;">,</span>@pred_param<span style="color: #66cc66;">&#41;</span> &amp;gt; <span style="color: #cc66cc;">0</span> <span style="color: #993333; font-weight: bold;">THEN</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">,</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">,</span><span style="color: #66cc66;">&#40;</span>charindex<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">','</span><span style="color: #66cc66;">,</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">-</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ELSE</span> @pred_param <span style="color: #993333; font-weight: bold;">END</span>
    <span style="color: #993333; font-weight: bold;">SET</span> @pred_param <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">,</span><span style="color: #66cc66;">&#40;</span>charindex<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">','</span><span style="color: #66cc66;">,</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">+</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>LEN<span style="color: #66cc66;">&#40;</span>@pred_param<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
    <span style="color: #993333; font-weight: bold;">SET</span> @pred <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">CASE</span> <span style="color: #993333; font-weight: bold;">WHEN</span> @pred <span style="color: #993333; font-weight: bold;">IS</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">THEN</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">COALESCE</span><span style="color: #66cc66;">&#40;</span>@pred<span style="color: #66cc66;">,</span><span style="color: #ff0000;">''</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'src.['</span> <span style="color: #66cc66;">+</span> @pred_item <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'] = '</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'tgt.['</span> <span style="color: #66cc66;">+</span> @pred_item <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">']'</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">ELSE</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">COALESCE</span><span style="color: #66cc66;">&#40;</span>@pred<span style="color: #66cc66;">,</span><span style="color: #ff0000;">''</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">' and '</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'src.['</span> <span style="color: #66cc66;">+</span> @pred_item <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'] = '</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'tgt.['</span> <span style="color: #66cc66;">+</span> @pred_item <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">']'</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">END</span>
    <span style="color: #993333; font-weight: bold;">SET</span> @done_ind <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">CASE</span> <span style="color: #993333; font-weight: bold;">WHEN</span> @pred_param <span style="color: #66cc66;">=</span> @pred_item <span style="color: #993333; font-weight: bold;">THEN</span> <span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">ELSE</span> <span style="color: #cc66cc;">0</span> <span style="color: #993333; font-weight: bold;">END</span>
    <span style="color: #993333; font-weight: bold;">END</span>
  <span style="color: #993333; font-weight: bold;">END</span>
<span style="color: #993333; font-weight: bold;">ELSE</span>
  <span style="color: #808080; font-style: italic;">-- If an explicite list of predicate match items was NOT passed in then automatically construct the predicate</span>
  <span style="color: #808080; font-style: italic;">-- match statement based on the primary keys of the Source and Target tables</span>
  <span style="color: #993333; font-weight: bold;">BEGIN</span>
  <span style="color: #993333; font-weight: bold;">SET</span> @pred_sql <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">' SELECT @predsqlout = COALESCE(@predsqlout+'</span><span style="color: #ff0000;">' and '</span><span style="color: #ff0000;">','</span><span style="color: #ff0000;">''</span><span style="color: #ff0000;">')+'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">'('</span><span style="color: #ff0000;">''</span><span style="color: #ff0000;">'+'</span><span style="color: #ff0000;">'src.'</span><span style="color: #ff0000;">'+column_name+'</span><span style="color: #ff0000;">' = tgt.'</span><span style="color: #ff0000;">'+ccu.column_name)'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' FROM '</span> <span style="color: #66cc66;">+</span>
                  @TgtDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc_tgt'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' INNER JOIN '</span> <span style="color: #66cc66;">+</span> @TgtDB <span style="color: #66cc66;">+</span><span style="color: #ff0000;">'.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' ON tc_tgt.CONSTRAINT_NAME = ccu.Constraint_name'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' AND tc_tgt.table_schema = ccu.table_schema'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' AND tc_tgt.table_name = ccu.table_name'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' WHERE'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' tc_tgt.CONSTRAINT_TYPE = '</span><span style="color: #ff0000;">'Primary Key'</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' and tc_tgt.table_catalog = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' and tc_tgt.table_name = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtTable <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' and tc_tgt.table_schema = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtSchema <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">''</span>
  <span style="color: #993333; font-weight: bold;">SET</span> @dsql_param <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'	@predsqlout nvarchar(max) OUTPUT'</span>
&nbsp;
  <span style="color: #993333; font-weight: bold;">EXEC</span> sp_executesql
  @pred_sql<span style="color: #66cc66;">,</span>
  @dsql_param<span style="color: #66cc66;">,</span>
  @predsqlout <span style="color: #66cc66;">=</span> @pred OUTPUT;
<span style="color: #993333; font-weight: bold;">END</span></pre></div></div>

<p>The benefit of generating the predicate this way is that this automatically handles multiple primary key constraints on the Target table. All primary key constraints are evaluated and added into the predicate statement. The example output from the @pred variable is listed here.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;">src<span style="color: #66cc66;">.</span>financial_nbr <span style="color: #66cc66;">=</span> tgt<span style="color: #66cc66;">.</span>financial_nbr</pre></div></div>

<p><strong>Step 3: Automatically determine if a Source column data type matches the corresponding Target column data type and store the matching columns in a temp table</strong><br />
To do this we first need to create the custom table (@columns) that will be used to hold all the columns that exist between the Source and the Target tables. These columns must have the same name and same data type. By ‘same data type’ I mean both that the data type is the same and that the precision of the Target table column is at the same level or greater than the precision of the Source table column. For example, the medical_number column on the Source table may be varchar(10). As long as the corresponding medical_number column on the Target/Production table is at least a varchar(10) or greater the logic will consider that to be a match and store that column on the temp table.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">--Create the temporary table to collect all the columns shared</span>
<span style="color: #808080; font-style: italic;">--between both the Source and Target tables.</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">DECLARE</span> @<span style="color: #993333; font-weight: bold;">COLUMNS</span> <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #66cc66;">&#40;</span>
 table_catalog            <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
 table_schema             <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
 <span style="color: #993333; font-weight: bold;">TABLE_NAME</span>               <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
 column_name              <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
 data_type                <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
 character_maximum_length <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
 numeric_precision        <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
 src_column_path          <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
 tgt_column_path          <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NULL</span>
<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>When the @columns temp table has been created a dynamic SQL statement is generated (@columns_sql) and executed to populate the temp table. The @columns_sql dynamic SQL statement takes the values passed in via parameters and uses them to query the INFORMATION_SCHEMA.COLUMNS table to find matching columns between the Source and Target tables. Once the @columns_sql dynamic SQL has been constructed it is executed and the values returned are automatically inserted into the @columns temp table.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">/************************************************************************************************
* Generate the dynamic sql (@columns_sql) statement that will                                  *
* populate the @columns temp table with the columns that will be used in the merge dynamic sql *
* The @columns table will contain columns that exist in both the source and target             *
* tables that have the same data types.                                                        *
************************************************************************************************/</span>    
&nbsp;
<span style="color: #993333; font-weight: bold;">SET</span> @columns_sql <span style="color: #66cc66;">=</span>
<span style="color: #ff0000;">'SELECT
tgt.table_catalog,
tgt.table_schema,
tgt.table_name,
tgt.column_name,
tgt.data_type,
tgt.character_maximum_length,
tgt.numeric_precision,
(src.table_catalog+'</span><span style="color: #ff0000;">'.'</span><span style="color: #ff0000;">'+src.table_schema+'</span><span style="color: #ff0000;">'.'</span><span style="color: #ff0000;">'+src.table_name+'</span><span style="color: #ff0000;">'.'</span><span style="color: #ff0000;">'+src.column_name) AS src_column_path,
(tgt.table_catalog+'</span><span style="color: #ff0000;">'.'</span><span style="color: #ff0000;">'+tgt.table_schema+'</span><span style="color: #ff0000;">'.'</span><span style="color: #ff0000;">'+tgt.table_name+'</span><span style="color: #ff0000;">'.'</span><span style="color: #ff0000;">'+tgt.column_name) AS tgt_column_path
FROM
     '</span> <span style="color: #66cc66;">+</span> @TgtDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'.information_schema.columns tgt
     INNER JOIN '</span> <span style="color: #66cc66;">+</span> @SrcDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'.information_schema.columns src
       ON tgt.column_name = src.column_name
       AND tgt.data_type = src.data_type
       AND (tgt.character_maximum_length IS NULL OR tgt.character_maximum_length &amp;gt;= src.character_maximum_length)
       AND (tgt.numeric_precision IS NULL OR tgt.numeric_precision &amp;gt;= src.numeric_precision)
     WHERE tgt.table_catalog     = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">'
     AND tgt.table_schema        = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtSchema <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">'
     AND tgt.table_name          = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtTable <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">'
     AND src.table_catalog       = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @SrcDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">'
     AND src.table_schema        = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @SrcSchema <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">'
     AND src.table_name          = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @SrcTable <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">'
     ORDER BY tgt.ordinal_position'</span>
&nbsp;
     <span style="color: #808080; font-style: italic;">--execute the @columns_sql dynamic sql and populate @columns table with the data</span>
     <span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> @<span style="color: #993333; font-weight: bold;">COLUMNS</span>
     <span style="color: #993333; font-weight: bold;">EXEC</span> sp_executesql @columns_sql</pre></div></div>

<p>Now that the @columns temp table has been populated we have a list of all the columns that will be referenced in the overall MERGE statement.</p>
<p><strong>Step 4:  Has logic to NOT update the primary key column(s) on the target table</strong></p>
<p>While investigating the efficiency gains of using the merge statement we found that we were not getting the gains we were expecting.  We already knew that we were updating all columns within the merge statement when the predicate match was satisfied.  The problem is that when you update a row&#8217;s primary key (even if you are updating the primary key to its original value) that the update is essentially handled as an insert and therefore you end up losing some efficiency, actually a lot of efficiency. I wanted to thank Paul White for his blog post regarding &#8220;<a title="The Impact of Non-Updating Updates" href="http://sqlblog.com/blogs/paul_white/archive/2010/08/11/the_2D00_impact_2D00_of_2D00_update_2D00_statements_2D00_that_2D00_don_2D00_t_2D00_change_2D00_data.aspx">The Impact of Non-Updating Updates</a>&#8220;.</p>
<p>To prevent the primary key from being updated within the MERGE update we created a temp table to record the primary key column(s) from the target table.  This temp table is later referenced to filter out the primary key column(s) when the update portion of the MERGE statement (@updt) is generated in the next step.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">/**************************************************************************************
* Create the temporary table to collect all the primary key columns                  *
* These primary key columns will be filtered out of the update portion of the merge  *
* We do not want to update any portion of clustered index for performance            *
**************************************************************************************/</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">DECLARE</span> @pk <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #66cc66;">&#40;</span>
  column_name              <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">100</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NULL</span>
<span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #993333; font-weight: bold;">SET</span> @pk_sql <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'SELECT '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'ccu.column_name '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'FROM '</span> <span style="color: #66cc66;">+</span>
              @TgtDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc_tgt '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'INNER JOIN '</span> <span style="color: #66cc66;">+</span> @TgtDB <span style="color: #66cc66;">+</span><span style="color: #ff0000;">'.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'ON tc_tgt.CONSTRAINT_NAME = ccu.Constraint_name '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'AND tc_tgt.table_schema = ccu.table_schema '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'AND tc_tgt.table_name = ccu.table_name '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'WHERE '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'tc_tgt.CONSTRAINT_TYPE = '</span><span style="color: #ff0000;">'Primary Key'</span><span style="color: #ff0000;">' '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'and tc_tgt.table_catalog = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">' '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'and tc_tgt.table_name = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtTable <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">' '</span> <span style="color: #66cc66;">+</span>
              <span style="color: #ff0000;">'and tc_tgt.table_schema = '</span><span style="color: #ff0000;">''</span> <span style="color: #66cc66;">+</span> @TgtSchema <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">''</span><span style="color: #ff0000;">' '</span> 
&nbsp;
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> @pk
<span style="color: #993333; font-weight: bold;">EXEC</span> sp_executesql @pk_sql</pre></div></div>

<p><strong>Step 5:  Generate a dynamic SQL MERGE statement based on the matched columns stored in the temp table</strong></p>
<p>For the overall MERGE statement there are 3 sets of comma-separated columns that need to be generated from the data collected in the @columns temp table. These will be the update columns, insert columns, and the insert-values columns. In our example MERGE statement the 3 sets of columns are outlined below in red.</p>
<p><img src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/merge_column_lists.png" alt="" border="0" /></p>
<p>For each one of the 3 sets of columns a query needs to be executed against the @columns temp table. We needed a way to query the columns on the @columns temp table and loop through them creating a comma separated list. Since such a query will return more than row, if you try to assign the output of this query to a variable you will receive the error &#8220;Subquery returned more than 1 value.&#8221;</p>
<p>To get around this we used SQLServer’s FOR XML functionality. For anyone not familiar with FOR XML, simply put FOR XML allows you to run a query and format all of the output as XML. It then assigns that output to a variable of type XML. We essentially ran the same query but used FOR XML PATH(&#8221;) which allowed us to loop through the data results by generating an XML version of the data with absolutely no XML formatting. Since FOR XML automatically returns a data type of XML, we then cast the results to convert them from XML to NVARCHAR.</p>
<p>What you are left with is a string that contains all the comma-separated columns that can be used for the MERGE statement. Obviously, this is not the intended use of FOR XML, but this suited our needs nicely.</p>
<p>Here is the section of the stored procedure used to generate the comma-separated columns used for the update portion of the overall MERGE statement.  You can see below that the @pk temp table we created in the previous step is now referenced to filter out primary key column(s) from the update.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">--1) List of columns used for Update Statement</span>
<span style="color: #808080; font-style: italic;">--Populate @updt with the list of columns that will be used to construct the Update Statment portion of the Merge</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">SET</span> @updt <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">CAST</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">',tgt.['</span> <span style="color: #66cc66;">+</span> column_name <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'] = src.['</span> <span style="color: #66cc66;">+</span> column_name <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">']'</span>
            <span style="color: #993333; font-weight: bold;">FROM</span> @<span style="color: #993333; font-weight: bold;">COLUMNS</span> c
            <span style="color: #993333; font-weight: bold;">WHERE</span> c<span style="color: #66cc66;">.</span>column_name !<span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'meta_orignl_load_dts'</span>                               <span style="color: #808080; font-style: italic;">--we do not want the original time the row was created to be overwritten</span>
            <span style="color: #993333; font-weight: bold;">AND</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">EXISTS</span> <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">'x'</span> <span style="color: #993333; font-weight: bold;">FROM</span> @pk p <span style="color: #993333; font-weight: bold;">WHERE</span> p<span style="color: #66cc66;">.</span>column_name <span style="color: #66cc66;">=</span> c<span style="color: #66cc66;">.</span>column_name<span style="color: #66cc66;">&#41;</span>  <span style="color: #808080; font-style: italic;">--we do not want the primary key columns updated for performance</span>
            <span style="color: #993333; font-weight: bold;">FOR</span> XML PATH<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">''</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
            <span style="color: #993333; font-weight: bold;">AS</span> NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>
            <span style="color: #66cc66;">&#41;</span></pre></div></div>

<p><em>**Special Note**   For the columns used in the update portion of the MERGE statement, we filtered out the <strong>meta_orginl_load_dts</strong> column. The EDW uses this column as meta data to record when a row was originally written out to the data mart. Upon an update, we do not want this column to be updated with the <strong>meta_orignl_load_dts</strong> from the staging/source table. We still want to preserve the original load date time of that row. The date and time the row was updated will be represented in the meta column <strong>meta_update_dts</strong>.</em> Second, the comma-separated list of columns used for the insert portion of the overall MERGE statement need to be generated. Here is section of the stored procedure used to generate the comma-separated columns used for the insert portion of the overall MERGE statement.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">--2) List of columns used for Insert Statement</span>
<span style="color: #808080; font-style: italic;">--Populate @insert with the list of columns that will be used to construct the Insert Statment portion of the Merge</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">SET</span> @<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">CAST</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">','</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'['</span> <span style="color: #66cc66;">+</span> column_name <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">']'</span>
              <span style="color: #993333; font-weight: bold;">FROM</span> @<span style="color: #993333; font-weight: bold;">COLUMNS</span>
              <span style="color: #993333; font-weight: bold;">FOR</span> XML PATH<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">''</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
              <span style="color: #993333; font-weight: bold;">AS</span> NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>
              <span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>Here is the output of the @insert variable for our example.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #66cc66;">,</span>name
<span style="color: #66cc66;">,</span>financial_nbr
<span style="color: #66cc66;">,</span>medical_number
<span style="color: #66cc66;">,</span>registration_date
<span style="color: #66cc66;">,</span>meta_orignl_load_dts
<span style="color: #66cc66;">,</span>meta_update_dts</pre></div></div>

<p>Here is section of the stored procedure used to create the comma-separated list of columns used for the insert-values portion of the overall MERGE statement.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">--3) List of columns used for Insert-Values Statement</span>
<span style="color: #808080; font-style: italic;">--Populate @vals with the list of columns that will be used to construct the Insert-Values Statment portion of the Merge	</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">SET</span> @vals <span style="color: #66cc66;">=</span> <span style="color: #993333; font-weight: bold;">CAST</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #ff0000;">',src.'</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'['</span> <span style="color: #66cc66;">+</span> column_name <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">']'</span>
            <span style="color: #993333; font-weight: bold;">FROM</span> @<span style="color: #993333; font-weight: bold;">COLUMNS</span>
            <span style="color: #993333; font-weight: bold;">FOR</span> XML PATH<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">''</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
            <span style="color: #993333; font-weight: bold;">AS</span> NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">MAX</span><span style="color: #66cc66;">&#41;</span>
            <span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>Here is the output of the @vals variable for our example.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #66cc66;">,</span>src<span style="color: #66cc66;">.</span>name
<span style="color: #66cc66;">,</span>src<span style="color: #66cc66;">.</span>fininancial_nbr
<span style="color: #66cc66;">,</span>src<span style="color: #66cc66;">.</span>medical_number
<span style="color: #66cc66;">,</span>src<span style="color: #66cc66;">.</span>registration_date
<span style="color: #66cc66;">,</span>src<span style="color: #66cc66;">.</span>meta_orignl_load_dts
<span style="color: #66cc66;">,</span>src<span style="color: #66cc66;">.</span>meta_update_dts</pre></div></div>

<p><strong>Step 6: Execute the dynamic SQL MERGE statement</strong><br />
Now that we have the Source database/schema/table, the Target database/schema/table, the predicate matching statement, and the update/insert/values comma-separated lists of columns, we have everything we need to generate the entire dynamic SQL MERGE statement. Once the MERGE statement is generated it then gets executed automatically.</p>
<p>Here is the section of the stored procedure used to pull everything together into the overall MERGE statement (@merge_sql) and then execute it.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">/*************************************************************************************
*  Generate the final Merge statement using the following...                        *
*    -The parameters (@TgtDB, @TgtSchema, @TgtTable, @SrcDB, @SrcSchema, @SrcTable) *
*    -The predicate matching statement (@pred)                                      *
*    -The update column list (@updt)                                                *
*    -The insert column list (@insert)                                              *
*    -The insert-value column list (@vals)                                          *
*    -Filter out Primary Key from the update (updating primary key essentially      *
*     turns the update into an insert and you lose all efficiency benefits)         *
*************************************************************************************/</span>
&nbsp;
<span style="color: #993333; font-weight: bold;">SET</span> @merge_sql <span style="color: #66cc66;">=</span> <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">' MERGE into '</span> <span style="color: #66cc66;">+</span> @TgtDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'.'</span> <span style="color: #66cc66;">+</span> @TgtSchema <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'.'</span> <span style="color: #66cc66;">+</span> @TgtTable <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">' tgt '</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' using '</span> <span style="color: #66cc66;">+</span> @SrcDB <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'.'</span> <span style="color: #66cc66;">+</span> @SrcSchema <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">'.'</span> <span style="color: #66cc66;">+</span> @SrcTable <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">' src '</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' on '</span> <span style="color: #66cc66;">+</span> @pred <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' when matched then update '</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' set '</span> <span style="color: #66cc66;">+</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>@updt<span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span> LEN<span style="color: #66cc66;">&#40;</span>@updt<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' when not matched then insert ('</span> <span style="color: #66cc66;">+</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>@<span style="color: #993333; font-weight: bold;">INSERT</span><span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span> LEN<span style="color: #66cc66;">&#40;</span>@<span style="color: #993333; font-weight: bold;">INSERT</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">')'</span> <span style="color: #66cc66;">+</span>
                  <span style="color: #ff0000;">' values ( '</span> <span style="color: #66cc66;">+</span> <span style="color: #993333; font-weight: bold;">SUBSTRING</span><span style="color: #66cc66;">&#40;</span>@vals<span style="color: #66cc66;">,</span> <span style="color: #cc66cc;">2</span><span style="color: #66cc66;">,</span> LEN<span style="color: #66cc66;">&#40;</span>@vals<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">+</span> <span style="color: #ff0000;">');'</span>
                 <span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #808080; font-style: italic;">--Either execute the final Merge statement to merge the staging table into production</span>
<span style="color: #808080; font-style: italic;">--Or kick out the actual merge statement text if debug is turned on (@debug=1)</span>
<span style="color: #993333; font-weight: bold;">IF</span> @debug <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">1</span>
  <span style="color: #993333; font-weight: bold;">BEGIN</span>
  <span style="color: #808080; font-style: italic;">-- If debug is turned on simply select the text of merge statement and return that</span>
  <span style="color: #993333; font-weight: bold;">SELECT</span> @merge_sql;
  <span style="color: #993333; font-weight: bold;">END</span>
<span style="color: #993333; font-weight: bold;">ELSE</span>
  <span style="color: #993333; font-weight: bold;">BEGIN</span>
  <span style="color: #808080; font-style: italic;">-- If debug is not turned on then execute the merge statement</span>
  <span style="color: #993333; font-weight: bold;">EXEC</span> sp_executesql @merge_sql;
<span style="color: #993333; font-weight: bold;">END</span></pre></div></div>

<p>The substring() function is used on @updt, @insert, and @vals to remove the preceding comma from each set of columns. This could have been done earlier in the stored procedure but we decided to take care of it here.</p>
<p>Here is the output of the @merge_sql variable from our example.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">MERGE</span> <span style="color: #993333; font-weight: bold;">INTO</span> edw<span style="color: #66cc66;">.</span>adventure_hospital_dm<span style="color: #66cc66;">.</span>visits tgt
<span style="color: #993333; font-weight: bold;">USING</span> edw<span style="color: #66cc66;">.</span>staging<span style="color: #66cc66;">.</span>stg_visits src
<span style="color: #993333; font-weight: bold;">ON</span> src<span style="color: #66cc66;">.</span>financial_nbr <span style="color: #66cc66;">=</span> tgt<span style="color: #66cc66;">.</span>financial_nbr
<span style="color: #993333; font-weight: bold;">WHEN</span> matched <span style="color: #993333; font-weight: bold;">THEN</span> <span style="color: #993333; font-weight: bold;">UPDATE</span>
<span style="color: #993333; font-weight: bold;">SET</span> tgt<span style="color: #66cc66;">.</span>name <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>name<span style="color: #66cc66;">,</span>
tgt<span style="color: #66cc66;">.</span>medical_number <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>medcial_nubmer<span style="color: #66cc66;">,</span>
tgt<span style="color: #66cc66;">.</span>registration_date <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>registration_date<span style="color: #66cc66;">,</span>
tgt<span style="color: #66cc66;">.</span>meta_orignl_load_dts <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>meta_orignl_load_dts<span style="color: #66cc66;">,</span>
tgt<span style="color: #66cc66;">.</span>meta_update_dts <span style="color: #66cc66;">=</span> src<span style="color: #66cc66;">.</span>meta_update_dts
<span style="color: #993333; font-weight: bold;">WHEN</span> <span style="color: #993333; font-weight: bold;">NOT</span> matched <span style="color: #993333; font-weight: bold;">THEN</span> <span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #66cc66;">&#40;</span>
name<span style="color: #66cc66;">,</span>
financial_nbr<span style="color: #66cc66;">,</span>
medical_number<span style="color: #66cc66;">,</span>
registration_date<span style="color: #66cc66;">,</span>
meta_orignl_load_dts<span style="color: #66cc66;">,</span>
meta_update_dts
<span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">VALUES</span> <span style="color: #66cc66;">&#40;</span>
src<span style="color: #66cc66;">.</span>name<span style="color: #66cc66;">,</span>
src<span style="color: #66cc66;">.</span>financial_nbr<span style="color: #66cc66;">,</span>
src<span style="color: #66cc66;">.</span>medical_number<span style="color: #66cc66;">,</span>
src<span style="color: #66cc66;">.</span>registration_date<span style="color: #66cc66;">,</span>
src<span style="color: #66cc66;">.</span>meta_orignl_load_dts<span style="color: #66cc66;">,</span>
src<span style="color: #66cc66;">.</span>meta_update_dts<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<h2>Conclusion</h2>
<p>We are currently working towards utilizing this stored procedure within our SSIS packages for populating datamarts. Using our current process, a simplified version of one of our SSIS packages would look like the example below.</p>
<p><img src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/merge_ssis_pkg_before.png" alt="" border="0" /></p>
<p>By using the stored procedure to merge the data from staging into the production table we can replace the last 2 tasks with one Execute SQL task. That Execute SQL task simply executes the stored procedure. Using this method the example SSIS package would now look like this.</p>
<p><img src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/11/merge_ssis_pkg_after.png" alt="" border="0" /></p>
<p>Here is the SQL code that is executed by the Execute SQL task.</p>
<p><img src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/04/ssis_pkg_sql_task.gif" alt="" border="0" /></p>
<p>Although this stored procedure as a bit tedious to build, it is worth it. We now have a stored procedure to run from SSIS that suits our data warehousing needs for merging staging data into production.</p>
<p>There are many benefits to merging data via a stored procedure.</p>
<ul>
<li>Easily executed through an SSIS SQL Task</li>
<li>Easily reusable</li>
<li>Automatically builds and executes the merge statement so there is no need for manual creation or maintenance of the merge SQL statement.</li>
<li>Reduces fragmentation since there are less deletes performed on the production table</li>
<li>Since rows are updated instead of deleted and reinserted, the ETL package runs faster</li>
<li>Overcomes SSIS SQL Task 32,676 character limit</li>
<li>Preserves original row load date and time (since the row is updated instead of deleted and re-inserted)</li>
</ul>
<div>Here is a link to the entire generate_merge stored procedure sql code, <a title="Generate_merge stored procedure" href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/05/generate_merge.txt">generage_merge</a>.</div>
<p><strong>In Practice</strong><br />
The largest table the EDW imports from the primary inpatient Electronic Health Record (EHR) source system stores clinical results and contains about 2.4 billion rows. Currently we merge between 600,000 &#8211; 900,000 rows of data from that source system table into the corresponding table within our EDW. With our old 2-step process of deleting rows from the production table that exist in the staging table and then inserting new rows, the whole process took around 2 hours to complete each night. Once we implemented the merge stored procedure into the ETL package that time was reduced to about 6 minutes!A bulk of the processing time in the old process was due to the deletes that were performed on a table that contained over 2 billion rows.</p>
<p><strong>Limitations</strong><br />
This stored procedure is meant for data warehousing type functionality.  It assumes you are using a Staging-To-Production type of model where the staging table and production table are almost identically built with the same column names and very similar data types. Also, this stored procedure does not utilize the delete functionality of the MERGE statement.</p>
<p><strong>Resources</strong><br />
<strong></strong>Dhaneenja, T. (2008). Understanding the MERGE DML Statement in SQL Server 2008.</p>
<p class="lib_citation"><em>  SQLServerCentral.com</em>. Retrieved from</p>
<p class="lib_citation"><a href="http://www.sqlservercentral.com/articles/SQL+Server+2008/64365/">  http://www.sqlservercentral.com/articles/SQL+Server+2008/64365/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/uncategorized/2012/05/using-the-merge-statement-in-ssis-via-stored-procedure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NUBIC&#8217;s FedEx Day</title>
		<link>http://informatics.northwestern.edu/blog/edw/2012/03/nubics-fedex-day/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=nubics-fedex-day</link>
		<comments>http://informatics.northwestern.edu/blog/edw/2012/03/nubics-fedex-day/#comments</comments>
		<pubDate>Mon, 26 Mar 2012 18:11:59 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[EDW]]></category>
		<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[24-hour delivery]]></category>
		<category><![CDATA[agile projects]]></category>
		<category><![CDATA[fedex day]]></category>
		<category><![CDATA[nubic]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=774</guid>
		<description><![CDATA[NUBIC decided to do a FedEx Day (following the Atlassian model). It&#8217;s our first FedEx Day, and we had about 1/3rd of our staff build or collaborate on projects between the NUBIC software development and EDW groups. There was also unanimous &#8230; <a href="http://informatics.northwestern.edu/blog/edw/2012/03/nubics-fedex-day/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/03/packing-box.png"><img class="alignleft size-full wp-image-796" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/03/packing-box.png" alt="" width="73" height="60" /></a><a href="http://www.nucats.northwestern.edu/clinical-research-resources/data-collection-biomedical-informatics-and-nubic/bioinformatics-overview.html">NUBIC</a> decided to do a <strong>FedEx Day</strong> (following <a href="http://confluence.atlassian.com/display/DEV/Atlassian+FedEx+Days">the Atlassian model</a>). It&#8217;s our first FedEx Day, and we had about 1/3rd of our staff build or collaborate on projects between the NUBIC <a href="https://github.com/nubic">software development</a> and <a href="http://informatics.northwestern.edu/blog/edw/">EDW</a> groups. There was also unanimous agreement that it was a good exercise, and that we should do it again.</p>
<p>Here&#8217;s a rundown of what was built in 24 hours (in the order in which they were presented):</p>
<p>&nbsp;</p>
<p><strong>Project: </strong><a href="https://github.com/NUBIC/bundle-recorder-jenkins-plugin">Bundle Recorder<br />
</a><strong>People: </strong><a href="https://github.com/rsutphin">Rhett Sutphin</a></p>
<p><a href="https://github.com/NUBIC/bundle-recorder-jenkins-plugin">Bundle Recorder</a> is a <a href="http://jenkinsci.org/">Jenkins</a> plugin for tracking the gem dependencies used in each build of a <a href="http://gembundler.com/">bundler</a>-using Ruby project. It stores bundler&#8217;s <strong>Gemfile.lock</strong> at the end of each build and provides a way to view changes between adjacent builds. While it can be used with any Ruby project, it will be most useful for gems, since <a href="https://groups.google.com/d/msg/ruby-bundler/ldJjA8ivqFs/OYXruxbgbwYJ">you don&#8217;t usually commit the Gemfile.lock for gems</a>.</p>
<p>I built this plugin to address an issue I ran into more than a few times: NUBIC <a href="https://public-ci.nubic.northwestern.edu/">uses Jenkins</a> for continuous integration. One of the things we use it for is to perform nightly tests of our various gems to ensure that they continue to work as new versions of their own dependencies are released. Sometimes a new dependency does cause a failure &#8212; that&#8217;s good as far as it goes; it means the builds are doing their jobs. However, when a nightly fails, it&#8217;s often hard to tell what changed &#8212; by default all we have is the console output from `<strong>bundle update</strong>`. It lists the dependencies and versions, but &#8212; because we also use these builds to verify that none of the dependencies have been yanked from `<strong>rubygems.org</strong>` and so reinstall everything &#8212; not which ones just changed. And they are in no particular order. By parsing the bundler lockfile, Bundle Recorder can give a summary of just what&#8217;s changed.</p>
<p>&nbsp;</p>
<p><strong>Project: </strong>Natural Language Processing (<a href="http://en.wikipedia.org/wiki/Natural_language_processing">NLP</a>) Abstraction Tool<br />
<strong>People: </strong><a href="http://www.linkedin.com/pub/luke-rasmussen/5/6b2/538">Luke Rasmussen</a>, <a href="http://www.linkedin.com/pub/tuan-nguyen/10/0/54a">Tuan Nguyen</a>, <a href="http://www.linkedin.com/in/thomasjelbert">Thomas Elbert</a>, <a href="http://www.linkedin.com/pub/schneider-daniel/7/667/a45">Daniel Scheider</a></p>
<p>Our project was to create a web application to allow someone to go in and make annotations on text documents.  Why is this useful?  When you&#8217;re working with text documents in any type of automated fashion, being able to have a &#8220;gold standard&#8221; to validate against is really important.  For NLP, this gold standard can help with machine learning efforts to improve the NLP engine.  It can also help verify if documents have had all of the patient information removed, and annotate where anything was missed.  The main goals were to create a tool that is:</p>
<ul>
<li>lightweight</li>
<li>easy to learn and use quickly</li>
<li>built on top of the UIMA framework (<a href="http://uima.apache.org/">http://uima.apache.org</a>/).</li>
</ul>
<p>Making this all happen from design to development in 24 hours was quite a challenge, but a fun one.  We split into a &#8220;data team&#8221; and a &#8220;UI team&#8221;.  The data team worked on getting documents from the EDW so the annotation tool could access them.  The UI team developed the annotation system, and the web services to connect everything together.  Probably the biggest challenge was trying to tone down what we wanted the tool to be able to do (the infamous scope creep – yes, even programmers do it) to the point we could get something working in the time allowed.  Getting free time to work on a pet project is probably the greatest gift a programmer can receive, and we had a lot of fun making the system work and talking about what we&#8217;d like to do with it.  We&#8217;re definitely hoping for another FedEx Day in the future!</p>
<p>&nbsp;</p>
<p><strong>Project: </strong><a href="https://github.com/normalocity/tardis">tardis &#8211; your friend in time<br />
</a><strong>People: </strong><a href="http://jefflunt.com">Jeff Lunt</a></p>
<p><strong>tardis</strong> is a grapher for events against a timeline. I wanted a simple way to graph arbitrary events against a timeline using the <a href="http://www.simile-widgets.org/timeline/">MIT Simile timeline</a> widget. The goal was to provide a visual representation of events in web apps, for example graphing the incident of application errors vs. time, vs. other events such as server maintenance, server load, etc. The hopeful outcome is that you have a timeline that covers every level of your stack (host, VM usage, process load, web app framework and errors, user load, etc.) all in one view, so that it&#8217;s easier to correlate problems and changes in application performance with events at other levels in the stack.</p>
<p>It&#8217;s important to realize that typical graphing libraries, that plot numerical data across two axis, don&#8217;t really handle momentary or duration events, or if they do, they don&#8217;t do it well. That&#8217;s what makes the <a href="http://www.simile-widgets.org/timeline/">MIT Simile timeline</a> widget especially useful for this purpose.</p>
<p><strong>tardis</strong> can also be used to post any series of events, even if it&#8217;s not software related, via a simple RESTful API, to be documented and published in the near future.</p>
<p>&nbsp;</p>
<p><strong>Project: </strong>tahoe file system implementation<br />
<strong>People: </strong>Dong Fu</p>
<p><a href="https://tahoe-lafs.org/trac/tahoe-lafs">Tahoe-LAFS</a> is an open-source distributed file system project that implements the <a href="http://en.wikipedia.org/wiki/Principle_of_Least_Authority">Principle of Least Authority (POLA)</a> and <a href="http://en.wikipedia.org/wiki/Redundant_array_of_inexpensive_nodes">Reliable Array of Independent Nodes (RAIN)</a>.  By leveraging an independent pool of SAN storage devices through WAN or LAN links, the software presents end-users with a fault-tolerant and secure resource for backup and online storage.  For the FedEx Day, I was able to set up a virtual machine with Tahoe-LAFS installed and configured.  I was also able to register the test VM onto the public storage Grid to demonstrate its potential.  Other aspects of the product, such as performance, fault-tolerance, and cross-platform compatibility will be explored at the next available opportunity.</p>
<p>A one-page description of the design of Tahoe-LAFS can be found <a href="https://tahoe-lafs.org/trac/tahoe-lafs/browser/docs/about.rst">here</a>.</p>
<p>&nbsp;</p>
<p><strong>Project: </strong>nubic boostrap &#8211; a virtual machine build automation using Puppet<br />
<strong>People: </strong><a href="https://twitter.com/#!/williamjdix">William Dix</a>, John Dzak, Dong Fu</p>
<p><strong>From Will:</strong> On boarding of new developers is a slow and difficult process.  Installing dependencies, particularly Oracle, is time consuming to say the least.  To improve this process, I wanted to build a VM with Oracle installed and set it up as a <a href="http://vagrantup.com/">Vagrant</a> base box (Vagrant is a Ruby gem which provides easy management of Virtualbox VMs).  Using Vagrant, a developer can quickly set up a new Oracle VM on their local machine whenever there is a need.</p>
<p>My project was not as successful as I would have liked.  Primarily because of the time required to move large VMs around, conver them to the proper formats, etc. Because of the time it took just to get the VMs in the right place and in the right format, I was not able to do other desired tasks like automated schema loading.</p>
<p><strong>From John: </strong>So, my FedEx Day project was the nubic bootstrap project that I worked on with Dong and William.  Initially I was thinking of working on a set of scripts that would make setting up an Oracle VM easier, but after talking with William and Dong we decided to take an approach similar to how <a href="http://gembundler.com/">bundler</a> works where you specify your dependencies in a file (shown below), and run the gem command `<strong>nubic_bootstrap install</strong>`, and all those dependencies are installed.  Here&#8217;s an example config file:</p>
<p><strong>example nubic_bootstrap.yml</strong></p>
<pre>vms:
  oracle:
    schemas: [cc_pers, cc_notis]
    database: //localhost:15210/XE</pre>
<p>I was also hoping to build on some of the functionality that bcdatabase/Oracle offered while building the gem.  After a couple of hours of work I realized that invoking the commands I need on the guest VM was trickier than I initially thought since I needed to know information about the guest VM to invoke certain commands (ex. oracle home).  As time started running out, I retreated back to the initial, more basic idea of commands to import/export databases in the VM along with hard coding some of the settings.  The gem ended up being very hard coded, but it could download the Oracle VM and import a schema into it.  Also, as we neared the end of FedEx Day William found a way to invoke commands inside a VM using Vagrant which could be useful in the future.  I think once I have some more freetime on my hands I will continue with this project since it is something our group could use when migrating to full application environments inside the VM for projects that use Oracle.</p>
<p>&nbsp;</p>
<p><strong>Project: </strong>The cost of changing health insurance, to insurance companies<br />
<strong>People: </strong><a href="http://fsmweb.northwestern.edu/faculty/facultyProfile.cfm?xid=20839">Justin Starren</a></p>
<div>Justin presented a statistical analysis of when it makes sense for health insurance customers to change providers, as well as when providers might encourage good customers to stay, and poor customers to change to other providers. It contained findings regarding what the cost of losing good customers was to insurance companies, among others.</div>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/edw/2012/03/nubics-fedex-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Continuous Delivery reading and resource list</title>
		<link>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/continuous-delivery-reading-and-resource-list/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=continuous-delivery-reading-and-resource-list</link>
		<comments>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/continuous-delivery-reading-and-resource-list/#comments</comments>
		<pubDate>Thu, 16 Feb 2012 19:36:11 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[continuous delivery]]></category>
		<category><![CDATA[continuous deployment]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[release management]]></category>
		<category><![CDATA[software deployment]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=572</guid>
		<description><![CDATA[Update 3/28/2012: In order to manage server configuration and code deployments, I&#8217;m experimenting with a free, hosted Chef account from OpsCode.com. I&#8217;m also following this blog post for the application deployment-specific questions and solutions. &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;- My technical project for the &#8230; <a href="http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/continuous-delivery-reading-and-resource-list/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Update 3/28/2012: </strong>In order to manage server configuration and code deployments, I&#8217;m experimenting with a free, hosted Chef account from <a href="http://www.opscode.com/">OpsCode.com</a>. I&#8217;m also following this blog post for the application deployment-specific questions and solutions.</p>
<p><strong></strong>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</p>
<p>My technical project for the year, I&#8217;ve decided, is to build a continuous delivery system inside the NUBIC dev team.</p>
<p>Here&#8217;s a quick reading list of source materials that I&#8217;m using to learn how to do it (blog posts to follow as I document the process of building the system internally):</p>
<ul>
<li>The <a href="http://www.amazon.com/Continuous-Delivery-Deployment-Addison-Wesley-ebook/dp/B003YMNVC0/ref=sr_1_1?s=digital-text&amp;ie=UTF8&amp;qid=1329420497&amp;sr=1-1">Continuous Delivery book</a></li>
<li>The <a href="http://continuousdelivery.com/">Continuous Delivery blog</a></li>
<li><a href="http://radar.oreilly.com/2009/03/continuous-deployment-5-eas.html">Eric Ries&#8217; 5-step primer to Continuous Deployment</a></li>
<li><a href="http://www.startuplessonslearned.com/2010/01/case-study-continuous-deployment-makes.html">Case Study: Continuous deployment makes releases non-events</a></li>
<li><a href="http://engineering.imvu.com/2010/04/09/imvus-approach-to-integrating-quality-assurance-with-continuous-deployment/">IMVU’s Approach to Integrating Quality Assurance with Continuous Deployment</a></li>
<li><a href="http://venturehacks.com/articles/five-whys">A series on the principle of &#8220;The 5 whys&#8221;</a></li>
<li><a href="http://en.wikipedia.org/wiki/Pareto_principle">The Pareto principle, a.k.a. the 80/20 rule</a></li>
</ul>
<p>These things couple very well with additional practices that NUBIC embraces as part of its software development process, including:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Test-driven_development">Test Driven Development</a></li>
<li><a href="http://en.wikipedia.org/wiki/Behavior_Driven_Development">Behavior Driven Development</a></li>
<li><a href="http://en.wikipedia.org/wiki/Continuous_Integration">Continuous Integration</a></li>
<li><a href="https://github.com/nubic">Releasing much of our code as open source</a></li>
<li><a href="http://en.wikipedia.org/wiki/Kaizen">Kaizen &#8211; continuous improvement philosophy</a></li>
<li><a href="http://www.joelonsoftware.com/articles/fog0000000043.html">Doing well on the &#8220;Joel Test&#8221;</a></li>
</ul>
<p>Finally, <a href="http://www.thoughtworks-studios.com/">ThoughtWorks Studios</a> has a commercial product called <a href="http://www.thoughtworks-studios.com/go-agile-release-management">Go</a> for automated release management. A couple of people from ThoughtWorks also happen to be the authors of the book on Continuous Delivery.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/continuous-delivery-reading-and-resource-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Information leakage, and the many places our data goes</title>
		<link>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/information-leakage-and-the-many-places-our-data-goes/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=information-leakage-and-the-many-places-our-data-goes</link>
		<comments>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/information-leakage-and-the-many-places-our-data-goes/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 17:55:08 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[anthony castillo]]></category>
		<category><![CDATA[authentication security]]></category>
		<category><![CDATA[leaky information]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[stackoverflow]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=730</guid>
		<description><![CDATA[Digital information is, without careful design, pretty leaky. It gets transferred all over the place, it gets logged, it gets backed up, and it often lives in those backups for a long time. This post discusses just one way that &#8230; <a href="http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/information-leakage-and-the-many-places-our-data-goes/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Digital information is, without careful design, pretty leaky. It gets transferred all over the place, it gets logged, it gets backed up, and it often lives in those backups for a long time.</p>
<p>This post discusses just one way that sensitive information is transparently taken out of your hands, and put into places you didn&#8217;t expect, or weren&#8217;t obvious from the use of a given technology.</p>
<p>My example is this: putting username/password combinations <strong>in a URL</strong>, as outlined in <a href="http://stackoverflow.com/questions/4980912/username-and-password-in-https-url">this question on StackOverflow</a> (you&#8217;ll probably want to read through this, for some background). While the implementation question posed is a simple one, and may make certain programming tasks simpler, it&#8217;s turned out to be a bad idea, because of the potential of that information leaking into unintended places.</p>
<p>As a follow up to that StackOverflow question, I received an email from <a href="http://twitter.com/#/alanthonyc">Anthony Castillo</a> (you can follow his blog at <a href="http://inquirious.com/">inquirious.com</a>), that delved deeper into this problem, specifically in an iOS context. The discussion is germane to anyone thinking about the unintended consequences of transferring data over networks:</p>
<blockquote><p>Hi Jeff,</p>
<p>[...]</p>
<p>I asked a question similar to one of yours on SO (<a href="http://stackoverflow.com/questions/4980912/username-and-password-in-https-url" target="_blank">regarding https://username:password@service.com type urls</a>).</p>
<p>I&#8217;m trying to decide if it is safe *enough* for me to use that format for an iOS project I&#8217;m working on. Just wondering if you went with it or if you ended up doing something else.</p>
<p>&nbsp;</p></blockquote>
<p>I replied&#8230;</p>
<blockquote><p>Hey Anthony, [...]</p>
<p>The short answer to your follow up is &#8220;no &#8211; it&#8217;s not considered good enough&#8221;. Security, to some extent, is a matter of managing risk. Username+password in HTTPS links potentially increases your risk with basically no reward. Any logs on systems between you and the destination system that capture that information (or even non-secure logs on your own machine), effectively makes using HTTPS useless for this purpose, since you wind up putting your credentials in the clear.</p>
<p>No bueno.</p>
<p>I originally wrote this question because I was trying to automate a connection between a script on my local machine, and my source repository, thereby preventing a password prompt form coming up at all, so the script could run unattended. The problem I was trying to solve was automating a web app deployment. However, I found that this wasn&#8217;t the best thing to do, for the following reasons:</p>
<ol>
<li>It&#8217;s better for security by far to use key-based authentication (rather than username+password), and connect over SSL for deployments and other automated processes, rather than HTTPS. The reasons for this are many, but the benefits are basically that automation can happen pretty easily after you get this setup, and because you&#8217;re not using passwords, the encryption is generally considered less likely to be broken, since your &#8220;secret key&#8221; is a high-security key, rather than a human-chosen password.</li>
<li>Having fully automated deployments was troublesome without <strong>also</strong> having an automatic rollback process. In my case, if a developer wasn&#8217;t involved in the deployment, that was a bigger problem than the inconvenience of having to be physically present to enter one&#8217;s password. [...]</li>
</ol>
<p>Back to your question, HTTPS connections are, as I understand them, tunnels directly between the client and the server, so theoretically security should be okay. However, you never know how many machines that request goes through before that tunnel is established (your ISP/wireless provider, for example, is one place that might get a hold of the URL <strong>with</strong> the username+password in it). Ideally, whatever code library you&#8217;re using to make the connection should drop the username+password from the URL while establishing the tunnel, however, there&#8217;s no guarantee, and it&#8217;s just better never to store your username+password in plain text <strong>anywhere</strong> that you don&#8217;t absolutely have to do so, especially since there are much better, and more secure, alternatives that are just as easy to setup. In any cases where you&#8217;ve determined that it is absolutely necessary to store credentials in plain text (I can&#8217;t think of any in the modern world in which we live), it&#8217;s critical that the file system on which it is stored is a system <strong>completely within your control, and not accessible via the Internet, </strong>lest someone get access to it. This includes not only the file system itself, but any locations/systems, onsite or off, to which that file system is backed up. In the real world, guaranteeing such a situation is difficult at best, and in practical terms, is probably closer to impossible.</p>
<p>As for your iOS app, that&#8217;s a bit trickier, depending on your situation, because you&#8217;re not talking about one key, and one server, you&#8217;re talking about a bunch of users, and a server they connect to. However, I would think that you could achieve key-based authentication for your users by giving them an API/application key that they could copy/paste into your iOS app that would essentially accomplish the same thing. If you need to go over HTTPS, you can pass the key in as a POST parameter to the session, and allow secure data transfer between your app and the server, in addition to the key acting as authentication (this user is who they say they are). I&#8217;m not an iOS expert, but one of my colleagues mentioned that iOS offers some encrypted storage on the device to store sensitive information such as credentials, and that it&#8217;s been available to iOS devices at least since iOS v4 (give or take). [..]</p></blockquote>
<p>Anthony replies&#8230;</p>
<blockquote><p>Hi Jeff, [...]</p>
<p>&nbsp;</p>
<p>After reading your email, I started digging around the web a bit more. Please let me know what you think of the following:</p>
<p><a href="http://en.wikipedia.org/wiki/Basic_access_authentication" target="_blank"><strong>Basic Access Authentication</strong></a> - this is the &#8220;<em><a href="mailto:username%3Apassword@service.com" target="_blank">username:password@service.com</a></em>&#8221; format that we have been discussing. This is as opposed to <a href="http://en.wikipedia.org/wiki/Digest_Access_Authentication" target="_blank">Digest Access Authentication</a>. (There may be others as well.)</p>
<ul>
<li>Basic authentication can be used with <em>either</em> HTTP or HTTPS.</li>
<li>With HTTP, it is definitely insecure.</li>
<li>When used with HTTPS, that implies SSL encryption over the whole connection.</li>
</ul>
<p>So&#8230;I <em>think</em> it&#8217;s okay to use basic access authentication, <strong>as long as</strong> it&#8217;s over HTTPS. (<a href="http://stackoverflow.com/questions/3464454/https-and-basic-authentication#3464462" target="_blank">1</a>) (<a href="http://en.wikipedia.org/wiki/HTTP_Secure" target="_blank">2</a>)</p>
<p>However, the point you make about not knowing how many machines a request goes through before reaching the server leaves me with slight doubts.</p>
<p>Perhaps we have slightly different choices here because you have control over both the client and server sides of the connection. Hence your ability to choose a protocol involving an API key.</p>
<p>As for me, I am building an app against a third-party web service. This means I have to go along with whatever protocol they have in place. In this case, it&#8217;s username/password authentication over HTTPS.</p>
<p>Maybe it <em>isn&#8217;t</em> completely secure, but it might still be the best I can do given my choices. (I think that&#8217;s my conclusion.) I don&#8217;t believe there is a way for me to force a different authentication method on the service to which I&#8217;m connecting.</p>
<p>Regarding encrypted storage on iOS, that is indeed a standard feature offered by Apple. They have what is known as a &#8220;Keychain&#8221; feature that allows you to store and encrypt data on an app by app basis. I basically let the OS handle the details of that for me. (Pretty nice.)</p></blockquote>
<p>And finally&#8230;</p>
<blockquote><p>Anthony,4</p>
<p>In your situation, it sounds like you&#8217;re right &#8211; if this is the only authentication method provided (and they don&#8217;t publish per-user API keys) you&#8217;re kind of up the creek on that one. You should still avoid putting the username/password in the URL itself if at all possible, depending on the limitations of the API requests. I would, if possible, contact the vendor and ask that they add it [key-based authentication]. In the mean time, I think you&#8217;re right that you&#8217;re stuck, according to what you&#8217;ve outlined.</p>
<p>The digest authentication looks like your best best, but isn&#8217;t considered as secure as key-based authentication (as noted in the wikipedia article you linked to), but it looks like you already know that. [...]</p></blockquote>
<p>So, it&#8217;s always a good idea to think about where your information is not only stored, but where it&#8217;s transferred to, and what that might mean in terms of that information getting out of your control inadvertently. Thanks to <a href="http://twitter.com/#/alanthonyc">Anthony Castillo</a> for the deeper discussion, and a specific example to which we could apply the principle.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/02/information-leakage-and-the-many-places-our-data-goes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why RTFM doesn&#8217;t work</title>
		<link>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/01/why-rtfm-doesnt-work/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=why-rtfm-doesnt-work</link>
		<comments>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/01/why-rtfm-doesnt-work/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 20:30:07 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[5-whys]]></category>
		<category><![CDATA[rtfm]]></category>
		<category><![CDATA[step-by-step instructions]]></category>
		<category><![CDATA[technical support]]></category>
		<category><![CDATA[training issues]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=601</guid>
		<description><![CDATA[It&#8217;s not the users&#8217; fault. Honestly, it&#8217;s not. When answering a technical support question, have you ever asked someone, &#8220;Did you read the manual?&#8221; Well, put away your superiority complex for a moment, and realize that your users are wondering &#8230; <a href="http://informatics.northwestern.edu/blog/nubic-dev-2/2012/01/why-rtfm-doesnt-work/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div>
<p>It&#8217;s <strong>not</strong> the users&#8217; fault. Honestly, it&#8217;s not.</p>
<p>When answering a technical support question, have you ever asked someone, &#8220;Did you read the manual?&#8221; Well, put away your superiority complex for a moment, and realize that your users are wondering why they need a manual in the first place.</p>
<p>Manuals stink, plain and simple, so stop using them whenever possible. If you&#8217;ve got a complex application, website (or really any training process whatsoever), and you feel that you aren&#8217;t receiving the respect you deserve for writing that 900-page, 100% comprehensive training manual, stop spending time on trying to improve the manual, and instead change the system.</p>
</div>
<div>Here are a few things you can try that are very simple, and very effective:</div>
<div></div>
<ul>
<li><strong>Ask why</strong> &#8211; first and foremost you need to understand why the user is having a problem with your application, then you need to correct the flaw that is causing the problem in the first place, thereby eliminating the need for the user to ask the question at all. A great method accomplishing this is <a title="5 Whys" href="http://en.wikipedia.org/wiki/5_Whys">5 Whys</a>. During the process of asking &#8220;why&#8221; it&#8217;s important to always be gracious about honest feedback, and curious about how people arrive at their state of confusion. Once you&#8217;ve figured out what&#8217;s at the root of the problem, it&#8217;s usually a trivial thing to change it.</li>
<li><strong>Show, don&#8217;t tell</strong> &#8211; create a short training video that shows people how to use it, rather than trying to explain it via text and pictures. If your training video can&#8217;t correctly explain it in less than three minutes, your app is either too complex, or your video is trying to do too much. Either fix your app, or sharpen the focus of your video. Great examples of awesome instructional videos are the videos that <a href="http://help.squarespace.com/customer/portal/articles/14410-squarespace-platform-overview-video-">introduce SquareSpace</a>. They are short, focused on a single topic each, and (in the case of SquareSpace) linked directly from the pages in which the related question might be raised in the user&#8217;s mind. A user is editing a webpage and wants to know how to add an image? The video for editing pages is linked from the page editing screen. Simple. It&#8217;s true that they still maintain a searchable collection of videos that any user can simply watch, but the fact of the matter is that pretty much no one is going to go through this library and watch all the videos <strong>first</strong>. Users will typically try something, and only when they fail, will they ask for help.</li>
<li><strong>Protect users from accidents</strong> - There are many times that users will do things that they don&#8217;t know are dangerous until it&#8217;s too late, and they can&#8217;t go back! Whenever possible, provide an &#8220;undo&#8221; function that allows users to fix mistakes with a simple click or keystroke. This method is often far superior then shifting all responsibility to the user, and presenting them with, &#8220;Are you sure?! You cannot undo this!&#8221; sorts of messages. Those messages make users fearful, cause them to stop and call you for help making a decision about what to do, and ultimately shift blame to the user when simply providing an &#8220;undo&#8221; function largely avoids the problem from happening the first place. Even the most seasoned users will occasionally make mistakes. These people aren&#8217;t &#8220;dumb,&#8221; and they&#8217;re just human after all. Do you really want to have to recover lost data, or blame them for the mistake, when your system could simply protect users from such accidents in the first place?</li>
<li><strong>Automate it</strong> &#8211; sometimes people make mistakes when doing repetitive tasks, because humans aren&#8217;t as good at doing highly repetitive things accurately 100% of the time, as compared to computers. This problem is exacerbated by processes that have multiple steps, where a mistake in any one of the steps can cause the whole process to break down. Try helping the users of your site or application by pre-filling in values for forms, automatically inserting reasonable default values, or better yet, just completely automate the process whenever possible. If there&#8217;s no reason that a human really needs to be involved in a process, take them out of the loop and save everyone some time and energy.</li>
<li><strong>Language is imprecise</strong> &#8211; step-by-step instructions, no matter how detailed and precise, no matter how carefully worded, are difficult to follow. Users gets lost in lengthy instructions, misunderstand or misinterpret technical terms, and people simply don&#8217;t want to read instructions anyway. Providing users with a glossary of terms (thinking that the manual should explain itself) isn&#8217;t really the answer either. So, use pictures instead of words when possible, and video instead of pictures when possible. The complications of interpreting language is part of why IKEA&#8217;s assembly instructions contain no words, only pictures.</li>
</ul>
<div>
<div id="attachment_705" class="wp-caption alignnone" style="width: 721px"><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/ikea_assembly_instructions.jpg"><img class="size-full wp-image-705" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/ikea_assembly_instructions.jpg" alt="" width="711" height="492" /></a><p class="wp-caption-text">The introductory page that explains how to avoid damaging your new furniture during assembly, and what to do if you need help or are confused. Pretty clear, yes? (1) put a carpet or rug under the pieces while assembling them, (2) if you&#039;re confused, look in the manual for a picture that shows what to do, and (3) call IKEA. Note that the last picture isn&#039;t a person on a phone calling IKEA - it&#039;s literally a handset connected to IKEA. When I see this, I think only two words: &quot;phone IKEA&quot;. The implication is uncomplicated, and clear. Also note this caption of those four pictures took an entire paragraph. Not very efficient, friendly, or helpful, is it?</p></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/nubic-dev-2/2012/01/why-rtfm-doesnt-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ETL Assistant – Getting Error Row Description and Column Information Dynamically</title>
		<link>http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=etl-assistant-getting-error-row-description-and-column-dynamically</link>
		<comments>http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/#comments</comments>
		<pubDate>Mon, 16 Jan 2012 07:15:40 +0000</pubDate>
		<dc:creator>Eric Whitley</dc:creator>
				<category><![CDATA[EDW]]></category>
		<category><![CDATA[ETL Assistant]]></category>
		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=498</guid>
		<description><![CDATA[SSIS does a fine job at letting you manage "garden path" ETL, but many face the challenge of how to manage row failures. Which row failed? Why did it fail? Error row handling is a central part of any development task and usually winds up representing a significant chunk of your time and code. In this article we'll step you through how to overcome SSIS's design-time-only availability of error row information by creating a runtime dynamic error row handler using CozyRoc's tool kit for SSIS. <a href="http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This article is going to attempt to provide one solution to the question of row error management in SSIS.  It&#8217;s one option, specially constructed for dynamic column mapping scenarios, but could probably be exploited for static situations as well.</p>
<h2>TLDR:</h2>
<ul>
<li>Download the sample package (<a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/dynamic_dft_error_handler_example.zip">Dynamic DFT Error Handler Sample Project</a>)</li>
<li>Run the SQL script</li>
<li>Open up the package and make sure your connections are set up appropriately</li>
</ul>
<h2>Management of Bad Rows in SSIS</h2>
<p>For ETL, SSIS does a fine job at letting you manage the basics of copying one column of data in some source table to another column of data in destination table.  Assuming all goes well, you wind up extracting/transforming/loading that data.</p>
<p>If things don&#8217;t go well, however&#8230;</p>
<p>Exception handling is a central part of any development task and usually winds up representing a significant chunk of your time and code. You wind up covering any number of &#8220;what ifs&#8221; like:</p>
<ul>
<li>What if I failed to connect to a system?</li>
<li>What if I expected data and didn&#8217;t get any?</li>
<li>What if my expected data type overflowed?</li>
<li>What if something totally unanticipated happened?</li>
</ul>
<p><a style="font-style: normal; line-height: 24px; text-decoration: underline;" href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_1_vanilla.png"><img class="size-full wp-image-649 alignright" style="border-style: initial; border-color: initial; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: #eeeeee;" title="etl_asst_error_log_1_vanilla" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_1_vanilla.png" alt="ETL Assistant Error Logger - Basic SSIS DFT Without Error Handling" width="202" height="185" /></a>If you&#8217;ve used SSIS for ETL you&#8217;re accustomed to the idea of data flow paths inside of a transformation.  You connect a source component to a destination component via either a green line (&#8220;good output&#8221;) or a red line (&#8220;bad / error output&#8221;).  This is great stuff.  Say you query some rows from a source database table and want to send the rows to a destination database table &#8211; you simply wire up the green line from the source to the destination and map the columns.  Done.  Walk away.</p>
<p>But what about the implied red line for bad rows?  What if you actually have an issue with the transformation?  Two immediate reasons come to mind:</p>
<ul>
<li>The data was truncated in some way (cast my Oracle number(20,0) to a SQL int)</li>
<li>Some other unanticipated error occurred (for the sake of explanation, let&#8217;s say a primary key violation on insert)</li>
</ul>
<p><a style="font-style: normal; line-height: 24px; text-decoration: underline; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;" href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_2_error_redirection.png"><img class="size-full wp-image-651 alignright" style="border-style: initial; border-color: initial; margin-top: 0.4em; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: #eeeeee;" title="etl_asst_error_log_2_error_redirection" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_2_error_redirection.png" alt="ETL Assistant - SSIS DFT Error Row Redirection" width="314" height="267" /></a></p>
<p>Usually what you&#8217;d do with a static transformation is simply use row redirection to handle the exception.  A common solution is to log your error information to a shared error log table for later review.  By attaching the appropriate error output to your destination you &#8220;channel&#8221; the row information to that destination so you have a hope of figuring out what happened and what you can do about it.</p>
<p>SSIS usually works really well for these situations, with the exception of two nagging challenges you&#8217;ll see come up a <em>lot</em> in discussion forums:</p>
<ul>
<li>&#8220;My row failed &#8211; how do I get the error description?&#8221;</li>
<li>&#8220;My row failed &#8211; how do I tell which row failed?&#8221;</li>
</ul>
<p>Error description is fairly straight forward and I&#8217;m not going to get into it too much &#8211; there&#8217;s a great step-by-step example at (<a href="http://consultingblogs.emc.com/jamiethomson/archive/2005/08/08/1969.aspx">http://consultingblogs.emc.com/jamiethomson/archive/2005/08/08/1969.aspx</a>) which is very instructive.</p>
<p>Error row identifier, though, is a bit more complex because of the way SSIS works.</p>
<h2>Error Columns and Lineage IDs</h2>
<div style="border: 1px solid yellow; background-color: #ffffcc; padding: 5px;">I&#8217;m going to preface this next section by noting that I don&#8217;t have a super clear picture on the internals of how SSIS column flow works, but I get a sense of it.  Please feel free to comment / email me and I&#8217;ll update anything that needs correcting.</div>
<p>&nbsp;</p>
<p>Let&#8217;s say you have a row with an integer column &#8220;employee_id&#8221; which is the primary key on a table.  What you see is a single presentation of that column &#8220;employee_id&#8221; &#8211; it&#8217;s labeled that way throughout your data transformation flow, so to you it&#8217;s &#8220;the same&#8221; throughout the flow.  What SSIS sees internally, however, is something completely different.  If you dig a bit you&#8217;ll find you have a <em>unique</em> representation of this column at each point throughout the flow of your SSIS package.  That single &#8220;column&#8221; (&#8220;employee_id&#8221;) has to be treated uniquely at each input, output, and error output for each step.  Beyond needing to understand how to treat flow direction (ex: input column vs output column), the column itself may change data types, names, or even value as it flows through your package.  SSIS needs to keep track of that &#8220;column&#8221; at each point throughout the flow and treat it as though it&#8217;s unique.  So how does it do that?  LineageID.</p>
<p>There&#8217;s a great article on SQL Server Central (<a href="http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/65730/" target="_blank">http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/65730/</a> ) that touches on some of this.  The article describes lineageid as</p>
<blockquote><p>It’s an integer and it’s unique throughout the data flow. When buffers are reused, the Lineage ID doesn’t change – it’s the same column at the input and output. When buffers are copied, a new column is created – which gets a new (unique) Lineage ID.</p></blockquote>
<p>That means that as the column &#8220;employee_id&#8221; flows through the DFT, it gets a unique Lineage ID -<em> for each input and output copy of itself</em>.  And, typically, you have&#8230;</p>
<ul>
<li>An input column</li>
<li>An output column for &#8220;good&#8221; data</li>
<li>An output column for errors</li>
</ul>
<p>Taking the &#8220;employee_id&#8221; example from the &#8220;OLE DB Source&#8221; step in our DFT we&#8217;d have:</p>
<ul>
<li>Input (ID = 33)</li>
<li>Source Error Output (Lineage ID 35)</li>
<li>Good Output (Lineage ID = 34)</li>
</ul>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_6_lineage_id_11.png"><img class="alignnone size-full wp-image-663" title="etl_asst_error_log_6_lineage_id_1" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_6_lineage_id_11.png" alt="ETL Assistant - SSIS DFT Lineage ID Flow" width="674" height="339" /></a></p>
<p>Great!  No problem.  As long as we know the LineageIDs related to our steps we can back track to determine the mapping to &#8220;column name&#8221; and voila &#8211; we know which row failed.  We can simply look up the column by LineageID using &#8220;FindColumnByLineageID&#8221; in a script task (<a href="http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.wrapper.idtsbuffermanager100.findcolumnbylineageid.aspx">http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.wrapper.idtsbuffermanager100.findcolumnbylineageid.aspx</a>).  Magic.</p>
<p>Not so fast.  One small, but critical catch.  Metadata about a task step is only available within the scope of that task step. Meaning &#8211; once we get past &#8220;OLE DB Source&#8221; I can see &#8220;Lineage ID,&#8221; but I can&#8217;t easily track back to determine the <em>mapping</em> of Lineage ID to column name.  So &#8211; if you want to write out error row information (specifically &#8220;column name&#8221;) in a second DFT (to your error log, for example) there&#8217;s no way to look up that name &#8211; because the metadata about LineageID is no longer in scope &#8211; it&#8217;s only available to the <em>prior</em> step.  Incredibly frustrating.</p>
<h2>Getting Error Column Information With Static DFTs</h2>
<p>For static packages this can be addressed a few ways. The general strategy is to map the Lineage IDs / IDs to column information at <em>design time</em> and then use that information to look up the information you need.</p>
<p>Couple of quick links you may find handy.</p>
<ul>
<li>How to Find Out Which Column Caused SSIS to Fail? (<a href="http://blogs.msdn.com/b/helloworld/archive/2008/08/01/how-to-find-out-which-column-caused-ssis-to-fail.aspx">http://blogs.msdn.com/b/helloworld/archive/2008/08/01/how-to-find-out-which-column-caused-ssis-to-fail.aspx</a>)</li>
<li>Error Output&#8217;s Description (Component on CodePlex) (<a href="http://eod.codeplex.com/">http://eod.codeplex.com/</a> )</li>
<li>eLog (<a href="http://ssisctc.codeplex.com/wikipage?title=eLog&amp;referringTitle=Home">http://ssisctc.codeplex.com/wikipage?title=eLog&amp;referringTitle=Home</a>)</li>
</ul>
<p>Again &#8211; for static packages, these can mostly if not completely solve the issue and leave you in a far better position to determine which rows failed.  I&#8217;m not going to go into these since you can read up online.</p>
<h2>So What About a Dynamic DFT?</h2>
<p>Note that the links I provided above address <em>design time</em> gathering / mapping of column information.   What do you do about a <em>runtime</em> situation?  We started digging into the CozyRoc dynamic DFT about a year ago.  Basic dynamic mappings worked <em>great</em>.  You can easily remap columns at runtime and, assuming all goes well, you&#8217;re done.  But if things don&#8217;t go well &#8211; what then?</p>
<p>We need to catch and log those bad rows.  But &#8211; we can&#8217;t map columns / Lineage ID information at design time because that negates the entire point of using a dynamic DFT &#8211; you won&#8217;t know <em>any</em> of the required information. It&#8217;s just not there.  Now that issue with the resolution of metadata from prior steps comes into play.  We can&#8217;t generate column information at design time and we can&#8217;t inspect metadata from ancestor steps within a DFT.  They&#8217;re out of scope.</p>
<p>I&#8217;ll admit that when I first looked at this I was stumped.  And incredibly frustrated.  There was this great opportunity to really let SSIS <strong><em>rock</em></strong> using CozyRoc&#8217;s dynamic DFT, but the inability to handle bad rows in a data warehousing solution is a showstopper (keep in mind the issue here is an <em>SSIS design constraint</em>, <strong><em>not</em></strong> a CozyRoc fault).  Following the examples for handling static mappings online (thank you very much, above-linked article authors), we had the notion that we should be able to pull some of the DFT information out at runtime and approach the problem somewhat similarly.</p>
<ul>
<li>Upon startup, obtain a list of all columns, their IDs, and their Lineage IDs</li>
<li>Store that list in a collection</li>
<li>Using the IDs / Lineage IDs from the errors to look up the corresponding record in our collection</li>
<li>Profit</li>
</ul>
<p>I rang up CozyRoc and discussed the situation with their engineers.  They immediately understood my intentions and mailed me back a quick sample of some code that exploited a fantastic capability of their dynamic DFT &#8211; the ability to <em>add script to the DFT itself</em>. (Thanks, CozyRoc!)  Not code via a script task <em>within</em> the DFT, but on the DFT directly.</p>
<p>CozyRoc DFT+ (<a href="http://www.cozyroc.com/ssis/data-flow-task">http://www.cozyroc.com/ssis/data-flow-task</a>) notes that you can apply script on the DFT by accessing&#8230;</p>
<ul>
<li><strong>Advanced</strong> tab &#8211; specifies advanced task options.</li>
<li><strong>Script</strong> page &#8211; specifies data flow task script, which is used for <strong>Setup</strong> tab customization.</li>
</ul>
<p>Aha.  And the magic snippet they supplied me&#8230;</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">public <span style="color: #993333;">void</span> OnColumnAdded<span style="color: #009900;">&#40;</span>IDTSComponentMetaData100 component<span style="color: #339933;">,</span> bool isInput<span style="color: #339933;">,</span> string colName<span style="color: #009900;">&#41;</span>
<span style="color: #666666; font-style: italic;">//do stuff</span></pre></div></div>

<p>Great!  They provided event hooks for the dynamic column mapping!  So now I can detect when a column is added to the DFT flow, add it to my reference collection of column information, and then access that collection within the DFT to derive column information critical to error logging.</p>
<p>This will let me take &#8220;Lineage ID&#8221; 12345 at <em>any</em> point throughout the flow and figure out that it was column &#8220;employee_name_concat&#8221; or whatever and log that.  We&#8217;re in business.</p>
<p>Something to note here.  Handling row truncation behavior is trickier when you&#8217;re doing this dynamically.  You can now longer manually address the need to &#8220;redirect on truncation&#8221; on a column by column basis, so you just extend the magic DFT+ column binding event to do it for you.</p>
<p>&nbsp;</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>isInput<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
      IDTSOutputColumn100 column <span style="color: #339933;">=</span> component.<span style="color: #202020;">OutputCollection</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">OutputColumnCollection</span><span style="color: #009900;">&#91;</span>colName<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
      column.<span style="color: #202020;">TruncationRowDisposition</span> <span style="color: #339933;">=</span> DTSRowDisposition.<span style="color: #202020;">RD_RedirectRow</span><span style="color: #339933;">;</span>
      column.<span style="color: #202020;">ErrorRowDisposition</span> <span style="color: #339933;">=</span> DTSRowDisposition.<span style="color: #202020;">RD_RedirectRow</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Done. Setting row disposition behavior accomplished.</p>
<p>From there we wrote up the nastier parts of the whole exercise &#8211; the entire collection lookup mechanism to derive column information.  We did that as a script task within the body of the DFT.</p>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_7_dft_script.png"><img class="alignnone size-full wp-image-671" title="etl_asst_error_log_7_dft_script" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_7_dft_script.png" alt="ETL Assistant - SSIS DFT Error Script Task" width="344" height="556" /></a></p>
<p>The script task pulls rows out of the buffer and evaluates row position to resolve the Lineage ID / ID and determine&#8230;</p>
<ul>
<li>Source column name (EX: &#8220;first_name&#8221;)</li>
<li>Source primary key name (EX: &#8220;employee_id&#8221;)</li>
<li>Source primary key value (EX: &#8220;12345&#8243;)</li>
<li>Error description (using ComponentMetaData.GetErrorDescription)</li>
<li>Error data (so we can quickly eyeball the offending column)</li>
</ul>
<p>You&#8217;ll note I said &#8220;primary key name&#8221; &#8211; we felt it was &#8220;good enough&#8221; for the moment to avoid dealing with compound keys.  That&#8217;s definitely a shortcoming, but for the time being we felt that was acceptable since it matched our existing static ETL error handling process.  It&#8217;s definitely something that needs to be addressed, though.  We also cheat by explicitly passing in the primary key as an element of the process (we derive it at an earlier step) &#8211; again, in consulting speak, an &#8220;opportunity for improvement.&#8221;</p>
<h2>Putting it All Together</h2>
<p>Now that we&#8217;ve touched on the ideas, let&#8217;s see it work.  Rather than walk you through the entire step-by-step process of building a package I&#8217;m going to suggest you <a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/dynamic_dft_error_handler_example.zip">Dynamic DFT Error Handler Sample Project</a>.  I&#8217;ll quickly touch on the major points on how the sample works.</p>
<p>The download includes some SQL scripts to set up&#8230;</p>
<ul>
<li>[<strong>etl_proxy_table</strong>, <strong>etl_proxy_table_stg</strong>, <strong>etl_proxy_table_src]</strong> We use some fake placeholder &#8220;proxy&#8221; tables  so you can set up data bindings in the DFT+.  CozyRoc also suggests you use THUNK_COLUMNs to do this, but I&#8217;ve found using these placeholder tables to be very helpful.  The reason we use these is that the magic OnColumnAdded method<em> only fires when a column is actually added</em> to the DFT. If you statically map any of the columns the entire error handling approach will fail because we won&#8217;t have those &#8220;static&#8221; columns added to our column collection.  Huge thank-you to CozyRoc for clueing me in on that.</li>
<li>[<strong>etl_errors</strong>] our error logging table. YMMV, but remember if you change this you also need to adjust the scripts in the DFT.</li>
<li>[<strong>demo_source_table, demo_dest_table</strong>] our source and destination tables.  We&#8217;re big Simpsons fans over here, so I&#8217;ve provided appropriate sample data.</li>
</ul>
<p>The overall package has a few steps:</p>
<ul>
<li><strong>["Set Table Information"]</strong> - A cheater <strong>Script Task</strong> to mimic pulling table configuration information.  In a production scenario you&#8217;d likely want to provide configuration elements from either a config file or, better yet, a configuration table.</li>
<li><strong>["SQL Get First table_keycol name"]</strong> - An <strong>Execute SQL</strong> task which we&#8217;ll use to pull out primary key information from our destination table.  This just uses INFORMATION_SCHEMA to look up your target table and pull back the first column for the primary key.  If you use unique constraints or something else, just tweak the SQL or overwrite the destination variable.</li>
<li><strong>["Truncate Destination Table"]</strong> - A second <strong>Execute SQL</strong> task to truncate our destination table (for a full load)</li>
<li><strong>["Data Flow Task Plus"]</strong> - A<strong> CozyRoc DFT+</strong> task for our dynamic loading.  The brains of the operation.</li>
</ul>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_5_process_flowl.png"><img class="alignnone size-full wp-image-682" title="etl_asst_error_log_5_process_flowl" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_5_process_flowl.png" alt="ETL Assistant - Dynamic Error Handling - Overall Package Step Flow" width="710" height="595" /></a></p>
<p>We also have variables.  In our production deployment we have lots and lots of variables.</p>
<p>The major points here are:</p>
<ul>
<li><strong>table_colmap</strong> is a System.Object that is our collection of column names, IDs, and Lineage IDs for all columns in our DFT.  I scoped this to our DFT+ task because it&#8217;s specific to that task, but you could get away with scoping it to the package.</li>
<li>Everything else.  We&#8217;re more or less mimicking the variables we used in previous articles.</li>
</ul>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_3_variables.png"><img title="etl_asst_error_log_3_variables" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_3_variables.png" alt="ETL Assistant - Dynamic Error - Variables" width="654" height="381" /></a></p>
<p>Let&#8217;s move on to the DFT.  Open up the DFT+.  You&#8217;re going to see two main paths:</p>
<ul>
<li>We had an issue obtaining the source data.  (right side) Yes.  This does happen.  Case in point &#8211; you have a date of &#8220;-4444 AD&#8221; in Oracle.  The OLEDB driver we use for Oracle really doesn&#8217;t like that.  Or even a 44 digit numeric.</li>
<li>We had an issue writing to the destination table. (left side)</li>
</ul>
<div>In both paths we simply channel the error rows to our error handler script task to process the buffer and do its magic.  I cheat by seeding the flow with additional error columns we overwrite within the task.  Mainly because I&#8217;m too lazy to magically add columns to the buffer myself from within the script task.</div>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_4_error_flow_final.png"><img class="alignnone size-full wp-image-685" title="etl_asst_error_log_4_error_flow_final" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2012/01/etl_asst_error_log_4_error_flow_final.png" alt="ETL Assistant - Dynamic Error Flow DFT+" width="492" height="652" /></a></p>
<p>Let&#8217;s give it a whirl and see what happens.</p>
<p>I&#8217;ve intentionally created opportunities for problems.</p>
<table width="564" border="0" cellspacing="0" cellpadding="0">
<colgroup>
<col width="118" />
<col width="30" />
<col width="111" />
<col width="77" />
<col width="33" />
<col width="117" />
<col width="78" /> </colgroup>
<tbody>
<tr>
<td width="118" height="21"><strong>Column</strong></td>
<td width="30"></td>
<td width="111"><strong>Source</strong></td>
<td width="77"></td>
<td width="33"></td>
<td width="117"><strong>Destination</strong></td>
<td width="78"></td>
</tr>
<tr>
<td height="17"><strong>column_name</strong></td>
<td></td>
<td><strong>DATA_TYPE</strong></td>
<td><strong>MAX_LEN</strong></td>
<td></td>
<td><strong>DATA_TYPE</strong></td>
<td><strong>MAX_LEN</strong></td>
</tr>
<tr>
<td height="17">employee_id</td>
<td></td>
<td>int</td>
<td>NULL</td>
<td></td>
<td>int</td>
<td>NULL</td>
</tr>
<tr>
<td height="17">employee_guid</td>
<td></td>
<td>uniqueidentifier</td>
<td>NULL</td>
<td></td>
<td>uniqueidentifier</td>
<td>NULL</td>
</tr>
<tr>
<td height="17">email_addr</td>
<td></td>
<td>varchar</td>
<td align="right">20</td>
<td></td>
<td>varchar</td>
<td align="right"><span style="color: #ff0000;">15</span></td>
</tr>
<tr>
<td height="17">first_nm</td>
<td></td>
<td>varchar</td>
<td align="right">20</td>
<td></td>
<td>varchar</td>
<td align="right"><span style="color: #ff0000;">10</span></td>
</tr>
<tr>
<td height="17">last_nm</td>
<td></td>
<td>varchar</td>
<td align="right">20</td>
<td></td>
<td>varchar</td>
<td align="right"><span style="color: #ff0000;">10</span></td>
</tr>
<tr>
<td height="17">awesomeness</td>
<td></td>
<td>bigint</td>
<td>NULL</td>
<td></td>
<td><span style="color: #ff0000;">int</span></td>
<td>NULL</td>
</tr>
<tr>
<td height="17">create_dts</td>
<td></td>
<td>datetime</td>
<td>NULL</td>
<td></td>
<td>datetime</td>
<td>NULL</td>
</tr>
<tr>
<td height="17">modified_dts</td>
<td></td>
<td>datetime</td>
<td>NULL</td>
<td></td>
<td>datetime</td>
<td>NULL</td>
</tr>
</tbody>
</table>
<p>The destination columns will have conversion issues with</p>
<ul>
<li>email_addr length</li>
<li>first_nm length</li>
<li>last_nm length</li>
<li>awesomeness (rating) size</li>
</ul>
<table width="567" border="0" cellspacing="0" cellpadding="0">
<colgroup>
<col width="81" />
<col width="161" />
<col width="107" />
<col width="101" />
<col width="117" /> </colgroup>
<tbody>
<tr>
<td width="81" height="17"><strong>employee_id</strong></td>
<td width="161"><strong>email_addr</strong></td>
<td width="107"><strong>first_nm</strong></td>
<td width="101"><strong>last_nm</strong></td>
<td width="117"><strong>awesomeness</strong></td>
</tr>
<tr>
<td align="right" height="17">1</td>
<td>jjones@test.org</td>
<td>Jimbo</td>
<td>Jones</td>
<td align="right">25</td>
</tr>
<tr>
<td align="right" height="17">2</td>
<td><span style="color: #ff0000;">captain@test.org</span></td>
<td>Horatio</td>
<td>McCallister</td>
<td align="right">100000</td>
</tr>
<tr>
<td align="right" height="17">3</td>
<td>homer@test.org</td>
<td>Homer</td>
<td>Simpson</td>
<td align="right">25000</td>
</tr>
<tr>
<td align="right" height="17">4</td>
<td>marge@test.org</td>
<td>Marjorie</td>
<td>Simpson</td>
<td align="right"><span style="color: #ff0000;">250000000000</span></td>
</tr>
<tr>
<td align="right" height="17">5</td>
<td><span style="color: #ff0000;">cruiser@test.org</span></td>
<td>Waylon</td>
<td>Smithers</td>
<td align="right">100</td>
</tr>
<tr>
<td align="right" height="17">6</td>
<td>bart@test.org</td>
<td><span style="color: #ff0000;">Bartholomew</span></td>
<td>Simpson</td>
<td align="right">25</td>
</tr>
<tr>
<td align="right" height="17">7</td>
<td><span style="color: #ff0000;">lisasimpson@test.org</span></td>
<td>Lisa</td>
<td>Simpson</td>
<td align="right">25</td>
</tr>
</tbody>
</table>
<p>If we run the package and review our error log we&#8217;ll see failures related to the highlighted columns.  (Note that I&#8217;ve removed some elements of the exception log here solely for formatting)</p>
<p>&nbsp;</p>
<table border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="49" height="17">error_id</td>
<td width="47">record_id</td>
<td width="59">record_id_dsc</td>
<td width="73">column_nm</td>
<td width="82">error_id</td>
<td width="515">error_dsc</td>
<td width="161">error_data</td>
</tr>
<tr>
<td align="right" height="34">8</td>
<td align="right">2</td>
<td>employee_id</td>
<td>email_addr</td>
<td align="right">-1071607689</td>
<td width="315">The data value cannot be converted for reasons other than sign mismatch or data overflow.</td>
<td>captain@test.org</td>
</tr>
<tr>
<td align="right" height="34">9</td>
<td align="right">4</td>
<td>employee_id</td>
<td>awesomeness</td>
<td align="right">-1071607686</td>
<td width="315">Conversion failed because the data value overflowed the type used by the provider.</td>
<td align="right">250000000000</td>
</tr>
<tr>
<td align="right" height="34">10</td>
<td align="right">5</td>
<td>employee_id</td>
<td>email_addr</td>
<td align="right">-1071607689</td>
<td width="315">The data value cannot be converted for reasons other than sign mismatch or data overflow.</td>
<td>cruiser@test.org</td>
</tr>
<tr>
<td align="right" height="34">11</td>
<td align="right">6</td>
<td>employee_id</td>
<td>first_nm</td>
<td align="right">-1071607689</td>
<td width="315">The data value cannot be converted for reasons other than sign mismatch or data overflow.</td>
<td>Bartholomew</td>
</tr>
<tr>
<td align="right" height="34">12</td>
<td align="right">7</td>
<td>employee_id</td>
<td>email_addr</td>
<td align="right">-1071607689</td>
<td width="315">The data value cannot be converted for reasons other than sign mismatch or data overflow.</td>
<td>lisasimpson@test.org</td>
</tr>
</tbody>
</table>
<p>&#8220;You&#8217;re failing, Seymour! What is it about you and failure?&#8221;</p>
<p>There you go &#8211; row exceptions being logged for various issues with data from the dynamic DFT.</p>
<h2>How Denali Should Fix This</h2>
<p>We&#8217;re eagerly anticipating Denali for several reasons, but one fantastic piece of news is that SSIS in Denali should let us bypass most if not all of the issues with LinageID.  As Jorg Klein notes in one of his blog posts (<a href="http://sqlblog.com/blogs/jorg_klein/archive/2011/07/22/ssis-denali-ctp3-what-s-new.aspx">http://sqlblog.com/blogs/jorg_klein/archive/2011/07/22/ssis-denali-ctp3-what-s-new.aspx</a>):</p>
<blockquote><p>SSIS always mapped columns from source to transformations or destinations with the help of lineage ids. Every column had a unique metadata ID that was known by all components in the data flow. If something changed in the source this would break the lineage ids and raised error messages like: The external metadata column collection is out of synchronization with the data source columns.<br />
To fix this error you would re-map all broken lineage ids with the “Restore Invalid Column References Editor”.<br />
In Denali lineage-ids are no longer used. Mappings are done on column names, which is great because you can now use auto map on column names and even copy/paste pieces of another data flow and connect them by mapping the corresponding column names.</p></blockquote>
<p>Fan.  Tastic.  Couldn&#8217;t come soon enough.  Granted, you&#8217;ll have to upgrade to Denali to make use of this, but there are so many other compelling reasons to migrate (<a href="http://www.brentozar.com/sql/sql-server-denali-2011-2012/">http://www.brentozar.com/sql/sql-server-denali-2011-2012/</a>) that this is just icing on the cake.</p>
<p>&nbsp;</p>
<h1>Appendix &#8211; Code</h1>
<p>This code is provided in the download, but for quick access / reference I&#8217;m also including it here.</p>
<h2>DFT+ Column Collection Script</h2>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">using System<span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Data</span><span style="color: #339933;">;</span>
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span><span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Windows</span>.<span style="color: #202020;">Forms</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">/*
&nbsp;
 Add references to ...
     CozyRoc.SSISPlus.2008
     Microsoft.SqlServer.DTSPipelineWrap
     Microsoft.SQLServer.DTSRuntimeWrap 
&nbsp;
 */</span>
&nbsp;
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Pipeline</span>.<span style="color: #202020;">Wrapper</span><span style="color: #339933;">;</span>
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span>.<span style="color: #202020;">Wrapper</span><span style="color: #339933;">;</span>
using CozyRoc.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">SSIS</span>.<span style="color: #202020;">Attributes</span><span style="color: #339933;">;</span>
using CozyRoc.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">SSIS</span><span style="color: #339933;">;</span>
&nbsp;
using System.<span style="color: #202020;">Collections</span><span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Collections</span>.<span style="color: #202020;">Generic</span><span style="color: #339933;">;</span>
&nbsp;
namespace ST_44af5cee356540e294c47d0aa17d41ed.<span style="color: #202020;">csproj</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #009900;">&#91;</span>System.<span style="color: #202020;">AddIn</span>.<span style="color: #202020;">AddIn</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;ScriptMain&quot;</span><span style="color: #339933;">,</span> Version <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;1.0&quot;</span><span style="color: #339933;">,</span> Publisher <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;&quot;</span><span style="color: #339933;">,</span> Description <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span>
    <span style="color: #009900;">&#91;</span>DataFlowColumnAdded<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;OnColumnAdded&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span><span style="color: #666666; font-style: italic;">//CozyRoc annotation</span>
    public partial class ScriptMain <span style="color: #339933;">:</span> Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Tasks</span>.<span style="color: #202020;">ScriptTask</span>.<span style="color: #202020;">VSTARTScriptObjectModelBase</span>
    <span style="color: #009900;">&#123;</span>
&nbsp;
        <span style="color: #339933;">#region VSTA generated code</span>
        <span style="color: #000000; font-weight: bold;">enum</span> ScriptResults
        <span style="color: #009900;">&#123;</span>
            Success <span style="color: #339933;">=</span> Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span>.<span style="color: #202020;">DTSExecResult</span>.<span style="color: #202020;">Success</span><span style="color: #339933;">,</span>
            Failure <span style="color: #339933;">=</span> Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span>.<span style="color: #202020;">DTSExecResult</span>.<span style="color: #202020;">Failure</span>
        <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
        <span style="color: #339933;">#endregion</span>
&nbsp;
        public <span style="color: #993333;">void</span> OnColumnAdded<span style="color: #009900;">&#40;</span>IDTSComponentMetaData100 component<span style="color: #339933;">,</span> bool isInput<span style="color: #339933;">,</span> string colName<span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>isInput<span style="color: #009900;">&#41;</span>
                <span style="color: #009900;">&#123;</span>
                    IDTSOutputColumn100 column <span style="color: #339933;">=</span> component.<span style="color: #202020;">OutputCollection</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">OutputColumnCollection</span><span style="color: #009900;">&#91;</span>colName<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                    column.<span style="color: #202020;">TruncationRowDisposition</span> <span style="color: #339933;">=</span> DTSRowDisposition.<span style="color: #202020;">RD_RedirectRow</span><span style="color: #339933;">;</span>
                    column.<span style="color: #202020;">ErrorRowDisposition</span> <span style="color: #339933;">=</span> DTSRowDisposition.<span style="color: #202020;">RD_RedirectRow</span><span style="color: #339933;">;</span>
                <span style="color: #009900;">&#125;</span>
                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>isInput<span style="color: #009900;">&#41;</span>
                <span style="color: #009900;">&#123;</span>
&nbsp;
                    IDTSInputColumn100 column <span style="color: #339933;">=</span> component.<span style="color: #202020;">InputCollection</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">InputColumnCollection</span><span style="color: #009900;">&#91;</span>colName<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                    Dictionary colmap <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                    Variables variables <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">null</span><span style="color: #339933;">;</span>
&nbsp;
                    try
                    <span style="color: #009900;">&#123;</span>
                        Dts.<span style="color: #202020;">VariableDispenser</span>.<span style="color: #202020;">LockOneForWrite</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;User::table_colmap&quot;</span><span style="color: #339933;">,</span> ref variables<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>variables<span style="color: #009900;">&#91;</span><span style="color: #ff0000;">&quot;User::table_colmap&quot;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">Value</span>.<span style="color: #202020;">GetType</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> colmap.<span style="color: #202020;">GetType</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                        <span style="color: #009900;">&#123;</span>
                            colmap <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>Dictionary<span style="color: #009900;">&#41;</span>variables<span style="color: #009900;">&#91;</span><span style="color: #ff0000;">&quot;User::table_colmap&quot;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">Value</span><span style="color: #339933;">;</span>
                        <span style="color: #009900;">&#125;</span>
                        <span style="color: #b1b100;">else</span>
                        <span style="color: #009900;">&#123;</span>
                        <span style="color: #009900;">&#125;</span>
                        colmap.<span style="color: #202020;">Add</span><span style="color: #009900;">&#40;</span>column.<span style="color: #202020;">ID</span><span style="color: #339933;">,</span> column.<span style="color: #202020;">Name</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                        variables<span style="color: #009900;">&#91;</span><span style="color: #ff0000;">&quot;User::table_colmap&quot;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">Value</span> <span style="color: #339933;">=</span> colmap<span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//put the column collection back into the variable</span>
                    <span style="color: #009900;">&#125;</span>
                    catch <span style="color: #009900;">&#40;</span>Exception exi<span style="color: #009900;">&#41;</span>
                    <span style="color: #009900;">&#123;</span>
                    <span style="color: #009900;">&#125;</span>
                    finally
                    <span style="color: #009900;">&#123;</span>
                        variables.<span style="color: #202020;">Unlock</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                    <span style="color: #009900;">&#125;</span>
                <span style="color: #009900;">&#125;</span>
            <span style="color: #009900;">&#125;</span>
            catch
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
        <span style="color: #009900;">&#125;</span>
&nbsp;
        public <span style="color: #993333;">void</span> Main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
&nbsp;
            Dts.<span style="color: #202020;">TaskResult</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span><span style="color: #009900;">&#41;</span>ScriptResults.<span style="color: #202020;">Success</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h2>Error Row Handler Script</h2>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">/* Microsoft SQL Server Integration Services Script Component
*  This is CozyRoc Script Component Plus Extended Script
*  Write scripts using Microsoft Visual C# 2008.
*  ScriptMain is the entry point class of the script.*/</span>
&nbsp;
using System<span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Text</span><span style="color: #339933;">;</span>
&nbsp;
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Pipeline</span>.<span style="color: #202020;">Wrapper</span><span style="color: #339933;">;</span>
using Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Runtime</span>.<span style="color: #202020;">Wrapper</span><span style="color: #339933;">;</span>
&nbsp;
using System.<span style="color: #202020;">Collections</span><span style="color: #339933;">;</span>
using System.<span style="color: #202020;">Collections</span>.<span style="color: #202020;">Generic</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//for our dictionaries / lists</span>
&nbsp;
<span style="color: #808080; font-style: italic;">/*
 * HOW IT WORKS:
 * ======================================================================
 * Thanks to CozyRoc's great sample code (thanks, CozyRoc! :), we're able to rip through the
 * set of columns and find the error info critical for our logging / fixing.  We get the basic
 * column info on PreExecute() and store the column names, column lineage IDs, and column relative
 * position (&quot;index&quot;) in two separate dictionaries for later.  We use LineageID as the key for those
 * and the later on during Input_ProcessInputRow to look up those names and IDs so we can pull back
 * data from the buffer and then also UPDATE the buffer to overwrite our custom error info columns
 *
 * Dictionary 1: Set of column names (&quot;colnames&quot;) key: LineageID, value: column.Name
 * Dictionary 2: Set of column relative positions (&quot;colids&quot;) key: LineageID, value colIndex
 *
 * PreExecute - set up objects for later.  IDs for columns, dictionaries, variables, etc.
 * Input_ProcessInputRow - the &quot;real work&quot; of adjusting / setting the values in the columns
 *
 * SETUP - READ THIS OR IT WON'T WORK
 * ======================================================================
 * REQUIRED INPUT COLUMNS
 * -------
 * We anticipate the following input columns being present (sent to the script task as inputs)
 *
 * Standard &quot;Error Output&quot; columns from tasks
 * ------
 * ErrorColumn      MSFT - The Lineage ID for the error column
 * ErrorCode        MSFT - The SSIS error code
 *
 * Additional error columns specific to our purposes.  You can reuse these or update the column names
 * ------
 * error_id         CUSTOM - Same as the SSIS error code, but we need them for our table
 * column_nm        CUSTOM - The name of the column where the error occurred
 * record_id_dsc    CUSTOM - the column name for the &quot;primary key&quot; column (EX: employee_id)
 * record_id        CUSTOM - the value/ID for the &quot;primary key&quot; column so you can look up the row later
 *                              EX:&quot;12345&quot; in column &quot;employee_id&quot;
 *
 * error_id         CUSTOM - the SSIS error (same as ErrorCode, but for my purposes we left it here)
 * error_dsc        CUSTOM - the human-readable description of the SSIS error EX: &quot;The data was truncated.&quot;
 *
 * REQUIRED VARIABLES
 * -------
 * NOTE: You MUST set these up as a read-only variables within your script task.
 *
 * Package Variable: @colmap (dictionary) - the collection of column names and IDs for our dynamic columns
 *                                        - this is set in the outer DFT+ OnColumnAdded()
 *                                        - we use this to pull out the full list of columns since we can't get ahold
 *                                        - of the prior step's column IDs/LineageIDs when we're in this script task
 *
 * Package Variable: @table_keycol (string) - the name of the column that represents your primary key
 *                                              EX: &quot;employee_id&quot;
 *
 * This is a cheap hack, but for my situation I'm OK with that.  We don't necessarily know what a &quot;key&quot;
 * column is at this point - primary key, I mean here.  So to get around that we set that value in a variable
 * within the overall package.  We then use that variable to say &quot;oh, that's the key column&quot; later and retrieve
 * the column name and the column value so we can write out our primary key reference info.  You'll see the
 * obvious limitation - we don't support compound primary keys.  But neither does my logging table, so...
 * 
&nbsp;
*/</span>
<span style="color: #009900;">&#91;</span>Microsoft.<span style="color: #202020;">SqlServer</span>.<span style="color: #202020;">Dts</span>.<span style="color: #202020;">Pipeline</span>.<span style="color: #202020;">SSISScriptComponentEntryPointAttribute</span><span style="color: #009900;">&#93;</span>
public class ScriptMain <span style="color: #339933;">:</span> UserComponent
<span style="color: #009900;">&#123;</span>
&nbsp;
    private <span style="color: #993333;">int</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> m_idx<span style="color: #339933;">;</span>
&nbsp;
    private string key_col_name<span style="color: #339933;">;</span>        <span style="color: #666666; font-style: italic;">//&quot;name&quot; of primary key column.  EX: &quot;employee_id&quot;.</span>
    private <span style="color: #993333;">int</span> key_col_id<span style="color: #339933;">;</span>             <span style="color: #666666; font-style: italic;">//Relative column index / position of our &quot;primary key&quot; column</span>
    <span style="color: #666666; font-style: italic;">//single primary key column.  Does not handle compound primary keys.  Retrieve this from a package variable since we want to handle this</span>
    <span style="color: #666666; font-style: italic;">//dynamically and can't automatically determine it from within the package at runtime</span>
&nbsp;
    private Dictionary colnames<span style="color: #339933;">;</span>           <span style="color: #666666; font-style: italic;">//collection to store our colnames for later use within row processing section</span>
    private Dictionary colpositions<span style="color: #339933;">;</span>
    private Dictionary colidsbyposition<span style="color: #339933;">;</span>
    private Dictionary colids<span style="color: #339933;">;</span>           <span style="color: #666666; font-style: italic;">//collection to store our column ids for later use within row processing section</span>
    private Dictionary colmap<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #666666; font-style: italic;">//Internal column tracking numbers.</span>
    <span style="color: #666666; font-style: italic;">//You could probably avoid using these as separate variables, but...</span>
    <span style="color: #666666; font-style: italic;">// 1. I'm not that clever</span>
    <span style="color: #666666; font-style: italic;">// 2. I really, really wanted to explicitly watch them as they moved around</span>
    private <span style="color: #993333;">int</span> i_error_code_id<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_error_column_id<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_error_id<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_column_nm<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_record_id<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_record_id_dsc<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_error_dsc<span style="color: #339933;">;</span>
    private <span style="color: #993333;">int</span> i_error_data<span style="color: #339933;">;</span>
&nbsp;
    StringBuilder _sbColIDs <span style="color: #339933;">=</span> new StringBuilder<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    StringBuilder _sbErrorCols <span style="color: #339933;">=</span> new StringBuilder<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    private bool isSourceErrorOutput <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">false</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">// = true;</span>
    private string _OLEDBSourceType <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;&quot;</span><span style="color: #339933;">;</span>
&nbsp;
    public override <span style="color: #993333;">void</span> PreExecute<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        base.<span style="color: #202020;">PreExecute</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        colnames <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        colpositions <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        colids <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        colidsbyposition <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        colmap <span style="color: #339933;">=</span> new Dictionary<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        try
        <span style="color: #009900;">&#123;</span>
            <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>Variables.<span style="color: #202020;">tablecolmap</span>.<span style="color: #202020;">GetType</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> colmap.<span style="color: #202020;">GetType</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
                colmap <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>Dictionary<span style="color: #009900;">&#41;</span>Variables.<span style="color: #202020;">tablecolmap</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
        <span style="color: #009900;">&#125;</span>
        catch <span style="color: #009900;">&#40;</span>Exception exi<span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
        IDTSInput100 input <span style="color: #339933;">=</span> base.<span style="color: #202020;">ComponentMetaData</span>.<span style="color: #202020;">InputCollection</span><span style="color: #009900;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        IDTSVirtualInput100 virtInput <span style="color: #339933;">=</span> input.<span style="color: #202020;">GetVirtualInput</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #993333;">int</span> colsCount <span style="color: #339933;">=</span> virtInput.<span style="color: #202020;">VirtualInputColumnCollection</span>.<span style="color: #202020;">Count</span><span style="color: #339933;">;</span>
        m_idx <span style="color: #339933;">=</span> new <span style="color: #993333;">int</span><span style="color: #009900;">&#91;</span>colsCount<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> colIndex <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span> colIndex <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> colsCount<span style="color: #339933;">;</span> colIndex<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>         <span style="color: #009900;">&#123;</span>             IDTSVirtualInputColumn100 column <span style="color: #339933;">=</span> virtInput.<span style="color: #202020;">VirtualInputColumnCollection</span><span style="color: #009900;">&#91;</span>colIndex<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>             <span style="color: #666666; font-style: italic;">//================================================================             //pull out the error codes and column IDs             if (string.Compare(column.Name, &quot;ErrorColumn&quot;, true) == 0)             {                 i_error_column_id = colIndex;             }             if (string.Compare(column.Name, &quot;ErrorCode&quot;, true) == 0)             {                 i_error_code_id = colIndex;             }             if (string.Compare(column.Name, &quot;error_id&quot;, true) == 0)             {                 i_error_id = colIndex;             }             if (string.Compare(column.Name, &quot;column_nm&quot;, true) == 0)             {                 i_column_nm = colIndex;             }             if (string.Compare(column.Name, &quot;record_id&quot;, true) == 0)             {                 i_record_id = colIndex;             }             if (string.Compare(column.Name, &quot;record_id_dsc&quot;, true) == 0)             {                 i_record_id_dsc = colIndex;             }             if (string.Compare(column.Name, &quot;error_dsc&quot;, true) == 0)             {                 i_error_dsc = colIndex;             }             if (string.Compare(column.Name, &quot;error_data&quot;, true) == 0)             {                 i_error_data = colIndex;             }             //add our column names to our list for later use             colnames.Add(column.LineageID, column.Name); //column.LineageID used to look up index of error column name in row             colids.Add(column.LineageID, colIndex); //column.LineageID used to look up index of error column index position in row             colidsbyposition.Add(colIndex, column.LineageID);             try             {                 colpositions.Add(column.Name, colIndex);             }             catch { }             try             {                 //is this column the &quot;key&quot; column we're using to identify the key values for the row? EX: primary key                 //NOTE: we're only doing this for a single member if a compound primary key                 if (string.Compare(column.Name, Variables.tablekeycol, true) == 0)//true = ignore case during comparison                 {                     key_col_id = colIndex;                     key_col_name = column.Name;                 }             }             catch { }             //================================================================             m_idx[colIndex] = base.HostComponent.BufferManager.FindColumnByLineageID(                 input.Buffer,                 column.LineageID);         }     }     public override void PostExecute()     {         base.PostExecute();     }     public override void Input_ProcessInputRow(InputBuffer Row)     {         int colsCount = m_idx.Length;         int cColLineageKey;         if (colsCount &amp;gt; 0)</span>
        <span style="color: #009900;">&#123;</span>
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//stuff the errocode into the error_id column</span>
                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_code_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//get the value for the &quot;primary key&quot; column</span>
                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_record_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>key_col_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//get the value for the &quot;primary key&quot; column</span>
                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_record_id_dsc<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> key_col_name<span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//get the error description</span>
                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_dsc<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>ComponentMetaData.<span style="color: #202020;">GetErrorDescription</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span>.<span style="color: #202020;">Parse</span><span style="color: #009900;">&#40;</span>Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_code_id<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">ToString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
&nbsp;
            try
            <span style="color: #009900;">&#123;</span>
                <span style="color: #666666; font-style: italic;">//get the name and value of the column that failed.</span>
                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>i_error_column_id <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_error_column_id <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_error_column_id  <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                        <span style="color: #009900;">&#123;</span>
                            <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colmap.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>cColLineageKey<span style="color: #339933;">,</span> out columnName<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                            <span style="color: #009900;">&#123;</span>
                                <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                <span style="color: #666666; font-style: italic;">//columnName should be set</span>
                                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>cColLineageKey <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> cColLineageKey <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> columnName <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> columnName.<span style="color: #202020;">Length</span> <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                                <span style="color: #009900;">&#123;</span>
                                    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colpositions.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>columnName<span style="color: #339933;">,</span> out currentposition<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                                    <span style="color: #009900;">&#123;</span>
                                        <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                        <span style="color: #666666; font-style: italic;">//current position should be set</span>
                                    <span style="color: #009900;">&#125;</span>
                                <span style="color: #009900;">&#125;</span>
                                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>cColLineageKey <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> cColLineageKey <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> currentposition <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;=</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span><span style="color: #666666; font-style: italic;">//&amp;amp;&amp;amp; currentposition != null)</span>
                                <span style="color: #009900;">&#123;</span>
                                    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colidsbyposition.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>currentposition<span style="color: #339933;">,</span> out cColLineageKey<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                                    <span style="color: #009900;">&#123;</span>
                                        <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                        <span style="color: #666666; font-style: italic;">//current position should be set</span>
                                    <span style="color: #009900;">&#125;</span>
                                <span style="color: #009900;">&#125;</span>
                            <span style="color: #009900;">&#125;</span>
                            <span style="color: #b1b100;">else</span>
                            <span style="color: #009900;">&#123;</span>
                                cColLineageKey <span style="color: #339933;">=</span> cColLineageKey <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
                            <span style="color: #009900;">&#125;</span>
                        <span style="color: #009900;">&#125;</span>
                        <span style="color: #b1b100;">else</span>
                        <span style="color: #009900;">&#123;</span>
                            <span style="color: #666666; font-style: italic;">//probably a &quot;source error output&quot;</span>
                            cColLineageKey <span style="color: #339933;">=</span> cColLineageKey <span style="color: #339933;">+</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
                            <span style="color: #666666; font-style: italic;">//MAJOR MAJOR MAJOR HACK</span>
                            <span style="color: #666666; font-style: italic;">//apparently, we do NOT persist the ORIGINAL LINEAGEID from source to output, so we need to... adjust... the number.</span>
                            <span style="color: #666666; font-style: italic;">// this is EXCEPTIONALLY RISKY, but since MS &quot;adjusts&quot; the output rows for errors to have be &quot;different&quot; from the &quot;it works!&quot; destination</span>
                            <span style="color: #666666; font-style: italic;">// we don't have much of a choice.  In reviewing them #'s it appears they consistently increment for errors, so we need to increment the</span>
                            <span style="color: #666666; font-style: italic;">// index here to find the right value.  Horrible stuff.  Likely to break.  Enjoy.</span>
                        <span style="color: #009900;">&#125;</span>
                    <span style="color: #009900;">&#125;</span>
&nbsp;
                    <span style="color: #666666; font-style: italic;">//Retrieve from the column names dictionary and place column name in error info</span>
                    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>i_column_nm <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_column_nm <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_column_nm  <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                        <span style="color: #009900;">&#123;</span>
                            <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colnames.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>cColLineageKey<span style="color: #339933;">,</span> out value<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                            <span style="color: #009900;">&#123;</span>
                                <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_column_nm<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> value<span style="color: #339933;">;</span>
                            <span style="color: #009900;">&#125;</span>
                        <span style="color: #009900;">&#125;</span>
                    <span style="color: #009900;">&#125;</span>
                    <span style="color: #666666; font-style: italic;">//get the missing column value for the key found at the identified &quot;error column&quot;</span>
                    <span style="color: #666666; font-style: italic;">//had issues where the column blew up because of data type conversion issues, so try/catch is here to help handle this</span>
                    try
                    <span style="color: #009900;">&#123;</span>
                        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>i_error_data <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">null</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_error_data <span style="color: #339933;">&amp;</span>gt<span style="color: #339933;">;</span> <span style="color: #0000dd;">0</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> i_error_data  <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
                            <span style="color: #009900;">&#123;</span>
                                <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>colids.<span style="color: #202020;">TryGetValue</span><span style="color: #009900;">&#40;</span>cColLineageKey<span style="color: #339933;">,</span> out colvalue<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                                <span style="color: #009900;">&#123;</span>
                                    <span style="color: #666666; font-style: italic;">//use the lineage_id to pull the column name</span>
                                    <span style="color: #666666; font-style: italic;">//NOTE: &quot;bad&quot; data MAY be totally thrown out here, which is why we're using the try/catch</span>
                                    <span style="color: #666666; font-style: italic;">//if the custom CozyRoc row processor dies due to formatting errors then this will throw an exception</span>
                                    <span style="color: #666666; font-style: italic;">//we're just going to ignore that and roll on by</span>
                                    <span style="color: #666666; font-style: italic;">//probably worth revisiting at a later date to see if we can get at the bad data anyway</span>
                                    Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>i_error_data<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> Row.<span style="color: #202020;">Buffer</span><span style="color: #009900;">&#91;</span>m_idx<span style="color: #009900;">&#91;</span>colvalue<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#93;</span>.<span style="color: #202020;">ToString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
                                <span style="color: #009900;">&#125;</span>
                            <span style="color: #009900;">&#125;</span>
                        <span style="color: #009900;">&#125;</span>
                    <span style="color: #009900;">&#125;</span>
                    catch <span style="color: #009900;">&#40;</span>Exception vEx<span style="color: #009900;">&#41;</span>
                    <span style="color: #009900;">&#123;</span>
                    <span style="color: #009900;">&#125;</span>
&nbsp;
                <span style="color: #009900;">&#125;</span>
                <span style="color: #b1b100;">else</span>
                <span style="color: #009900;">&#123;</span>
                <span style="color: #009900;">&#125;</span>
            <span style="color: #009900;">&#125;</span>
            catch <span style="color: #009900;">&#40;</span>Exception ex<span style="color: #009900;">&#41;</span>
            <span style="color: #009900;">&#123;</span>
            <span style="color: #009900;">&#125;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #009900;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Developing for NUBIC</title>
		<link>http://informatics.northwestern.edu/blog/nubic-dev-2/2011/12/developing-for-nubic/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=developing-for-nubic</link>
		<comments>http://informatics.northwestern.edu/blog/nubic-dev-2/2011/12/developing-for-nubic/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 19:46:32 +0000</pubDate>
		<dc:creator>Jeff Lunt</dc:creator>
				<category><![CDATA[NUBIC Development]]></category>
		<category><![CDATA[dev environment]]></category>
		<category><![CDATA[nubic-dev]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=574</guid>
		<description><![CDATA[NUBIC is the Northwestern University Biomedical Informatics Center in Chicago. Our developers write software, computation, and data analysis tools that support medical research. There is a famous blog post written by Joel Spolsky titled &#8220;The Joel Test.&#8221; It describes some &#8230; <a href="http://informatics.northwestern.edu/blog/nubic-dev-2/2011/12/developing-for-nubic/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.nucats.northwestern.edu/clinical-research-resources/data-collection-biomedical-informatics-and-nubic/bioinformatics-overview.html">NUBIC</a> is the Northwestern University Biomedical Informatics Center in Chicago. Our developers write software, computation, and data analysis tools that support medical research.</p>
<hr />
<p>There is a famous blog post written by <a href="http://www.joelonsoftware.com/AboutMe.html">Joel Spolsky</a> titled <a href="http://www.joelonsoftware.com/articles/fog0000000043.html">&#8220;The Joel Test.&#8221;</a> It describes some key parts of what make a given development shop a great (or scary) place to work. It&#8217;s also a key metric used on <a href="http://careers.stackoverflow.com/">careers.stackoverflow.com</a>, where every employer posting a job is encouraged to rate themselves, and post their own score on &#8220;The Joel Test.&#8221;</p>
<p>I use &#8220;The Joel Test&#8221; as one of the ways that I critique a potential employer, what sort of value they place on developers, and software development as a practice. I&#8217;m especially passionate about software, more than any other professional or personal pursuit in my life, and I want to work for places that value the work I do for them just as highly as I value the work myself. As such, I think &#8220;The Joel Test&#8221; is a pretty good indicator of that aspect of a workplace.</p>
<p>Since I tend to agree with values put forth in &#8220;The Joel Test,&#8221; I look for a place that scores highly. After working inside NUBIC for just a few months, I&#8217;m pretty happy with what I see here. However, rather than providing a score for NUBIC (I obviously think we score very high), I&#8217;ll just layout what we do here:</p>
<ol>
<li><strong>Do you use source control?</strong> We have an internal Git server, and we also <a href="https://github.com/nubic">publish much of our code on GitHub</a>.</li>
<li><strong>Can you make a build in one step?</strong> We have a <a href="http://jenkins-ci.org/">Jenkins CI server</a> that is tied to our Git project repositories. It automatically runs our test suite when new code is committed to the master branch. It also sends out automated emails when a build fails.</li>
<li><strong>Do you make daily builds?</strong> As often as code is committed, it is built and tested. For projects under active development it&#8217;s common to see multiple builds per day.</li>
<li><strong>Do you have a bug database?</strong> We use a combination of <a href="http://www.redmine.org/">Redmine</a> for internal projects, and <a href="https://github.com/">GitHub</a> for open source projects.</li>
<li><strong>Do you fix bugs before writing new code?</strong> We prioritize bugs over new features. For the purpose of this question I&#8217;m defining bugs as things that are broken in the system, as opposed to simply inconvenient, or inefficient. However, we definitely label inconvenient or inefficient code as a bug if it is preventing meaningful work from getting done (rightfully so), and will prioritize it as such.</li>
<li><strong>Do you have an up-to-date schedule?</strong> We scope and schedule work for all of our projects, and keep clients in the loop at every step of the process, whether we&#8217;re ahead or behind schedule.</li>
<li><strong>Do you have a spec?</strong> The vast majority of projects don&#8217;t have specs for every feature; some of them are as simple as, &#8220;Can you change this thing from red to blue, and move it to the left by 5 pixels?&#8221; which I suppose qualifies as a spec, but may be communicated verbally as opposed to going through a formal process. However, anytime there&#8217;s major work to be done, or any place that we feel has lots of ambiguity, we work it out verbally or on a whiteboard, and then commit the details of that session back to the bug tracking system. After a project launch we also produce documentation for internal purposes, as well as training documents and screencasts for the benefit of end users.</li>
<li><strong>Do programmers have quiet working conditions?</strong> This is one of my favorite things about NUBIC. Though we do work largely in a set of cubicles, we accomplish a quiet working environment in a several ways. First, and foremost, we respect each other by taking conversations that we expect to last more than a minute or two away from the common workspace, and into side offices with closed doors that keep the sound contained. Second, most of us also have noise-cancelling headphones that we use to either listen to our favorite music while coding, or simply to soften the ambient office noise (which is nearly silent most of the time anyway). Third, we prefer asynchronous communication (email + IM) over coming to a person&#8217;s desk unannounced. Unless you&#8217;re on the support rotation for the week, email can safely be checked just 2-3 times a day, and on a typical day I will have fewer than two IM conversations. We also like to eat lunch together often, which sometimes serve as conversation starters for projects, in place of formal meetings.</li>
<li><strong>Do you use the best tools money can buy?</strong> We&#8217;re not talking about the most expensive tools &#8211; we&#8217;re talking about the best tools to do the job. Besides personal preferences, the tools we&#8217;re given are a help, not a hindrance to our work. For example, I personally prefer <a href="http://www.fogcreek.com/fogbugz/">FogBugz</a> for issue tracking over <a href="http://github.com">GitHub</a>, due to its integrated, and more advanced project management features, but GitHub is by no means a hindrance to the workflow. There are also regular, open discussions that occur regarding tool choice. Programmers are not only free to experiment with various tools, but encouraged to do so, bringing their experiences back to the group so everyone can benefit. We maintain a coding library that holds maybe 100 books that have been collected over the years, and cover topics ranging from various programming languages, to guides on design, typeface choice, and the art of software craftsmanship. As for the day-to-day tools, we&#8217;re lucky to have a solid, flexible hardware setup. Here&#8217;s a photo of the standard issue desk: 15&#8243; MacBook Pro + Apple Cinema Display, and push-pin friendly walls for hanging up notes or creating a makeshift <a href="http://www.infoq.com/resource/articles/hiranabe-lean-agile-kanban/en/resources/image5.jpg">kanban</a>board.
<p><div id="attachment_608" class="wp-caption alignnone" style="width: 459px"><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/12/1219111201.jpg"><img class="size-full wp-image-608" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/12/1219111201.jpg" alt="" width="449" height="337" /></a><p class="wp-caption-text">MacBook Pro, Apple cinema display, whiteboard for prototyping </p></div></li>
<li><strong>Do you have testers?</strong> In addition to test- and behavior-driven development practices that each developer employs, we have people who specialize in user experience and UI design. We also encourage Usability Testing, of the sort outlined in Steve Krug&#8217;s <a href="http://www.amazon.com/Rocket-Surgery-Made-Easy-Yourself/dp/0321657292">&#8220;Rocket Surgery Made Easy&#8221;</a>. Finding a coworker interested in pair programming is also very easy. In NUBIC, responsibility for testing software is a partnership between developers, clients, and groups that verify projects on staging systems, as well as anyone else who is simply interested in assisting a given project. <strong>We do not, however, have dedicated testers</strong> whose job it is to do nothing but think of ways to break the system (in order to prevent it). There&#8217;s been quite a bit of discussion about this, and we&#8217;re hoping that as our group grows, we&#8217;ll be able to add dedicated testers. The reasons for wanting dedicated testers are many, but I think the greatest benefit to having them is that having someone whose job it is to think of ways to break things is fundamentally a different role than a developer (whose job it is to think of ways to make things work). Both sides are important.</li>
<li><strong>Do new candidates write code during their interview?</strong> Developers are expected to be able to demonstrate their ability to implement a few common algorithms, as well as design a small system, live, during the interview. If you&#8217;re awesome at software, it&#8217;s not a big deal. It basically proves that you have three things: a solid background in algorithms, an ability to reason about a system and accept feedback from others on its design, and an ability to ask intelligent questions of the client about what they want.</li>
<li><strong>Do you do hallway usability testing?</strong> Some people do this more than others, but anytime you&#8217;re implementing a new feature where you&#8217;d like some feedback, it&#8217;s encouraged that you consult with people around you. The culture in NUBIC is such that I&#8217;ve never run into a person who wasn&#8217;t willing to give you five minutes of their time to examine something you&#8217;re working on. This often turns into a discussion of several different approaches to implementation, and helps lead to software that benefits from multiple perspectives, without adding a bunch of wasted time to the project schedule in formal meetings.</li>
</ol>
<p>On top of that, one of my favorite things about working at Northwestern is the focus on collaboration, and leveraging the large community of brilliant people around you. Northwestern is one of the leading research universities in the country, and they have no shortage smart people that are passionate about what they do.</p>
<p>The panoramic views of Lake Michigan aren&#8217;t bad either.</p>
<p><a href="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/12/10281108181.jpg"><img class="alignnone size-full wp-image-616" src="http://informatics.northwestern.edu/blog/wp-content/uploads/2011/12/10281108181.jpg" alt="" width="2048" height="1536" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/nubic-dev-2/2011/12/developing-for-nubic/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Is Your Mobile Phone a HIPAA Violator?</title>
		<link>http://informatics.northwestern.edu/blog/cid/2011/12/is-your-mobile-phone-a-hipaa-violator/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=is-your-mobile-phone-a-hipaa-violator</link>
		<comments>http://informatics.northwestern.edu/blog/cid/2011/12/is-your-mobile-phone-a-hipaa-violator/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 17:11:03 +0000</pubDate>
		<dc:creator>Justin Starren</dc:creator>
				<category><![CDATA[CID Chief Info-dude]]></category>
		<category><![CDATA[HIT Policy]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=560</guid>
		<description><![CDATA[The recent firestorm over the discovery of a rootkit on many Mobile phones has raised the specter of federal wiretap violations, as discussed in a recent Forbes article.  The rootkit manufacturer, Carrier IQ, denies that it collected keystrokes.  However, a &#8230; <a href="http://informatics.northwestern.edu/blog/cid/2011/12/is-your-mobile-phone-a-hipaa-violator/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The recent firestorm over the discovery of a rootkit on many Mobile phones has raised the specter of federal wiretap violations, as discussed in a recent <a title="Phone 'Rootkit' Maker Carrier IQ May Have Violated Wiretap Law In Millions Of Cases" href="http:///www.forbes.com/sites/andygreenberg/2011/11/30/phone-rootkit-carrier-iq-may-have-violated-wiretap-law-in-millions-of-cases/">Forbes article.</a>  The rootkit manufacturer, Carrier IQ, <a href="http://www.pcmag.com/article2/0,2817,2397156,00.asp">denies that it collected keystrokes</a>.  However, a recent video post appears to show the software <a href="http://www.geek.com/articles/mobile/security-researcher-responds-to-carrieriq-with-video-proof-20111129/">doing exactly that</a>.</p>
<p>What has not hit the press yet is the issue of HIPAA potential violations.  It is important to remember that inter HITECH, covered entities are responsible for breaches even if they  <em>didn&#8217;t know, and by reasonable diligence would not have known.</em>  In other words, if your phone sent PHI to the phone company, you are potentially <a title="HIPAA Act Enforcement Interim Final Rule" href="http://www.hhs.gov/ocr/privacy/hipaa/administrative/enforcementrule/enfifr.pdf">liable for $100-50,000 per violation</a> (probably will be interpreted as each compromised message) and up to $1.5 million total.</p>
<p>Sprint, AT&amp;T and T-Mobile admit to use Carrier IQ.  Verizon says it does not.  I&#8217;m sure that my hospital is not ready to ban all non-Verizon phones&#8230;yet.</p>
<p><del>This is not unique to mobile phones.  The new <a href="http://www.pcmag.com/article2/0,2817,2397014,00.asp">Kindle Fire </a>provides web browsing, so you might use it to access your web-based EHR, right?  However, the actual rendering of the web page is done in the Amazon Cloud and a compressed version of the page is sent to your device.  From a security standpoint, this means that Amazon must  execute what amounts to a <a href="http://en.wikipedia.org/wiki/Man-in-the-middle_attack">man-in-the-middle attack</a> on your secure browsing session.  Whether Amazon looks at your data, or not, is irrelevant.  They can do it at any time, and you would be none the wiser.</del></p>
<p>Thanks to Matt who pointed out that EFF had a nice evaluation of <a href="https://www.eff.org/2011/october/amazon-fire’s-new-browser-puts-spotlight-privacy-trade-offs">security on the Kindle Fire.</a>  It turns out that HTTPS sessions bypass the cloud rendering engine, so things are not as dire as I had thought.  Google also assures that they are not logging content.  Perhaps, but since they are logging the URL and session token, they can probably reproduce much of the content at a later date.</p>
<p>Ultimately, this is why domain specific privacy (e.g. health privacy, mail privacy or video rental privacy)  laws are doomed to fail.  Our digital lives are far too interconnected for any service provider to create separate services to comply with the separate regulations.  As a health care organization, we cannot generally share location data more precise than a 3-digit zip code.   The phone company can collect, and give to advertisers, my location down to a few feet!  I&#8217;m sure that the phone company can tell, if they wanted, what chronic medications I take simply based on which lab I go to each month.  We need a fundamental right of digital privacy.  That right need to be based on the concept of information use, rather than on information access.  Anyone who uses my digital data should admit that they use it and justify why.</p>
<p>A good analogy for this is credit ratings.  Anyone who uses a credit rating to grant or deny a service is required to say what information they used to make the decision.  If it appears that someone used information that they should not have, like age or race, they need to be able to explain how they reached the decision without using that data.  The same should be true of all of our digital &#8220;breadcrumbs&#8221; that was scatter across the digital landscape as we attempt to live our digital lives.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/cid/2011/12/is-your-mobile-phone-a-hipaa-violator/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Medical Science and the Blogosphere</title>
		<link>http://informatics.northwestern.edu/blog/cid/2011/11/medical-science-and-the-blogosphere/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=medical-science-and-the-blogosphere</link>
		<comments>http://informatics.northwestern.edu/blog/cid/2011/11/medical-science-and-the-blogosphere/#comments</comments>
		<pubDate>Wed, 30 Nov 2011 00:42:25 +0000</pubDate>
		<dc:creator>Justin Starren</dc:creator>
				<category><![CDATA[CID Chief Info-dude]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=536</guid>
		<description><![CDATA[Electronic publication of results is rapidly supplanting conventional scientific journals.  The challenge is to separate fact from fiction.  While peer review is far from perfect, it is still pretty good.  In the blogosphere, volume can overwhelm substance, and truthiness can &#8230; <a href="http://informatics.northwestern.edu/blog/cid/2011/11/medical-science-and-the-blogosphere/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Electronic publication of results is rapidly supplanting conventional scientific journals.  The challenge is to separate fact from fiction.  While peer review is far from perfect, it is still pretty good.  In the blogosphere, volume can overwhelm substance, and <a title="Wikipedia definition" href="http://en.wikipedia.org/wiki/Truthiness">truthiness</a> can overwhelm truth.  When the blogosphere gets combined with legal threats,  rational discourse can be silenced.  Rhys Morgan, a high school student, got a lesson in legal intimidation when he questioned an unpublished medical therapy.  His <a href="http://rhysmorgan.co/2011/11/threats-from-the-burzynski-clinic/">description of the ordeal</a> is both enlightening and cautionary.  I applaud both his articulateness and his tenacity.</p>
<p>As we increasingly use electronic media to accelerate the spread of scientific and medical knowledge, we need to think seriously about how we insure that the information we spread is the most correct, not just the loudest, truthiest, sexiest, or best financed.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/cid/2011/11/medical-science-and-the-blogosphere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Best iPad Stylus</title>
		<link>http://informatics.northwestern.edu/blog/cid/2011/11/best-ipad-stylus/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=best-ipad-stylus</link>
		<comments>http://informatics.northwestern.edu/blog/cid/2011/11/best-ipad-stylus/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 19:41:26 +0000</pubDate>
		<dc:creator>Justin Starren</dc:creator>
				<category><![CDATA[CID Chief Info-dude]]></category>

		<guid isPermaLink="false">http://informatics.northwestern.edu/blog/?p=524</guid>
		<description><![CDATA[As we try to use iPads to replace the medical clipboard, the need for an iPad writing instrument becomes paramount.  As every fountain pen user will attest, writing instruments are very personal items.  There have been many queries on the &#8230; <a href="http://informatics.northwestern.edu/blog/cid/2011/11/best-ipad-stylus/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As we try to use iPads to replace the medical clipboard, the need for an iPad writing instrument becomes paramount.  As every fountain pen user will attest, writing instruments are very personal items.  There have been many queries on the web trying to find the best iPad stylus.  There have even been <a title="stylus reviews" href="http://www.imedicalapps.com/2011/02/ipad-stylus-review-best-handwriting-touch-screen/">reviews</a>.  To me, there are two main criteria for a good stylus.  First, low friction.  Finding a stylus with the right friction on the iPad screen is not easy, especially since it changes with the amount of finger grease on the glass.  Second, is accuracy.  I want to be able to write with the same resolution as a pen on paper.</p>
<p>While I have not tried every stylus on the marked, I have tried a number.  The major problem with the rubber tipped styli is that the friction seems to change as the rubber gets worn.  iFaraday stylus uses a cloth tip that glides smoothly and does not seem to change as much as the rubber.  The <a title="iFaraday store" href="http://www.ifaraday.com/store.html">iFaraday Artist, Firm Dome</a> is the best I have found by far.</p>
<p>Of course, nothing is perfect.  I would love to see the firm dome available in the Rx stylus, which has a cap to cover the tip when not in use.  I would also love to see the Rx cap be clip on, rather than screw on.  Even so, this has become my go-to stylus for note taking, sketching, or making marginal notes in papers I read on the iPad.</p>
]]></content:encoded>
			<wfw:commentRss>http://informatics.northwestern.edu/blog/cid/2011/11/best-ipad-stylus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

